throbber

`
`PROCEEDINGS
`
`AR
`
`
`Wz
`2002 Pacific Rim
`International Symposium on
`Dependable Computing |
`
`
`
`
`
`COMPUTER
`SOCIETY
`
`IEEE
`
`Sponsored by
`IEEE Computer Society Technical Committee on Fault-Tolerant Computing
`IEICE Technical Group on Dependable Computing
`
`EiC
`
`HPE, Exh. 1013,p.1
`
`HPE, Exh. 1013, p. 1
`
`

`

`Proceedings
`
`
`
`2002 Pacific Rim International Symposium on
`Dependable Computing
`
`PRDC 2002
`
`16-18 December 2002 e Tsukuba, Japan
`
`Sponsored by
`IEEE Computer Society Technical Committee on Fault-Tolerant Computing
`IEICE Technical Group on Dependable Computing
`
`In cooperation with
`IFIP WG10.4 on Dependable Computing and Fault Tolerance
`Jet Propulsion Laboratory
`
`mx)
`
`COMPUTER
`SOCIETY
`
`IEEE
`Los Alamitos, California
`Washington
`+
`Brussels
`+
`
`Tokyo
`
`
`
`HPE, Exh. 1013, p. 2
`
`HPE, Exh. 1013, p. 2
`
`

`

`A)69 _
`os 33 ro 34
`
`o IO A
`
`Copyright © 2002 by TheInstitute of Electrical and Electronics Engineers, Inc.
`All rights reserved
`
`Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries may
`photocopy beyond the limits of US copyright law, for private use of patrons, those articles in this volume
`that carry a codeat the bottom ofthe first page, provided that the per-copyfee indicated in the code is paid
`through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923.
`
`Other copying, reprint, or republication requests should be addressed to: IEEE Copyrights Manager, IEEE
`Service Center, 445 Hoes Lane, P.O. Box 133, Piscataway, NJ 08855-1331.
`
`The papers in this book comprise the proceedings of the meeting mentioned on the cover andtitle page.
`They reflect the authors’ opinions and, in the interests oftimely dissemination, are published as presented
`and without change. Their inclusion in this publication does not necessarily constitute endorsement by the
`editors, the IEEE Computer Society, or the Institute ofElectrical and Electronics Engineers, Inc.
`
`IEEE Computer Society Order Number PRO1852
`ISBN 0-7695-1852-4
`Library of Congress Number 20021 13942
`
`Additional copies may be orderedfrom:
`
`IEEE Computer Society
`CustomerService Center
`10662 Los VaquerosCircle
`P.O. Box 3014
`Los Alamitos, CA 90720-1314
`Tel: + 1 800 272 6657
`Fax: + 1 714 821 4641
`http://computer.org/
`csbooks@computer.org
`
`IEEE Service Center
`445 Hoes Lane
`P.O. Box 1331
`Piscataway, NJ 08855-1331
`Tel: + 1 732 981 0060
`Fax: + 1 732 981 9667
`http://shop.ieee.org/store/
`customer-service@ieee.org
`
`IEEE Computer Society
`Asia/Pacific Office
`Watanabe Bldg., 1-4-2
`Minami-Aoyama
`Minato-ku, Tokyo 107-0062
`JAPAN
`Tel: + 813 3408 3118
`Fax: + 81 3 3408 3553
`tokyo.ofc@computer.org
`
`Editorial production by Anne Jacobs
`Coverart production by Joe Daigle/Studio Productions
`Printed in the United States of America by The Printing House
`
`ABD
`COMPUTER
`SOCIETY
`
`IEEE
`
`HPE, Exh. 1013, p. 3
`
`HPE, Exh. 1013, p. 3
`
`

`

`
`
`Table of Contents
`2002 Pacific Rim International Symposium on Dependable Computing
`
`General Chair’s MeSSage@.................:ccccesscceesceeeceeeseeeesseseeetieeneenneseesereaesaetsseaeciessseeesesersseesenerees ix
`Program Co-Chairs’ Message.................ccccccecseseeeeseeeetenesaeessaeresseteeessseasseseneneesanenenarieeeeerieney Xx
`Organizing COMMITEE... -ecvec neve ennnncnennnnnestbianntaniys cia niedeaeaessunesRaueatanne meen eee teeuneenenee concen xi
`Steering Committee ..................sccecsensnesssssensssenenenseeeteesssseeeeeesesensesrseesserenenenereneneeneensennnenesxii
`Program Committee.....................ccccccccce eee e tee ceeeeeeeneeneeneeesenenseseesaaeesnarseessenssneneenenieserneneeny xiii
`REVICWESS..........00ccccccceesscecesseececeneecseeeeeeeeeeeseneeeceneeseenieeeceerseseseeseeeeseteeenieeestaaersneeeseiesenseeeeeeniesxiv
`
`=
`
`Keynote Speech(1)
`Fault Tolerance in Autonomic Computing Environmennt..................:ccccceeecseeeceeecieeeettteeeeeeenneaes 3
`Y. Tohma
`
`Session 1A: Dependable Distributed Systems(1)
`Formal Specification and Verification of a Group Membership Protocol
`for an Intrusion-Tolerant Group Communication System. .............c.:ccccccee tenn eteteeeeneeeeeenes 9
`H. V. Ramasamy, M. Cukier, and W. H. Sanders
`AsynchronousActive Replication in Three-tier Distributed Systems...............-.-. eters 19
`R. Baldoni, C. Marchetti, and S. Tucci Piergiovanni
`
`Session 1B: Checkpointing
`On Characteristics of DEF Communication-Induced Checkpointing Protocols...................05 29
`J. Tsai and J.-W. Lin
`A Low Overhead Checkpointing Protocol for Mobile Computing SystemS.............::: essere 37
`C.-Y. Lin, S.-C. Wang, S.-Y. Kuo, and I.-Y. Chen
`The Cost of Checkpointing, Logging and Recovery for the Mobile Agent Systems................... 45
`H. Kim, H. Y. Yeom,T. Park, and H. Park
`
`Session 2A: Dependable Systems(1)
`Analysis of the Effects of Real and Injected Software Faults: Linux as a Case Study............... 51
`T. Jarboui, J. Arlat, Y. Crouzet, K. Kanoun, and T. Marteau
`Principles of Multi-Level Reflection for Fault Tolerant Architectures..............cccceceecee eerie: 59
`F. Taiani, J.-C. Fabre, and M.-O.Killijian
`Hardware/Software Co-Reliability of Configurable Digital Systems..............-.:.: settee 67
`M. Choi, N. Park, Y. Kim, and F. Lombardi
`Highly Fault-Tolerant FPGA Processor by Degrading Strategy .............:ccceccseeentierreees 75
`Y. Nakamura and K. Hiraki
`
`LINDA HALL LIBRARY
`Kansas City, Mo.
`
`HPE, Exh. 1013, p. 4
`
`HPE, Exh. 1013, p. 4
`
`

`

`
`
`
`
`Session 2B: Reliability/ Dependability Analysis
`Using Software Implemented Fault Inserterin Dependability AnalySis 00.000...cececcsccscccsceseeosees..81
`P. Gawkowski and J. Sosnowski
`Analyzing Network Reliability with Imperfect Nodes using OBDD.............:ccceecccsccsssseeeceeesssc.89
`F.-M. Yeh, H.-Y. Lin, and S.-Y. Kuo
`Reliability Analysis of Grid Computing Systems.........sesseesessseeecsssssssestsessssessstssseseececcccc97
`Y. S. Dai, M. Xie, and K. L. Poh
`A Control Theory Approachfor Analyzing the Effects of Data Errors
`in Safety-Critical Control Systems
`O. Askerdal, M. G&fvert, M. Hiller, and N. Suri
`
`105
`
`Keynote Speech(II)
`Caveat Emptor: Making Grid Services Dependable from the Client Side....eeeccccccccccceece 117
`M.Livny and D. Thain
`
`Session 3A: Dependable Distributed Systems(II)
`An Introduction to the Renaming Problem......sscssecccsssssssssseetssessscsssssssesttstssatsesssseseseescccs 121
`M. Raynal
`Passive Replication Schemes in AQUA .....-.ssstssatetnnctnttustananunetieeeeeeecc. 125
`Y. Ren, P. Rubel, M. Seri, M. Cukier, W. H. Sanders, and T. Courtney
`Detecting Feature Interactionsin Telecommunication Services with a SAT
`Solver
`T. Tsuchiya, M. Nakamura, and T. Kikuno
`
`131
`
`Session 3B: Dependable Networks
`Fault-Tolerant Properties of Generalized Hierarchical Completely-Connected
`MTCeternaatinhennerenetaes-agunmrmtetaoeeeemmecsesens 137
`T. Takabatake, M. Kitakami, and H.Ito
`EnergyEfficient and Robust Multicast Protocol for Mobile Ad Hoc NetworkS......ccecsceecco...-. 145
`S. Moh, C. Yu, B. Lee, and H. Y. Youn
`Configurable PC Clusters Using a Hierarchical Complete-Connection-Based
`Switching Network .cicocsersitetntintitenssitenatistatiatnintiunitittinirite. 153
`N. Tsuda
`
`Keynote Speech (11)
`Perspectives on Dependable Computing for Solar System Exploration...ceccccsccccescecese.. 163
`
`L. Alkalai
`
`vi
`
` ( HPE, Exh. 1013, p. 5
`
`HPE, Exh. 1013, p. 5
`
`

`

`
`
`Session 4A: Dependable Systems(II)
`
`Enhancing Access Control with SysGuard, a Reference Monitor Supporting
`Portable and Composable Kernel Module..................:::ccccccecceeeeceeeteeeeteeneeeeeeetnteeeeeseeseeeeeens 167
`Y. Shinjo, K. Eiraku, A. Suzuki, K. Itano, and C. Pu
`
`An Adaptive High-Low Water Mark Destage Algorithm for Cached RAIDS..............000. 177
`Y. J. Nam and C. Park
`
`A Packet-Based Caching Proxy with Loss Recovery for Video Streaming.............cccceeees 185
`K.-S. Hsueh and S.-D. Wang
`
`Session 4B: Dependability Evaluation and Verification
`
`Reliability Evaluation of Multi-State Systems Subject to Imperfect
`Coverage Using OBDD........... cee eee cece eeeseceeecneeeceeeceescseesiseesseesseesceesieeerisesiieeeiaeeritees 193
`Y.-R. Chang, S. V. Amari, and S.-Y. Kuo
`Characterization of Operating Systems Behavior in the Presence of Faulty
`Drivers throtigh Software Fault Emulation wiciccvscvsseccssnssecsnuzccssseevesscesonennsasen sansnubauenersectsvens 201
`J. Durdes and H. Madeira
`
`Level Oriented Formal Model for Asynchronous Circuit Verification and
`Its Efficient Analysis Method .................ccccccceeceee sees cece enn eeessrnneeeeersnnieeeeseeeieeeesneeieeesenas 210
`T. Kitai, Y. Oguro, T. Yoneda, E. Mercer, and C. Myers
`
`Session 5A: Consensusin Distributed Systems
`Consensus in Synchronous Systems: A Concise Guided Tour .............cccccecceetitteeeeeeeen 221
`M. Raynal
`A Consensus Algorithm for SynchronousDistributed Systems using Mobile Agent................ 229
`J. Cao, X. Wang, S. Lo, and S. K. Das
`
`Session 5B: Coding
`
`Evaluation of Convolutional-Code-Based FEC underLimited Recovery Time
`and Its Application to Real-Time Transmission ............0ccccccceeseecseeeteeeceeeeeeetieeeentaeeeteneeeens 239
`M.Arai, H. Kurosu, S. Fukumoto, and K. Iwasaki
`
`A Class of Random Multiple Bits in a Byte Error Correcting (S,,EC) Codes
`for Semiconductor Meniory Systems... ...ssisersassnesn sanenmnmnnniens onnsaneaen eames eremomeneeseees 247
`G. Umanesanand E. Fujiwara
`
`Session 6A: Dependable Database
`Providing Persistence for Sensor Streams with Light Neighbor WAL..............::::ceeeeees 257
`H. Kawashima, M. Toyama, M. Imai, and Y. Anzai
`
`Definition of Faultloads Based on Operator Faults for DMBS Recovery
`B@ACHIMAIKING) 0. cvnnesennenien arnenene ied da dansiciens aun eeang cab leimNnN avemmere cease ene nun teMaN even stnpens ennmemce ens eoenune ne 265
`M. Vieira and H. Madeira
`
`vii
`
`HPE, Exh. 1013, p. 6
`
`
`
`HPE, Exh. 1013, p. 6
`
`

`

`Session 6B: Test
`An Alternative Method of Generating Tests for Path Delay Faults Using
`NrDetection Test Sets ..........sescecessssssseesssssitsstsssssssessitsssnuasuivssisssssseeseeeeeeeeececc 275
`H. Takahashi, K. K. Saluja, and Y. Takamatsu
`Enhancing Delay Fault Testability for Iterative LOgiC Array$oo...ccceicceseeseececesseeececeeeeeececee. 283
`S.-K. Lu and C.-H. Yeh
`
`Session 7A: Neural Network
`Comparison with Defect Compensation Methods for Feed-Forward Neural
`NEWFIE. scan snnesnsonusemvenenesntencsshtahinsanceeoeeeeeneunihiassthdansncerseverversutenentpatereeeecccceecceccgec 293
`K. Takahashi, K. Yamamori, S. Horiguchi, and |. Yoshihara
`An Extreme Value Injection Approach with Reduced Learning Time to Make
`MLNs Multiple-Weight-Fault Tolerant...0......scsesssscsssssssesssvesvstssssseseteessssstttteeeee 301
`|. Takanami and Y. Oyama
`
`Session 7B: Practical Reports
`Event Log Based Dependability Analysis of Windows NT and 2K SysteMS 2.0.00... cceeceeceee. 311
`C. Simache, M. Kaaniche, and A. Saidane
`Using VHDL-BasedFault Injection to Exercise Error Detection Mechanisms
`In the Time-Triggered Architecture... o.oo... ecoccccssesesusssssesiseeteteecesses 316
`J. Gracia, D. Gil, J. C. Baraza, and P. J. Gil
`Formal Verification of a Pipelined Processor with New Memory Hierarchy
`Using a Commercial Model Checker..........cescccccessssssssivevesssteeee sss itteeee 321
`H. Nakamura, T. Arai, and M. Fujita
`
`
`
`AUEMOK INdOX.......ssooeeeeseessssssnnstscssenessnsnssssssonnstsenseessispsussenssesssetsssssesstpitsasesseeseeeeceececcccecc 325
`
`~HPE, Exh. 1013, p. 7
`
`HPE, Exh. 1013, p. 7
`
`

`

`
`
`_ An Adaptive High-Low Water MarkDestage Algorithm for Cached RAIDS
`
`Young Jin Nam and Chanik Park
`Department of Computer Science and Engineering/PIRL
`Pohang University of Science and Technology
`Kyungbuk, Republic of Korea
`{yjnam,cipark} @postech.ac.kr
`
`
`
`Second, it keeps the cache occupancylevel as high as pos-
`sible to increase the write cachehit ratio. In addition, keep-
`ing the cache occupancylevel high may reduce the number
`of disk I/Os required for destaging. Third, it should avoid
`cache overflow, i.e., maintain a sufficient number of clean
`or free cache entries. Observe that a trade-off should be
`madeto pursuethe last two properties at the same time. In
`addition, the following two issues ought to be taken into
`consideration to meet the properties noted above: how to
`elaborately determine its thresholds to managethe destag-
`ing process(e.g., starting or stopping destaging), and how
`to decide whichdirty cacheentries are to be destaged. This
`paper mainly emphasizesthe first issue: how to efficiently
`determine the destaging thresholds.
`
`Two destage algorithms have been available in terms
`of manipulating destaging thresholds:
`the High-Low Wa-
`ter Mark (HLWM)algorithm [1] and the Linear Threshold
`(LT) algorithm [7]. The HLWMalgorithm is based on two
`destaging thresholds:
`the High Water Mark (HWM)and
`the Low Water Mark (LWM). It starts the destage process
`whenthe current cache occupancy level goes up to HWM.
`Conversely, it stops destaging when the cache occupancy
`level goes down to LWM. TheLTalgorithm defines more
`than two thresholds. As the current cache occupancylevel
`reaches each higherthreshold,it accelerates the destaging
`processby including more dirty entries in a single destaging
`write operation. Even though this algorithm outperforms
`the HLWMalgorithm, it requires a complicated operation
`to compute the destaging costs for all the dirty entries at
`every disk head move. Observe that the HLWM andLTal-
`gorithmsdefine their destaging thresholdsas time-invariant
`cache occupancylevels. Next, the thresholds remain un-
`changedin the presence of a time-varying I/O workload.
`However,it is desirable to set HWM and LWMto higher
`levels under a light I/O workload in order to increase the
`write cache hit ratio. Conversely, it is required that HWM
`and LWM beset to lower levels under a heavy I/O work-
`load, so as to prepare a sufficient numberof free or clean
`cache entries.
`In sum, we claim that the opportunity ex-
`
` The High-Low Water Mark destage (HLWM) algorithm
`
`Abstract
`
`is widely used to enable a cached RAIDStoflush dirty data
`from its write cache to disks. It activates and deactivates a
`destaging process based on two time-invariant thresholds
`which are determined by cache occupancy levels. How-
`ever, the opportunity exists to improve I/O throughput by
`adaptively changing the thresholds. This paper proposes
`an adaptive HLWM algorithm which dynamically changes
`its thresholds according to a varying I/O workload. Two
`thresholds are defined as the multiplication of changing
`rates of the cache occupancy level and the time required
`to fill and empty the cache. Performance evaluations with a
`cached RAIDSsimulatorrevealthat the proposed algorithm
`outperforms the HLWMalgorithm in terms ofread response
`time, write cachehit ratio, anddisk utilization.
`
`1. Introduction
`
`Despite its prevalence, RAIDS [4] has suffered from a
`small write problem, which refers to the phenomenonthat
`RAIDS requires four disk accesses to write a single disk
`block: old parity/data reads and new parity/data writes. To
`overcomethe small write problem in RAIDS, several solu-
`tions have been proposed [1, 2, 5, 6, 7]. One utilizes the
`non-volatile write cache [1]. Since the service of a write
`request entails writing data onto the write cache, it can be
`performed quickly. The written(i.e., dirty) data in the cache
`should be flushed into physical disks when the cache occu-
`pancylevelarrives at a pre-defined threshold. The operation
`offlushing dirty data is called a destaging process, which is
`usually initiated by a destage scheduler or a destage algo-
`rithm.
`The desirable properties that a destage algorithm encom-
`passesare as follows.First, it has no effect on the response
`time of read requests. This implies that the destaging pro-
`cess is initiated when read requests are no longerin service.
`
`0-7695-1852-4/02 $17.00 © 2002 IEEE
`
`177
`
`
`HPE, Exh. 1013, p. 8
`
`HPE, Exh. 1013, p. 8
`
`

`

`
`
`ists to improve the I/O throughput by adaptively changing
`HWM and LWM accordingto a varying I/O workload.
`In this paper, we propose an Adaptive High-Low Water
`Mark destage (AHLWM)algorithm which dynamically de-
`terminesits thresholds accordingto a varying I/O workload.
`Thethresholdsare defined as the multiplication of changing
`rates of the cache occupancy level and the time required to
`fill and empty the cache. We prove that the AHLWMal-
`gorithm outperforms the HLWM algorithm in termsof read
`response time, write cachehit ratio, and disk utilization via
`performanceevaluations on our cached RAIDS simulator.
`This paper is organized as follows: Section 2 describes the
`AHLWMalgorithm andSection 3 evaluates its performance
`by comparison with the HLWM algorithm and then dis-
`cusses anissue related to a bursty /O workload. Finally,
`we concludethis paper with Section 4.
`
`2. Adaptive High-Low Water Mark Destage
`Algorithm
`
`This section describes the Adaptive High-Low Water
`Mark destage algorithm and then givesillustrative exam-
`ples to explain its operation under various workload envi-
`ronments and configuration parameters.
`
`Figure 1 compares a High-
`Algorithm Description:
`Low Water Mark destage (briefly HLWM)algorithm and
`an Adaptive High-Low Water Mark destage (AHLWM)al-
`gorithm in terms of managing their thresholds. Typically,
`
`
`
`
`
` Pus
`
`Ss
`
`ie %
`
`|
`
`
`
`
`TZ
`
`Pult)S
`
`PLOS &
`
`il
`
`(a) HLWMalgorithm
`
`(b) AHLWM algorithm
`
`Figure 1. Comparisons of two thresholds of
`the HLWM algorithm and the AHLWM algo-
`rithm
`
`the HLWM algorithm defines two static threshold values —
`High Water Mark (HWM)and Low Water Mark (LWM)by
`using cache occupancylevels, as shownin Figure 1(a). The
`
`178
`
`
`
`HWM(Hstatic) and LWM (Letatic) are written as
`
`Hstatic
`Letatic
`
`puS
`PLS,
`
`(1)
`(2)
`
`where S is the cache size and 0 < pz < pH < 1. Ob-
`serve that both HWM and LWM of the HLWM algorithm
`remain unchanged during its operation undera time-variant
`V/O workload.
`While the HWM and LWM are time-invariant in the
`HLWM algorithm,they are time-variant in the AHLWM al-
`gorithm, as shownin Figure 1(b). The HLM and LWM of
`the AHLWM algorithm can be written as follows:
`
`Hadaptive (t)
`Ladaptive (t)
`
`PH (t)S,
`PL (t)S.
`
`(3)
`(4)
`
`Next, we will drive the time-variant py(t) and pz(t). Let
`Ag and zp represent base increasing and decreasing rates
`of the cache occupancylevel. Note that Ag is related to
`the increasing rate of the cache occupancylevel, which ac-
`counts for the write cachehit andis not directly proportional
`to the arrival rate of write I/O requests. Let the current in-
`creasing and decreasing rates of the cache occupancylevel
`at time ¢ be A(t) and p(é), In addition, we define an effec-
`tive decreasing rate of the cache occupancy levelat timet,
`a(t) = p(t) — A(t). Similarly, ag = pp — AB. Then,let
`us define Ty and Ty, which representthe times required to
`fill and empty the cacheas follows:
`Tq = ql is pH) S,
`AB
`PL
`(6)
`Xp
`the following equations should be
`
`Tr
`Given X(t) and a(t),
`valid:
`
`(5)
`
`At)TH + px(t)S = S,
`px(t)S —Tra(t) = 0.
`
`(7)
`(8)
`
`Finally, the time-variant px(t) and pz(t) are written as:
`pu(t) = maz{0,1-(1- pay,
`pi(t) = min{1,mar{0,pr}},
`
`AB
`
`(9)
`(10)
`
`Note that p;,(t) can be smaller than pz (t) undera rare con-
`dition. To avoid this, the following condition has to be
`addedin its implementation: pr (t) = px(t) if px(t) <
`pr(t).
`look into the useful properties of the
`Next, we will
`AHLWM algorithm associated with the current increasing
`and decreasing rates. To begin, we can easily see that the
`following lemmaholds withoutproof.
`
`HPE, Exh. 1013, p. 9
`
`HPE, Exh. 1013, p. 9
`
`

`

` Lemma 1 Given X(t) = Ag and p(t) = pp at any time
`
`Corollary 1 adaptive (€) > Ladaptive (t) holds at any time
`5

`i
`_ i=
`t is satisfied if Xp =i
`
`If u(t) = wR*” in Equation (11) of Theorem 1,
`Proof:
`the upper and lower bounds of Ag converge into the point
`of Asp = oe Lp Which always makes the condition
`hold. Thus, this corollary follows.
`ff
`
`Corollary 2 Given 0 < p(t) < HB, Hadaptive(t) 2
`Ladaptive(t) holds at any time t if
`
`(1 — pu)uB* <AB <(1—-pr)eR™.
`
`(12)
`
`The Equation (12) can be obtained by simply re-
`Proof:
`placing k with 3°" in Equation (11) of Theorem 1. Thus,
`this corollary follows.
`ff
`
`Figure 2 demonstrates the ex-
`Illustrative Examples:
`pected adaptiveness of HWM and LWM by the AHLWM
`algorithm according to the variation of an I/O workload.
`In Figure 2(a), Hg becomes He — Any with a higher
`A(t) = Ap + dun mainlyattributed to a higher increas-
`ing rate of dirty data in the write cache. Conversely, 1,
`goes up to Hg + At with a lower increasing rate of X(t)
`by dyz. Similarly, Lp becomes Ly + Ary with a higher
`a(t) = ag + dp attributed to a higher rate of destag-
`ing data to disks or a lowerincreasingrate of dirty data in
`the write cache, as shown in Figure 2(b). Conversely, C3
`goes down to Lg — Arr with the decrease of a(t) by dzz.
`Observe that Equation (5)-(8) remain valid under-any /O
`workloadlevel.
`
`cache occupancy level
`
`
`
`(a) Variation of HWM
`
`(b) Variation of LWM
`
`3. Performance Evaluations under Cached
`RAIDS5
`
`For performanceevaluations of the AHLWM algorithm,
`we have implementedit within a cached RAIDSsimulator
`which employs a write cache. Then, we compare the per-
`formanceof the proposed algorithm with the HLWM algo-
`rithm. We begin by describing the cached RAIDS simulator.
`
`3.1. Simulation Environments
`
`The cached RAIDSsimulator consists of a host I/O re-
`quest queue, a host I/O scheduler, RAID I/O processes, a
`volatile read cache, a non-volatile write cache, disk I/O re-
`quest queues,disk I/O schedulers, a destage process which
`includes different types of destage algorithms, and a syn-
`thetic I/O workload generator, as shown in Figure 3.
`
`Host I/O requests issued by
`Overall I/O Operations:
`a group of workload generatorsis first be placed into the
`
`t, the AHLWMalgorithm degenerates to the HLWM algo-
`rithm.
`
`the two thresholds of the AHLWM algo-
`Note that
`rithm are dynamically determined as the A(t) and jut)
`change. However, the following inequality should be met
`at any given time t for normal operations: Hadaptive(t) >
`Ladaptive(t).
`
`Theorem 1 Hadaptive(t) > Ladaptive(t) at any time t if
`
`(1—py)k < AB < up — pLk,
`
`(11)
`
`fora given p(t) =k, whereO<k < Pee
`
`Basically, Equation (11) can be derived from two
`Proof:
`boundary conditions in Equation (9) and (10). It is easily
`seen that Hadaptive(t) > Cadaptive(t) when A(t) = AB
`and p(t) = pp according to Lemma 1. Next, we will
`consider the two boundary conditions: 1) px(t)|,(4)=0 <
`px(t)|acey=0 = 1 and 2) px(t)|a(tyax 2 PL(t)laay=e = 9.
`Then, wecan derive the relation of Ap < ws — pik from
`the boundary condition 1. Also, the (1 — pa)k < Ag rela-
`tion can be obtained from the boundary condition 2. Since
`(1 — pu)k < up — pk, then we have k < ee oy It
`Figure 2. Variations of HWM and LWM accord-
`implies that the maximum value ofk, i.e., y(t) is confined
`ing to different levels of /O workloads, where
`toes5are called 3°". Also, we can show that the
`S represents the cache size
`two boundary conditions hold when k = p'%°*. Thus,this
`theorem follows.
`ff
`
`
`
`HPE, Exh. 1013, p. 10
`
`HPE, Exh. 1013, p. 10
`
`

`

`
`
`Write Cache
`
`Non-Volatile
`
`Figure 3. The internal structure of the cached
`RAIDS5simulator
`
`host I/O request queue in an FIFO manner. Next, the host
`V/O schedulerreceives a host I/O request from the host I/O
`request queue. Then,it spawnseither a RAID read I/O pro-
`cess or a RAID write I/O process, according to the request
`type. The RAID read I/O processsearchesits datafirst from
`the write cache and then from the read cache. If valid data
`resides in either of the two caches (a read cachehit), the
`cacheddata is immediately transferred to hosts with no disk
`V/O.If not (aread cache miss), the data is fetched from phys-
`ical disks into the read cache andis then returned to hosts.
`The RAID write I/O process writes its data simply into the
`write cache and notifies its host of the completionof the cur-
`rent write requestservice. Later, the destage processflushes
`the newly written data called dirty data (entries) into phys-
`ical disks. The behavior of the destage process will be de-
`scribed later in detail.
`
`Each individual disk has a pair of queues which are the
`host request queue and the destage request queue. While
`disk I/O requests made from a read requestare placed into
`the host request queue ofa disk, the destage read/write /O
`requests issued by the destage process are lined up in the
`destage queueof a disk. Note that a host request queueis
`given a higherpriority in service than a destage queue,i.e.,
`the requests in the destage request queueare processed only
`while the host request queue becomes empty. In the event
`of a write cache overflow, the destage requests are placed
`into the host request queue. Eachdisk also contains its own
`VO scheduler. Even though a few efficient disk schedul-
`ing algorithmsare currently available, we employ a simple
`FIFO disk scheduling algorithm because performances of
`disk scheduling are not our concern.
`
`To be-
`Write I/O Operations and Destaging Process:
`gin, Figure 4 depicts a state diagram of a cache entry
`within the write cache.
`Initially,
`the state of each en-
`try is NOT_USED and INVALID. Whenthe entry is al-
`located for storing dirty data from the host,
`its state is
`changed to IN_USE and VALID/DIRTY.If new dirty data
`arrives to the same location before the destaging of the
`dirty data, it is overwritten to the entry. This is called a
`write cache hit. Conversely, if new dirty data arrives dur-
`ing the destaging ofthe dirty data, it is stored separately
`by allocating different cache entries. At this point, its state
`is changed to IN-USE and VALID/DDIRTY. Subsequent
`dirty data at the samelocation will be overwritten, as previ-
`ously. Once the destaging ofa dirty data/entry completes,
`its state is changed to NOT-USED and VALID/CLEAN.In
`addition, if an associate cache entry of the IN_USE and
`VALID/DDIRTYstate exists, then its state is changed to
`IN_USE and VALID/DIRTY.
`
`INVALID
`
`anew request
`write cache hit(overwrite)
`a new requesUdestaging
`subsequentwrite cache hit (overwrite)
`
`destage done
`
`anew request
`write cache miss
`
`
`a new request
`write cache miss
`
`a new request/destaging
`first write cache hit
`
`
`
`
`
`Figure 4. Stage diagram of cache entries
`within the write cache by the destage process
`
`Next, we explain the destage process which flushesdirty
`write cache entries into physical disks in the background.
`Note that the old data and parity blocks are read into the
`read cache. However, those entries are not locked in the
`read cache until the computation of its new parity blocks
`completes. This implies that these blocks can be replaced
`with others before the new parity is calculated. Newly com-
`putedparity blocksare overwrittenintothe old parity blocks
`in the read cache, not the write cache. Also, the write of
`the new parity blocks are not delayed.
`In the presence of
`write cache overflows implying that no available cache en-
`tries exist for the newly arrived data from the host, subse-
`quent write requests bypass the write cache, i.e., the write
`requests are treated as in the traditional RAIDS. Finally,
`host read requests are given a higherpriority than destage
`requests. Then, the destage processstarts when the current
`occupancyreaches its HWM andthen stops whenthecur-
`rent occupancy goes downto its LWM.The destage process
`first scans the write cache andthenselects a dirty cache en-
`
`180
`
`HPE, Exh. 1013, p. 11
`
`HPE, Exh. 1013, p. 11
`
`

`

`try not currently in destaging. Then, it decidesif the old data
`and parity blocks of the entry reside in the read cache.If not,
`it issues destage read requests for the missing blocks. Af-
`ter readingall the old data and parity blocks by the destage
`reads, it computes the new parity and generates two destage
`write requests for the new data and parity. Note that a pos-
`sibility exists that the old blocks in the read cachecan bere-
`placed while the destage read requests of missing blocksare
`in progress. Thus, the destage process goes through another
`checking process to determine whether all the blocks are
`still there. When the destage write requests are completed,
`the state of the dirty cache entry is changedinto clean, as
`previously described.
`
`Monitoring Interval|400 msec
`
`RAID Size
`Numberof Disks
`Read Cache Size
`Write Cache Size
`Cache Entry Size
`Read/Write Ratio
`PH
`PL
`LB
`AB
`
`4.19 GB
`5 (data:4, parity: 1)
`16 MB (2048 cacheentries)
`8 MB (1024 cacheentries)
`8 KB
`4:1
`0.7
`0.3
`70 IOPS
`49 IOPS (0.7'3°")
`
`slightly better than the HLVM algorithm. The performance HPE, Exh. 1013, p. 12
`
`Basically,
`V/O Workload and Simulation Parameters:
`we employ synthetically generated /O workloads through-
`out our performance evaluations. Each workload genera-
`tor models an I/O application which issues an V/O request
`and then waits for its completion in a synchronous manner.
`Weassumethat each I/O application generates 8KB read or
`write requests with one percent of spatial locality, where the
`read and write ratio is set to 4:1. Using realistic I/O work-
`loads is one of our future work.
`Table 1 shows configuration parameters which are used
`in our cache RAIDS simulator. The simulated RAID system
`consists offive(5) disks, four of which are usedfor data and
`one for parity. Since the read cacheis larger than the write
`cache,thesize of the read cacheis two times larger than that
`of the write cache. The cacheentry size of each cacheisset
`to 8KB.
`In the experiments,the static cache occupancylevels for
`HWM and LWM aresetto 0.7 and 0.3 of the whole write
`cachesize, respectively. Note that these static threshold val-
`ues are generally determined in an ad-hoc mannerindepen-
`dently of a time-varying /O workload. Recall that it is the
`main motivation ofthis paper. The configuration of the base
`increasing and decreasing rates of the cache occupancylev-
`els takes the following twosteps. First, the base decreasing
`rate uy°* was empirically obtained from our simulator un-
`der the condition that no intervention of host read requests
`exists. Thus, we obtained 70 IOPS with 8KB requests. Sec-
`ond, the inequality of Corollary 2 is applied to the obtained
`pig?? value in order to decide the value of Ag. Our ex-
`periments takes the largest number of Ag by multiplying
`1 — pz(= 0.7) with 70. Finally, the current changing rates
`of the cache occupancylevel are monitored every 400 msec.
`Also, the independentdisk features 512-byte sector, 1962
`cylinders , 19 tracks per cylinder, 72 sectors pertrack, and
`4002 rpm.
`
`Table 1. Configuration parameters of our
`cached RAID5 simulator
`
`3.2. Simulation Results
`
`The I/O performancesof the HLWM and AHLWMalgo-
`rithmsare evaluatedin terms of the response time of reads,
`write cache hit ratio, and disk utilization with an increase
`of I/O throughput. The response time of read requests has
`become moresignificant, because fast writes were achieved
`with non-volatile memory. This response time can be af-
`fected by disk utilization, especially the number of destage
`I/Os, and write cache overflows, as well as the read cachehit
`ratio. The high level of the write cache occupancy enhances
`the write cache hit ratio, resultantly reducing the response
`time of reads. Disk utilization is dependent mainly on the
`numberof read and destage I/O requests. Of these, only
`destage I/O requests are directly related to an underlying
`destage algorithm.
`
`Fig-
`V/O Performance with Different I/O Workloads:
`ures 5—6 show the variation of read response time, write
`cache hit ratio, the number ofdisk I/Os issued by destage
`requests (briefly called destage I/Os), and disk utilization
`of the HLWM algorithm and the AHLWMalgorithm with
`the increase of I/O throughput. Throughoutall experiments,
`the base increasing rate of the cache occupancy Ap and the
`base decreasing rate of the cache occupancy pp are set to
`49(0.71p) IOs/sec and 70 IOs/sec. As shownin Figures 5—
`6, the AHLWM algorithm outperforms the HLWM algo-
`rithm. The performance differences are maximized espe-
`cially undera light I/O workload in that its A(t)is relatively
`lower than the base value. As the I/O workloadincreases,
`however, the performances of both algorithms becomeal-
`most equal, even though the AHLWM algorithm works
`
`HPE, Exh. 1013, p. 12
`
`

`

`
`
`
`
`
`
`
`
`
`
`
`
`5
`
`
`
`
`
`
`70
`
`80
`
`110
`JOPS
`
`120
`
`130
`
`140
`
`150
`
`couSs
`
`
`
` WriteHitRatio(%) ReadRespon:
`
`oNBDOM
`
`(a) Read response time
`
`(b) Write cachehit ratio
`
`Figure 5. Variations of read responsetime and write hit ratio of the HLWM algorithm and the AHLWM
`algorithm with the increaseof I/O throughput: (a) read responsetime,(b) write cachehit ratio, where
`pu = 0.7, pr = 0.3, Ag = 49 lOs/sec, and zz = 70 lOs/sec
`
`enhancementsof read response time in the AHLWM algo-
`rithm are primarily due to time-varying, adaptive thresh-
`olds,
`i.e., both HWM and LWM at eachtime are deter-
`mined based onthe current increasing and decreasing rates
`of the cache occupancylevel. Under a light I/O workload,
`the AHLWM

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket