`
`PROCEEDINGS
`
`AR
`
`
`Wz
`2002 Pacific Rim
`International Symposium on
`Dependable Computing |
`
`
`
`
`
`COMPUTER
`SOCIETY
`
`IEEE
`
`Sponsored by
`IEEE Computer Society Technical Committee on Fault-Tolerant Computing
`IEICE Technical Group on Dependable Computing
`
`EiC
`
`HPE, Exh. 1013,p.1
`
`HPE, Exh. 1013, p. 1
`
`
`
`Proceedings
`
`
`
`2002 Pacific Rim International Symposium on
`Dependable Computing
`
`PRDC 2002
`
`16-18 December 2002 e Tsukuba, Japan
`
`Sponsored by
`IEEE Computer Society Technical Committee on Fault-Tolerant Computing
`IEICE Technical Group on Dependable Computing
`
`In cooperation with
`IFIP WG10.4 on Dependable Computing and Fault Tolerance
`Jet Propulsion Laboratory
`
`mx)
`
`COMPUTER
`SOCIETY
`
`IEEE
`Los Alamitos, California
`Washington
`+
`Brussels
`+
`
`Tokyo
`
`
`
`HPE, Exh. 1013, p. 2
`
`HPE, Exh. 1013, p. 2
`
`
`
`A)69 _
`os 33 ro 34
`
`o IO A
`
`Copyright © 2002 by TheInstitute of Electrical and Electronics Engineers, Inc.
`All rights reserved
`
`Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries may
`photocopy beyond the limits of US copyright law, for private use of patrons, those articles in this volume
`that carry a codeat the bottom ofthe first page, provided that the per-copyfee indicated in the code is paid
`through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923.
`
`Other copying, reprint, or republication requests should be addressed to: IEEE Copyrights Manager, IEEE
`Service Center, 445 Hoes Lane, P.O. Box 133, Piscataway, NJ 08855-1331.
`
`The papers in this book comprise the proceedings of the meeting mentioned on the cover andtitle page.
`They reflect the authors’ opinions and, in the interests oftimely dissemination, are published as presented
`and without change. Their inclusion in this publication does not necessarily constitute endorsement by the
`editors, the IEEE Computer Society, or the Institute ofElectrical and Electronics Engineers, Inc.
`
`IEEE Computer Society Order Number PRO1852
`ISBN 0-7695-1852-4
`Library of Congress Number 20021 13942
`
`Additional copies may be orderedfrom:
`
`IEEE Computer Society
`CustomerService Center
`10662 Los VaquerosCircle
`P.O. Box 3014
`Los Alamitos, CA 90720-1314
`Tel: + 1 800 272 6657
`Fax: + 1 714 821 4641
`http://computer.org/
`csbooks@computer.org
`
`IEEE Service Center
`445 Hoes Lane
`P.O. Box 1331
`Piscataway, NJ 08855-1331
`Tel: + 1 732 981 0060
`Fax: + 1 732 981 9667
`http://shop.ieee.org/store/
`customer-service@ieee.org
`
`IEEE Computer Society
`Asia/Pacific Office
`Watanabe Bldg., 1-4-2
`Minami-Aoyama
`Minato-ku, Tokyo 107-0062
`JAPAN
`Tel: + 813 3408 3118
`Fax: + 81 3 3408 3553
`tokyo.ofc@computer.org
`
`Editorial production by Anne Jacobs
`Coverart production by Joe Daigle/Studio Productions
`Printed in the United States of America by The Printing House
`
`ABD
`COMPUTER
`SOCIETY
`
`IEEE
`
`HPE, Exh. 1013, p. 3
`
`HPE, Exh. 1013, p. 3
`
`
`
`
`
`Table of Contents
`2002 Pacific Rim International Symposium on Dependable Computing
`
`General Chair’s MeSSage@.................:ccccesscceesceeeceeeseeeesseseeetieeneenneseesereaesaetsseaeciessseeesesersseesenerees ix
`Program Co-Chairs’ Message.................ccccccecseseeeeseeeetenesaeessaeresseteeessseasseseneneesanenenarieeeeerieney Xx
`Organizing COMMITEE... -ecvec neve ennnncnennnnnestbianntaniys cia niedeaeaessunesRaueatanne meen eee teeuneenenee concen xi
`Steering Committee ..................sccecsensnesssssensssenenenseeeteesssseeeeeesesensesrseesserenenenereneneeneensennnenesxii
`Program Committee.....................ccccccccce eee e tee ceeeeeeeneeneeneeesenenseseesaaeesnarseessenssneneenenieserneneeny xiii
`REVICWESS..........00ccccccceesscecesseececeneecseeeeeeeeeeeseneeeceneeseenieeeceerseseseeseeeeseteeenieeestaaersneeeseiesenseeeeeeniesxiv
`
`=
`
`Keynote Speech(1)
`Fault Tolerance in Autonomic Computing Environmennt..................:ccccceeecseeeceeecieeeettteeeeeeenneaes 3
`Y. Tohma
`
`Session 1A: Dependable Distributed Systems(1)
`Formal Specification and Verification of a Group Membership Protocol
`for an Intrusion-Tolerant Group Communication System. .............c.:ccccccee tenn eteteeeeneeeeeenes 9
`H. V. Ramasamy, M. Cukier, and W. H. Sanders
`AsynchronousActive Replication in Three-tier Distributed Systems...............-.-. eters 19
`R. Baldoni, C. Marchetti, and S. Tucci Piergiovanni
`
`Session 1B: Checkpointing
`On Characteristics of DEF Communication-Induced Checkpointing Protocols...................05 29
`J. Tsai and J.-W. Lin
`A Low Overhead Checkpointing Protocol for Mobile Computing SystemS.............::: essere 37
`C.-Y. Lin, S.-C. Wang, S.-Y. Kuo, and I.-Y. Chen
`The Cost of Checkpointing, Logging and Recovery for the Mobile Agent Systems................... 45
`H. Kim, H. Y. Yeom,T. Park, and H. Park
`
`Session 2A: Dependable Systems(1)
`Analysis of the Effects of Real and Injected Software Faults: Linux as a Case Study............... 51
`T. Jarboui, J. Arlat, Y. Crouzet, K. Kanoun, and T. Marteau
`Principles of Multi-Level Reflection for Fault Tolerant Architectures..............cccceceecee eerie: 59
`F. Taiani, J.-C. Fabre, and M.-O.Killijian
`Hardware/Software Co-Reliability of Configurable Digital Systems..............-.:.: settee 67
`M. Choi, N. Park, Y. Kim, and F. Lombardi
`Highly Fault-Tolerant FPGA Processor by Degrading Strategy .............:ccceccseeentierreees 75
`Y. Nakamura and K. Hiraki
`
`LINDA HALL LIBRARY
`Kansas City, Mo.
`
`HPE, Exh. 1013, p. 4
`
`HPE, Exh. 1013, p. 4
`
`
`
`
`
`
`
`Session 2B: Reliability/ Dependability Analysis
`Using Software Implemented Fault Inserterin Dependability AnalySis 00.000...cececcsccscccsceseeosees..81
`P. Gawkowski and J. Sosnowski
`Analyzing Network Reliability with Imperfect Nodes using OBDD.............:ccceecccsccsssseeeceeesssc.89
`F.-M. Yeh, H.-Y. Lin, and S.-Y. Kuo
`Reliability Analysis of Grid Computing Systems.........sesseesessseeecsssssssestsessssessstssseseececcccc97
`Y. S. Dai, M. Xie, and K. L. Poh
`A Control Theory Approachfor Analyzing the Effects of Data Errors
`in Safety-Critical Control Systems
`O. Askerdal, M. G&fvert, M. Hiller, and N. Suri
`
`105
`
`Keynote Speech(II)
`Caveat Emptor: Making Grid Services Dependable from the Client Side....eeeccccccccccceece 117
`M.Livny and D. Thain
`
`Session 3A: Dependable Distributed Systems(II)
`An Introduction to the Renaming Problem......sscssecccsssssssssseetssessscsssssssesttstssatsesssseseseescccs 121
`M. Raynal
`Passive Replication Schemes in AQUA .....-.ssstssatetnnctnttustananunetieeeeeeecc. 125
`Y. Ren, P. Rubel, M. Seri, M. Cukier, W. H. Sanders, and T. Courtney
`Detecting Feature Interactionsin Telecommunication Services with a SAT
`Solver
`T. Tsuchiya, M. Nakamura, and T. Kikuno
`
`131
`
`Session 3B: Dependable Networks
`Fault-Tolerant Properties of Generalized Hierarchical Completely-Connected
`MTCeternaatinhennerenetaes-agunmrmtetaoeeeemmecsesens 137
`T. Takabatake, M. Kitakami, and H.Ito
`EnergyEfficient and Robust Multicast Protocol for Mobile Ad Hoc NetworkS......ccecsceecco...-. 145
`S. Moh, C. Yu, B. Lee, and H. Y. Youn
`Configurable PC Clusters Using a Hierarchical Complete-Connection-Based
`Switching Network .cicocsersitetntintitenssitenatistatiatnintiunitittinirite. 153
`N. Tsuda
`
`Keynote Speech (11)
`Perspectives on Dependable Computing for Solar System Exploration...ceccccsccccescecese.. 163
`
`L. Alkalai
`
`vi
`
` ( HPE, Exh. 1013, p. 5
`
`HPE, Exh. 1013, p. 5
`
`
`
`
`
`Session 4A: Dependable Systems(II)
`
`Enhancing Access Control with SysGuard, a Reference Monitor Supporting
`Portable and Composable Kernel Module..................:::ccccccecceeeeceeeteeeeteeneeeeeeetnteeeeeseeseeeeeens 167
`Y. Shinjo, K. Eiraku, A. Suzuki, K. Itano, and C. Pu
`
`An Adaptive High-Low Water Mark Destage Algorithm for Cached RAIDS..............000. 177
`Y. J. Nam and C. Park
`
`A Packet-Based Caching Proxy with Loss Recovery for Video Streaming.............cccceeees 185
`K.-S. Hsueh and S.-D. Wang
`
`Session 4B: Dependability Evaluation and Verification
`
`Reliability Evaluation of Multi-State Systems Subject to Imperfect
`Coverage Using OBDD........... cee eee cece eeeseceeecneeeceeeceescseesiseesseesseesceesieeerisesiieeeiaeeritees 193
`Y.-R. Chang, S. V. Amari, and S.-Y. Kuo
`Characterization of Operating Systems Behavior in the Presence of Faulty
`Drivers throtigh Software Fault Emulation wiciccvscvsseccssnssecsnuzccssseevesscesonennsasen sansnubauenersectsvens 201
`J. Durdes and H. Madeira
`
`Level Oriented Formal Model for Asynchronous Circuit Verification and
`Its Efficient Analysis Method .................ccccccceeceee sees cece enn eeessrnneeeeersnnieeeeseeeieeeesneeieeesenas 210
`T. Kitai, Y. Oguro, T. Yoneda, E. Mercer, and C. Myers
`
`Session 5A: Consensusin Distributed Systems
`Consensus in Synchronous Systems: A Concise Guided Tour .............cccccecceetitteeeeeeeen 221
`M. Raynal
`A Consensus Algorithm for SynchronousDistributed Systems using Mobile Agent................ 229
`J. Cao, X. Wang, S. Lo, and S. K. Das
`
`Session 5B: Coding
`
`Evaluation of Convolutional-Code-Based FEC underLimited Recovery Time
`and Its Application to Real-Time Transmission ............0ccccccceeseecseeeteeeceeeeeeetieeeentaeeeteneeeens 239
`M.Arai, H. Kurosu, S. Fukumoto, and K. Iwasaki
`
`A Class of Random Multiple Bits in a Byte Error Correcting (S,,EC) Codes
`for Semiconductor Meniory Systems... ...ssisersassnesn sanenmnmnnniens onnsaneaen eames eremomeneeseees 247
`G. Umanesanand E. Fujiwara
`
`Session 6A: Dependable Database
`Providing Persistence for Sensor Streams with Light Neighbor WAL..............::::ceeeeees 257
`H. Kawashima, M. Toyama, M. Imai, and Y. Anzai
`
`Definition of Faultloads Based on Operator Faults for DMBS Recovery
`B@ACHIMAIKING) 0. cvnnesennenien arnenene ied da dansiciens aun eeang cab leimNnN avemmere cease ene nun teMaN even stnpens ennmemce ens eoenune ne 265
`M. Vieira and H. Madeira
`
`vii
`
`HPE, Exh. 1013, p. 6
`
`
`
`HPE, Exh. 1013, p. 6
`
`
`
`Session 6B: Test
`An Alternative Method of Generating Tests for Path Delay Faults Using
`NrDetection Test Sets ..........sescecessssssseesssssitsstsssssssessitsssnuasuivssisssssseeseeeeeeeeececc 275
`H. Takahashi, K. K. Saluja, and Y. Takamatsu
`Enhancing Delay Fault Testability for Iterative LOgiC Array$oo...ccceicceseeseececesseeececeeeeeececee. 283
`S.-K. Lu and C.-H. Yeh
`
`Session 7A: Neural Network
`Comparison with Defect Compensation Methods for Feed-Forward Neural
`NEWFIE. scan snnesnsonusemvenenesntencsshtahinsanceeoeeeeeneunihiassthdansncerseverversutenentpatereeeecccceecceccgec 293
`K. Takahashi, K. Yamamori, S. Horiguchi, and |. Yoshihara
`An Extreme Value Injection Approach with Reduced Learning Time to Make
`MLNs Multiple-Weight-Fault Tolerant...0......scsesssscsssssssesssvesvstssssseseteessssstttteeeee 301
`|. Takanami and Y. Oyama
`
`Session 7B: Practical Reports
`Event Log Based Dependability Analysis of Windows NT and 2K SysteMS 2.0.00... cceeceeceee. 311
`C. Simache, M. Kaaniche, and A. Saidane
`Using VHDL-BasedFault Injection to Exercise Error Detection Mechanisms
`In the Time-Triggered Architecture... o.oo... ecoccccssesesusssssesiseeteteecesses 316
`J. Gracia, D. Gil, J. C. Baraza, and P. J. Gil
`Formal Verification of a Pipelined Processor with New Memory Hierarchy
`Using a Commercial Model Checker..........cescccccessssssssivevesssteeee sss itteeee 321
`H. Nakamura, T. Arai, and M. Fujita
`
`
`
`AUEMOK INdOX.......ssooeeeeseessssssnnstscssenessnsnssssssonnstsenseessispsussenssesssetsssssesstpitsasesseeseeeeceececcccecc 325
`
`~HPE, Exh. 1013, p. 7
`
`HPE, Exh. 1013, p. 7
`
`
`
`
`
`_ An Adaptive High-Low Water MarkDestage Algorithm for Cached RAIDS
`
`Young Jin Nam and Chanik Park
`Department of Computer Science and Engineering/PIRL
`Pohang University of Science and Technology
`Kyungbuk, Republic of Korea
`{yjnam,cipark} @postech.ac.kr
`
`
`
`Second, it keeps the cache occupancylevel as high as pos-
`sible to increase the write cachehit ratio. In addition, keep-
`ing the cache occupancylevel high may reduce the number
`of disk I/Os required for destaging. Third, it should avoid
`cache overflow, i.e., maintain a sufficient number of clean
`or free cache entries. Observe that a trade-off should be
`madeto pursuethe last two properties at the same time. In
`addition, the following two issues ought to be taken into
`consideration to meet the properties noted above: how to
`elaborately determine its thresholds to managethe destag-
`ing process(e.g., starting or stopping destaging), and how
`to decide whichdirty cacheentries are to be destaged. This
`paper mainly emphasizesthe first issue: how to efficiently
`determine the destaging thresholds.
`
`Two destage algorithms have been available in terms
`of manipulating destaging thresholds:
`the High-Low Wa-
`ter Mark (HLWM)algorithm [1] and the Linear Threshold
`(LT) algorithm [7]. The HLWMalgorithm is based on two
`destaging thresholds:
`the High Water Mark (HWM)and
`the Low Water Mark (LWM). It starts the destage process
`whenthe current cache occupancy level goes up to HWM.
`Conversely, it stops destaging when the cache occupancy
`level goes down to LWM. TheLTalgorithm defines more
`than two thresholds. As the current cache occupancylevel
`reaches each higherthreshold,it accelerates the destaging
`processby including more dirty entries in a single destaging
`write operation. Even though this algorithm outperforms
`the HLWMalgorithm, it requires a complicated operation
`to compute the destaging costs for all the dirty entries at
`every disk head move. Observe that the HLWM andLTal-
`gorithmsdefine their destaging thresholdsas time-invariant
`cache occupancylevels. Next, the thresholds remain un-
`changedin the presence of a time-varying I/O workload.
`However,it is desirable to set HWM and LWMto higher
`levels under a light I/O workload in order to increase the
`write cache hit ratio. Conversely, it is required that HWM
`and LWM beset to lower levels under a heavy I/O work-
`load, so as to prepare a sufficient numberof free or clean
`cache entries.
`In sum, we claim that the opportunity ex-
`
` The High-Low Water Mark destage (HLWM) algorithm
`
`Abstract
`
`is widely used to enable a cached RAIDStoflush dirty data
`from its write cache to disks. It activates and deactivates a
`destaging process based on two time-invariant thresholds
`which are determined by cache occupancy levels. How-
`ever, the opportunity exists to improve I/O throughput by
`adaptively changing the thresholds. This paper proposes
`an adaptive HLWM algorithm which dynamically changes
`its thresholds according to a varying I/O workload. Two
`thresholds are defined as the multiplication of changing
`rates of the cache occupancy level and the time required
`to fill and empty the cache. Performance evaluations with a
`cached RAIDSsimulatorrevealthat the proposed algorithm
`outperforms the HLWMalgorithm in terms ofread response
`time, write cachehit ratio, anddisk utilization.
`
`1. Introduction
`
`Despite its prevalence, RAIDS [4] has suffered from a
`small write problem, which refers to the phenomenonthat
`RAIDS requires four disk accesses to write a single disk
`block: old parity/data reads and new parity/data writes. To
`overcomethe small write problem in RAIDS, several solu-
`tions have been proposed [1, 2, 5, 6, 7]. One utilizes the
`non-volatile write cache [1]. Since the service of a write
`request entails writing data onto the write cache, it can be
`performed quickly. The written(i.e., dirty) data in the cache
`should be flushed into physical disks when the cache occu-
`pancylevelarrives at a pre-defined threshold. The operation
`offlushing dirty data is called a destaging process, which is
`usually initiated by a destage scheduler or a destage algo-
`rithm.
`The desirable properties that a destage algorithm encom-
`passesare as follows.First, it has no effect on the response
`time of read requests. This implies that the destaging pro-
`cess is initiated when read requests are no longerin service.
`
`0-7695-1852-4/02 $17.00 © 2002 IEEE
`
`177
`
`
`HPE, Exh. 1013, p. 8
`
`HPE, Exh. 1013, p. 8
`
`
`
`
`
`ists to improve the I/O throughput by adaptively changing
`HWM and LWM accordingto a varying I/O workload.
`In this paper, we propose an Adaptive High-Low Water
`Mark destage (AHLWM)algorithm which dynamically de-
`terminesits thresholds accordingto a varying I/O workload.
`Thethresholdsare defined as the multiplication of changing
`rates of the cache occupancy level and the time required to
`fill and empty the cache. We prove that the AHLWMal-
`gorithm outperforms the HLWM algorithm in termsof read
`response time, write cachehit ratio, and disk utilization via
`performanceevaluations on our cached RAIDS simulator.
`This paper is organized as follows: Section 2 describes the
`AHLWMalgorithm andSection 3 evaluates its performance
`by comparison with the HLWM algorithm and then dis-
`cusses anissue related to a bursty /O workload. Finally,
`we concludethis paper with Section 4.
`
`2. Adaptive High-Low Water Mark Destage
`Algorithm
`
`This section describes the Adaptive High-Low Water
`Mark destage algorithm and then givesillustrative exam-
`ples to explain its operation under various workload envi-
`ronments and configuration parameters.
`
`Figure 1 compares a High-
`Algorithm Description:
`Low Water Mark destage (briefly HLWM)algorithm and
`an Adaptive High-Low Water Mark destage (AHLWM)al-
`gorithm in terms of managing their thresholds. Typically,
`
`
`
`
`
` Pus
`
`Ss
`
`ie %
`
`|
`
`
`
`
`TZ
`
`Pult)S
`
`PLOS &
`
`il
`
`(a) HLWMalgorithm
`
`(b) AHLWM algorithm
`
`Figure 1. Comparisons of two thresholds of
`the HLWM algorithm and the AHLWM algo-
`rithm
`
`the HLWM algorithm defines two static threshold values —
`High Water Mark (HWM)and Low Water Mark (LWM)by
`using cache occupancylevels, as shownin Figure 1(a). The
`
`178
`
`
`
`HWM(Hstatic) and LWM (Letatic) are written as
`
`Hstatic
`Letatic
`
`puS
`PLS,
`
`(1)
`(2)
`
`where S is the cache size and 0 < pz < pH < 1. Ob-
`serve that both HWM and LWM of the HLWM algorithm
`remain unchanged during its operation undera time-variant
`V/O workload.
`While the HWM and LWM are time-invariant in the
`HLWM algorithm,they are time-variant in the AHLWM al-
`gorithm, as shownin Figure 1(b). The HLM and LWM of
`the AHLWM algorithm can be written as follows:
`
`Hadaptive (t)
`Ladaptive (t)
`
`PH (t)S,
`PL (t)S.
`
`(3)
`(4)
`
`Next, we will drive the time-variant py(t) and pz(t). Let
`Ag and zp represent base increasing and decreasing rates
`of the cache occupancylevel. Note that Ag is related to
`the increasing rate of the cache occupancylevel, which ac-
`counts for the write cachehit andis not directly proportional
`to the arrival rate of write I/O requests. Let the current in-
`creasing and decreasing rates of the cache occupancylevel
`at time ¢ be A(t) and p(é), In addition, we define an effec-
`tive decreasing rate of the cache occupancy levelat timet,
`a(t) = p(t) — A(t). Similarly, ag = pp — AB. Then,let
`us define Ty and Ty, which representthe times required to
`fill and empty the cacheas follows:
`Tq = ql is pH) S,
`AB
`PL
`(6)
`Xp
`the following equations should be
`
`Tr
`Given X(t) and a(t),
`valid:
`
`(5)
`
`At)TH + px(t)S = S,
`px(t)S —Tra(t) = 0.
`
`(7)
`(8)
`
`Finally, the time-variant px(t) and pz(t) are written as:
`pu(t) = maz{0,1-(1- pay,
`pi(t) = min{1,mar{0,pr}},
`
`AB
`
`(9)
`(10)
`
`Note that p;,(t) can be smaller than pz (t) undera rare con-
`dition. To avoid this, the following condition has to be
`addedin its implementation: pr (t) = px(t) if px(t) <
`pr(t).
`look into the useful properties of the
`Next, we will
`AHLWM algorithm associated with the current increasing
`and decreasing rates. To begin, we can easily see that the
`following lemmaholds withoutproof.
`
`HPE, Exh. 1013, p. 9
`
`HPE, Exh. 1013, p. 9
`
`
`
` Lemma 1 Given X(t) = Ag and p(t) = pp at any time
`
`Corollary 1 adaptive (€) > Ladaptive (t) holds at any time
`5
`é
`i
`_ i=
`t is satisfied if Xp =i
`
`If u(t) = wR*” in Equation (11) of Theorem 1,
`Proof:
`the upper and lower bounds of Ag converge into the point
`of Asp = oe Lp Which always makes the condition
`hold. Thus, this corollary follows.
`ff
`
`Corollary 2 Given 0 < p(t) < HB, Hadaptive(t) 2
`Ladaptive(t) holds at any time t if
`
`(1 — pu)uB* <AB <(1—-pr)eR™.
`
`(12)
`
`The Equation (12) can be obtained by simply re-
`Proof:
`placing k with 3°" in Equation (11) of Theorem 1. Thus,
`this corollary follows.
`ff
`
`Figure 2 demonstrates the ex-
`Illustrative Examples:
`pected adaptiveness of HWM and LWM by the AHLWM
`algorithm according to the variation of an I/O workload.
`In Figure 2(a), Hg becomes He — Any with a higher
`A(t) = Ap + dun mainlyattributed to a higher increas-
`ing rate of dirty data in the write cache. Conversely, 1,
`goes up to Hg + At with a lower increasing rate of X(t)
`by dyz. Similarly, Lp becomes Ly + Ary with a higher
`a(t) = ag + dp attributed to a higher rate of destag-
`ing data to disks or a lowerincreasingrate of dirty data in
`the write cache, as shown in Figure 2(b). Conversely, C3
`goes down to Lg — Arr with the decrease of a(t) by dzz.
`Observe that Equation (5)-(8) remain valid under-any /O
`workloadlevel.
`
`cache occupancy level
`
`
`
`(a) Variation of HWM
`
`(b) Variation of LWM
`
`3. Performance Evaluations under Cached
`RAIDS5
`
`For performanceevaluations of the AHLWM algorithm,
`we have implementedit within a cached RAIDSsimulator
`which employs a write cache. Then, we compare the per-
`formanceof the proposed algorithm with the HLWM algo-
`rithm. We begin by describing the cached RAIDS simulator.
`
`3.1. Simulation Environments
`
`The cached RAIDSsimulator consists of a host I/O re-
`quest queue, a host I/O scheduler, RAID I/O processes, a
`volatile read cache, a non-volatile write cache, disk I/O re-
`quest queues,disk I/O schedulers, a destage process which
`includes different types of destage algorithms, and a syn-
`thetic I/O workload generator, as shown in Figure 3.
`
`Host I/O requests issued by
`Overall I/O Operations:
`a group of workload generatorsis first be placed into the
`
`t, the AHLWMalgorithm degenerates to the HLWM algo-
`rithm.
`
`the two thresholds of the AHLWM algo-
`Note that
`rithm are dynamically determined as the A(t) and jut)
`change. However, the following inequality should be met
`at any given time t for normal operations: Hadaptive(t) >
`Ladaptive(t).
`
`Theorem 1 Hadaptive(t) > Ladaptive(t) at any time t if
`
`(1—py)k < AB < up — pLk,
`
`(11)
`
`fora given p(t) =k, whereO<k < Pee
`
`Basically, Equation (11) can be derived from two
`Proof:
`boundary conditions in Equation (9) and (10). It is easily
`seen that Hadaptive(t) > Cadaptive(t) when A(t) = AB
`and p(t) = pp according to Lemma 1. Next, we will
`consider the two boundary conditions: 1) px(t)|,(4)=0 <
`px(t)|acey=0 = 1 and 2) px(t)|a(tyax 2 PL(t)laay=e = 9.
`Then, wecan derive the relation of Ap < ws — pik from
`the boundary condition 1. Also, the (1 — pa)k < Ag rela-
`tion can be obtained from the boundary condition 2. Since
`(1 — pu)k < up — pk, then we have k < ee oy It
`Figure 2. Variations of HWM and LWM accord-
`implies that the maximum value ofk, i.e., y(t) is confined
`ing to different levels of /O workloads, where
`toes5are called 3°". Also, we can show that the
`S represents the cache size
`two boundary conditions hold when k = p'%°*. Thus,this
`theorem follows.
`ff
`
`
`
`HPE, Exh. 1013, p. 10
`
`HPE, Exh. 1013, p. 10
`
`
`
`
`
`Write Cache
`
`Non-Volatile
`
`Figure 3. The internal structure of the cached
`RAIDS5simulator
`
`host I/O request queue in an FIFO manner. Next, the host
`V/O schedulerreceives a host I/O request from the host I/O
`request queue. Then,it spawnseither a RAID read I/O pro-
`cess or a RAID write I/O process, according to the request
`type. The RAID read I/O processsearchesits datafirst from
`the write cache and then from the read cache. If valid data
`resides in either of the two caches (a read cachehit), the
`cacheddata is immediately transferred to hosts with no disk
`V/O.If not (aread cache miss), the data is fetched from phys-
`ical disks into the read cache andis then returned to hosts.
`The RAID write I/O process writes its data simply into the
`write cache and notifies its host of the completionof the cur-
`rent write requestservice. Later, the destage processflushes
`the newly written data called dirty data (entries) into phys-
`ical disks. The behavior of the destage process will be de-
`scribed later in detail.
`
`Each individual disk has a pair of queues which are the
`host request queue and the destage request queue. While
`disk I/O requests made from a read requestare placed into
`the host request queue ofa disk, the destage read/write /O
`requests issued by the destage process are lined up in the
`destage queueof a disk. Note that a host request queueis
`given a higherpriority in service than a destage queue,i.e.,
`the requests in the destage request queueare processed only
`while the host request queue becomes empty. In the event
`of a write cache overflow, the destage requests are placed
`into the host request queue. Eachdisk also contains its own
`VO scheduler. Even though a few efficient disk schedul-
`ing algorithmsare currently available, we employ a simple
`FIFO disk scheduling algorithm because performances of
`disk scheduling are not our concern.
`
`To be-
`Write I/O Operations and Destaging Process:
`gin, Figure 4 depicts a state diagram of a cache entry
`within the write cache.
`Initially,
`the state of each en-
`try is NOT_USED and INVALID. Whenthe entry is al-
`located for storing dirty data from the host,
`its state is
`changed to IN_USE and VALID/DIRTY.If new dirty data
`arrives to the same location before the destaging of the
`dirty data, it is overwritten to the entry. This is called a
`write cache hit. Conversely, if new dirty data arrives dur-
`ing the destaging ofthe dirty data, it is stored separately
`by allocating different cache entries. At this point, its state
`is changed to IN-USE and VALID/DDIRTY. Subsequent
`dirty data at the samelocation will be overwritten, as previ-
`ously. Once the destaging ofa dirty data/entry completes,
`its state is changed to NOT-USED and VALID/CLEAN.In
`addition, if an associate cache entry of the IN_USE and
`VALID/DDIRTYstate exists, then its state is changed to
`IN_USE and VALID/DIRTY.
`
`INVALID
`
`anew request
`write cache hit(overwrite)
`a new requesUdestaging
`subsequentwrite cache hit (overwrite)
`
`destage done
`
`anew request
`write cache miss
`
`
`a new request
`write cache miss
`
`a new request/destaging
`first write cache hit
`
`
`
`
`
`Figure 4. Stage diagram of cache entries
`within the write cache by the destage process
`
`Next, we explain the destage process which flushesdirty
`write cache entries into physical disks in the background.
`Note that the old data and parity blocks are read into the
`read cache. However, those entries are not locked in the
`read cache until the computation of its new parity blocks
`completes. This implies that these blocks can be replaced
`with others before the new parity is calculated. Newly com-
`putedparity blocksare overwrittenintothe old parity blocks
`in the read cache, not the write cache. Also, the write of
`the new parity blocks are not delayed.
`In the presence of
`write cache overflows implying that no available cache en-
`tries exist for the newly arrived data from the host, subse-
`quent write requests bypass the write cache, i.e., the write
`requests are treated as in the traditional RAIDS. Finally,
`host read requests are given a higherpriority than destage
`requests. Then, the destage processstarts when the current
`occupancyreaches its HWM andthen stops whenthecur-
`rent occupancy goes downto its LWM.The destage process
`first scans the write cache andthenselects a dirty cache en-
`
`180
`
`HPE, Exh. 1013, p. 11
`
`HPE, Exh. 1013, p. 11
`
`
`
`try not currently in destaging. Then, it decidesif the old data
`and parity blocks of the entry reside in the read cache.If not,
`it issues destage read requests for the missing blocks. Af-
`ter readingall the old data and parity blocks by the destage
`reads, it computes the new parity and generates two destage
`write requests for the new data and parity. Note that a pos-
`sibility exists that the old blocks in the read cachecan bere-
`placed while the destage read requests of missing blocksare
`in progress. Thus, the destage process goes through another
`checking process to determine whether all the blocks are
`still there. When the destage write requests are completed,
`the state of the dirty cache entry is changedinto clean, as
`previously described.
`
`Monitoring Interval|400 msec
`
`RAID Size
`Numberof Disks
`Read Cache Size
`Write Cache Size
`Cache Entry Size
`Read/Write Ratio
`PH
`PL
`LB
`AB
`
`4.19 GB
`5 (data:4, parity: 1)
`16 MB (2048 cacheentries)
`8 MB (1024 cacheentries)
`8 KB
`4:1
`0.7
`0.3
`70 IOPS
`49 IOPS (0.7'3°")
`
`slightly better than the HLVM algorithm. The performance HPE, Exh. 1013, p. 12
`
`Basically,
`V/O Workload and Simulation Parameters:
`we employ synthetically generated /O workloads through-
`out our performance evaluations. Each workload genera-
`tor models an I/O application which issues an V/O request
`and then waits for its completion in a synchronous manner.
`Weassumethat each I/O application generates 8KB read or
`write requests with one percent of spatial locality, where the
`read and write ratio is set to 4:1. Using realistic I/O work-
`loads is one of our future work.
`Table 1 shows configuration parameters which are used
`in our cache RAIDS simulator. The simulated RAID system
`consists offive(5) disks, four of which are usedfor data and
`one for parity. Since the read cacheis larger than the write
`cache,thesize of the read cacheis two times larger than that
`of the write cache. The cacheentry size of each cacheisset
`to 8KB.
`In the experiments,the static cache occupancylevels for
`HWM and LWM aresetto 0.7 and 0.3 of the whole write
`cachesize, respectively. Note that these static threshold val-
`ues are generally determined in an ad-hoc mannerindepen-
`dently of a time-varying /O workload. Recall that it is the
`main motivation ofthis paper. The configuration of the base
`increasing and decreasing rates of the cache occupancylev-
`els takes the following twosteps. First, the base decreasing
`rate uy°* was empirically obtained from our simulator un-
`der the condition that no intervention of host read requests
`exists. Thus, we obtained 70 IOPS with 8KB requests. Sec-
`ond, the inequality of Corollary 2 is applied to the obtained
`pig?? value in order to decide the value of Ag. Our ex-
`periments takes the largest number of Ag by multiplying
`1 — pz(= 0.7) with 70. Finally, the current changing rates
`of the cache occupancylevel are monitored every 400 msec.
`Also, the independentdisk features 512-byte sector, 1962
`cylinders , 19 tracks per cylinder, 72 sectors pertrack, and
`4002 rpm.
`
`Table 1. Configuration parameters of our
`cached RAID5 simulator
`
`3.2. Simulation Results
`
`The I/O performancesof the HLWM and AHLWMalgo-
`rithmsare evaluatedin terms of the response time of reads,
`write cache hit ratio, and disk utilization with an increase
`of I/O throughput. The response time of read requests has
`become moresignificant, because fast writes were achieved
`with non-volatile memory. This response time can be af-
`fected by disk utilization, especially the number of destage
`I/Os, and write cache overflows, as well as the read cachehit
`ratio. The high level of the write cache occupancy enhances
`the write cache hit ratio, resultantly reducing the response
`time of reads. Disk utilization is dependent mainly on the
`numberof read and destage I/O requests. Of these, only
`destage I/O requests are directly related to an underlying
`destage algorithm.
`
`Fig-
`V/O Performance with Different I/O Workloads:
`ures 5—6 show the variation of read response time, write
`cache hit ratio, the number ofdisk I/Os issued by destage
`requests (briefly called destage I/Os), and disk utilization
`of the HLWM algorithm and the AHLWMalgorithm with
`the increase of I/O throughput. Throughoutall experiments,
`the base increasing rate of the cache occupancy Ap and the
`base decreasing rate of the cache occupancy pp are set to
`49(0.71p) IOs/sec and 70 IOs/sec. As shownin Figures 5—
`6, the AHLWM algorithm outperforms the HLWM algo-
`rithm. The performance differences are maximized espe-
`cially undera light I/O workload in that its A(t)is relatively
`lower than the base value. As the I/O workloadincreases,
`however, the performances of both algorithms becomeal-
`most equal, even though the AHLWM algorithm works
`
`HPE, Exh. 1013, p. 12
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`5
`
`
`
`
`
`
`70
`
`80
`
`110
`JOPS
`
`120
`
`130
`
`140
`
`150
`
`couSs
`
`
`
` WriteHitRatio(%) ReadRespon:
`
`oNBDOM
`
`(a) Read response time
`
`(b) Write cachehit ratio
`
`Figure 5. Variations of read responsetime and write hit ratio of the HLWM algorithm and the AHLWM
`algorithm with the increaseof I/O throughput: (a) read responsetime,(b) write cachehit ratio, where
`pu = 0.7, pr = 0.3, Ag = 49 lOs/sec, and zz = 70 lOs/sec
`
`enhancementsof read response time in the AHLWM algo-
`rithm are primarily due to time-varying, adaptive thresh-
`olds,
`i.e., both HWM and LWM at eachtime are deter-
`mined based onthe current increasing and decreasing rates
`of the cache occupancylevel. Under a light I/O workload,
`the AHLWM