throbber
PROCEEDINGS
`
`P
`
`20 02
`2002 Pacific Rim
`International Symposium on
`Dependable Computing
`
`•
`
`-""
`
`_
`
`ti
`
`"F"'
`
`.7- .4.
`yl
`
`16-18 December 2002
`
`Tsukuba City, Ibaraki, Japan
`
`IFFF COMPUTER
`SOCIETY
`0 IEEE
`
`Sponsored by
`IEEE Computer Society Technical Committee on Fault-Tolerant Computing
`IEICE Technical Group on Dependable Computing
`•
`
`HPE, Exh. 1011, p. 1
`
`

`

`Proceedings
`
`2002 Pacific Rim International Symposium on
`Dependable Computing
`
`PRDC 2002
`
`16-18 December 2002 • Tsukuba, Japan
`
`Sponsored by
`IEEE Computer Society Technical Committee on Fault-Tolerant Computing
`IEICE Technical Group on Dependable Computing
`
`In cooperation with
`IFIP WG10.4 on Dependable Computing and Fault Tolerance
`Jet Propulsion Laboratory
`
`IEEEC )•
`
`COMPUTER
`SOCIETY
`*IEEE
`Los Alamitos, California
`Washington
`•
`Brussels
`•
`
`Tokyo
`
`HPE, Exh. 1011, p. 2
`
`

`

`0169
`,F33 P33
`
`Copyright © 2002 by The Institute of Electrical and Electronics Engineers, Inc.
`All rights reserved
`
`Copyright and Reprint Permissions: Abstracting is permitted with credit to the source. Libraries may
`photocopy beyond the limits of US copyright law, for private use of patrons, those articles in this volume
`that carry a code at the bottom of the first page, provided that the per-copy fee indicated in the code is paid
`through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923.
`
`Other copying, reprint, or republication requests should be addressed to: IEEE Copyrights Manager, IEEE
`Service Center, 445 Hoes Lane, P.O. Box 133, Piscataway, NJ 08855-1331.
`
`The papers in this book comprise the proceedings of the meeting mentioned on the cover and title page.
`They reflect the authors' opinions and, in the interests of timely dissemination, are published as presented
`and without change. Their inclusion in this publication does not necessarily constitute endorsement by the
`editors, the IEEE Computer Society, or the Institute of Electrical and Electronics Engineers, Inc.
`
`IEEE Computer Society Order Number PR01852
`ISBN 0-7695-1852-4
`Library of Congress Number 2002113942
`
`Additional copies may be ordered from:
`
`IEEE Computer Society
`Customer Service Center
`10662 Los Vaqueros Circle
`P.O. Box 3014
`Los Alamitos, CA 90720-1314
`Tel: + 1 800 272 6657
`Fax: + 1 714 821 4641
`http://computer.org/
`csbooks@computer.org
`
`IEEE Service Center
`445 Hoes Lane
`P.O. Box 1331
`Piscataway, NJ 08855-1331
`Tel: + 1 732 981 0060
`Fax: + 1 732 981 9667
`http://shop.ieee.org/store/
`customer-service a ieee.org
`
`IEEE Computer Society
`Asia/Pacific Office
`Watanabe Bldg., 1-4-2
`Minami-Aoyama
`Minato-ku, Tokyo 107-0062
`JAPAN
`Tel: + 81 3 3408 3118
`Fax: + 81 3 3408 3553
`tokyo.ofc@computer.org
`
`Editorial production by Anne Jacobs
`
`Cover art production by Joe Daigle/Studio Productions
`
`Printed in the United States of America by The Printing House
`
`IEEE
`
`COMPUTER
`SOCIETY
`IEEE
`
`HPE, Exh. 1011, p. 3
`
`

`

`Table of Contents
`2002 Pacific Rim International Symposium on Dependable Computing
`
`General Chair's Message
`Program Co-Chairs' Message
`Organizing Committee
`Steering Committee
`Program Committee
`Reviewers
`
`Keynote Speech (I)
`Fault Tolerance in Autonomic Computing Environment
`Y. Tohma
`
`Session 1A: Dependable Distributed Systems (I)
`Formal Specification and Verification of a Group Membership Protocol
`for an Intrusion-Tolerant Group Communication System
`H. V. Ramasamy, M. Cukier, and W. H. Sanders
`Asynchronous Active Replication in Three-tier Distributed Systems
`R. Baldoni, C. Marchetti, and S. Tucci Piergiovanni
`
`Session 1B: Checkpointing
`On Characteristics of DEF Communication-Induced Checkpointing Protocols
`J. Tsai and J.-W. Lin
`A Low Overhead Checkpointing Protocol for Mobile Computing Systems
`C.-Y. Lin, S.-C. Wang, S.-Y. Kuo, and 1.-Y. Chen
`The Cost of Checkpointing, Logging and Recovery for the Mobile Agent Systems
`H. Kim, H. Y. Yeom, T Park, and H. Park
`
`Session 2A: Dependable Systems (1)
`Analysis of the Effects of Real and Injected Software Faults: Linux as a Case Study
`T. Jarboui, J. Arlat, Y. Crouzet, K. Kanoun, and T. Marteau
`Principles of Multi-Level Reflection for Fault Tolerant Architectures
`F. Tai"ani, J.-C. Fabre, and M.-O. Killijian
`Hardware/Software Co-Reliability of Configurable Digital Systems
`M. Choi, N. Park, Y Kim, and F. Lombardi
`Highly Fault-Tolerant FPGA Processor by Degrading Strategy
`Y. Nakamura and K. Hiraki
`
`V LINDA HALL LIBRARY
`Kansas City, Mo.
`
`ix
`
`xi
`xii
`xiii
`xiv
`
`3
`
`9
`
`19
`
`29
`
`37
`
`45
`
`51
`
`59
`
`67
`
`75
`
`HPE, Exh. 1011, p. 4
`
`

`

`Session 2B: Reliability/Dependability Analysis
`Using Software Implemented Fault Inserter in Dependability Analysis
`P. Gawkowski and J. Sosnowski
`Analyzing Network Reliability with Imperfect Nodes using OBDD
`F.-M. Yeh, H.-Y. Lin, and S.-Y. Kuo
`Reliability Analysis of Grid Computing Systems
`Y. S. Dai, M. Xie, and K. L. Poh
`A Control Theory Approach for Analyzing the Effects of Data Errors
`in Safety-Critical Control Systems
`O. Askerdal, M. Gafvert, M. Hiller, and N. Suri
`
`Keynote Speech (II)
`Caveat Emptor: Making Grid Services Dependable from the Client Side
`M. Livny and D. Thain
`
`Session 3A: Dependable Distributed Systems (II)
`An Introduction to the Renaming Problem
`M. Raynal
`Passive Replication Schemes in AQUA
`Y. Ren, P. Rubel, M. Seri, M. Cukier, W. H. Sanders, and T. Courtney
`Detecting Feature Interactions in Telecommunication Services with a SAT
`Solver
`T. Tsuchiya, M. Nakamura, and T. Kikuno
`
`Session 3B: Dependable Networks
`Fault-Tolerant Properties of Generalized Hierarchical Completely-Connected
`Networks
`T. Takabatake, M. Kitakami, and H. Ito
`Energy Efficient and Robust Multicast Protocol for Mobile Ad Hoc Networks
`S. Moh, C. Yu, B. Lee, and H. Y. Youn
`Configurable PC Clusters Using a Hierarchical Complete-Connection-Based
`Switching Network
`N. Tsuda
`
`Keynote Speech (III)
`Perspectives on Dependable Computing for Solar System Exploration
`L. Alkalai
`
`vi
`
`81
`
`89
`
`97
`
`105
`
`117
`
`121
`
`125
`
`131
`
`137
`
`145
`
`153
`
`163
`
`HPE, Exh. 1011, p. 5
`
`

`

`Session 4A: Dependable Systems (II)
`Enhancing Access Control with SysGuard, a Reference Monitor Supporting
`Portable and Composable Kernel Module
`Y. Shinjo, K. Eiraku, A. Suzuki, K. Itano, and C. Pu
`An Adaptive High-Low Water Mark Destage Algorithm for Cached RAIDS
`Y. J. Nam and C. Park
`A Packet-Based Caching Proxy with Loss Recovery for Video Streaming
`K.-S. Hsueh and S.-D. Wang
`
`Session 4B: Dependability Evaluation and Verification
`Reliability Evaluation of Multi-State Systems Subject to Imperfect
`Coverage Using OBDD
`Y.-R. Chang, S. V. Amari, and S.-Y. Kuo
`Characterization of Operating Systems Behavior in the Presence of Faulty
`Drivers through Software Fault Emulation
`J. Duraes and H. Madeira
`Level Oriented Formal Model for Asynchronous Circuit Verification and
`Its Efficient Analysis Method
`T. Kitai, Y. Oguro, T. Yoneda, E. Mercer, and C. Myers
`
`Session 5A: Consensus in Distributed Systems
`Consensus in Synchronous Systems: A Concise Guided Tour
`M. Raynal
`A Consensus Algorithm for Synchronous Distributed Systems using Mobile Agent
`J. Cao, X. Wang, S. Lo, and S. K. Das
`
`Session 5B: Coding
`Evaluation of Convolutional-Code-Based FEC under Limited Recovery Time
`and Its Application to Real-Time Transmission
`M. Arai, H. Kurosu, S. Fukumoto, and K. Iwasaki
`A Class of Random Multiple Bits in a Byte Error Correcting (St/bEC) Codes
`for Semiconductor Memory Systems
`G. Umanesan and E. Fujiwara
`
`Session 6A: Dependable Database
`Providing Persistence for Sensor Streams with Light Neighbor WAL
`H. Kawashima, M. Toyama, M. /mai, and Y. Anzai
`Definition of Faultloads Based on Operator Faults for DMBS Recovery
`Benchmarking
`M. Vieira and H. Madeira
`
`vii
`
`167
`
`177
`
`185
`
`193
`
`201
`
`210
`
`221
`
`229
`
`239
`
`247
`
`257
`
`265
`
`HPE, Exh. 1011, p. 6
`
`

`

`Session 6B: Test
`An Alternative Method of Generating Tests for Path Delay Faults Using
`N,-Detection Test Sets
`H. Takahashi, K. K. Saluja, and Y. Takamatsu
`Enhancing Delay Fault Testability for Iterative Logic Arrays
`S.-K. Lu and C.-H. Yeh
`
`Session 7A: Neural Network
`Comparison with Defect Compensation Methods for Feed-Forward Neural
`Networks
`K. Takahashi, K. Yamamori, S. Horiguchi, and I. Yoshihara
`An Extreme Value Injection Approach with Reduced Learning Time to Make
`MLNs Multiple-Weight-Fault Tolerant
`I. Takanami and Y. Oyama
`
`Session 7B: Practical Reports
`Event Log Based Dependability Analysis of Windows NT and 2K Systems
`C. Simache, M. Kaaniche, and A. Saidane
`Using VHDL-Based Fault Injection to Exercise Error Detection Mechanisms
`in the Time-Triggered Architecture
`J. Gracia, D. Gil, J. C. Baraza, and P. J. Gil
`Formal Verification of a Pipelined Processor with New Memory Hierarchy
`Using a Commercial Model Checker
`H. Nakamura, T. Arai, and M. Fujita
`
`Author Index
`
`275
`
`283
`
`293
`
`301
`
`311
`
`316
`
`321
`
`325
`
`viii
`
`HPE, Exh. 1011, p. 7
`
`

`

`An Adaptive High-Low Water Mark Destage Algorithm for Cached RAIDS
`
`Young Jin Nam and Chanik Park
`Department of Computer Science and Engineering/PIRL
`Pohang University of Science and Technology
`Kyungbuk, Republic of Korea
`{ yjnam,cipark} @postech.ac.kr
`
`Abstract
`
`The High-Low Water Mark destage (HLWM) algorithm
`is widely used to enable a cached RAIDS to flush dirty data
`from its write cache to disks. It activates and deactivates a
`destaging process based on two time-invariant thresholds
`which are determined by cache occupancy levels. How-
`ever, the opportunity exists to improve I/O throughput by
`adaptively changing the thresholds. This paper proposes
`an adaptive HLWM algorithm which dynamically changes
`its thresholds according to a varying I/O workload. Two
`thresholds are defined as the multiplication of changing
`rates of the cache occupancy level and the time required
`to fill and empty the cache. Performance evaluations with a
`cached RAIDS simulator reveal that the proposed algorithm
`outperforms the HLWM algorithm in terms of read response
`time, write cache hit ratio, and disk utilization.
`
`1. Introduction
`
`Despite its prevalence, RAIDS [4] has suffered from a
`small write problem, which refers to the phenomenon that
`RAIDS requires four disk accesses to write a single disk
`block: old parity/data reads and new parity/data writes. To
`overcome the small write problem in RAIDS, several solu-
`tions have been proposed [1, 2, 5, 6, 7]. One utilizes the
`non-volatile write cache [1]. Since the service of a write
`request entails writing data onto the write cache, it can be
`performed quickly. The written (i.e., dirty) data in the cache
`should be flushed into physical disks when the cache occu-
`pancy level arrives at a pre-defined threshold. The operation
`of flushing dirty data is called a destaging process, which is
`usually initiated by a destage scheduler or a destage algo-
`rithm.
`The desirable properties that a destage algorithm encom-
`passes are as follows. First, it has no effect on the response
`time of read requests. This implies that the destaging pro-
`cess is initiated when read requests are no longer in service.
`
`Second, it keeps the cache occupancy level as high as pos-
`sible to increase the write cache hit ratio. In addition, keep-
`ing the cache occupancy level high may reduce the number
`of disk I/Os required for destaging. Third, it should avoid
`cache overflow, i.e., maintain a sufficient number of clean
`or free cache entries. Observe that a trade-off should be
`made to pursue the last two properties at the same time. In
`addition, the following two issues ought to be taken into
`consideration to meet the properties noted above: how to
`elaborately determine its thresholds to manage the destag-
`ing process (e.g., starting or stopping destaging), and how
`to decide which dirty cache entries are to be destaged. This
`paper mainly emphasizes the first issue: how to efficiently
`determine the destaging thresholds.
`Two destage algorithms have been available in terms
`of manipulating destaging thresholds: the High-Low Wa-
`ter Mark (HLWM) algorithm [1] and the Linear Threshold
`(LT) algorithm [7]. The HLWM algorithm is based on two
`destaging thresholds: the High Water Mark (HWM) and
`the Low Water Mark (LWM). It starts the destage process
`when the current cache occupancy level goes up to HWM.
`Conversely, it stops destaging when the cache occupancy
`level goes down to LWM. The LT algorithm defines more
`than two thresholds. As the current cache occupancy level
`reaches each higher threshold, it accelerates the destaging
`process by including more dirty entries in a single destaging
`write operation. Even though this algorithm outperforms
`the HLWM algorithm, it requires a complicated operation
`to compute the destaging costs for all the dirty entries at
`every disk head move. Observe that the HLWM and LT al-
`gorithms define their destaging thresholds as time-invariant
`cache occupancy levels. Next, the thresholds remain un-
`changed in the presence of a time-varying I/O workload.
`However, it is desirable to set HWM and LWM to higher
`levels under a light I/O workload in order to increase the
`write cache hit ratio. Conversely, it is required that HWM
`and LWM be set to lower levels under a heavy I/O work-
`load, so as to prepare a sufficient number of free or clean
`cache entries. In sum, we claim that the opportunity ex-
`
`0-7695-1852-4/02 $17.00 © 2002 IEEE
`
`177
`
`HPE, Exh. 1011, p. 8
`
`

`

`ists to improve the I/O throughput by adaptively changing
`HWM and LWM according to a varying I/O workload.
`In this paper, we propose an Adaptive High-Low Water
`Mark destage (AHLWM) algorithm which dynamically de-
`termines its thresholds according to a varying I/O workload.
`The thresholds are defined as the multiplication of changing
`rates of the cache occupancy level and the time required to
`fill and empty the cache. We prove that the AHLWM al-
`gorithm outperforms the HLWM algorithm in terms of read
`response time, write cache hit ratio, and disk utilization via
`performance evaluations on our cached RAIDS simulator.
`This paper is organized as follows: Section 2 describes the
`AHLWM algorithm and Section 3 evaluates its performance
`by comparison with the HLWM algorithm and then dis-
`cusses an issue related to a bursty I/O workload. Finally,
`we conclude this paper with Section 4.
`
`2. Adaptive High-Low Water Mark Destage
`Algorithm
`
`This section describes the Adaptive High-Low Water
`Mark destage algorithm and then gives illustrative exam-
`ples to explain its operation under various workload envi-
`ronments and configuration parameters.
`
`Figure 1 compares a High-
`Algorithm Description:
`Low Water Mark destage (briefly HLWM) algorithm and
`an Adaptive High-Low Water Mark destage (AHLWM) al-
`gorithm in terms of managing their thresholds. Typically,
`
`s
`
`pHs 4-
`
`p,S
`
`o --
`
`s
`
`pH(t)S
`
`p,(t)S
`
`Vaal
`
`0
`
`HWM
`
`static) and LWM (£ static) are written as
`
`Natatic
`
`£static
`
`= PHS
`PLS,
`
`=
`
`where S is the cache size and 0 < pL < pH < 1. Ob-
`serve that both HWM and LWM of the HLWM algorithm
`remain unchanged during its operation under a time-variant
`I/O workload.
`While the HWM and LWM are time-invariant in the
`HLWM algorithm, they are time-variant in the AHLWM al-
`gorithm, as shown in Figure 1(b). The HLM and LWM of
`the AHLWM algorithm can be written as follows:
`
`Nadaptive(t)
`
`£adaptive (t) =
`
`= PH(t)S,
`pL(t)S.
`
`(3)
`(4)
`
`Next, we will drive the time-variant p H (t) and ALM. Let
`AB and µB represent base increasing and decreasing rates
`of the cache occupancy level. Note that AB is related to
`the increasing rate of the cache occupancy level, which ac-
`counts for the write cache hit and is not directly proportional
`to the arrival rate of write I/O requests. Let the current in-
`creasing and decreasing rates of the cache occupancy level
`at time t be A(t) and µ(t) , In addition, we define an effec-
`tive decreasing rate of the cache occupancy level at time t,
`a (t) = µ(t) - A(t). Similarly, aB =AB— AB. Then, let
`us define TB and TL which represent the times required to
`fill and empty the cache as follows:
`
`(1- PH) s
`
`=
`
`AB
`S.
`AB
`
`(5)
`
`(6)
`
`Given )(t) and a(t), the following equations should be
`valid:
`
`A(t)TH + p H (t)S = S,
`PL(t)S — TL,a(t) = 0.
`
`(7)
`(8)
`
`Finally, the time-variant pH (t) and AL (t) are written as:
`
`(a) HLWM algorithm
`
`(b) AHLWM algorithm
`
`pH (t)
`
`pL(t)
`
`At
`rflax{0, 1 - (1 - PH)
`-A(17)
`t
`-
`} •
`ct(B)
`
`min{ 1, max{0,
`
`/,
`
`(9)
`
`(10)
`
`Figure 1. Comparisons of two thresholds of
`the HLWM algorithm and the AHLWM algo-
`rithm
`
`the HLWM algorithm defines two static threshold values —
`High Water Mark (HWM) and Low Water Mark (LWM) by
`using cache occupancy levels, as shown in Figure 1(a). The
`
`Note that pH (t) can be smaller than pL (t) under a rare con-
`dition. To avoid this, the following condition has to be
`added in its implementation: pL(t) = pH (t) if pH (t) <
`PL(t) •
`Next, we will look into the useful properties of the
`AHLWM algorithm associated with the current increasing
`and decreasing rates. To begin, we can easily see that the
`following lemma holds without proof.
`
`178
`
`HPE, Exh. 1011, p. 9
`
`

`

`Lemma 1 Given A(t) = AB and µ(t) = µB at any time
`t, the AHLWM algorithm degenerates to the HLWM algo-
`rithm.
`
`Note that the two thresholds of the AHLWM algo-
`rithm are dynamically determined as the A(t) and µ( t)
`change. However, the following inequality should be met
`at any given time t for normal operations: 7{ adaptive
`adaptive (t) >
`adaptive(t).
`
`Theorem 1 '{adaptive (t) > Ladaptive (t) at any time t if
`
`(1- pH)k < AB < PB pLk,
`
`(11)
`
`for a given µ(t) = k, where 0 < k < 1-(Z3 PL).
`
`Proof: Basically, Equation (11) can be derived from two
`boundary conditions in Equation (9) and (10). It is easily
`seen that '{adaptive(t) > Ladaptive(t) when A(t) = AB
`and µ(t) = µH according to Lemma 1. Next, we will
`consider the two boundary conditions: 1) pL(t)j),(0=0 ≤
`PH MIA(t)=0 = 1 and 2) pH(t)I A(t)=k ≥ PL(t)lx(t)=k = 0.
`Then, we can derive the relation of AB 5_ µB - pLk from
`the boundary condition 1. Also, the (1 — pH)k < AB rela-
`tion can be obtained from the boundary condition 2. Since
`1-(p "HB-PL) . It
`(1 — pH)k < µH — pLk, then we have k
`implies that the maximum value of k, i.e., µ(t) is confined
`ita
`PL ) , ' called µIL'. Also, we can show that the
`to
`1—( px—
`two boundary conditions hold when k = µBay. Thus, this
`theorem follows. I
`
`Corollary 1 11._adaptive(t) > Ladaptive(t) holds at any time
`t is satisfied if AB =
` AB-
`
`1 ,;
`
`H P
`
`+ 11 p
`
`L
`
`in Equation (11) of Theorem 1,
`If µ(t) = µ7141"
`Proof:
`the upper and lower bounds of AB converge into the point
`µB which always makes the condition
`of AB =
`1—pHi-PL
`hold. Thus, this corollary follows. I
`
`Corollary 2 Given 0 < µ(t) < µBay, '{adaptive (t) >
`Ladaptzve (t) holds at any time t if
`
`(1 PH)µBax < AB
`
`≤
`
`(1
`
`PL)P731ax •
`
`(12)
`
`Proof: The Equation (12) can be obtained by simply re-
`placing k with µr3" in Equation (11) of Theorem 1. Thus,
`this corollary follows.
`
`I
`
`179
`
`Illustrative Examples:
`Figure 2 demonstrates the ex-
`pected adaptiveness of HWM and LWM by the AHLWM
`algorithm according to the variation of an I/O workload.
`In Figure 2(a), 7-1B becomes 1-1B — ABB with a higher
`A(t) = AB + oBH mainly attributed to a higher increas-
`ing rate of dirty data in the write cache. Conversely, NB
`goes up to 7-1H + Am, with a lower increasing rate of A(t)
`by oBL. Similarly, LB becomes LB + ALB with a higher
`a(t) = aH + oLH attributed to a higher rate of destag-
`ing data to disks or a lower increasing rate of dirty data in
`the write cache, as shown in Figure 2(b). Conversely, LB
`goes down to LB — ALL with the decrease of a(t) by hi,.
`Observe that Equation (5)—(8) remain valid under•any I/O
`workload level.
`
`rock, nersoponn level
`
`AN"Sar
`
`A, .4a 7T„
`
`each, or, wenn level
`
`T,
`
`4 ,4•4,„,
`
`1a. a.
` •
`
`(a) Variation of HWM
`
`(b) Variation of LWM
`
`Figure 2. Variations of HWM and LWM accord-
`ing to different levels of I/O workloads, where
`S represents the cache size
`
`3. Performance Evaluations under Cached
`RAID5
`
`For performance evaluations of the AHLWM algorithm,
`we have implemented it within a cached RAIDS simulator
`which employs a write cache. Then, we compare the per-
`formance of the proposed algorithm with the HLWM algo-
`rithm. We begin by describing the cached RAIDS simulator.
`
`3.1. Simulation Environments
`
`The cached RAIDS simulator consists of a host I/O re-
`quest queue, a host I/O scheduler, RAID I/O processes, a
`volatile read cache, a non-volatile write cache, disk I/O re-
`quest queues, disk I/O schedulers, a destage process which
`includes different types of destage algorithms, and a syn-
`thetic I/O workload generator, as shown in Figure 3.
`
`Overall I/O Operations: Host I/O requests issued by
`a group of workload generators is first be placed into the
`
`HPE, Exh. 1011, p. 10
`
`

`

`060
`
`Non n° nepaed pwe
`
`nom VO
`rise
`
`IUD
`mad YO
`
`RAID
`WM' Do
`
`Den ae
`Roc
`
`Volatile
`Read Cache
`
`lboomQ
`
`-
`
`INA
`
`Non.Volao
`Write Cache
`
`INA VO
`Schad
`
`auk 1/0
`SNed
`
`Out .
`Sdled
`
`Ova VO
`ached
`
`Dad
`
`Duk
`
`Figure 3. The internal structure of the cached
`RAID5 simulator
`
`host I/O request queue in an FIFO manner. Next, the host
`I/O scheduler receives a host I/O request from the host I/O
`request queue. Then, it spawns either a RAID read I/O pro-
`cess or a RAID write I/O process, according to the request
`type. The RAID read I/O process searches its data first from
`the write cache and then from the read cache. If valid data
`resides in either of the two caches (a read cache hit), the
`cached data is immediately transferred to hosts with no disk
`I/O. If not (a read cache miss), the data is fetched from phys-
`ical disks into the read cache and is then returned to hosts.
`The RAID write I/O process writes its data simply into the
`write cache and notifies its host of the completion of the cur-
`rent write request service. Later, the destage process flushes
`the newly written data called dirty data (entries) into phys-
`ical disks. The behavior of the destage process will be de-
`scribed later in detail.
`
`Each individual disk has a pair of queues which are the
`host request queue and the destage request queue. While
`disk I/O requests made from a read request are placed into
`the host request queue of a disk, the destage read/write I/O
`requests issued by the destage process are lined up in the
`destage queue of a disk. Note that a host request queue is
`given a higher priority in service than a destage queue, i.e.,
`the requests in the destage request queue are processed only
`while the host request queue becomes empty. In the event
`of a write cache overflow, the destage requests are placed
`into the host request queue. Each disk also contains its own
`1/O scheduler. Even though a few efficient disk schedul-
`ing algorithms are currently available, we employ a simple
`FIFO disk scheduling algorithm because performances of
`disk scheduling are not our concern.
`
`180
`
`Write I/O Operations and Destaging Process: To be-
`gin, Figure 4 depicts a state diagram of a cache entry
`Initially, the state of each en-
`within the write cache.
`try is NOT_USED and INVALID. When the entry is al-
`located for storing dirty data from the host, its state is
`changed to IN_USE and VALID/DIRTY. If new dirty data
`arrives to the same location before the destaging of the
`dirty data, it is overwritten to the entry. This is called a
`write cache hit. Conversely, if new dirty data arrives dur-
`ing the destaging of the dirty data, it is stored separately
`by allocating different cache entries. At this point, its state
`is changed to IN_USE and VALID/DDIRTY. Subsequent
`dirty data at the same location will be overwritten, as previ-
`ously. Once the destaging of a dirty data/entry completes,
`its state is changed to NOT_USED and VALID/CLEAN. In
`addition, if an associate cache entry of the IN_USE and
`VALID/DDIRTY state exists, then its state is changed to
`IN_USE and VALID/DIRTY.
`
`a new request
`write cache hit(overwrite)
`
`Q
`
`a new requestidestaging
`subsequent write cache hit (overwrite)
`
`DLUSE
`vmaimoarnr
`
`
`
`destage done
`
`DI est
` vALuwooinv
`
`NOT_USED
`INVALID
`
`a new request
`write cache miss
`
`a new request
`write cache miss
`
`destage done
`
`a new request/destaging
`first write cache hit
`
`NOT_USED
`USED
`VA LIDLEAN
`
`Figure 4. Stage diagram of cache entries
`within the write cache by the destage process
`
`Next, we explain the destage process which flushes dirty
`write cache entries into physical disks in the background.
`Note that the old data and parity blocks are read into the
`read cache. However, those entries are not locked in the
`read cache until the computation of its new parity blocks
`completes. This implies that these blocks can be replaced
`with others before the new parity is calculated. Newly com-
`puted parity blocks are overwritten into the old parity blocks
`in the read cache, not the write cache. Also, the write of
`the new parity blocks are not delayed. In the presence of
`write cache overflows implying that no available cache en-
`tries exist for the newly arrived data from the host, subse-
`quent write requests bypass the write cache, i.e., the write
`requests are treated as in the traditional RAID5. Finally,
`host read requests are given a higher priority than destage
`requests. Then, the destage process starts when the current
`occupancy reaches its HWM and then stops when the cur-
`rent occupancy goes down to its LWM. The destage process
`first scans the write cache and then selects a dirty cache en-
`
`HPE, Exh. 1011, p. 11
`
`

`

`try not currently in destaging. Then, it decides if the old data
`and parity blocks of the entry reside in the read cache. If not,
`it issues destage read requests for the missing blocks. Af-
`ter reading all the old data and parity blocks by the destage
`reads, it computes the new parity and generates two destage
`write requests for the new data and parity. Note that a pos-
`sibility exists that the old blocks in the read cache can be re-
`placed while the destage read requests of missing blocks are
`in progress. Thus, the destage process goes through another
`checking process to determine whether all the blocks are
`still there. When the destage write requests are completed,
`the state of the dirty cache entry is changed into clean, as
`previously described.
`
`I/O Workload and Simulation Parameters: Basically,
`we employ synthetically generated I/O workloads through-
`out our performance evaluations. Each workload genera-
`tor models an I/O application which issues an I/O request
`and then waits for its completion in a synchronous manner.
`We assume that each I/O application generates 8KB read or
`write requests with one percent of spatial locality, where the
`read and write ratio is set to 4:1. Using realistic I/O work-
`loads is one of our future work.
`
`Table 1 shows configuration parameters which are used
`in our cache RAID5 simulator. The simulated RAID system
`consists of five(5) disks, four of which are used for data and
`one for parity. Since the read cache is larger than the write
`cache, the size of the read cache is two times larger than that
`of the write cache. The cache entry size of each cache is set
`to 8KB.
`
`In the experiments, the static cache occupancy levels for
`HWM and LWM are set to 0.7 and 0.3 of the whole write
`cache size, respectively. Note that these static threshold val-
`ues are generally determined in an ad-hoc manner indepen-
`dently of a time-varying I/O workload. Recall that it is the
`main motivation of this paper. The configuration of the base
`increasing and decreasing rates of the cache occupancy lev-
`els takes the following two steps. First, the base decreasing
`rate µB62 was empirically obtained from our simulator un-
`der the condition that no intervention of host read requests
`exists. Thus, we obtained 70 IOPS with 8KB requests. Sec-
`ond, the inequality of Corollary 2 is applied to the obtained
`µ"Bwr value in order to decide the value of AB. Our ex-
`periments takes the largest number of AB by multiplying
`1 — pl,(= 0.7) with 70. Finally, the current changing rates
`of the cache occupancy level are monitored every 400 msec.
`Also, the independent disk features 512-byte sector, 1962
`cylinders , 19 tracks per cylinder, 72 sectors per track, and
`4002 rpm.
`
`Table 1. Configuration parameters of our
`cached RAID5 simulator
`
`Parameter
`RAID Size
`Number of Disks
`Read Cache Size
`Write Cache Size
`Cache Entry Size
`Read/Write Ratio
`
`PH
`PL
`µB
`AB
`Monitoring Interval
`
`Value
`4.19 GB
`5 (data:4, parity: 1)
`16 MB (2048 cache entries)
`8 MB (1024 cache entries)
`8KB
`4:1
`0.7
`0.3
`70 IOPS
`49 IOPS (0.7µB°2)
`400 msec
`
`3.2. Simulation Results
`
`The I/O performances of the HLWM and AHLWM algo-
`rithms are evaluated in terms of the response time of reads,
`write cache hit ratio, and disk utilization with an increase
`of I/O throughput. The response time of read requests has
`become more significant, because fast writes were achieved
`with non-volatile memory. This response time can be af-
`fected by disk utilization, especially the number of destage
`I/Os, and write cache overflows, as well as the read cache hit
`ratio. The high level of the write cache occupancy enhances
`the write cache hit ratio, resultantly reducing the response
`time of reads. Disk utilization is dependent mainly on the
`number of read and destage I/O requests. Of these, only
`destage I/O requests are directly related to an underlying
`destage algorithm.
`
`Fig-
`1/O Performance with Different I/O Workloads:
`ures 5-6 show the variation of read response time, write
`cache hit ratio, the number of disk I/Os issued by destage
`requests (briefly called destage I/Os), and disk utilization
`of the HLWM algorithm and the AHLWM algorithm with
`the increase of I/O throughput. Throughout all experiments,
`the base increasing rate of the cache occupancy AB and the
`base decreasing rate of the cache occupancy µB are set to
`49(0.7µB) IOs/sec and 70 IOs/sec. As shown in Figures 5-
`6, the AHLWM algorithm outperforms the HLWM algo-
`rithm. The performance differences are maximized espe-
`cially under a light I/O workload in that its A(t) is relatively
`lower than the base value. As the I/O workload increases,
`however, the performances of both algorithms become al-
`most equal, even though the AHLWM algorithm works
`slightly better than the HLWM algorithm. The performance
`
`181
`
`HPE, Exh. 1011, p. 12
`
`

`

`40
`
`35 1
`3 30 r
`1 25
`
`
`U213
`
`,;,2
`
`15
`
`10
`5
`0
`70
`
`HLWM
`AHLWM
`
`18
`
`16
`
`14
`F-4 12
`10
`
`a
`
`8
`
`4
`
`2
`0
`70
`
`HLWM
`AHLWM
`
`80
`
`90
`
`100
`
`110
`
`120
`
`130
`
`140
`
`150
`
`MPS
`
`80
`
`90
`
`100
`
`110
`
`120
`
`130
`
`140
`
`150
`
`lops
`
`(a) Read response time
`
`(b) Write cache hit ratio
`
`Figure 5. Variations of read response time and write hit ratio of the HLWM algorithm and the AHLWM
`algorithm with the increase of I/O throughput : (a) read response time, (b) write cache hit ratio, where
`PL = 0.3, AB = 49 1Os/sec, and µB = 70 1Os/sec
`PH
`
`enhancements of read response time in the AHLWM algo-
`rithm are primarily due to time-varying, adaptive thresh-
`olds, i.e., both HWM and LWM at each time are deter-
`mined based on the current increasing and decreasing rates
`of the cache occupancy level. Under a light I/O workload,
`the AHLWM algorithm keeps up the cache occupancy level
`higher than the HLWM algorithm, meaning that the write
`cache hit ratio increases. It is easily inferred that the high
`level of the write cache hit ratio enhances the response time
`of reads i.e., a high write cache hit ratio can reduce the num-
`ber of destage UOs, which then lessens disk utilization by
`destage UOs.
`
`A Variation of the Cache Occupancy Level:
`Figure 7
`presents the average cache occupancy levels at each I/O re-
`quest arrival rate of the previous results. It is clearly shown
`that the AHLWM algorithm maintains the cache occupancy
`level adaptively to the current I/O workload, i.e., it keeps
`the level of the cache occupancy relatively high under light
`workloads, ranging from 82 to 140 I0s/sec. We can expect
`that HWM and LWM of the HLWM algorithm become sim-
`ilar to those of the AHLWM algorithm as the I/O workload
`increases. A series of subsequent results will prove this ex-
`pectation. Also, we observed that the cache occupancy lev-
`els of both the HLWM and the AHLWM algorithms grow
`abruptly under heavy workloads beyond 140 IOs/sec, be-
`cause the increasing rate of the cache occupancy level A(t)
`is almost equal to the decreasing rate of the cache occu-
`pancy level µ(t) under such a condition. Note that the write
`cache will be filled up shortly

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket