throbber
(12) United States Patent
`Gill et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 7,574,556 B2
`Aug. 11, 2009
`
`US007574556B2
`
`(54) WISE ORDERING FOR
`WRITES COMBINING SPATAL AND
`TEMPORAL LOCALITY IN WRITE CACHES
`
`75
`D.
`(75) Inventors: E. R t Nissa, s );
`"artists anta IVodna, San
`
`(73) Assignee: International Business Machines
`Corporation, Armonk, NY (US)
`
`(*) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 491 days.
`
`(21) Appl. No.: 11/384939
`
`(22) Filed:
`65
`
`Mar 20, 2006
`Prior Publication Data
`US 2007/0220201 A1
`Sep. 20, 2007
`(51) Int. Cl.
`(2006.01)
`G06F 12/00
`(52) U.S. Cl. ........................ 711/113: 711/130; 711/114
`(58) Field of Classification Search ..................... 71 1/4,
`711/113114, 118, 130, 141–144
`See application file for complete search history.
`References Cited
`
`(56)
`
`U.S. PATENT DOCUMENTS
`
`E. 3. R et al.
`5.543,066 A
`T. 1996 MNO etal
`5,754sss A
`5/1998 Yang et al.
`6,272,662 B1
`8/2001 Jadav et al.
`6,341,331 B1
`1, 2002 McNutt
`6,493,800 B1
`12/2002 Blumrich
`
`8/2004 Ash et al.
`6,785,771 B2
`3/2005 Olarig et al.
`6,865,647 B2
`3/2005 Naamad et al.
`6,865,648 B1
`5/2005 Lambright et al.
`6,898,672 B2
`4/2004 Kuwata
`2004f0078518 A1
`1/2005 Chen
`2005/0010722 A1
`9, 2007 G11
`2007/022O2OO A1
`FOREIGN PATENT DOCUMENTS
`
`JP
`
`9, 2002
`2001.249835 A
`OTHER PUBLICATIONS
`Lee, Jung-Hoon et al., Application-adaptive intelligent cache
`memory system, ACM Transactions on Embedded Computing Sys
`tems (TECS), ACM Transactions on Embedded Computing Systems
`(TECS), vol. 1. Issue 1 (Nov. 2002), pp. 56-78.
`(Continued)
`Primary Examiner Jasmine Song
`74) Attorney, Agent, or Firm—Jeffrey P. Aiello; Anthony V.
`V, Ag
`y
`y
`S. England
`ABSTRACT
`(57)
`A storage SVStem has a storage controller for an arraV of
`9.
`ge Sy
`y
`storage disks, the array being ordered in an sequence of write
`groups. A write cache is shared by the disks. The storage
`controller temporarily stores write groups in the write cache,
`responsive to write groups being written, and lists the write
`groups in order of their sequence in the array and in circular
`fashion, so that a lowest is listed next to a highest one of the
`write groups. The storage controller selects the listed write
`groups in rotating sequence. Such a write group is destaged
`from the write cache to the disk responsive to i) the selecting
`of the write group and ii) a state of a recency indicator for the
`write group, wherein the recency indicator shows recency of
`writing to the write group.
`
`PAGEP
`ly
`DR1
`
`Rak
`
`
`
`PAGEP3
`c-PAGE P4
`
`D2R2
`Ya
`
`S.
`
`STRIP 102
`
`STRIPE 104
`
`- D1R2
`D1R3
`-
`
`R
`
`s
`
`h
`
`Y DR3
`
`27 Claims, 4 Drawing Sheets
`
`WG1
`WG4
`WG7
`WG2
`WG5
`RANK YWG8
`R3
`
`- DiR
`- D2R3
`
`WG3
`WG6
`WG9
`D3R3
`
`w
`
`CONTROLER
`CXR1
`
`STORAGE
`CONTROLLER
`
`SORAGE
`CONTROLLER
`
`RECENCY
`
`SECUENCE
`
`BIT 310 (S
`RANKR1 SORTED, S
`CIRCULARQUEUEQ
`O
`P1
`
`0.
`
`/2 N. ->1 1
`
`RANKR3 SORTED, N
`CIRCULARQUEUE C3N 0
`SEQUENCE
`
`HPE, Exh. 1011, p. 1
`
`

`

`US 7,574,556 B2
`Page 2
`
`OTHER PUBLICATIONS
`Panda, P.R. et al., Data and memory optimization techniques for
`embedded systems, ACM Transactions on Design Automation of
`Electronic Systems (TODAES), vol. 6, Issue 2 (Apr. 2001), pp.
`149-206.
`Song, Yonghong et al., “Data Locality Enhancement by Memory
`Reduction.” Proceedings of the 15th international conference on
`Supercomputing, Jun. 16-21, 2001, Sorrento, Italy, pp. 50-64. http://
`www.cs.purdue.edu/homes/li/draftlics01.pdf.
`Lee, Jung-Hoon et al., “Application-adaptive intelligent cache
`memory system.” ACM Transactions on Embedded Computing Sys
`tems (TECS), ACM Transactions on Embedded Computing Systems
`(TECS), vol. 1, Issue 1 (Nov. 2002), pp. 56-78, http://wotanliu.edu/
`docis/lib/sisl/rclis/dblfatemcs/
`(2002)1%253A1%253C56%253AAICMS%253E'supercom.
`yonsei.ac.kr'6252Fpaper%252FACM-format-final.pdf.
`
`Sermulins, Janis et al., "Cache Aware Optimization of Stream Pro
`grams.” Proceedings of the 2005 ACM SIGPLAN/SIGBED confer
`ence on Languages, compilers, and tools for embedded systems, Jun.
`15-27, 2005, Chicago, vol. 40 Issue 7, pp. 115-126, http://web.mit.
`edu/rabbah/www/docs/sermulins-Ictes-2005.pdf.
`Panda, P.R. et al., “Data and memory optimization techniques for
`embedded systems.” ACM Transactions on Design Automation of
`Electronic Systems (TODAES), vol. 6, Issue 2 (Apr. 2001), pp.
`149-206. http://www.ee.ucla.edu/-ingrid/Courses/Readingp149
`pandaACM2001.pdf.
`Marathe, Jaydeep et al., “METRIC: tracking down inefficiencies in
`the memory hierarchy viabinary rewriting.” Performance monitoring
`session, ACM International Conference Proceeding Series, vol. 37.
`Proceedings of the international Symposium on Code generation and
`optimization: feedback-directed and runtime optimization, Mar.
`23-26, 2003, pp. 289-300, http://www.csl.cornell.edu/~sam/papers/
`cgo03.pdf.
`
`HPE, Exh. 1011, p. 2
`
`

`

`U.S. Patent
`
`Aug. 11, 2009
`
`Sheet 1 of 4
`
`US 7,574,556 B2
`
`
`
`
`
`
`
`NON
`
`
`
`GTT SHOSSBOOMd
`
`
`
`TZT Å HOWEW ETILWTON
`
`
`
`
`
`T?T BOIABO AVTdSIG
`
`HPE, Exh. 1011, p. 3
`
`

`

`U.S. Patent
`U.S. Patent
`
`Aug. 11, 2009
`Aug. 11, 2009
`
`Sheet 2 of 4
`Sheet 2 of 4
`
`US 7,574,556 B2
`US 7,574,556 B2
`
`
`
`
`
`
`
`001ALYYNVYJOVYOLS
`
`
`MATTONINOD
`
`ZYYNWYwnaneaaedRdeeme,roaEaSLUM
`
`(ees
`
`tag-—“YINOD4INOOD
`
`p013dIMLSFOVYOLS
`
`aNdmeda0zuL0a,a,a,a,a,a,390SLIM
`POTETIOMINGD
`FSCe1204dlls
`
`
`
`
`soy.Sued€uzo
`
`aMATIONLNOD
`EYINVYFOVYOLS
`
`
`e‘Sid
`
`eYNGeyedeuzacuLa
`
`NwaNaNa
`
`HPE, Exh. 1011, p. 4
`
`HPE, Exh. 1011, p. 4
`
`
`

`

`U.S. Patent
`
`Aug. 11, 2009
`
`Sheet 3 of 4
`
`US 7,574,556 B2
`
`eyed“acySUC
`
`69M89MNVY\99MSOM
`
`zuzd
`
`COMZOM
`
`LOM
`
`
`
`euza7yOM0),adIMLS
`
`ey“a201dIdLS
`
`LOM
`
`€d4J9Vd
`édADVd
`
`bddDVd
`
`€uXD
`
`
`
`FOVYOLSJOVYOLS
`
`
`
`YATIONLNODYSTIONLNOD
`
`AOVYOLS
`
`YSATIOWLNOD
`
`LYX9
`
`So
`
`|SS‘GALUOSCYNYYYcCALYOSZHMNYY
`0aZ
`gOanandYVTNOUIO
`
`20ANANDYVINDUID
`
`0,m‘GSLYOSLYYNVY
`
`LY3NANDYVININID
`
`
`JONANOASAON303Y
`ty
`0OleLid~~
`
`e"old
`
`bd
`
`HPE, Exh. 1011, p. 5
`
`HPE, Exh. 1011, p. 5
`
`
`
`
`

`

`U.S. Patent
`
`Aug. 11, 2009
`
`Sheet 4 of 4
`
`US 7,574,556 B2
`
`
`
`
`
`
`
`
`
`STORINGLOGIC 410
`
`RECENCY LOGIC 440
`
`LISTINGLOGIC 420
`
`DESTAGINGLOGIC 450
`
`ISSUNGLOGIC 460
`
`SELECTINGLOGIC430
`
`QUEUE 465
`
`STORAGE CONTROLLERCXR1
`
`FG 4
`
`HPE, Exh. 1011, p. 6
`
`

`

`US 7,574,556 B2
`
`1.
`WISE ORDERING FOR
`WRITES COMBINING SPATAL AND
`TEMPORAL LOCALITY IN WRITE CACHES
`
`CROSS-REFERENCE
`
`This application is related to U.S. patent application Ser.
`No. 1 1/384,890, Wise Ordering For Writes Combining
`Spatial and Temporal Locality in Write Caches For Multi
`Rank Storage, which is filed on the same date as the present
`application, and the related application is hereby incorpo
`rated herein by reference.
`
`BACKGROUND
`
`1. Field of the Invention
`The present invention concerns caching in computer sys
`tems and, more particularly, concerns write caching for disk
`storage devices.
`2. Related Art
`Processor speeds have grown about sixty percent per year,
`while electromechanical disk storage devices have improved
`their access times at a comparatively meager annual rate of
`about eight percent. At the same time, capacity of disk storage
`devices has grown more than fifty percent per year, which
`tends to reduce the amount of data available by parallel,
`concurrent disk access. By themselves, these trends would
`dictate that a processor must wait longer for disk read and
`write operations. Consequently, huge efforts have gone into
`hiding latency for disk bound computer applications.
`A cache has been described by the Storage Networking
`Industry Association as “A high speed memory or storage
`device used to reduce the effective time required to read data
`from or write data to a lower speed memory or device.”
`Caching is a fundamental technique in hiding I/O latency and
`is widely used in storage controllers, databases, file systems,
`and operating systems. A modern storage controller's cache
`typically contains Volatile memory used as a read cache and a
`non-volatile memory used as a write cache. Non-volatile
`storage (“NVS), which is typically fast, but relatively expen
`sive, random access memory, enables writes to be stored
`quickly and safely in an NVS write cache and destaged, i.e.,
`written to disk storage, later in an asynchronous fashion,
`which hides the latency of writing to the disk storage, which
`is slower, but relatively less expensive than the NVS write
`cache. Read cache management is a well studied discipline
`and there are a large number of cache replacement algorithms
`in this context. In contrast, write caching is a relatively less
`developed subject.
`In destaging, a balance must be obtained. Data in the write
`cache must be destaged quickly enough so that there is always
`space for incoming writes, but not so quickly that the benefit
`of the write cache is under utilized. This balance has been
`addressed, for example, by linear threshold scheduling, in
`which the rate of destaging is varied in response to instanta
`neous occupancy of the write cache. It has also been
`addressed by Scheduling using least-cost and high/low
`“water marks.
`Another issue presented in destaging concerns the destag
`ing order. As long as the write cache is drained fast enough,
`the precise order in which data is destaged does not affect
`performance from the standpoint of fairness or of write
`requests becoming starved because write requests are consid
`ered complete, in most respects, once they are written to the
`write cache. However, destage ordering can crucially affect
`peak write throughput and performance of concurrent reads.
`Peak write throughput is affected by destage ordering because
`
`2
`the capacity of disks to physically Support sequential or
`nearly sequential write traffic is significantly higher than their
`capacity to Supportrandom writes. That is, destaging writes in
`an order that exploits this physical fact can significantly
`improve peak write throughput. This was a fundamental moti
`Vation for development of log-structured file systems in a
`different context. Performance of concurrent reads is affected
`by destage ordering because to the extent destage ordering
`improves write performance, it improves read performance.
`That is, with concurrent reads and writes, the less time spent
`writing to a disk, the more time there is available to read from
`the disk.
`As may be seen from this briefbackground, problems are
`presented in balancing tradeoffs involved in write caching.
`Problems are presented even regarding how to frame all the
`issues regarding write caching and in recognizing resultant
`interactions among the issues.
`
`SUMMARY OF THE INVENTION
`
`The present invention addresses the foregoing problem.
`According to one form of the invention, in a storage system
`having an array of storage disks, the array is ordered in an
`sequence of write groups and a write cache is shared by the
`disks. A method of destaging the write cache includes placing
`ones of the write groups in the write cache temporarily,
`responsive to the ones of the write groups being written to the
`array. The write groups stored in the write cache are listed in
`order of the sequence in the array. The listing is circular, so
`that a lowest one of the write groups is listed next to a highest
`one of the write groups. The listed write groups are selected in
`rotation according to the sequence. Such a listed write group
`is destaged from the write cache to the disk responsive to the
`selecting of the write group and a state of a recency indicator
`for the write group. The recency indicator shows recency of
`writing to the write group.
`In another aspect, Such a recency indicator includes a bit
`having one state indicating the write group has been written to
`recently and another state indicating the write group has not
`been written to recently.
`In another aspect, the method includes setting Such a
`recency indicator for Such a write group responsive to placing
`the write group in the write cache.
`In another aspect, the method includes clearing the state of
`Such a recency indicator responsive to the selecting of the
`indicator's respective write group, and wherein the destaging
`of a write group responsive to the write group's recency
`indicator includes destaging responsive to the indicator being
`cleared.
`In another aspect, the destaging of Such write groups
`includes issuing numerous, concurrent destaging write
`request to ones of the disks, and the method includes varying
`the concurrent write requests in number responsive to space
`in the shared write cache.
`In another aspect, varying the concurrent write requests
`includes issuing concurrent destaging write requests at a cer
`tain predetermined maximum rate responsive to write cache
`space exceeding the certain high threshold.
`In another aspect, varying the concurrent write requests
`includes increasing the number of concurrent destaging write
`requests responsive to decreasing space in the write cache if
`the write cache space is within a range between a certain high
`threshold and a certain low threshold.
`In another aspect, varying the concurrent write requests
`includes issuing concurrent destaging write requests at a cer
`tain predetermined minimum rate responsive to write cache
`space falling below the low threshold for a write group that is
`
`10
`
`15
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`HPE, Exh. 1011, p. 7
`
`

`

`3
`a member of a sequence of write groups in the write cache, if
`a next listed write group for destaging is also a member of a
`sequence of write groups.
`In another aspect, varying the concurrent destaging write
`requests includes issuing no write requests responsive to the
`write cache space falling below the low threshold for a write
`group that is not a member of a sequence of write groups in the
`write cache.
`In another form of the invention, a computer program
`product resides on a computer usable medium having com
`10
`puter readable program code. The program code includes
`instructions for placing ones of the write groups in the write
`cache temporarily, responsive to the ones of the write groups
`being written to the array. The program code further includes
`instructions for listing the write groups stored in the write
`cache in order of the sequence in the array, wherein the listing
`is circular, so that a lowest one of the write groups is listed
`next to a highest one of the write groups. The program code
`further includes instructions for selecting the listed write
`groups in rotation according to the sequence. The program
`code further includes instructions for destaging Such a listed
`write group from the write cache to the disk responsive to i)
`the selecting of the write group and ii) a state of a recency
`indicator for the write group, wherein the recency indicator
`shows recency of writing to the write group.
`In another aspect, the program code includes instructions
`for setting Such a recency indicator for Such a write group
`responsive to placing the write group in the write cache.
`In another aspect, the program code includes instructions
`for clearing the state of such a recency indicator responsive to
`the selecting of the indicator's respective write group, and the
`instructions for destaging of a write group responsive to the
`write group's recency indicator include instructions for
`destaging responsive to the indicator being cleared.
`In another aspect, the program code includes instructions
`for destaging of Such write groups include instructions for
`issuing numerous, concurrent destaging write request to ones
`of the disks, and the program code includes instructions for
`varying the concurrent write requests in number responsive to
`space in the shared write cache.
`In another aspect, the instructions for varying the concur
`rent write requests include instructions for issuing concurrent
`destaging write requests at a certain predetermined maximum
`rate responsive to write cache space exceeding the certain
`high threshold.
`In another aspect, the instructions for varying the concur
`rent write requests include instructions for increasing the
`number of concurrent destaging write requests responsive to
`decreasing space in the write cache if the write cache space is
`within a range between a certain high threshold and a certain
`low threshold.
`In another aspect, the instructions for varying the concur
`rent write requests include instructions for issuing concurrent
`destaging write requests at a certain predetermined minimum
`rate responsive to write cache space falling below the low
`threshold for a write group that is a member of a sequence of
`write groups in the write cache, if a next listed write group for
`destaging is also a member of a sequence of write groups.
`In another aspect, the instructions for varying the concur
`rent destaging write requests include instructions for issuing
`no write requests responsive to the write cache space falling
`below the low threshold for a write group that is not a member
`of a sequence of write groups in the write cache.
`In another form of the invention, a storage system has a
`storage controller and a write cache that is shared by an array
`of storage disks. The array is ordered in an sequence of write
`groups. The storage controller includes storing logic for plac
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`US 7,574,556 B2
`
`15
`
`4
`ing ones of the write groups in the write cache temporarily,
`responsive to the ones of the write groups being written to the
`array. The storage controller also includes listing logic for
`listing the write groups stored in the write cache in order of
`their sequence in the array, wherein the listing is circular, so
`that a lowest one of the write groups is listed next to a highest
`one of the write groups. The storage controller also includes
`selecting logic for selecting the listed write groups in rotation
`according to the sequence. The storage controller also
`includes destaging logic for destaging such a listed write
`group from the write cache to the disk responsive to i) the
`selecting of the write group and ii) a state of a recency indi
`cator for the write group, wherein the recency indicator shows
`recency of writing to the write group.
`In another aspect of the storage system, such a recency
`indicator includes a bit having one state indicating the write
`group has been written to recently and another state indicat
`ing the write group has not been written to recently.
`In another aspect, the storage system includes recency
`logic for setting Such a recency indicator for Such a write
`group responsive to placing the write group in the write cache.
`In another aspect, the recency logic includes logic for clear
`ing the state of Such a recency indicator responsive to the
`selecting of the indicator's respective write group, and
`wherein the destaging logic includes logic for destaging
`responsive to the indicator being cleared.
`In another aspect, the destaging logic includes logic for
`issuing numerous, concurrent destaging write request to ones
`of the disks, and the system includes logic for varying the
`concurrent write requests in number responsive to space in
`the shared write cache.
`In another aspect, the logic for varying the concurrent write
`requests includes logic for issuing concurrent destaging write
`requests at a certain predetermined maximum rate responsive
`to write cache space exceeding the certain high threshold.
`In another aspect, the logic for varying the concurrent write
`requests includes logic for increasing the number of concur
`rent destaging write requests responsive to decreasing space
`in the write cache if the write cache space is within a range
`between a certain high threshold and a certain low threshold.
`In another aspect, the logic for varying the concurrent write
`requests includes logic for issuing concurrent destaging write
`requests at a certain predetermined minimum rate responsive
`to write cache space falling below the low threshold for a
`write group that is a member of a sequence of write groups in
`the write cache, if a next listed write group for destaging is
`also a member of a sequence of write groups.
`In another aspect, the logic for varying the concurrent
`destaging write requests includes logic for issuing no write
`requests responsive to the write cache space falling below the
`low threshold for a write group that is not a member of a
`sequence of write groups in the write cache.
`Other variations, objects, advantages, and forms of the
`invention will become apparent upon reading the following
`detailed description and upon reference to the accompanying
`drawings.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The foregoing and other objects, aspects and advantages
`will be better understood from the following detailed descrip
`tion of a preferred embodiment(s) of the invention with ref
`erence to the drawings, in which:
`FIG. 1 illustrates a storage system, according to an embodi
`ment of the present invention.
`FIG. 2 illustrates certain details of the storage system of
`FIG. 1, according to an embodiment of the present invention.
`
`HPE, Exh. 1011, p. 8
`
`

`

`US 7,574,556 B2
`
`5
`FIG. 3 illustrates certain additional details of the storage
`system of FIGS. 1-2, according to an embodiment of the
`present invention.
`FIG. 4 illustrates certain additional details of one of the
`storage controllers of FIGS. 1-3, according to an embodiment
`of the present invention.
`
`DETAILED DESCRIPTION OF PREFERRED
`EMBODIMENTS OF THE INVENTION
`
`6
`many other variants known as LOOK, VSCAN, FSCAN,
`Shortest Positioning Time First (“SPTF), GSTF and WSTF,
`and largest segment per track (“LST).
`These known algorithms that are advantageous regarding
`spatial locality may depend upon a detailed knowledge of
`instantaneous position of a disk head and the location of each
`data segment relative to the disk head, particularly for read
`operations. However, the present invention is in the context of
`a storage controller at a level in a memory hierarchy that is
`somewhat removed from the actual disks themselves. Further,
`it is for write operations instead of read operations. In this
`context, most disk parameters are hidden, due to RAID, for
`example, so that spatial locality is generally hard to exploit.
`For example, it has been found that applying SATF at a high
`level in the memory hierarchy Such as that of a storage con
`troller was not practical. See L. Huang et al., Experience in
`Building a Software Based SATF Scheduler.” Tech. Rep.
`ECSL-TR81, SUNY at Stony Brook (July 2001) (“... mod
`ern disks have too many internal control mechanisms that are
`too complicated to properly account for in the disk service
`time model. This exercise lead us to conclude that software
`based SATF disk schedulers are less and less feasible as the
`disk technology evolves.”), also (“Even when a reasonably
`accurate software based SATF disk scheduler can be success
`fully built, the performance gain over a SCAN based disk
`scheduler that it can realistically achieve appears to be insig
`nificant...”). This conclusion regarding SATF was for mere
`single disk applications. For applications having an array of
`redundant disks, as in the context of the present invention, the
`conclusion is all the more certain. Previous work has
`attempted to apply principles of both spatial and temporal
`locality to the problem of write cache control by partitioning
`the write cache into a “hot” Zone managed according to an
`LRW policy and a “cold' Zone managed via an LST policy.
`However, this work only deals with interaction between the
`cache and one disk. T. R. Haining, “Non-volatile Cache Man
`agement for Improving Write Response Time with Rotating
`Magnetic Media. 126, Ph.D. Thesis, Ph.D. Dissertation,
`University of California, Santa Cruz (2000). Also, even with
`regard to a single disk, further work is needed to develop an
`adaptive algorithm fortuning the size of the hot Zone. Seeid.
`125 (“One of the most immediate aspects of this work requir
`ing more research is the method to determine the size of the
`hot Zone for the stack model based replacement algorithm.
`We determined the best size for the hot Zone empirically in
`our experiments”). The work does not fully address how to
`determine the best adaptive partition. In addition, the hot Zone
`optimizes for temporal locality, whereas the cold Zone for
`spatial locality.
`In seeking to apply principles of both spatial and temporal
`locality for write cache control in the context of a storage
`controller, i.e., in upper levels of memory hierarchy, the
`present invention applies processes related to LRW for tem
`poral locality and CSCAN for spatial locality. In CSCAN,
`I/O's are served only in increasing order of their logical
`addresses. A difficulty arises because the order of destaging
`that is suggested by LRW tends to be different than that of
`CSCAN. This difficulty is addressed in the present invention
`by a novel combination, referred to herein as Wise Ordering
`for Writes (“WOW), wherein write groups in the write cache
`are maintained in a CSCAN-like arrangement, but with
`recency bits akin to those of CLOCK. (For details of CLOCK,
`See F. J. Corbato, “A paging experiment with the multics
`system.” In Honor of P. M. Morse, pp. 217-228, MIP Press,
`1969.) Destaging is skipped for write groups to which data
`has been recently written, as indicated by the recency bits.
`
`10
`
`15
`
`25
`
`In the following detailed description of the preferred
`embodiments, reference is made to the accompanying draw
`ings illustrating embodiments in which the invention may be
`practiced. It should be understood that other embodiments
`may be utilized and changes may be made without departing
`from the scope of the present invention. The drawings and
`detailed description are not intended to limit the invention to
`the particular form disclosed. On the contrary, the intention is
`to coverall modifications, equivalents and alternatives falling
`within the spirit and scope of the present invention as defined
`by the appended claims. Headings herein are not intended to
`limit the Subject matter in any way.
`Overview
`There are numerous incentives to improving write cache
`destaging performance, and, hence, overall write perfor
`mance, which also leads to improved read performance, as
`previously stated. For one thing, the widely used redundant
`array of independent disks (“RAID) technology tends to
`make writes slower than reads. Also, write caches are com
`monly smaller than read caches. Further, client side caching
`30
`does not significantly benefit write operations as it does read
`operations. Also, the proportion of write operations in the
`widely used storage benchmark SPC-1 is larger than the
`proportion of read operations.
`The present invention involves a recognition that write
`destaging performance depends upon a combination of fac
`tors, including (i) the total number of destages to disks,
`namely, the write miss ratio, and (ii) the average cost of each
`destage. In the absence of any read operations, it is important
`for a write destage algorithm to minimize the product of these
`factors. Even in the presence of concurrent reads, the product
`of these factors tends to minimize the amount of time that the
`disk heads are occupied in serving writes, which tends to
`minimize the average read response time and maximize
`aggregate throughput.
`To minimize the first factor, the present invention seeks to
`exploit temporal locality, which may favor destaging seg
`ments of data least likely to be written to while in the write
`cache, and thus most likely to ultimately be destaged. Clas
`sically, a least recently used (“LRU) policy has been used to
`exploit temporal locality in read caches. In write caches this
`translates to a least recently written (“LRW) policy. Other
`policies that have been used in read caches for exploiting
`temporal locality include CLOCK, FBR, LRU-2, 20, LRFU,
`LIRS, MQ, ARC and CAR, all of which attempt to reduce the
`miss ratio in a read cache.
`To minimize the second factor, the present invention seeks
`to exploit spatial locality, which may favor destaging seg
`ments of data located closest together on the disks, and thus
`Subject to the fastest write operation, according to the heads
`and the geometry of the disks in the system. A number of
`algorithms are known for exploiting spatial locality, includ
`ing those which seek to serve I/O’s on a first-come-first-serve
`(“FCFS) basis, a shortest-seek-time-first (“SSTF) basis, a
`shortest access time first ("SATF) basis, and on the basis that
`I/O's are served first in increasing order and then in decreas
`ing order of their logical addresses ("SCAN). There are
`
`50
`
`35
`
`40
`
`45
`
`55
`
`60
`
`65
`
`HPE, Exh. 1011, p. 9
`
`

`

`US 7,574,556 B2
`
`5
`
`10
`
`15
`
`25
`
`35
`
`7
`Described another way, the present invention frames the
`destaging problem as interrelated issues concerning what
`write data to destage and when to destage the selected write
`data. According to the present invention, the issues are
`defined and addressed by a novel combination of a particular
`kind of adaptive water mark and a particular kind of linear
`threshold Scheduling.
`Example of Destaging Context
`Referring now to FIG. 1, a storage system 100 is shown
`according to an embodiment of the present invention. System
`100 takes the form of a computer system. It should be under
`stood that the term “computer system’ is intended to encom
`pass any device having a processor that executes instructions
`from a memory medium, regardless of whether referred to in
`terms of a microcontroller, personal computer system, main
`frame computer system, workStation, server, or in Some other
`terminology. Computer system 100 includes processors 115,
`a volatile memory 127, e.g., RAM and a nonvolatile memory
`129. Memories 127 and 129 store program instructions (also
`known as a “software program'), which are executable by
`processors 115, to implement various embodiments of a soft
`ware program in accordance with the present invention. Pro
`cessor 115 and memories 127 and 129 are interconnected by
`bus 140. An input/output adapter (not shown) is also con
`nected to bus 140 to enable information exchange between
`processors 115 and other devices or circuitry. System 100 also
`includes a keyboard 133, pointing device 130, e.g., mouse,
`floppy disk, CD-ROM, and DVD, and a display device 137.
`In the illustrated embodiment, system 100 is an IBM
`30
`XSeries 345 computer equipped with two Intel Xeon 2 GHz
`processors 115. In the illustrated embodiment, nonvolatile
`memory 129 includes 10K RPM, SCSI, DDR disks D1R1
`through DNR1, D1R2 through DNR2, and D1R3 through
`DNR3, each disk being 36.4 GB each. The number of ranks
`and number of disks N in each rank may vary in different
`embodiments, of course.
`Nonvolatile memory 129 also includes another disk (not
`shown) that is used for an operating system, Software appli
`cations, and workloads. In other embodiments the operating
`system may be on multiple disks or on Some other nonvolatile
`store, not necessarily a disk. In another embodiment the oper
`ating system may even be programmed in specialized chip
`hardware. A Linux kernel runs on system 100 for hosting the
`applications and standard workload generators. Memory 129
`also includes ROM, which is not shown, and may include
`other devices, which are also not shown, Such as floppy disks,
`CD-ROMs, and DVDs.
`Referring now to FIG. 2, certain details of system 100 are
`illustrated according to an embodiment of the invention. SCSI
`disks, D1R1, etc., are configured as three RAID-10 arrays R1,
`R2, and R3, each having a respective set of data disks, each set
`being N in number. For example, array R1 has data disks
`D1R1, D2R1, D3R1, etc. through DNR1. Arrays R1,R2 and
`R3 have respective RAID storage controllers CXR1, CXR2
`55
`and CXR3, each for controlling its respective set of the
`RAID-configured disks. Each disk has its own disk controller.
`For example, disk D1 R1 has its own disk controller C1R1, as
`shown.
`In a RAID-10 implementation illustrated in FIG. 2, storage
`controllers CXR1, CXR2 and CXR3 have respective write
`caches WXR1, WXR2 and WXR3, which are managed in
`terms of 64 KB strip 102 write groups. Strip 102 size may
`vary, of course. Write cache WXR1 is shared by the disks
`D1R1, etc. of array R1, WXR2 is shared by the disks D1R2,
`etc. of array R2 and WXR3 is shared by the disks D1 R3, etc.
`of array R3.
`
`50
`
`40
`
`45
`
`60
`
`65
`
`8
`In an alternative embodiment of the invention, disks D1R1
`DNR1, D1R2-DNR2 and D1R3-DNR3 are configured as
`RAID-5 arrays. For the RAID-5 implemen

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket