throbber
CONFERENCE ON MEASUREMENT &
`MODELING OF COMPUTER SYSTEMS
`BOULDER, COLORADO USA
`MAY 22-25,
`I990
`
`PROCEEDINGS
`VL1
`0.1
`O
`8
`.
`N
`
`CSIGMETRICS
`
`EMC
`EMC
`EXHIBIT 1007
`EXHIBIT 1007
`
`

`
`TX
`
`3 323 896
`
`PERFORMANCE EVALUATION REVIEW
`
`SPECIAL ISSUE
`
`VOLUME 18
`
`NO. 1 MAY, 1990
`
`1990 ACM SIGMETRICS
`
`CONFERENCE ON
`MEASUREMENT AND MODELING
`
`OF COMPUTER SYSTEMS
`
`May 22-25, 1990
`
`University of Colorado
`
`Boulder, Colorado
`
`Page 2 of 17
`
`

`
`TI
`
`3 323 896
`
`PERFORMANCE EVALUATION REVIEW
`
`SPECIAL ISSUE
`
`VOLUME 18
`
`NO. 1 MAY, 1990
`
`1990 ACM SIGMETRICS
`
`CONFERENCE ON
`
`MEASUREMENT AND MODELING
`
`OF COMPUTER SYSTEMS
`
`PROCEEDINGS
`
`'
`
`-
`
`'
`
`..‘._'.
`
`'-'
`
`‘_
`
`r'
`
`f_"
`
`~
`
`'.
`
`‘
`
`_.
`
`_r
`
`May 2225, 1990
`
`University of Colorado
`
`Boulder, Colorado
`
`Page 3 of 17
`
`

`
`The Association for Computing Machinery
`11 \Vest 42"“ Street
`New York, New York 10036
`
`© 1990 by the Association for Computing Machinery, Inc. Copying without fee is permitted
`provided that the copies are not made or distributed for direct commercial advantage, and credit
`to the source is given. Abstracting with credit is permitted. For other copying of articles that
`carry a code at the bottom of the first page, copying is permitted provided that the per-copy fee
`indicated in the code is paid through the Copyright Clearance Center, 27 Congress Street, Salem,
`MA 01970. For permission to republish write to: Director of Publications, Association for
`Computing Machinery. To copy otherwise, or republish, requires a fee and/or specific permission.
`
`ISBN 0-89791-359—0
`
`Additional copiesfmayi ordered‘”prepaid from
`4;
`.4
`
`ACM Order Depanment
`P.O. Box 64145
`2‘
`Baltimore, MD 2l264”~.;—l“
`._~
`
`my
`
`in K
`’
`
`‘
`4
`
`;_-_,
`
`A; ‘1:;Price:v:_
`.
`.
`Members
`11 ‘others .
`
`.
`.
`
`. .. $16.00
`. .- $22-00
`
`ACM order nufiBé§:”'4ss9oo
`
`Page 4 of 17
`
`

`
`TABLE OF CONTENTS
`
`SESSION 1: PARALLEL PROCESSING
`
`Urtboundedly Parallel Simulations Via Recurrence Relations
`Albert G. Grcenberg and Boris D. Lubachevsky, AT&T Bell Labs
`Isi Mitrani, University of Newcastle upon Tyne ..................................................... ..
`
`A Performance Evaluation of a General Parallel Processing Model
`Randolph Nelson, IBM T. J. Watson Research Center
`.............................................
`
`13
`
`SESSION 2; TRACE-DRIVEN SIMULATION
`
`Efficient Trace—Driven Simulation Methodsfor Cache Performance Analysis
`Wen-Hann Wang, IBM T. J. Watson Research Center
`Jean-Loup Baer, University of Washington ............................................................ ..
`
`Techniquesfor Efficient Inline Tracing on a Shared-Memory Multiprocessor
`Susan I. Eggers, David R. Koppel, Eric J. Koldinger, and Henry M. Levy,
`University of Washington ..........................................................................................
`
`Blocking: Exploiting Spatial Locality for Trace Compaction
`AnantAgarwa1 and Minor Huffman, MIT ................................................................
`
`SESSION 3: FAULT TOLERANT SYSTEMS
`
`A Bayesian Approach to Fault Classification
`Tein-Hsiang Lin, SUNY at Buffalo
`Kang G. Shin, University of Michigan .................................................................... ..
`
`Probabilistic Language Analysis of Weighted Voting Algorithms
`Louise E. Moser, Vikas Kapur, and P.M. Mel1iar—Smith,
`University of California, Santa Barbara .................................................................. ..
`
`An Evaluation ofRedundant Arrays ofDisks using an Amdahl 5890
`Peter M. Chen, Garth A. Gibson, Randy H. Katz, and David A. Patterson,
`University of California, Berkeley ........................................................................... ..
`
`27
`
`37
`
`48
`
`58
`
`67
`
`74
`
`SESSION 4: COMPUTER NETWORKS AND CONCURRENCY CONTROL
`
`Simultaneous Analysis of Flow and Error Control Strategies With Congestion—Dependent
`Errors
`
`Amarnath Mukherjee, Lawrence H. Landweber, and John C. Strikwerda,
`University of Wisconsin-Madison ........................................................................... ..
`
`Queueing Analysis of an ATM Switch with Multichannel Transmission Groups
`Arthur Y.M. Lin and John A. Silvester,
`University of Southern California ............................................................................ ..
`
`Approximate Analysis ofReader and Writer Access to a Shared Resource
`Theodore Johnson, Courant Institute, NYU
`
`86
`
`95
`
`105
`
`Page 5 of 17
`
`

`
`SESSION 5: PERFORMANCE OPTIMIZATION AND TUNING
`
`Quartz: A Toolfor Tuning Parallel Program Performance
`Thomas B. Anderson and Edward D. Lazowska,
`University of Washington ........................................................................................ ..
`
`115
`
`A Calculus of Variations Approach to File Allocation Problems in Computer Systems
`Krishna R. Pattipati, University of Connecticut
`Joel Wolf, IBM T. J. Watson Research Center
`
`Somnath Deb, University of Connecticut
`
`................................................................ ..
`
`126
`
`SESSION 6: MEMORY MANAGEMENT
`
`Data Cache Management Using Frequency-Based Replacement
`John T. Robinson and Murthy V. Devarakonda,
`IBM T. J. Watson Research Center
`......................................................................... ..
`
`An Approximate Analysis of the LR U and FIFO Bufler Replacement Schemes
`Asit Dan and Don Towsley,
`Univerity of Massachusetts, Amherst
`
`...................................................................... ..
`
`An Advisorfor Flexible Working Sets
`Raphael Alonso and Andrew W. Appel,
`Princeton University ................................................................................................ ..
`
`134
`
`143
`
`153
`
`SESSION 7: MULTIPROCESSOR INTERCONNECTS
`
`Analysis of Critical Architectural and Programming Parameters in a Hierarchical
`Shared Memory Multiprocessor
`Josep Torrellas, John Hennessy, and Thierry Weil,
`Stanford University ....................................................................................................
`
`163
`
`Performance Evaluation of a Commercial Cache-Coherent Shared Memory Multiproces-
`sor
`
`Rajeev Jog, Philip L. Vitale, and James R. Cailister,
`Hewlett-Packard ....................................................................................................... ..
`
`Performance Analysis of the Connection Machine
`Erol Gelenbe, EHEI—Paris ..........................................................................................
`
`An Analytic Model ofMultistage Interconnection Networks
`Darryl L. Willick and Derek L. Eager,
`University of Saskatchewan .......................................................................................
`
`173
`
`183
`
`192
`
`Page 6 of 17
`
`

`
`SESSION 8: PROCESSOR SCHEDULING
`
`Dynamic Partitioning in a Transputer Environment
`K Dussa, B. Carlson, L. Dowdy, and K.—H. Park,
`Vanderbilt University .............................................................................................. ..
`
`Processor Scheduling in Shared Memory Multiprocessors
`John Zahorjan and Cathy McCann,
`University of Washington ........................................................................................ ..
`
`203
`
`214
`
`The Performance ofMultiprogrammed Multiprocessor Scheduling Algorithms
`Scott T. Leutenegger and Mary K. Vernon,
`University of Wisconsin-Madison
`
`POSTER SESSION EXTENDED ABSTRACTS
`
`Efficient Simulation ofMultiprogramming
`W. P. Dawkins, V. Debbad, J. R. Jump, and J. B. Sinclair,
`Rice University ..........................................................................................................
`
`237
`
`Making Flow Control Work in Networks: A Control-Theoretic Analysis of Gateway Ser-
`vice Disciplines
`Scott Shenker, Xerox PARC .................................................................................... ..
`
`239
`
`Making Greed Work in Networks: A Game-Theoretic Analysis of Gateway Service Dis-
`ciplines
`Scott Shenker, Xerox PARC .................................................................................... ..
`
`241
`
`Factors Aflecting the Performance of Multiuser Database Management Systems
`Shahram Ghandeharizadeh and David J. DeWitt,
`University of Wisconsin .......................................................................................... ..
`
`243
`
`A Benchmark of NonStop SQL Release 2 Demonstrating Near-Linear Speedup and
`Scaleup on Large Databases
`Susanne Englert, Jim Gray, Terrye Kocher, and Praful Shah,
`Tandem Computers, Inc.
`.......................................................................................... ..
`
`245
`
`Phased Mission Reliability Analysis
`Arun K. Somani, James A. Ritcey, and Stephen H.L. Au,
`University of Washington ........................................................................................ ..
`
`Performance Analysis of a Fault Tolerant Computer System
`Lionel C. Mitchell, CTA Incorporated .................................................................... ..
`
`Ray Tracing on Distributed Memory Parallel Systems
`David W. Jensen and Daniel A. Reed,
`University of Illinois ................................................................................................ ..
`
`247
`
`249
`
`251
`
`Characterizing and Modeling Ethernet Performance of Distributed DECwindows Appli-
`cations
`
`Dinesh Mirchandani and Prabuddha Biswas,
`Dlgltal Equipment Corporation ................................................................................ ..
`
`25 3
`
`Page 7 of 17
`
`

`
`Challenges in Obtaining Peak Parallel Performance with a Convex C240, a Parallel
`Vector Processor
`
`Patrick F. McGehea1ty, Convex Computer Corporation ......................................... ..
`
`255
`
`Traffic Characterization of the NSFNET National Backbone
`Steven A. Heimlich, University of Maryland .......................................................... ..
`
`257
`
`Ease: An Environmentfor Architecture Study and Experimentation
`Jack W. Davidson and David B. Whalley,
`University of Virginia .............................................................................................. ..
`
`Dynamic Queue Behavior in Networks with Window Protocols
`John G. Waclawsky and Ashok K. Agrawala,
`University of Maryland ............................................................................................ ..
`
`The Performance ofMultistage Interconnection Networks with Finite Bufiers
`John D. Garofalakis and Paul G. Spirakis,
`Computer Technology Institute—Patras
`.................................................................... ..
`
`259
`
`261
`
`263
`
`Adaptive Window Flow Control and Learning Algorithms for Adaptive Routing in Data
`Networks
`
`Athanasios V. Vasilakos, Christos A. Moschonas and Constantinos T. Paximadis,
`University of Patras
`................................................................................................. ..
`
`265
`
`Modeling a Circuit Switched Multiprocessor Interconnect
`Daniel Nussbaum, Ingmar Vuong-Adlerberg, and Anant Agarwal,
`MIT.
`...........................................................................................................................
`
`267
`
`TUTORIALS ............................................................................................................................
`
`270
`
`AUTHOR INDEX .................................................................................................................. ..
`
`273
`
`Page 8 of 17
`
`

`
`Data Cache Management Using Frequency-Based Replacement
`
`John T. Robinson
`
`Murthy V. Devarakonda
`
`IBM Research Division, T. J. Watson Research Center
`P. O. Box 704, Yorktown Heights, NY 10598
`
`Abstract. We propose a new frequency—based replacement
`algorithm for managing caches used for disk blocks by a file
`system, database management system, or disk control unit,
`which we refer to here as data caches. Previously, LRU
`replacement has usually been used for such caches. We
`describe a replacement algorithm based on the concept of
`maintaining reference counts in which locality has been
`“factored out".
`In this algorithm replacement choices are
`made using a combination of reference frequency and block
`age. Simulation results based on traces of file system and
`IE0 activity from actual systems show that this algorithm can
`offer up to 34% performance improvement over LRU re-
`placement, where the improvement is expressed as the frac-
`tion of the performance gain achieved between LRU
`replacement and the theoretically optimal policy in which the
`reference string must be known in advance. Furthermore,
`the implementation complexity and efficiency of this algo-
`rithm is comparable to one using LRU replacement.
`
`1. Introduction
`
`In any memory hierarchy performance can be improved
`by caching data from a lower level of the hierarchy in a
`higher level. The goal in designing a cache management
`algorithm is to achieve as low a cache miss ratio as possible
`subject to constraints on the complexity of the algorithm.
`Cache management algorithms have been most widely
`studied in the context of CPU caches (for example see
`[SMlT82]),
`in which there are significant complexity con-
`straints. There have also been numerous studies concerning
`virtual memory management, in which the complexity con-
`straints are somewhat relaxed (a hit must be handled
`quickly, e.g. by means of address translation and the setting
`ofa reference bit, whereas a miss can be handled by oper-
`ating system software). Here, however, we are primarily
`concerned with caches residing in main memory that are
`managed entirely by software. Examples include caches for
`file systems and database management systems.
`In these
`
`Permission to copy without fee all or part of this material is granted provided
`that the copies are not made or distributed for direct commercial advantage,
`the ACM copyright notice and the title of the publication and its date appear,
`and notice is given that copying is by permission of the Association for
`Computing Machinery. To copy otherwise, or to republish, requires a fee
`and/or specific permission.
`© l990 ACM 08979l-359-0/90/0005/Ol34 $1.50
`
`cases the average access time for file or database disk blocks
`is reduced by storing some subset of these blocks in the
`cache. Another example is a cache within a disk control unit
`(disk cache), managed by control unit microcode, which can
`be used to reduce the average device access time without
`involving host system software. Here we will refer to these
`various types of caches as data caches.
`
`The design issues for data caches are somewhat different
`than those for CPU caches or virtual memory. For exam-
`ple, due to the fact that the cache is managed by software (or
`control unit microcode), a strict
`implementation of LRU
`replacement is possible.
`in addition to relaxed complexity
`constraints, the reference behavior may be different.
`
`Previous studies of file system caches and disk caches
`include [OUST85] and [SMIT85], respectively.
`Issues of
`block size, cache size, and so on were investigated, but in
`both studies an LRU replacement policy was assumed
`throughout. On the other hand, various replacement poli-
`cies were studied for database caches in [EFFE84]. Due to
`factors related to a number of other design issues, this study
`did not conclusively show that any one replacement policy
`was best. However, it was concluded that LRU replacement
`and the CLOCK algorithm (which approximates LRU re-
`placement using reference bits, and has been used in virtual
`memory management) gave good satisfactory overall be-
`havior.
`
`In terms of differences in reference behavior, in general
`it seems that there is less locality (the degree to which re-
`cently referenced data tends to be re—referenced) in data
`caches as compared to CPU caches or virtual memory sys-
`tems (this was one of the conclusions of[EFFE84] for data-
`base caches).
`lf there were in fact a small degree 0r
`dependence of each reference on previous references, then
`it is possible that an independent reference assumption (in
`which each block is referenced with a given fixed probabil-
`ill’) could be used to accurately model reference behavior-
`In such a case the optimal cache management policy Would
`be to keep the blocks with the highest reference probabilities
`in the cache at all times (see, e.g., [COFF73]).
`In Pfacllce
`of course there would be a problem in determining £11659
`Probabilities. However, given a reference string produced
`by 3 trace, the blocks can be sorted by reference frequt?I’lC)’v
`and this policy can be simulated.
`It has been our expenence
`
`..n...:\,_c5JL,.gm;iz//l..‘o@"
`
`Page 9 of 17
`
`

`
`mat for the file system traces we describe later this leads to
`poor performance as compared to LRU replacement. The
`conclusion is that although there is less locality than in CPU
`caches or virtual memory, there is still sufficient locality so
`that LRU replacement performs quite well.
`Similarly, in
`[EFFE84] replacement algorithms based on estimating ref-
`erence probabilities did not consistently outperform those
`using LRU replacement. This agrees with general experi-
`ence: currently, LRU replacement is almost universally used
`in software managed buffers or caches.
`
`Here, we study a new frequency-based replacement
`cache management algorithm based on the concept of
`"factoring out locality” from reference counts. Like previ-
`ously studied algorithms, reference counts are maintained
`for each block in the cache; unlike previous algorithms, ref-
`erence counts are not incremented on every reference.
`In-
`stead, reference counts are incremented only for those blocks
`that have aged out of a “new section” of the cache. Another
`novel aspect of the algorithm is that replacement choices are
`confined to those blocks that have aged into an “old section"
`of the cache. These techniques and the intuition behind their
`use are discussed in Section 2, in which we describe the fre-
`quency-based replacement cache management algorithm.
`
`In Section 3 we present performance analysis results for
`the new algorithm. These results were generated via detailed
`trace-driven simulations for five actual systems, with differ-
`ent hardware configurations, operating systems, andfor
`workload characteristics. The performance of the new al-
`gorithm is compared to LRU replacement and the theore-
`tically optimal replacement algorithm in which the reference
`string must be known in advance. The results indicate that
`significant performance improvements can be obtained for
`most systems at small to intermediate cache sizes, and for
`all systems at intermediate cache sizes (for increasingly large
`cache sizes, the performance of all three replacement algo-
`rithms tends to converge). Finally, in Section 4 we discuss
`the major conclusions of our study, and identify some areas
`for further research.
`
`2. Frequency-Based Replacement
`
`ln this section we present the frequency-based replace-
`ment (FBR) algorithm by means of a sequence of refine-
`ments. We discuss (1) the use of a “new section" which
`controls the incrementing of reference counts; (2) the use of
`an “old section" to confine replacement choices to blocks
`that have not been too recently referenced; and (3) period-
`ically adjusting reference counts so as to approximate refer-
`ence frequencies and avoid practical problems. Finally we
`describe an implementation which is comparable in com-
`plexity to an implementation of a cache management algo-
`rithm using LRU replacement.
`
`2.1. Factoring Out Locality
`
`The basis of our cache management algorithm is the
`LRU replacement algorithm, in which logically the Cache
`Consists of a stack of blocks, with the most recently refer-
`enced block always pushed on the top of the stack. Unlike
`
`Page 10 of 17
`
`count
`
`:‘ count-rl
`
`(count
`unchanged)
`
`new section
`boundary
`count
`:= l
`
`T
`
`MISS
`
`Figure 2.1. Use of the New Section
`
`LRU replacement, the least recently used block will not ne-
`cessarily be selected for replacement on a cache miss.
`In-
`stead, a reference count is maintained for each block in the
`cache, and in general the blocks with the smallest reference
`counts will be candidates for replacement.
`
`When a block is first brought into the cache, its reference
`count is intialized to one. Previous algorithms using refer-
`ence counts have incremented the count for a given block
`on every reference to that block. This results in the following
`kind of problem: certain blocks are relatively infrequently
`referenced overall, and yet when they are referenced, due to
`locality there are short intervals of repeated re—references,
`thus building up high reference counts. After such an inter-
`val is over, the high reference count is misleading: it is due
`to locality, and cannot be used to estimate the probability
`that such a block will be re-referenced following the end of
`this interval (this description is intuitive, since in practice
`such intervals are not as well—defined as the preceding dis-
`cussion may suggest).
`
`this problem is addressed by “factoring out
`Here,
`locality” from reference counts, as follows. A certain por-
`tion of the top part of the stack is set aside as a new
`section. When there is a cache hit, if the block is in the new
`section then its reference count is not incremented, but it is
`incremented otherwise. This is illustrated in Figure 2.1.
`Given a sufficiently large new section, this results in the ref-
`erence counts for blocks that are repeatedly re-referenced
`within a short interval remaining unchanged. The fraction
`of blocks in the new section, Fnew, is a parameter of the al-
`gorithm.
`
`2.2. Confining Replacement
`
`The technique just described for factoring out locality
`from reference counts can be used without further refine-
`ment as the basis of a replacement policy: on a miss, select
`the block with the smallest reference count that is not in the
`new section for replacement, choosing the least recently used
`such block if there is more than one.
`
`Preliminary experiments with the policy just described
`showed that although it was sometimes possible to obtain
`slightly better performance than LRU replacement, consist-
`ently significant improvements could not be obtained, and
`
`

`
`IIIIIIIIIIIIII
`section
`new section T
`middle
`old Section
`new boundary
`old boundary
`
`Figure 2.2. Three Sections of the Cache
`
`often the performance was worse. The problem seemed to
`be the following: (1) on a cache miss, a new block is brought
`into the new section and initialized with a count of one; (2)
`the count remains at one as long as the block remains in the
`new section; (3) eventually the block ages out of the new
`section, with its count still at one; (4) if the block is not now
`re-referenced fairly quickly it is very likely to be replaced
`since it necessarily has the smallest reference count of those
`blocks that are not in the new section, and all other less re-
`cently used blocks with a count of one may have already
`been replaced.
`In other words, there did not seem to be a
`sufficiently long i_nterval for blocks aging out of the new
`section to build up their reference counts even if they were
`relatively frequently referenced.
`
`This problem is addressed by confining replacement, as
`follows. A portion of the bottom part ofthe stack is defined
`as the old section, and we call the remaining part of the stack
`between the new and old sections the middle section. This
`is illustrated in Figure 2.2. Using the old section, the re-
`placement policy is as follows: on a cache miss, select the
`block with the smallest reference count in the old section for
`replacement, choosing the least recently used such block if
`there is more than one. Assuming a sufficient.ly large middle
`section, intuitively this allows relatively frequently referenced
`blocks a chance to build up their reference counts before
`becoming eligible for replacement. The fraction of blocks
`in the old section, FOM. is another parameter of the algo-
`rithm, where we require that Fagd _<_
`I — Fngw. Note that
`(1) by setting F014 = 1 — F,,e,,.,, the middle section is empty.
`and We have the replacement policy described at the begin-
`ning
`of
`this
`section;
`and
`(2)
`by
`setting
`F0/av = l/(cache size), the old section consists of one block,
`and we have the LRU replacement policy.
`
`2.3. Adjusting Reference Counts
`
`There are some practical problems associated with the
`use of reference counts. One is overflow, which is easily
`dealt with by not incrementing a count ifit is currently at a
`given maximum value. Another problem, not quite so easily
`handled, is that even using the new section as described
`above, certain blocks may build up high reference counts
`and never be replaced: such blocks are essentially fixed in
`the cache. Now it could be that such blocks should be fixed
`in the cache since they are among the most frequently ref-
`erenced blocks. However, in the absence of additional in-
`formation, it must be assumed that it is equally likely that the
`high reference frequency for such a block is short—lived.
`In
`this case fixing such blocks in the cache would cause a gra-
`
`dual degradation in performance. Another issue is relating
`reference counts to reference frequency (that is, references
`per unit of time, where time could be defined as total refer.
`ences).
`
`described in
`issues was
`these
`One approach to
`[EFFE84], where the concept of reference density was in.
`troduced. There, the reference density of a block was de.
`fined as the reference count for the block divided by the total
`number of references made since the block was brought into
`the cache. Unfortunately reference density, defined in this
`fashion, is a floating point number, and given a set of blocks,
`there is no simple fast way to find the block with the mini-
`mum reference density (at least compared to the implemen-
`tation we discuss below,
`in which the least recently used
`block with the smallest reference count in the old section is
`almost always found directly). For this reason we follow
`another approach mentioned in [EF FE84], which was clas-
`sified there as “periodic aging by division".
`
`More precisely, we dynamically maintain the sum of all
`reference counts. Whenever the average reference count
`exceeds a predetermined maximum value, Amax (another
`parameter of the algorithm), every reference count C is re-
`duced to FC/27. Thus, in the steady state the sum of all
`reference counts stays between N x A,-flax/2 and N x Amax
`(where N is the number of cache blocks), and the reference
`count of a block is approximately linearly related to the fre-
`quency of references to the block while it is in the middle and
`old sections. Note that in this reduction of reference counts,
`a count of one remains at one, a count of two is reduced to
`one, a count of three is reduced to two, etc. As will later be
`seen, most blocks selected for replacement have a reference
`count of one, and it is desirable for efliciency to maintain this
`property in the steady state.
`Therefore, after reference
`counts have been reduced, we want to have a count of two
`reduced to one (since using division with a fixed divisor and
`integer arithmetic, the only other alternative is to have it re-
`main at two indefinitely). For this reason we did not make
`the divisor a parameter above (as it was in [EFFE8-1]), since
`division by two using the ceiling function yields the desired
`behavior. Finally, we did not consider reduction of refer-
`ence counts by subtraction, since in that case there is no in-
`tuitive relationship between reference counts and reference
`frequencies.
`
`2.4. Implementation
`
`The preceding discussions essentially define the frequen-
`cy—based replacement algorithm at a logical
`level;
`in this
`section we discuss an efficient implementation. As a starting
`point we assume the following typical implementation of 5
`cache management algorithm using LRU replacement: (1)
`the cache directory consists of a collection of cache directory
`entries. one for each block in the cache; (2) each cache di-
`rectory entry (CDE) describes the contents of the associated
`cache block (e.g.
`file name and block number) a101’lg Wm‘
`any additional information associated with that block; (3) 3
`CDE for a given block is searched for using a hash table-
`where CDEs with the same hash value are linked together
`in a “synonym" chain to handle collisions (for example)? (4)
`
`Page 11 of 17
`
`
`
`
`
`4;-,..-./..—..—...,.\.7......»A«vs<(.-«—Vsun,
`
`

`
`“a
`
`
`
`
`tvar/>3'(J~.“"““'\7r‘*"'-v—-4--wv..,,,.,-.-,_,___
`.,.._.
`
`Av"
`
`“
`
`the LRU stack is implemented as a doubly—linked list of
`CDEs, called tl1e LRU chain, and there are pointers to the
`head (most recently used block) and tail (least recently used
`block) of the list; (5) on a cache hit the CDE for the block
`is moved to the head of the list, and (6) on a cache miss the
`block associated with the CDE at the tail of the list is se-
`lected to be replaced, this CDE is set to refer to the new
`block, and is moved to the head of the list. Since there is a
`one-to-one correspondence between the CDES and the
`blocks in the cache, we will sometimes use the term “block"
`for CDE without ambiguity.
`
`First we describe how the three cache sections are im-
`
`In addition to the pointers to the head and tail
`plemented.
`of the LRU chain, pointers to the most recently used blocks
`in the middle and old sections, respectively called the new
`boundary and the old boundary, are maintained. These
`pointers were indicated earlier in Figure 2.2. Also, each
`CDE contains two boolean fields:
`(1) a new field, which is
`true if the block is in the new section and false otherwise, and
`(2) an old field, which is true if the block is in the oldsection
`and false otherwise. As blocks are moved in the LRU chain
`the boundary pointers are adjusted so as to keep the sizes
`of each section constant, and the new and old fields of CD Es
`are set appropriately. For example, suppose there were a
`hit on a block in the old section. This block will be moved
`out of the old section to the head of the LRU chain.
`Therefore, in this case: (1) new boundary is set to point to the
`next more recently used block in the LRU chain, and the
`new field for this block is set false; (2) old boundary is set to
`point to the next more recently used block in the LRU chain,
`and the old field for this block is set true; and last (3) the
`block that was referenced is moved to the head of LRU
`chain, with new set true and old set false.
`
`Next we describe how the least recently used block with
`the smallest reference count in the old section is found effi-
`ciently. Each CDE contains a count field used for the ref-
`erence count, and two additional pointer fields that are used
`when the CDE is in a count chain, which we now define.
`For each reference count value 1, 2, 3,
`Cmax (where
`Cmax is a parameter), a count chain is maintained for that
`value. Count chain 1‘ is a doubly—linked list of all blocks with
`a reference count of z’ in the same order that they appear in
`the LRU chain. Pointers to the head and tail of each count
`chain are maintained. The algorithm for finding a block to
`replace is as follows.
`
`1. Do the following fori = 1 to Cmax:
`if count chain 1‘ is not empty then
`if old is true for the least recently used block
`in this chain then
`
`select this block to replace and exit.
`
`2-
`
`If no block was found in step (1), select the least re-
`cently used block to replace.
`
`Note that if the smallest reference count for blocks in the old
`section is greater than Cmax, the least recently used block is
`relflaced by default. We will later see that even with fairly
`
`In fact most
`small values for Cmax this happens very rarely.
`of the time (typically more than 90%) a block with a refer-
`ence count of one is replaced, and of the remaining cases it
`is most likely for a block with a reference count of two to
`be replaced, etc. Thus the above search algorithm gives ex-
`tremely good expccted performance, since the expected
`number of blocks examined is very close to one.
`
`Although there are multiple count chains, each block is
`in at most one such chain, and therefore the overhead for
`maintaining the count chains is on the average less than that
`of maintaining the LRU chain. Maintenance of the count
`chains can be summarized by the following rules: (1) on a
`miss, the block being replaced is removed from its count
`chain (if it is in one), and the new block is put at the head
`of count chain 1; (2) on a hit to the new section, the block
`is moved to the head of its count chain (if it is in one); and
`(3) on a hit to a block that is not in the new section with
`reference count C, the block is removed from its count chain
`(if it is in one), C is incremented, then if C S Cmax the block
`is put at the head of count chain C.
`
`Given the discussion in this and the preceding sections,
`implementation of the frequency-based replacement algo-
`rithm using the above techniques is straightforward. This
`concludes our description of an efficient implementation.
`
`3. Performance Analysis
`
`This section describes the trace-driven simulation of data
`caches, and the results obtained from our analysis. First
`we describe the traces that were used.
`
`3.1. Description of the Traces
`
`The traces used in the simulations were obtained from
`actual systems runnin

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket