`MODELING OF COMPUTER SYSTEMS
`BOULDER, COLORADO USA
`MAY 22-25,
`I990
`
`PROCEEDINGS
`VL1
`0.1
`O
`8
`.
`N
`
`CSIGMETRICS
`
`EMC
`EMC
`EXHIBIT 1007
`EXHIBIT 1007
`
`
`
`TX
`
`3 323 896
`
`PERFORMANCE EVALUATION REVIEW
`
`SPECIAL ISSUE
`
`VOLUME 18
`
`NO. 1 MAY, 1990
`
`1990 ACM SIGMETRICS
`
`CONFERENCE ON
`MEASUREMENT AND MODELING
`
`OF COMPUTER SYSTEMS
`
`May 22-25, 1990
`
`University of Colorado
`
`Boulder, Colorado
`
`Page 2 of 17
`
`
`
`TI
`
`3 323 896
`
`PERFORMANCE EVALUATION REVIEW
`
`SPECIAL ISSUE
`
`VOLUME 18
`
`NO. 1 MAY, 1990
`
`1990 ACM SIGMETRICS
`
`CONFERENCE ON
`
`MEASUREMENT AND MODELING
`
`OF COMPUTER SYSTEMS
`
`PROCEEDINGS
`
`'
`
`-
`
`'
`
`..‘._'.
`
`'-'
`
`‘_
`
`r'
`
`f_"
`
`~
`
`'.
`
`‘
`
`_.
`
`_r
`
`May 2225, 1990
`
`University of Colorado
`
`Boulder, Colorado
`
`Page 3 of 17
`
`
`
`The Association for Computing Machinery
`11 \Vest 42"“ Street
`New York, New York 10036
`
`© 1990 by the Association for Computing Machinery, Inc. Copying without fee is permitted
`provided that the copies are not made or distributed for direct commercial advantage, and credit
`to the source is given. Abstracting with credit is permitted. For other copying of articles that
`carry a code at the bottom of the first page, copying is permitted provided that the per-copy fee
`indicated in the code is paid through the Copyright Clearance Center, 27 Congress Street, Salem,
`MA 01970. For permission to republish write to: Director of Publications, Association for
`Computing Machinery. To copy otherwise, or republish, requires a fee and/or specific permission.
`
`ISBN 0-89791-359—0
`
`Additional copiesfmayi ordered‘”prepaid from
`4;
`.4
`
`ACM Order Depanment
`P.O. Box 64145
`2‘
`Baltimore, MD 2l264”~.;—l“
`._~
`
`my
`
`in K
`’
`
`‘
`4
`
`;_-_,
`
`A; ‘1:;Price:v:_
`.
`.
`Members
`11 ‘others .
`
`.
`.
`
`. .. $16.00
`. .- $22-00
`
`ACM order nufiBé§:”'4ss9oo
`
`Page 4 of 17
`
`
`
`TABLE OF CONTENTS
`
`SESSION 1: PARALLEL PROCESSING
`
`Urtboundedly Parallel Simulations Via Recurrence Relations
`Albert G. Grcenberg and Boris D. Lubachevsky, AT&T Bell Labs
`Isi Mitrani, University of Newcastle upon Tyne ..................................................... ..
`
`A Performance Evaluation of a General Parallel Processing Model
`Randolph Nelson, IBM T. J. Watson Research Center
`.............................................
`
`13
`
`SESSION 2; TRACE-DRIVEN SIMULATION
`
`Efficient Trace—Driven Simulation Methodsfor Cache Performance Analysis
`Wen-Hann Wang, IBM T. J. Watson Research Center
`Jean-Loup Baer, University of Washington ............................................................ ..
`
`Techniquesfor Efficient Inline Tracing on a Shared-Memory Multiprocessor
`Susan I. Eggers, David R. Koppel, Eric J. Koldinger, and Henry M. Levy,
`University of Washington ..........................................................................................
`
`Blocking: Exploiting Spatial Locality for Trace Compaction
`AnantAgarwa1 and Minor Huffman, MIT ................................................................
`
`SESSION 3: FAULT TOLERANT SYSTEMS
`
`A Bayesian Approach to Fault Classification
`Tein-Hsiang Lin, SUNY at Buffalo
`Kang G. Shin, University of Michigan .................................................................... ..
`
`Probabilistic Language Analysis of Weighted Voting Algorithms
`Louise E. Moser, Vikas Kapur, and P.M. Mel1iar—Smith,
`University of California, Santa Barbara .................................................................. ..
`
`An Evaluation ofRedundant Arrays ofDisks using an Amdahl 5890
`Peter M. Chen, Garth A. Gibson, Randy H. Katz, and David A. Patterson,
`University of California, Berkeley ........................................................................... ..
`
`27
`
`37
`
`48
`
`58
`
`67
`
`74
`
`SESSION 4: COMPUTER NETWORKS AND CONCURRENCY CONTROL
`
`Simultaneous Analysis of Flow and Error Control Strategies With Congestion—Dependent
`Errors
`
`Amarnath Mukherjee, Lawrence H. Landweber, and John C. Strikwerda,
`University of Wisconsin-Madison ........................................................................... ..
`
`Queueing Analysis of an ATM Switch with Multichannel Transmission Groups
`Arthur Y.M. Lin and John A. Silvester,
`University of Southern California ............................................................................ ..
`
`Approximate Analysis ofReader and Writer Access to a Shared Resource
`Theodore Johnson, Courant Institute, NYU
`
`86
`
`95
`
`105
`
`Page 5 of 17
`
`
`
`SESSION 5: PERFORMANCE OPTIMIZATION AND TUNING
`
`Quartz: A Toolfor Tuning Parallel Program Performance
`Thomas B. Anderson and Edward D. Lazowska,
`University of Washington ........................................................................................ ..
`
`115
`
`A Calculus of Variations Approach to File Allocation Problems in Computer Systems
`Krishna R. Pattipati, University of Connecticut
`Joel Wolf, IBM T. J. Watson Research Center
`
`Somnath Deb, University of Connecticut
`
`................................................................ ..
`
`126
`
`SESSION 6: MEMORY MANAGEMENT
`
`Data Cache Management Using Frequency-Based Replacement
`John T. Robinson and Murthy V. Devarakonda,
`IBM T. J. Watson Research Center
`......................................................................... ..
`
`An Approximate Analysis of the LR U and FIFO Bufler Replacement Schemes
`Asit Dan and Don Towsley,
`Univerity of Massachusetts, Amherst
`
`...................................................................... ..
`
`An Advisorfor Flexible Working Sets
`Raphael Alonso and Andrew W. Appel,
`Princeton University ................................................................................................ ..
`
`134
`
`143
`
`153
`
`SESSION 7: MULTIPROCESSOR INTERCONNECTS
`
`Analysis of Critical Architectural and Programming Parameters in a Hierarchical
`Shared Memory Multiprocessor
`Josep Torrellas, John Hennessy, and Thierry Weil,
`Stanford University ....................................................................................................
`
`163
`
`Performance Evaluation of a Commercial Cache-Coherent Shared Memory Multiproces-
`sor
`
`Rajeev Jog, Philip L. Vitale, and James R. Cailister,
`Hewlett-Packard ....................................................................................................... ..
`
`Performance Analysis of the Connection Machine
`Erol Gelenbe, EHEI—Paris ..........................................................................................
`
`An Analytic Model ofMultistage Interconnection Networks
`Darryl L. Willick and Derek L. Eager,
`University of Saskatchewan .......................................................................................
`
`173
`
`183
`
`192
`
`Page 6 of 17
`
`
`
`SESSION 8: PROCESSOR SCHEDULING
`
`Dynamic Partitioning in a Transputer Environment
`K Dussa, B. Carlson, L. Dowdy, and K.—H. Park,
`Vanderbilt University .............................................................................................. ..
`
`Processor Scheduling in Shared Memory Multiprocessors
`John Zahorjan and Cathy McCann,
`University of Washington ........................................................................................ ..
`
`203
`
`214
`
`The Performance ofMultiprogrammed Multiprocessor Scheduling Algorithms
`Scott T. Leutenegger and Mary K. Vernon,
`University of Wisconsin-Madison
`
`POSTER SESSION EXTENDED ABSTRACTS
`
`Efficient Simulation ofMultiprogramming
`W. P. Dawkins, V. Debbad, J. R. Jump, and J. B. Sinclair,
`Rice University ..........................................................................................................
`
`237
`
`Making Flow Control Work in Networks: A Control-Theoretic Analysis of Gateway Ser-
`vice Disciplines
`Scott Shenker, Xerox PARC .................................................................................... ..
`
`239
`
`Making Greed Work in Networks: A Game-Theoretic Analysis of Gateway Service Dis-
`ciplines
`Scott Shenker, Xerox PARC .................................................................................... ..
`
`241
`
`Factors Aflecting the Performance of Multiuser Database Management Systems
`Shahram Ghandeharizadeh and David J. DeWitt,
`University of Wisconsin .......................................................................................... ..
`
`243
`
`A Benchmark of NonStop SQL Release 2 Demonstrating Near-Linear Speedup and
`Scaleup on Large Databases
`Susanne Englert, Jim Gray, Terrye Kocher, and Praful Shah,
`Tandem Computers, Inc.
`.......................................................................................... ..
`
`245
`
`Phased Mission Reliability Analysis
`Arun K. Somani, James A. Ritcey, and Stephen H.L. Au,
`University of Washington ........................................................................................ ..
`
`Performance Analysis of a Fault Tolerant Computer System
`Lionel C. Mitchell, CTA Incorporated .................................................................... ..
`
`Ray Tracing on Distributed Memory Parallel Systems
`David W. Jensen and Daniel A. Reed,
`University of Illinois ................................................................................................ ..
`
`247
`
`249
`
`251
`
`Characterizing and Modeling Ethernet Performance of Distributed DECwindows Appli-
`cations
`
`Dinesh Mirchandani and Prabuddha Biswas,
`Dlgltal Equipment Corporation ................................................................................ ..
`
`25 3
`
`Page 7 of 17
`
`
`
`Challenges in Obtaining Peak Parallel Performance with a Convex C240, a Parallel
`Vector Processor
`
`Patrick F. McGehea1ty, Convex Computer Corporation ......................................... ..
`
`255
`
`Traffic Characterization of the NSFNET National Backbone
`Steven A. Heimlich, University of Maryland .......................................................... ..
`
`257
`
`Ease: An Environmentfor Architecture Study and Experimentation
`Jack W. Davidson and David B. Whalley,
`University of Virginia .............................................................................................. ..
`
`Dynamic Queue Behavior in Networks with Window Protocols
`John G. Waclawsky and Ashok K. Agrawala,
`University of Maryland ............................................................................................ ..
`
`The Performance ofMultistage Interconnection Networks with Finite Bufiers
`John D. Garofalakis and Paul G. Spirakis,
`Computer Technology Institute—Patras
`.................................................................... ..
`
`259
`
`261
`
`263
`
`Adaptive Window Flow Control and Learning Algorithms for Adaptive Routing in Data
`Networks
`
`Athanasios V. Vasilakos, Christos A. Moschonas and Constantinos T. Paximadis,
`University of Patras
`................................................................................................. ..
`
`265
`
`Modeling a Circuit Switched Multiprocessor Interconnect
`Daniel Nussbaum, Ingmar Vuong-Adlerberg, and Anant Agarwal,
`MIT.
`...........................................................................................................................
`
`267
`
`TUTORIALS ............................................................................................................................
`
`270
`
`AUTHOR INDEX .................................................................................................................. ..
`
`273
`
`Page 8 of 17
`
`
`
`Data Cache Management Using Frequency-Based Replacement
`
`John T. Robinson
`
`Murthy V. Devarakonda
`
`IBM Research Division, T. J. Watson Research Center
`P. O. Box 704, Yorktown Heights, NY 10598
`
`Abstract. We propose a new frequency—based replacement
`algorithm for managing caches used for disk blocks by a file
`system, database management system, or disk control unit,
`which we refer to here as data caches. Previously, LRU
`replacement has usually been used for such caches. We
`describe a replacement algorithm based on the concept of
`maintaining reference counts in which locality has been
`“factored out".
`In this algorithm replacement choices are
`made using a combination of reference frequency and block
`age. Simulation results based on traces of file system and
`IE0 activity from actual systems show that this algorithm can
`offer up to 34% performance improvement over LRU re-
`placement, where the improvement is expressed as the frac-
`tion of the performance gain achieved between LRU
`replacement and the theoretically optimal policy in which the
`reference string must be known in advance. Furthermore,
`the implementation complexity and efficiency of this algo-
`rithm is comparable to one using LRU replacement.
`
`1. Introduction
`
`In any memory hierarchy performance can be improved
`by caching data from a lower level of the hierarchy in a
`higher level. The goal in designing a cache management
`algorithm is to achieve as low a cache miss ratio as possible
`subject to constraints on the complexity of the algorithm.
`Cache management algorithms have been most widely
`studied in the context of CPU caches (for example see
`[SMlT82]),
`in which there are significant complexity con-
`straints. There have also been numerous studies concerning
`virtual memory management, in which the complexity con-
`straints are somewhat relaxed (a hit must be handled
`quickly, e.g. by means of address translation and the setting
`ofa reference bit, whereas a miss can be handled by oper-
`ating system software). Here, however, we are primarily
`concerned with caches residing in main memory that are
`managed entirely by software. Examples include caches for
`file systems and database management systems.
`In these
`
`Permission to copy without fee all or part of this material is granted provided
`that the copies are not made or distributed for direct commercial advantage,
`the ACM copyright notice and the title of the publication and its date appear,
`and notice is given that copying is by permission of the Association for
`Computing Machinery. To copy otherwise, or to republish, requires a fee
`and/or specific permission.
`© l990 ACM 08979l-359-0/90/0005/Ol34 $1.50
`
`cases the average access time for file or database disk blocks
`is reduced by storing some subset of these blocks in the
`cache. Another example is a cache within a disk control unit
`(disk cache), managed by control unit microcode, which can
`be used to reduce the average device access time without
`involving host system software. Here we will refer to these
`various types of caches as data caches.
`
`The design issues for data caches are somewhat different
`than those for CPU caches or virtual memory. For exam-
`ple, due to the fact that the cache is managed by software (or
`control unit microcode), a strict
`implementation of LRU
`replacement is possible.
`in addition to relaxed complexity
`constraints, the reference behavior may be different.
`
`Previous studies of file system caches and disk caches
`include [OUST85] and [SMIT85], respectively.
`Issues of
`block size, cache size, and so on were investigated, but in
`both studies an LRU replacement policy was assumed
`throughout. On the other hand, various replacement poli-
`cies were studied for database caches in [EFFE84]. Due to
`factors related to a number of other design issues, this study
`did not conclusively show that any one replacement policy
`was best. However, it was concluded that LRU replacement
`and the CLOCK algorithm (which approximates LRU re-
`placement using reference bits, and has been used in virtual
`memory management) gave good satisfactory overall be-
`havior.
`
`In terms of differences in reference behavior, in general
`it seems that there is less locality (the degree to which re-
`cently referenced data tends to be re—referenced) in data
`caches as compared to CPU caches or virtual memory sys-
`tems (this was one of the conclusions of[EFFE84] for data-
`base caches).
`lf there were in fact a small degree 0r
`dependence of each reference on previous references, then
`it is possible that an independent reference assumption (in
`which each block is referenced with a given fixed probabil-
`ill’) could be used to accurately model reference behavior-
`In such a case the optimal cache management policy Would
`be to keep the blocks with the highest reference probabilities
`in the cache at all times (see, e.g., [COFF73]).
`In Pfacllce
`of course there would be a problem in determining £11659
`Probabilities. However, given a reference string produced
`by 3 trace, the blocks can be sorted by reference frequt?I’lC)’v
`and this policy can be simulated.
`It has been our expenence
`
`..n...:\,_c5JL,.gm;iz//l..‘o@"
`
`Page 9 of 17
`
`
`
`mat for the file system traces we describe later this leads to
`poor performance as compared to LRU replacement. The
`conclusion is that although there is less locality than in CPU
`caches or virtual memory, there is still sufficient locality so
`that LRU replacement performs quite well.
`Similarly, in
`[EFFE84] replacement algorithms based on estimating ref-
`erence probabilities did not consistently outperform those
`using LRU replacement. This agrees with general experi-
`ence: currently, LRU replacement is almost universally used
`in software managed buffers or caches.
`
`Here, we study a new frequency-based replacement
`cache management algorithm based on the concept of
`"factoring out locality” from reference counts. Like previ-
`ously studied algorithms, reference counts are maintained
`for each block in the cache; unlike previous algorithms, ref-
`erence counts are not incremented on every reference.
`In-
`stead, reference counts are incremented only for those blocks
`that have aged out of a “new section” of the cache. Another
`novel aspect of the algorithm is that replacement choices are
`confined to those blocks that have aged into an “old section"
`of the cache. These techniques and the intuition behind their
`use are discussed in Section 2, in which we describe the fre-
`quency-based replacement cache management algorithm.
`
`In Section 3 we present performance analysis results for
`the new algorithm. These results were generated via detailed
`trace-driven simulations for five actual systems, with differ-
`ent hardware configurations, operating systems, andfor
`workload characteristics. The performance of the new al-
`gorithm is compared to LRU replacement and the theore-
`tically optimal replacement algorithm in which the reference
`string must be known in advance. The results indicate that
`significant performance improvements can be obtained for
`most systems at small to intermediate cache sizes, and for
`all systems at intermediate cache sizes (for increasingly large
`cache sizes, the performance of all three replacement algo-
`rithms tends to converge). Finally, in Section 4 we discuss
`the major conclusions of our study, and identify some areas
`for further research.
`
`2. Frequency-Based Replacement
`
`ln this section we present the frequency-based replace-
`ment (FBR) algorithm by means of a sequence of refine-
`ments. We discuss (1) the use of a “new section" which
`controls the incrementing of reference counts; (2) the use of
`an “old section" to confine replacement choices to blocks
`that have not been too recently referenced; and (3) period-
`ically adjusting reference counts so as to approximate refer-
`ence frequencies and avoid practical problems. Finally we
`describe an implementation which is comparable in com-
`plexity to an implementation of a cache management algo-
`rithm using LRU replacement.
`
`2.1. Factoring Out Locality
`
`The basis of our cache management algorithm is the
`LRU replacement algorithm, in which logically the Cache
`Consists of a stack of blocks, with the most recently refer-
`enced block always pushed on the top of the stack. Unlike
`
`Page 10 of 17
`
`count
`
`:‘ count-rl
`
`(count
`unchanged)
`
`new section
`boundary
`count
`:= l
`
`T
`
`MISS
`
`Figure 2.1. Use of the New Section
`
`LRU replacement, the least recently used block will not ne-
`cessarily be selected for replacement on a cache miss.
`In-
`stead, a reference count is maintained for each block in the
`cache, and in general the blocks with the smallest reference
`counts will be candidates for replacement.
`
`When a block is first brought into the cache, its reference
`count is intialized to one. Previous algorithms using refer-
`ence counts have incremented the count for a given block
`on every reference to that block. This results in the following
`kind of problem: certain blocks are relatively infrequently
`referenced overall, and yet when they are referenced, due to
`locality there are short intervals of repeated re—references,
`thus building up high reference counts. After such an inter-
`val is over, the high reference count is misleading: it is due
`to locality, and cannot be used to estimate the probability
`that such a block will be re-referenced following the end of
`this interval (this description is intuitive, since in practice
`such intervals are not as well—defined as the preceding dis-
`cussion may suggest).
`
`this problem is addressed by “factoring out
`Here,
`locality” from reference counts, as follows. A certain por-
`tion of the top part of the stack is set aside as a new
`section. When there is a cache hit, if the block is in the new
`section then its reference count is not incremented, but it is
`incremented otherwise. This is illustrated in Figure 2.1.
`Given a sufficiently large new section, this results in the ref-
`erence counts for blocks that are repeatedly re-referenced
`within a short interval remaining unchanged. The fraction
`of blocks in the new section, Fnew, is a parameter of the al-
`gorithm.
`
`2.2. Confining Replacement
`
`The technique just described for factoring out locality
`from reference counts can be used without further refine-
`ment as the basis of a replacement policy: on a miss, select
`the block with the smallest reference count that is not in the
`new section for replacement, choosing the least recently used
`such block if there is more than one.
`
`Preliminary experiments with the policy just described
`showed that although it was sometimes possible to obtain
`slightly better performance than LRU replacement, consist-
`ently significant improvements could not be obtained, and
`
`
`
`IIIIIIIIIIIIII
`section
`new section T
`middle
`old Section
`new boundary
`old boundary
`
`Figure 2.2. Three Sections of the Cache
`
`often the performance was worse. The problem seemed to
`be the following: (1) on a cache miss, a new block is brought
`into the new section and initialized with a count of one; (2)
`the count remains at one as long as the block remains in the
`new section; (3) eventually the block ages out of the new
`section, with its count still at one; (4) if the block is not now
`re-referenced fairly quickly it is very likely to be replaced
`since it necessarily has the smallest reference count of those
`blocks that are not in the new section, and all other less re-
`cently used blocks with a count of one may have already
`been replaced.
`In other words, there did not seem to be a
`sufficiently long i_nterval for blocks aging out of the new
`section to build up their reference counts even if they were
`relatively frequently referenced.
`
`This problem is addressed by confining replacement, as
`follows. A portion of the bottom part ofthe stack is defined
`as the old section, and we call the remaining part of the stack
`between the new and old sections the middle section. This
`is illustrated in Figure 2.2. Using the old section, the re-
`placement policy is as follows: on a cache miss, select the
`block with the smallest reference count in the old section for
`replacement, choosing the least recently used such block if
`there is more than one. Assuming a sufficient.ly large middle
`section, intuitively this allows relatively frequently referenced
`blocks a chance to build up their reference counts before
`becoming eligible for replacement. The fraction of blocks
`in the old section, FOM. is another parameter of the algo-
`rithm, where we require that Fagd _<_
`I — Fngw. Note that
`(1) by setting F014 = 1 — F,,e,,.,, the middle section is empty.
`and We have the replacement policy described at the begin-
`ning
`of
`this
`section;
`and
`(2)
`by
`setting
`F0/av = l/(cache size), the old section consists of one block,
`and we have the LRU replacement policy.
`
`2.3. Adjusting Reference Counts
`
`There are some practical problems associated with the
`use of reference counts. One is overflow, which is easily
`dealt with by not incrementing a count ifit is currently at a
`given maximum value. Another problem, not quite so easily
`handled, is that even using the new section as described
`above, certain blocks may build up high reference counts
`and never be replaced: such blocks are essentially fixed in
`the cache. Now it could be that such blocks should be fixed
`in the cache since they are among the most frequently ref-
`erenced blocks. However, in the absence of additional in-
`formation, it must be assumed that it is equally likely that the
`high reference frequency for such a block is short—lived.
`In
`this case fixing such blocks in the cache would cause a gra-
`
`dual degradation in performance. Another issue is relating
`reference counts to reference frequency (that is, references
`per unit of time, where time could be defined as total refer.
`ences).
`
`described in
`issues was
`these
`One approach to
`[EFFE84], where the concept of reference density was in.
`troduced. There, the reference density of a block was de.
`fined as the reference count for the block divided by the total
`number of references made since the block was brought into
`the cache. Unfortunately reference density, defined in this
`fashion, is a floating point number, and given a set of blocks,
`there is no simple fast way to find the block with the mini-
`mum reference density (at least compared to the implemen-
`tation we discuss below,
`in which the least recently used
`block with the smallest reference count in the old section is
`almost always found directly). For this reason we follow
`another approach mentioned in [EF FE84], which was clas-
`sified there as “periodic aging by division".
`
`More precisely, we dynamically maintain the sum of all
`reference counts. Whenever the average reference count
`exceeds a predetermined maximum value, Amax (another
`parameter of the algorithm), every reference count C is re-
`duced to FC/27. Thus, in the steady state the sum of all
`reference counts stays between N x A,-flax/2 and N x Amax
`(where N is the number of cache blocks), and the reference
`count of a block is approximately linearly related to the fre-
`quency of references to the block while it is in the middle and
`old sections. Note that in this reduction of reference counts,
`a count of one remains at one, a count of two is reduced to
`one, a count of three is reduced to two, etc. As will later be
`seen, most blocks selected for replacement have a reference
`count of one, and it is desirable for efliciency to maintain this
`property in the steady state.
`Therefore, after reference
`counts have been reduced, we want to have a count of two
`reduced to one (since using division with a fixed divisor and
`integer arithmetic, the only other alternative is to have it re-
`main at two indefinitely). For this reason we did not make
`the divisor a parameter above (as it was in [EFFE8-1]), since
`division by two using the ceiling function yields the desired
`behavior. Finally, we did not consider reduction of refer-
`ence counts by subtraction, since in that case there is no in-
`tuitive relationship between reference counts and reference
`frequencies.
`
`2.4. Implementation
`
`The preceding discussions essentially define the frequen-
`cy—based replacement algorithm at a logical
`level;
`in this
`section we discuss an efficient implementation. As a starting
`point we assume the following typical implementation of 5
`cache management algorithm using LRU replacement: (1)
`the cache directory consists of a collection of cache directory
`entries. one for each block in the cache; (2) each cache di-
`rectory entry (CDE) describes the contents of the associated
`cache block (e.g.
`file name and block number) a101’lg Wm‘
`any additional information associated with that block; (3) 3
`CDE for a given block is searched for using a hash table-
`where CDEs with the same hash value are linked together
`in a “synonym" chain to handle collisions (for example)? (4)
`
`Page 11 of 17
`
`
`
`
`
`4;-,..-./..—..—...,.\.7......»A«vs<(.-«—Vsun,
`
`
`
`“a
`
`
`
`
`tvar/>3'(J~.“"““'\7r‘*"'-v—-4--wv..,,,.,-.-,_,___
`.,.._.
`
`Av"
`
`“
`
`the LRU stack is implemented as a doubly—linked list of
`CDEs, called tl1e LRU chain, and there are pointers to the
`head (most recently used block) and tail (least recently used
`block) of the list; (5) on a cache hit the CDE for the block
`is moved to the head of the list, and (6) on a cache miss the
`block associated with the CDE at the tail of the list is se-
`lected to be replaced, this CDE is set to refer to the new
`block, and is moved to the head of the list. Since there is a
`one-to-one correspondence between the CDES and the
`blocks in the cache, we will sometimes use the term “block"
`for CDE without ambiguity.
`
`First we describe how the three cache sections are im-
`
`In addition to the pointers to the head and tail
`plemented.
`of the LRU chain, pointers to the most recently used blocks
`in the middle and old sections, respectively called the new
`boundary and the old boundary, are maintained. These
`pointers were indicated earlier in Figure 2.2. Also, each
`CDE contains two boolean fields:
`(1) a new field, which is
`true if the block is in the new section and false otherwise, and
`(2) an old field, which is true if the block is in the oldsection
`and false otherwise. As blocks are moved in the LRU chain
`the boundary pointers are adjusted so as to keep the sizes
`of each section constant, and the new and old fields of CD Es
`are set appropriately. For example, suppose there were a
`hit on a block in the old section. This block will be moved
`out of the old section to the head of the LRU chain.
`Therefore, in this case: (1) new boundary is set to point to the
`next more recently used block in the LRU chain, and the
`new field for this block is set false; (2) old boundary is set to
`point to the next more recently used block in the LRU chain,
`and the old field for this block is set true; and last (3) the
`block that was referenced is moved to the head of LRU
`chain, with new set true and old set false.
`
`Next we describe how the least recently used block with
`the smallest reference count in the old section is found effi-
`ciently. Each CDE contains a count field used for the ref-
`erence count, and two additional pointer fields that are used
`when the CDE is in a count chain, which we now define.
`For each reference count value 1, 2, 3,
`Cmax (where
`Cmax is a parameter), a count chain is maintained for that
`value. Count chain 1‘ is a doubly—linked list of all blocks with
`a reference count of z’ in the same order that they appear in
`the LRU chain. Pointers to the head and tail of each count
`chain are maintained. The algorithm for finding a block to
`replace is as follows.
`
`1. Do the following fori = 1 to Cmax:
`if count chain 1‘ is not empty then
`if old is true for the least recently used block
`in this chain then
`
`select this block to replace and exit.
`
`2-
`
`If no block was found in step (1), select the least re-
`cently used block to replace.
`
`Note that if the smallest reference count for blocks in the old
`section is greater than Cmax, the least recently used block is
`relflaced by default. We will later see that even with fairly
`
`In fact most
`small values for Cmax this happens very rarely.
`of the time (typically more than 90%) a block with a refer-
`ence count of one is replaced, and of the remaining cases it
`is most likely for a block with a reference count of two to
`be replaced, etc. Thus the above search algorithm gives ex-
`tremely good expccted performance, since the expected
`number of blocks examined is very close to one.
`
`Although there are multiple count chains, each block is
`in at most one such chain, and therefore the overhead for
`maintaining the count chains is on the average less than that
`of maintaining the LRU chain. Maintenance of the count
`chains can be summarized by the following rules: (1) on a
`miss, the block being replaced is removed from its count
`chain (if it is in one), and the new block is put at the head
`of count chain 1; (2) on a hit to the new section, the block
`is moved to the head of its count chain (if it is in one); and
`(3) on a hit to a block that is not in the new section with
`reference count C, the block is removed from its count chain
`(if it is in one), C is incremented, then if C S Cmax the block
`is put at the head of count chain C.
`
`Given the discussion in this and the preceding sections,
`implementation of the frequency-based replacement algo-
`rithm using the above techniques is straightforward. This
`concludes our description of an efficient implementation.
`
`3. Performance Analysis
`
`This section describes the trace-driven simulation of data
`caches, and the results obtained from our analysis. First
`we describe the traces that were used.
`
`3.1. Description of the Traces
`
`The traces used in the simulations were obtained from
`actual systems runnin