`ACM SIGMETRICS '2000
`International Conference on Measurement
`and Modeling of Computer Systems
`
`Sponsored by ACM SIGMETRICS
`
`HPE, Exh. 1012, p. 1
`
`
`
`Proceedings
`ACM SIGMETRICS '2000
`International Conference on Measurement
`and Modeling of Computer Systems
`
`Sponsored by ACM SIGMETRICS
`
`"URT F. WENDT llBRAR,
`CQUFr.r "''" -•:,.."'•cc:QIN(.
`JUN 2 2- 2000
`
`UW-MAOISON, Wt 5370f
`~~---· . - -
`-
`
`HPE, Exh. 1012, p. 2
`
`
`
`The Association for Computing Machinery
`1515 Broadway
`New York, N.Y.10036
`
`Copyright © 2000 by Association for Computing Machinery, Inc. (ACM). Permission to make
`digital or hard copies of portions of this work for personal or classroom use is granted without fee
`provided that the copies are not made or distributed for profit or commercial advantage and that
`copies bear this notice and the full citation on the first page. Copyright for components of this
`work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy
`otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific
`permission and/or a fee . Request permission to republish from: Publications Dept. ACM, Inc.
`Fax +l (212) 869-0481 or E-mail permissions@acm.org
`
`For other copying of articles that carry a code at the bottom of the first or last page, copying is
`permitted provided that the per-copy fee indicated in the code is paid through the Copyright
`Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, +1-978-750-8400, +1-978-750-
`4470 (fax).
`
`Notice to Contributing Authors to ~!G Newsletters:
`
`By submitting your article for distribution in this Special Interest Group publication,
`you hereby grant to ACM the following non-exclusive, perp~tual, worldwide rights:
`
`--to publish in print on condition of acceptance by the editor
`--to digitize and post your article in the electronic version of this publication
`--to inci1,.1p~ ti)~ article in the ACM Digital Library
`-:;to aJl9W.use~ to cgpy 'and distribute the article for noncommercial, educational or research
`, ,
`purposes
`:111
`(': \
`I f
`! Howevdi", as a co'ntributing author, you retain copyright to your article and
`' ACM will make every effort to refer requests for commercial use directly to
`~u.
`
`.
`
`;I
`
`I
`
`I
`
`~ .. , . '
`
`ACM ISBN: 1-58113-194-1
`
`Additional copies may be ordered prepaid from:
`
`ACM Order Department
`P.O. Box 11405
`New York, N.Y. 10286-1405
`
`Phone: 1-800-342-6626
`(U.S.A. and Canada)
`+ 1-212-626-0500
`(All other countries)
`Fax: +1-212-944-1318
`E-mail: acmhelp@acm.org
`
`ACM Order Number: 488000
`Printed in the U.S.A.
`
`HPE, Exh. 1012, p. 3
`
`
`
`TABLE OF CONTENTS
`
`. ..
`.
`.. C
`0
`rgan1z1ng omrmttee .. ................................ _ ........................................................ 111
`Referees ......................................................................................................... .iv
`Message from General Chair ................................................................................... v
`Message from Program Co-Chairs ......................................................................... .. vi
`Message from Sigmetrics Chair ............................................................................... vii
`
`Keynote Address: The Internet and Its Future
`Leonard Kleinrock, Chairman and Founder, Nomadic Inc., Professor, Department of Computer
`Science, UCLA
`
`Session 1: Network Architecture and Protocols
`
`A Case for End System Multicast .... ; ......................................................................... 1
`Y ang-hua Chu, Sanjay Rao, Hui Zhang, Carnegie Mellon University
`
`PLM: Fast Convergence for Cumulative Layered Multicast Transmission Schemes ............... 13
`Arnaud Legout, Ernst W . Biersack, Institut EURECOM
`
`On Achievable Service Differentiation with Token Bucket Marking for TCP .. ... . ................... 23
`Sambit Sahu, University of Massachusetts, Philippe Nain, INRIA Sophia Antipolis,
`Don Towsley, University of Massachusetts, Christophe Diot, Sprint ATL, Victor Firiou, Nortel
`Networks
`
`Session 2: File and Storage Systems
`
`Feasibility of a Serverless Distributed File System Deployed on an Existing Set of
`Desktop PCs ................... : .............. .. ............................. ... ....... ............. : ......... . 34
`William J. Bolosky, John R. Douceur, Microsoft Research, David Ely, University of Washington,
`Marvin Theimer, Microsoft Research
`
`Comparing Random Data Allocation and Data Striping in Multimedia Servers .......... ........... .. 44
`Jose Renato Santos, Richard Muntz, UCLA, Berthier Ribeiro-Neto, Universidade Federal de Minas
`Gerais
`
`Modeling and Performance of MEMS-Based Storage Devices ................................ ................ 56
`John Linwood Griffin, Steven W. Schlosser, Gregory R. Ganger, David F.Nagle, Carnegie Mellon
`University.
`
`viii
`
`HPE, Exh. 1012, p. 4
`
`
`
`Session 3: Web and Multimedia Servers
`
`Implications of Proxy Caching for Provisioning Networks and Servers .................................... 66
`Mohammad S. Raunak, Prashant Shenoy, University of Massachusetts, Pawan Goyal, Ensim
`Corporation, Krithi Ramamritham, University of Massachusetts
`
`Collaborative Web Caching Based on Proxy Affinities .......................................................... 78
`Jiong Yang, Wei Wang, IBM, Richard Muntz, UCLA
`
`Cluster Reserves: A Mechanism for Resource Management in Cluster-Based Network Servers ... 90
`Mohit Aron, Peter Druschel, Willy Zwaenepoel, Rice University
`
`Poster Session
`
`Analysis of the Phenomenon of Several Consecutive Slow Start Phases in TCP ....................... 102
`Chadi Barakat, Eitan Altman, INRIA Sophia Antipolis
`
`Providing Guaranteed Quality of Service for Interactive Visualization Applications ............... 104
`Wai-Man R. Wong, Richard R. Muntz, UCLA
`
`IP Multicast Fault Recovery in PIM over OSPF ......................................................... 106
`Xin Wang, Chienming Yu, Henning Schulzrinne, Columbia University,
`Paul Stirpe, Wei Wu, Reuters
`
`Cell-based Multicast Grouping in Large-Scale Virtual Environments ................................ 108
`Emmanuel Lety, Thierry Turletti, INRIA Sophia Antipolis, Francois Baccelli, INRIA/ENS
`
`Temporal Locality in Web Request Streams: Sources, Characteristics, and Caching
`Implications.... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
`Azer Bestavros, Shudong Jin, Boston University
`
`Automated Disk Drive Characterization ................................................................. . 112
`Jiri Schindler, Gregory Ganger, Carnegie Mellon University
`
`Online Superpage Promotion Revisited ................................................................. ... 114
`Zhen Fang, Lixin Zhang, John Carter, Sally McKee, Wilson C. Hsieh Unive,rsity of Utah
`
`An Inherently Loss-Less and Bandwidth Efficient Periodic Broadcast Scheme for VBR Video .... 116
`Ioanis Nikolaidis, Fulu Li, Ailan Hu University of Alberta
`
`An Analysis of Short-Term Fairness in Wireless Media Access Protocols ........................... 118
`Can Emre Koksal, Hisham Kassab, Hari Balakrishnan, MIT
`
`RESCU: Dynamic Hybrid Packet Loss Recovery for Video Transmission over the Internet ....... 120
`Srinath R. Joshi, Injong Rhee, North Carolina State University
`
`ix
`
`HPE, Exh. 1012, p. 5
`
`
`
`The Content and Access Dynamics of a Busy Web Server ..................... : ... .. ..... ........... ....... ... 122
`Venkata N. Padmanabhan, Microsoft Research, Lili Qiu, Cornell University
`
`Session 4: Networkii,g: Congestion Control
`
`TCP in presence of Bursty Losses ............. ................. ..... ..... ... ... .. ...... ... .... .............. 124
`Eitan Altman, Konstantin Avrachenkov, Chadi Barakat, INRIA Sophia Antipolis
`
`The Incremental Deployability of R'IT-Based Congestion Avoidance for High Speed TCP Internet
`Connections ... .. .. .... .. .... . ... ......................... ........... .. .. .... ... .... ... ... .... .... .. ............ 134
`Jim Martin, GartnerGroup, Ame Nilsson, Injong Rhee, North Carolina State University
`
`Detecting Shared Congestion of Flows Via End-to-end Measurement ..... ... ..... ....... ... .. ...... , .. .. 145
`Dan Rubenstein, Jim Kurose, Don Towsley, University of Massachusetts
`Best Student Paper Award
`
`Session 5: Network Measurement and Performanc~ Modeling
`
`Measurement and Analysis of LDAP Pe,formance ........ ...... . .. .... ... ........... ......... . .. ....... 156
`Xin Wang, Henning Schulzrinne, Columbia University, Dilip Kandlur, Dinesh Verma, IBM T.J.
`Watson Research Center
`
`IP Packet Generation: Statistical Models for TCP Start Times Based on Connection-Rate
`Superposition ..... ....... .. ... . ..... . ...... ....... ...................... . .. . ... ... .......... ...... .. . .. .... .... 166
`William S. Cleveland, Dong Lin, Don X. Sun, Bell Laboratories Lucent Technologies
`
`On the Impact of Soft Hand-off in Cellular Systems ....... ........... ................ .. ...... .. .......... 178
`Nidhi Hegde, Khosrow Sohraby, University of Missouri-Kansas City
`
`Session 6: Queueing and Performance Evaluation Techniques
`
`Delay Asymptoticsfor a Priority Queueing System .... .... ..... .......... .. ....... ........ ..... ... ... ...... .... 188
`Sanjay Shakkottai, R. Srikant, University of Illinois ·at Urbana-Champaign
`
`A Fast and Accurate Iterative Solution of a Multi-Class Threshold-based Queueing System with
`Hysteresis.. ........ .. ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
`Leana Golubchik, University of Maryland, John C.S. Lui, The Chinese University of Hong Kong
`
`Using the Exact State Space of a Markov Model to Compute Approximate Stationary Measures 207
`Andrew S Miner, Gianfranco Ciardo, College of William and Mary,
`Susanna Donatelli, Universita di Torino
`
`X
`
`HPE, Exh. 1012, p. 6
`
`
`
`AMVA Techniques for High Service Time Variability ......................................................... 217
`Derek L. Eager, Univ. of Saskatchewan, Daniel J. Sorin, Mary K. Vernon, University of Wisconsin
`
`Session 7: Tools and Benchmarking
`
`Efficient Perfonnance Prediction for Modem Microprocessors ....................................... 229
`David Ofelt, Juniper Networks, John Hennessy, Stanford University
`
`Improving Interactive System Perfonnance Using TIPME ............ ................................. 240
`Yasuhiro Endo, Network Appliance, Margo Seltzer, Harvard University
`
`Quantifying the Energy Consumption of a Pocket Computer and a Java Virtual Machine ......... 252
`Keith Farkas, Compaq Western Research Lab, Jason Flinn, Carnegie Mellon University,
`Godmar Back, University of Utah, Dirk Grunwald, University of Colorado,
`Jennifer Anderson, VMware
`
`Session 8: Memory Management and Databases
`
`Memory System Behavior of Java Programs: Methodology and Analysis ............................ 264
`Jin-Soo Kim, Seoul National University, Yarsun Hsu, National Tsing Hua University
`Best Systems Paper Award
`
`An Analytical Model of the Working-Set Sizes in Decision-Support Systems ........................ 275
`Magnus Karlsson, HP Labs, Fredrik Dahlgren, Ericsson, Per Stenstroem, Chalmers University
`
`Towards Application/File-level Characterization of Block References:
`A Case for Fine-Grained Buffer Management ................................................... .......... 286
`Jongmoo Choi, Seoul National University, Sam H. Noh, Hong-lk University, Sang Lyul Min,
`Yookun Cho, Seoul National University
`
`Session 9: Network Routing
`
`Multicast Routing with Bandwidth Guarantees: A New Approach Using Multicast Network
`Flow ......................................... .................................................................................. 296
`Murali S. Kodialam, T. V. Lakshman, Bell Laboratories Lucent Technologies,
`Sudipta Sengupta, MIT
`
`Stable Internet Routing Without Global Coordination ................................................... 307
`Lixin Gao, Smith College, Jennifer Rexford, AT&T Labs Research
`
`An Efficient Algorithm for Finding a Path Subject to Two Additive Constraints ..................... 318
`Turgay Korkmaz, Marwan Krunz, University of Arizona, Spyros Tragoudas, Southern Illinois Univ
`
`Author Index ................................................................................................. 328
`
`xi
`
`HPE, Exh. 1012, p. 7
`
`
`
`Feasibility of a Serverless Distributed File System
`Deployed on an Existing Set of Desktop PCs
`William J. Bolosky, John R. Douceur, David Ely, and Marvin Theimer
`Microsoft Research, Redmond, WA 98052
`{bolosky, johndo, theimer}@microsoft.com; ely@cs.washington.edu
`
`ABSTRACT
`We consider an architecture for a serverless distributed file system
`that does not assume mutual trust among the client computers.
`The system provides security, availability, and reliability by
`distributing multiple encrypted replicas of each file among the
`client machines. To assess the feasibility of deploying this system
`on an existing desktop infrastructure, we measure and analyze a
`In
`large set of client machines in a commercial environment.
`particular, we measure and report results on disk usage and
`content; file activity; and machine uptimes, lifetimes, and loads.
`We conclude that the measured desktop infrastructure would
`passably support our proposed system, providing availability on
`the order of one unfilled file request per user per thousand days.
`Keywords
`Serverless distributed file system architecture, personal computer
`usage data, availability, reliability, security,
`trust, workload
`characterization, analytical modeling, feasibility analysis.
`1. INTRODUCTION
`We present an architecture for a serverless distributed file system
`and assess the feasibility of deploying this system on an existing
`desktop infrastructure. The distinguishing feature of our proposed
`system is that it does not assume the client computers to be
`carefully administered or mutually trusting. Instead, the system
`provides high availability (file access) and high reliability (file
`persistence) by making multiple replicas of each file and
`distributing them among the client machines. To determine how
`well such a system might work, we gathered usage data from a
`large number of client machines at Microsoft Corporation, and we
`use this data to analyze the feasibility of our architecture.
`Our proposed serverless distributed file system is intended to
`provide a global name space for files, location-transparent access
`to both private files and shared public files, and improved
`reliability relative to a desktop workstation.
`These goals can be achieved with a centralized, server-based file
`system, but this has several disadvantages: Servers tend to be
`expensive because they need special hardware, such as high(cid:173)
`performance 1/0 (to support multiple clients simultaneously) and
`RAID disks (for reliability) (21].
`They rely on system
`administrators, in whom the users must place their faith (28], both
`to authorize access and to perform reliability functions, such as
`
`Permission to make digital or hard copies of all or part of this work for
`personal or classroom use is granted without fee provided that
`copies are not made or distributed for profit or commercial advent
`-age and that copies bear this notice and the full citation on the first page .
`To copy otherwise, to republish, to post on servers or to
`redistribute to lists, requires prior specific permission and/or a fee.
`SIGMETRICS 2000 6/00 Santa Clara, California, USA
`© 2000 ACM 1-58113-194-1/00/0006 ••. $5 .00
`
`regular backups. Finally, they are vulnerable to geographically
`localized faults, such as a failed router or a broken network link.
`Serverless distributed file systems have been developed before
`(§ 6), but these have all assumed that client machines are mutually
`trusting. This assumption greatly eases the system design, but we
`believe it is unrealistic, especially in a large-scale system. If we
`can build a system that works without requiring trust, then the
`system can function with no central administration: The owner of
`each machine determines who can store files on it, and owners
`establish contracts with other machine owners to collaboratively
`share their resources. With no central administrator, any user can
`add resources to the system. One can even add userless client
`machines to function as dedicated servers.
`The lack of trust between clients pervades our entire design. We
`need to use cryptographic techniques to ensure data privacy and
`integrity. We also need to create and securely distribute multiple
`replicas of each file throughout the system, both to prevent a
`malicious user from easily destroying all copies of any given file
`and because we cannot expect users to significantly alter their
`behavior with regard to keeping their machines on and available.
`The need for multiple file replicas significantly increases the
`storage demand. However, since disk space is inexpensive and
`becoming more so all the time (25], we consider this an
`acceptable trade-off. Also, previous research [7] has shown a
`significant amount of free storage space on client machines. This
`inspired us to investigate how well our proposed system would
`work if deployed on an existing client-computing infrastructure,
`without adding any new resources to support our system. This
`question is the primary concern of the present paper, which we
`answer by measuring file-system space, machine availability, and
`machine load on client machines in a commercial environment.
`The remainder of this paper describes our system architecture,
`explains our measurement methodology, presents our results, and
`analyzes these results in terms of our proposed architecture.
`2. SYSTEM ARCIDTECTURE
`The principal construct of our proposed system is the global file
`store, which is a logically monolithic file system indexed by a
`hierarchical directory tree. Although logically monolithic, the
`global file store is physically distributed among the disks of the
`client machines participating in the distributed file system.
`Each disk of a participating machine is partitioned into three
`regions: a scratch area, a global storage area, and a local cache.
`The scratch area holds ephemeral data, such as virtual-memory
`paging files and temporary files. The global storage area houses a
`portion of the global file store, which• is accessible by other
`machines in the distributed system. The local cache is of variable
`size, caching all files accessed within a certain period of time (the
`cache retention period), like some in-memory file-system buffer
`caches (19, 22, 26] and unlike traditional fixed-size caches.
`To provide high availability and reliability in a not-fully-trusted
`environment, our proposed system makes multiple replicas of
`
`34
`
`HPE, Exh. 1012, p. 8
`
`
`
`each file and distributes them among the global storage areas of.
`the client machines. We define the replication factor as the
`number of distributed replicas of each file.
`Since file availability and reliability are drastically affected by the
`file replication factor, it is important to maintain a minimum
`number of replicas for each file. To ensure sufficient disk space
`for these replicas, our proposed system enforces quotas [ 16] on
`the space available to the users of the file system. We propose
`that each user's space allotment depend on how much space the
`user has contributed to the global store.
`Part of the local cache and global storage area is reserved for a
`highly available, reliable, global, distributed directory service,
`much like that provided by xFS [1]. Each machine can use this
`service to locate replicas of requested files. This directory must
`be maintained in a strongly consistent fashion, to ensure that
`clients see the latest version of each file.
`2.1 Efficiency Considerations
`Our proposed system can increase the usable space in the global
`file store through two techniques. First, it can compress files
`before storing them and decompress them on the fly when they are
`read [26]. Second, it can coalesce distinct files that happen to
`have identical contents [5]; for example, many users may store
`copies of a common application program in the global file store,
`and these can be coalesced into a single instance, prior to
`replicating this instance among multiple machines. For clarity, we
`refer to logically distinct files with identical content as duplicates,
`and we refer to the file copies our system generates as replicas.
`We plan for our system to employ a lazy-update strategy, meaning
`that the system waits for a short time after a file is written before
`updating the file's replicas. Since a large fraction of written files
`are deleted or overwritten within a few seconds [29], lazy update
`can significantly reduce file-update traffic. Lazy update also
`allows a file to be written without requiring all (or even a quorum)
`of the replicas to be immediately available. The disadvantage of
`this approach is that the content of newly written files will briefly
`reside on only one machine, so loss of that machine will result in
`loss of the update. The directory service must keep track of which
`replicas contain up-to-date data, so users will not accidentally
`access out-of-date versions of files.
`Since file replicas are distributed among multiple machines, our
`system can select which machine should be accessea to service a
`client request [30]. There are two primary considerations in this
`decision: First, the selected machine should be topologically
`close to the machine that is requesting the file, to minimize both
`transmission delay and generated network traffic. Second, the
`selected machine should be lightly loaded, to minimize both read
`delay and performance impact on the sending machine. The
`impact on the sending machine can also be reduced by performing
`non-cached reads and writes, to prevent buffer cache pollution.
`If remote reads of very popular files are targeted at a small
`number of machines, those machines could become overloaded.
`Our system can avoid creating these hotspots by allowing
`machines to copy files from other machines' on-disk local caches.
`As a side benefit, this improves the availability of popular files,
`since the effective number of replicas is substantially increased.
`2.2 Replica Management
`In our proposed system, the selection of machines for storing file
`replicas is driven by the availability of those machines. The
`system measures machine uptimes and distributes replicas so as to
`maximize the minimum file availability. (§ 2.2.1 describes such a
`replica-placement algorithm.)
`
`In selecting locations for replicas of a given file, the system could
`select a set of machines whose uptimes are negatively correlated,
`thus reducing the likelihood that all machines containing a replica
`will be down at the same time. However, our measurements
`(described in § 3.2.2 and reported in § 4.2.3) suggest that this
`would provide little marginal benefit.
`Our system can improve the availability of sets of related files by
`storing them in the same locations. If a user needs a given set of
`files to accomplish a task, then if any of those files is inaccessible,
`the task cannot be completed; therefore, it is beneficial to store
`related files together, so that either all or none of the files are
`available. Our system could attempt to determine relatedness of
`files by observing file access patterns, but we propose a simpler
`approach for at least the initial implementation: Since files
`accessed
`together temporally tend
`to be grouped
`together
`spatially, replicas for all files in a given directory are stored on the
`same set of machines, and entire directories are cached together.
`When a new machine joins the system, its files are replicated to
`the global storage areas on other machines; space for those
`replicas is made by relocating replicas of other files onto the new
`machine. Similarly, when a machine is decommissioned, the
`system creates and distributes additional replicas of the files
`stored on that machine.
`'
`Machines join the system by explicitly announcing their presence;
`however, machines can leave without notice, particularly if they
`leave due to permanent failure, such as a disk-head crash. If the
`system notices that a machine has been down for an extended
`period of time, it must assume that the machine has been
`decommissioned and accordingly generate new
`replicas;
`otherwise, the possibility of another failure can jeopardize the
`reliability of files with replicas on that machine.
`2.2.1 Replica ma.nagement- placement algorithm
`Ideally, replicas should be assigned to machines so as to maximize
`the minimum availability of any file, while also maximizing the
`minimum reliability of any file. The latter goal merely requires
`minimizing the variance of the replica count, whereas the former
`is more involved. Measured logarithmically, the availability of a
`file equals the sum of the availability of all machines that store a
`replica of the file, assuming randomly correlated machine uptime.
`The following heuristic algorithm yields a low availability
`variance: Set the provisional availability of all files to zero; then,
`iteratively select machines in order of decreasing availability; for
`each selected machine, assign the files with the lowest provisional
`availability to the selected machine, and update the provisional
`availability of those files; repeat until all machines are full.
`Unfortunately, the above algorithm does not minimize reliability
`variance, so we modify it as follows: If an assignment of a file to
`a machine would reduce the remaining free space to below that
`necessary for any two files to differ in replica count by at most
`one, abort the above iteration and begin the following procedure:
`Iteratively select machines in order of increasing availability; for
`each selected machine, · identify the file with
`the highest
`provisional availability from those with the lowest replica count,
`assign it to the selected machine; and update the provisional
`availability of that file; repeat until all machines are full.
`2.3 Data Security
`Distributing multiple replicas of a file protects not only against
`accidental failure but also against malicious attack. To destroy a
`file, an adversary must compromise all machines that hold replicas
`of that file. To prevent an adversary from coercing the system
`into placing all replicas of a given file on a small set of machines
`
`35
`
`HPE, Exh. 1012, p. 9
`
`
`
`under the adversary' s control, the system must use secure methods
`for selecting machines to house file replicas.
`File-update messages are digitally signed by the writer to the file.
`Before applying an update to a replica it stores, each machine
`verifies the digital signature on the update and compares the
`writer' s identity to a list of authorized writers associated with the
`file. This prevents an adversary from forging and distributing file
`updates.
`To prevent files from being read by unauthorized users, the
`contents of files are encrypted before they are replicated.
`However, encryption could interfere with the automatic detection
`and coalescing of duplicate files, since different users may encrypt
`identical plaintext files with different keys, which would normally
`produce different ciphertext files. We have developed a
`cryptographic technology, called convergent encryption, that
`allows the detection and coalescing of identical files even when
`these files are encrypted with separate keys. Rather than
`enciphering the contents of a user's files directly with the user's
`key, the contents of each file are one-way hashed, and the
`resulting hash value is used as a key for enciphering the file
`contents. The user' s key is then used to encipher the hash value,
`and this enciphered value is attached to the file as meta-data. The
`user decrypts a file by first deciphering the hash value and then
`deciphering the file using the hash value as a key. With this
`approach, two files with identical plaintext will also have identical
`ciphertext, irrespective of the secret keys used to encrypt them.
`3. METHODOLOGY
`To analyze the feasibility of deploying our proposed system on an
`existing computing infrastructure, we collected usage data from a
`large number of desktop personal computers at Microsoft
`Corporation. We measured contents of file systems, availability
`of machines, and load on machines.
`To judge the breadth of applicability of our results, we partitioned
`our measurements into six categories by the job function of each
`machine's owner: administration (clerical), business (marketing,
`accounting,
`legal), management, non-technical development
`(writers, artists), technical development (programmers, testers),
`and technical support. Roughly half of all machines belong to
`technical developers; the other half are distributed approximately
`equally among the remaining categories.
`3.1 File System Measurement
`We measured a set of file systems to determine the amount of disk
`space that is free and the amount of disk space that would be free
`if all duplicate files were eliminated. We also use the measured
`data to estimate the rate of ·cache misses, the size of local system
`caches, and the rate of file writes.
`3.1.1 Disk usage measurement
`In
`We collected two data sets for our analysis of \:Jisk usage.
`September 1998, we asked Microsoft employees to run a scanning
`program on their Windows and Windows NT computers that
`collected directory information (file names, sizes, and timestamps)
`from their file systems [7]. By this means, we obtained
`measurements of I0,568 file systems [8].
`In August 1999, we remotely read the performance counters (free
`disk space, total disk space, and Iogon name of the primary user)
`of every Windows NT computer we could access. By this means,
`we obtained measurements of 8669 file systems.
`3.1.2 Disk content measurement
`In February 1999, we asked a random subset of the participants in
`the 1998 study to run a scanning program that computed and
`
`recorded hashes of all the files on their file systems. By this
`means, we obtained data from 550 file systems, which we used to
`determine the amount of duplicate file content.
`3.1.3 File access rate estimation
`We used the timestamps in the 1998 data set to estimate the rate at
`which files are read and written. We determined each file's last
`read time as the most recent timestamp (create, update, or access)
`on the file, and its last write time as the most recent of the create
`and update timestamps.
`There are three aspects of file access rate that are relevant to our
`design: the miss rate of the local file cache, the size of the local
`file cache, and the amount of write traffic sent to file replicas on
`other machines. We consider only files that would be stored in
`the global file store, so we excluded inherently local files,
`including the system paging file and those files in temporary
`directories or the Internet browser cache, all of which account for
`2% of the bytes in the data set.
`Cache miss rate is a function of the cache retention period. We
`estimate miss rate as follows: For a cache retention period of n
`days, the files in directories last accessed n+ 1 days ago will exit
`the cache on the current day. Over the long term, the rate at
`which files exit the cache must match the rate at which they enter
`the cache, so the rate of file exit approximates the rate of file
`entry. Since a file enters the cache only in response to a miss, the
`cache entry rate equals the cache miss rate.
`We estimate the cache size similarly. For a cache retention period
`of n days, the size of the local cache is equal to the sum of the file
`sizeli in all directories accessed in the last n days.
`Write traffic rate is a function of the lazy-update propagation lag.
`We estimate write traffic rate as follows: For a propagation lag of
`n hours, files last written between 2n hours ago and n hours ago
`will have been propagated during the past n hours.
`3.2 Machine Availability Measurement
`To determine file availability in our proposed system, we need to
`know machine uptimes, whether these uptimes are consistent over
`the uptimes of different machines are
`time, and whether
`correlated. When our proposed system sees a mac