throbber
Proceedings ··
`ACM SIGMETRICS '2000
`International Conference on Measurement
`and Modeling of Computer Systems
`
`Sponsored by ACM SIGMETRICS
`
`HPE, Exh. 1012, p. 1
`
`

`

`Proceedings
`ACM SIGMETRICS '2000
`International Conference on Measurement
`and Modeling of Computer Systems
`
`Sponsored by ACM SIGMETRICS
`
`"URT F. WENDT llBRAR,
`CQUFr.r "''" -•:,.."'•cc:QIN(.
`JUN 2 2- 2000
`
`UW-MAOISON, Wt 5370f
`~~---· . - -
`-
`
`HPE, Exh. 1012, p. 2
`
`

`

`The Association for Computing Machinery
`1515 Broadway
`New York, N.Y.10036
`
`Copyright © 2000 by Association for Computing Machinery, Inc. (ACM). Permission to make
`digital or hard copies of portions of this work for personal or classroom use is granted without fee
`provided that the copies are not made or distributed for profit or commercial advantage and that
`copies bear this notice and the full citation on the first page. Copyright for components of this
`work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy
`otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific
`permission and/or a fee . Request permission to republish from: Publications Dept. ACM, Inc.
`Fax +l (212) 869-0481 or E-mail permissions@acm.org
`
`For other copying of articles that carry a code at the bottom of the first or last page, copying is
`permitted provided that the per-copy fee indicated in the code is paid through the Copyright
`Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, +1-978-750-8400, +1-978-750-
`4470 (fax).
`
`Notice to Contributing Authors to ~!G Newsletters:
`
`By submitting your article for distribution in this Special Interest Group publication,
`you hereby grant to ACM the following non-exclusive, perp~tual, worldwide rights:
`
`--to publish in print on condition of acceptance by the editor
`--to digitize and post your article in the electronic version of this publication
`--to inci1,.1p~ ti)~ article in the ACM Digital Library
`-:;to aJl9W.use~ to cgpy 'and distribute the article for noncommercial, educational or research
`, ,
`purposes
`:111
`(': \
`I f
`! Howevdi", as a co'ntributing author, you retain copyright to your article and
`' ACM will make every effort to refer requests for commercial use directly to
`~u.
`
`.
`
`;I
`
`I
`
`I
`
`~ .. , . '
`
`ACM ISBN: 1-58113-194-1
`
`Additional copies may be ordered prepaid from:
`
`ACM Order Department
`P.O. Box 11405
`New York, N.Y. 10286-1405
`
`Phone: 1-800-342-6626
`(U.S.A. and Canada)
`+ 1-212-626-0500
`(All other countries)
`Fax: +1-212-944-1318
`E-mail: acmhelp@acm.org
`
`ACM Order Number: 488000
`Printed in the U.S.A.
`
`HPE, Exh. 1012, p. 3
`
`

`

`TABLE OF CONTENTS
`
`. ..
`.
`.. C
`0
`rgan1z1ng omrmttee .. ................................ _ ........................................................ 111
`Referees ......................................................................................................... .iv
`Message from General Chair ................................................................................... v
`Message from Program Co-Chairs ......................................................................... .. vi
`Message from Sigmetrics Chair ............................................................................... vii
`
`Keynote Address: The Internet and Its Future
`Leonard Kleinrock, Chairman and Founder, Nomadic Inc., Professor, Department of Computer
`Science, UCLA
`
`Session 1: Network Architecture and Protocols
`
`A Case for End System Multicast .... ; ......................................................................... 1
`Y ang-hua Chu, Sanjay Rao, Hui Zhang, Carnegie Mellon University
`
`PLM: Fast Convergence for Cumulative Layered Multicast Transmission Schemes ............... 13
`Arnaud Legout, Ernst W . Biersack, Institut EURECOM
`
`On Achievable Service Differentiation with Token Bucket Marking for TCP .. ... . ................... 23
`Sambit Sahu, University of Massachusetts, Philippe Nain, INRIA Sophia Antipolis,
`Don Towsley, University of Massachusetts, Christophe Diot, Sprint ATL, Victor Firiou, Nortel
`Networks
`
`Session 2: File and Storage Systems
`
`Feasibility of a Serverless Distributed File System Deployed on an Existing Set of
`Desktop PCs ................... : .............. .. ............................. ... ....... ............. : ......... . 34
`William J. Bolosky, John R. Douceur, Microsoft Research, David Ely, University of Washington,
`Marvin Theimer, Microsoft Research
`
`Comparing Random Data Allocation and Data Striping in Multimedia Servers .......... ........... .. 44
`Jose Renato Santos, Richard Muntz, UCLA, Berthier Ribeiro-Neto, Universidade Federal de Minas
`Gerais
`
`Modeling and Performance of MEMS-Based Storage Devices ................................ ................ 56
`John Linwood Griffin, Steven W. Schlosser, Gregory R. Ganger, David F.Nagle, Carnegie Mellon
`University.
`
`viii
`
`HPE, Exh. 1012, p. 4
`
`

`

`Session 3: Web and Multimedia Servers
`
`Implications of Proxy Caching for Provisioning Networks and Servers .................................... 66
`Mohammad S. Raunak, Prashant Shenoy, University of Massachusetts, Pawan Goyal, Ensim
`Corporation, Krithi Ramamritham, University of Massachusetts
`
`Collaborative Web Caching Based on Proxy Affinities .......................................................... 78
`Jiong Yang, Wei Wang, IBM, Richard Muntz, UCLA
`
`Cluster Reserves: A Mechanism for Resource Management in Cluster-Based Network Servers ... 90
`Mohit Aron, Peter Druschel, Willy Zwaenepoel, Rice University
`
`Poster Session
`
`Analysis of the Phenomenon of Several Consecutive Slow Start Phases in TCP ....................... 102
`Chadi Barakat, Eitan Altman, INRIA Sophia Antipolis
`
`Providing Guaranteed Quality of Service for Interactive Visualization Applications ............... 104
`Wai-Man R. Wong, Richard R. Muntz, UCLA
`
`IP Multicast Fault Recovery in PIM over OSPF ......................................................... 106
`Xin Wang, Chienming Yu, Henning Schulzrinne, Columbia University,
`Paul Stirpe, Wei Wu, Reuters
`
`Cell-based Multicast Grouping in Large-Scale Virtual Environments ................................ 108
`Emmanuel Lety, Thierry Turletti, INRIA Sophia Antipolis, Francois Baccelli, INRIA/ENS
`
`Temporal Locality in Web Request Streams: Sources, Characteristics, and Caching
`Implications.... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
`Azer Bestavros, Shudong Jin, Boston University
`
`Automated Disk Drive Characterization ................................................................. . 112
`Jiri Schindler, Gregory Ganger, Carnegie Mellon University
`
`Online Superpage Promotion Revisited ................................................................. ... 114
`Zhen Fang, Lixin Zhang, John Carter, Sally McKee, Wilson C. Hsieh Unive,rsity of Utah
`
`An Inherently Loss-Less and Bandwidth Efficient Periodic Broadcast Scheme for VBR Video .... 116
`Ioanis Nikolaidis, Fulu Li, Ailan Hu University of Alberta
`
`An Analysis of Short-Term Fairness in Wireless Media Access Protocols ........................... 118
`Can Emre Koksal, Hisham Kassab, Hari Balakrishnan, MIT
`
`RESCU: Dynamic Hybrid Packet Loss Recovery for Video Transmission over the Internet ....... 120
`Srinath R. Joshi, Injong Rhee, North Carolina State University
`
`ix
`
`HPE, Exh. 1012, p. 5
`
`

`

`The Content and Access Dynamics of a Busy Web Server ..................... : ... .. ..... ........... ....... ... 122
`Venkata N. Padmanabhan, Microsoft Research, Lili Qiu, Cornell University
`
`Session 4: Networkii,g: Congestion Control
`
`TCP in presence of Bursty Losses ............. ................. ..... ..... ... ... .. ...... ... .... .............. 124
`Eitan Altman, Konstantin Avrachenkov, Chadi Barakat, INRIA Sophia Antipolis
`
`The Incremental Deployability of R'IT-Based Congestion Avoidance for High Speed TCP Internet
`Connections ... .. .. .... .. .... . ... ......................... ........... .. .. .... ... .... ... ... .... .... .. ............ 134
`Jim Martin, GartnerGroup, Ame Nilsson, Injong Rhee, North Carolina State University
`
`Detecting Shared Congestion of Flows Via End-to-end Measurement ..... ... ..... ....... ... .. ...... , .. .. 145
`Dan Rubenstein, Jim Kurose, Don Towsley, University of Massachusetts
`Best Student Paper Award
`
`Session 5: Network Measurement and Performanc~ Modeling
`
`Measurement and Analysis of LDAP Pe,formance ........ ...... . .. .... ... ........... ......... . .. ....... 156
`Xin Wang, Henning Schulzrinne, Columbia University, Dilip Kandlur, Dinesh Verma, IBM T.J.
`Watson Research Center
`
`IP Packet Generation: Statistical Models for TCP Start Times Based on Connection-Rate
`Superposition ..... ....... .. ... . ..... . ...... ....... ...................... . .. . ... ... .......... ...... .. . .. .... .... 166
`William S. Cleveland, Dong Lin, Don X. Sun, Bell Laboratories Lucent Technologies
`
`On the Impact of Soft Hand-off in Cellular Systems ....... ........... ................ .. ...... .. .......... 178
`Nidhi Hegde, Khosrow Sohraby, University of Missouri-Kansas City
`
`Session 6: Queueing and Performance Evaluation Techniques
`
`Delay Asymptoticsfor a Priority Queueing System .... .... ..... .......... .. ....... ........ ..... ... ... ...... .... 188
`Sanjay Shakkottai, R. Srikant, University of Illinois ·at Urbana-Champaign
`
`A Fast and Accurate Iterative Solution of a Multi-Class Threshold-based Queueing System with
`Hysteresis.. ........ .. ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
`Leana Golubchik, University of Maryland, John C.S. Lui, The Chinese University of Hong Kong
`
`Using the Exact State Space of a Markov Model to Compute Approximate Stationary Measures 207
`Andrew S Miner, Gianfranco Ciardo, College of William and Mary,
`Susanna Donatelli, Universita di Torino
`
`X
`
`HPE, Exh. 1012, p. 6
`
`

`

`AMVA Techniques for High Service Time Variability ......................................................... 217
`Derek L. Eager, Univ. of Saskatchewan, Daniel J. Sorin, Mary K. Vernon, University of Wisconsin
`
`Session 7: Tools and Benchmarking
`
`Efficient Perfonnance Prediction for Modem Microprocessors ....................................... 229
`David Ofelt, Juniper Networks, John Hennessy, Stanford University
`
`Improving Interactive System Perfonnance Using TIPME ............ ................................. 240
`Yasuhiro Endo, Network Appliance, Margo Seltzer, Harvard University
`
`Quantifying the Energy Consumption of a Pocket Computer and a Java Virtual Machine ......... 252
`Keith Farkas, Compaq Western Research Lab, Jason Flinn, Carnegie Mellon University,
`Godmar Back, University of Utah, Dirk Grunwald, University of Colorado,
`Jennifer Anderson, VMware
`
`Session 8: Memory Management and Databases
`
`Memory System Behavior of Java Programs: Methodology and Analysis ............................ 264
`Jin-Soo Kim, Seoul National University, Yarsun Hsu, National Tsing Hua University
`Best Systems Paper Award
`
`An Analytical Model of the Working-Set Sizes in Decision-Support Systems ........................ 275
`Magnus Karlsson, HP Labs, Fredrik Dahlgren, Ericsson, Per Stenstroem, Chalmers University
`
`Towards Application/File-level Characterization of Block References:
`A Case for Fine-Grained Buffer Management ................................................... .......... 286
`Jongmoo Choi, Seoul National University, Sam H. Noh, Hong-lk University, Sang Lyul Min,
`Yookun Cho, Seoul National University
`
`Session 9: Network Routing
`
`Multicast Routing with Bandwidth Guarantees: A New Approach Using Multicast Network
`Flow ......................................... .................................................................................. 296
`Murali S. Kodialam, T. V. Lakshman, Bell Laboratories Lucent Technologies,
`Sudipta Sengupta, MIT
`
`Stable Internet Routing Without Global Coordination ................................................... 307
`Lixin Gao, Smith College, Jennifer Rexford, AT&T Labs Research
`
`An Efficient Algorithm for Finding a Path Subject to Two Additive Constraints ..................... 318
`Turgay Korkmaz, Marwan Krunz, University of Arizona, Spyros Tragoudas, Southern Illinois Univ
`
`Author Index ................................................................................................. 328
`
`xi
`
`HPE, Exh. 1012, p. 7
`
`

`

`Feasibility of a Serverless Distributed File System
`Deployed on an Existing Set of Desktop PCs
`William J. Bolosky, John R. Douceur, David Ely, and Marvin Theimer
`Microsoft Research, Redmond, WA 98052
`{bolosky, johndo, theimer}@microsoft.com; ely@cs.washington.edu
`
`ABSTRACT
`We consider an architecture for a serverless distributed file system
`that does not assume mutual trust among the client computers.
`The system provides security, availability, and reliability by
`distributing multiple encrypted replicas of each file among the
`client machines. To assess the feasibility of deploying this system
`on an existing desktop infrastructure, we measure and analyze a
`In
`large set of client machines in a commercial environment.
`particular, we measure and report results on disk usage and
`content; file activity; and machine uptimes, lifetimes, and loads.
`We conclude that the measured desktop infrastructure would
`passably support our proposed system, providing availability on
`the order of one unfilled file request per user per thousand days.
`Keywords
`Serverless distributed file system architecture, personal computer
`usage data, availability, reliability, security,
`trust, workload
`characterization, analytical modeling, feasibility analysis.
`1. INTRODUCTION
`We present an architecture for a serverless distributed file system
`and assess the feasibility of deploying this system on an existing
`desktop infrastructure. The distinguishing feature of our proposed
`system is that it does not assume the client computers to be
`carefully administered or mutually trusting. Instead, the system
`provides high availability (file access) and high reliability (file
`persistence) by making multiple replicas of each file and
`distributing them among the client machines. To determine how
`well such a system might work, we gathered usage data from a
`large number of client machines at Microsoft Corporation, and we
`use this data to analyze the feasibility of our architecture.
`Our proposed serverless distributed file system is intended to
`provide a global name space for files, location-transparent access
`to both private files and shared public files, and improved
`reliability relative to a desktop workstation.
`These goals can be achieved with a centralized, server-based file
`system, but this has several disadvantages: Servers tend to be
`expensive because they need special hardware, such as high(cid:173)
`performance 1/0 (to support multiple clients simultaneously) and
`RAID disks (for reliability) (21].
`They rely on system
`administrators, in whom the users must place their faith (28], both
`to authorize access and to perform reliability functions, such as
`
`Permission to make digital or hard copies of all or part of this work for
`personal or classroom use is granted without fee provided that
`copies are not made or distributed for profit or commercial advent
`-age and that copies bear this notice and the full citation on the first page .
`To copy otherwise, to republish, to post on servers or to
`redistribute to lists, requires prior specific permission and/or a fee.
`SIGMETRICS 2000 6/00 Santa Clara, California, USA
`© 2000 ACM 1-58113-194-1/00/0006 ••. $5 .00
`
`regular backups. Finally, they are vulnerable to geographically
`localized faults, such as a failed router or a broken network link.
`Serverless distributed file systems have been developed before
`(§ 6), but these have all assumed that client machines are mutually
`trusting. This assumption greatly eases the system design, but we
`believe it is unrealistic, especially in a large-scale system. If we
`can build a system that works without requiring trust, then the
`system can function with no central administration: The owner of
`each machine determines who can store files on it, and owners
`establish contracts with other machine owners to collaboratively
`share their resources. With no central administrator, any user can
`add resources to the system. One can even add userless client
`machines to function as dedicated servers.
`The lack of trust between clients pervades our entire design. We
`need to use cryptographic techniques to ensure data privacy and
`integrity. We also need to create and securely distribute multiple
`replicas of each file throughout the system, both to prevent a
`malicious user from easily destroying all copies of any given file
`and because we cannot expect users to significantly alter their
`behavior with regard to keeping their machines on and available.
`The need for multiple file replicas significantly increases the
`storage demand. However, since disk space is inexpensive and
`becoming more so all the time (25], we consider this an
`acceptable trade-off. Also, previous research [7] has shown a
`significant amount of free storage space on client machines. This
`inspired us to investigate how well our proposed system would
`work if deployed on an existing client-computing infrastructure,
`without adding any new resources to support our system. This
`question is the primary concern of the present paper, which we
`answer by measuring file-system space, machine availability, and
`machine load on client machines in a commercial environment.
`The remainder of this paper describes our system architecture,
`explains our measurement methodology, presents our results, and
`analyzes these results in terms of our proposed architecture.
`2. SYSTEM ARCIDTECTURE
`The principal construct of our proposed system is the global file
`store, which is a logically monolithic file system indexed by a
`hierarchical directory tree. Although logically monolithic, the
`global file store is physically distributed among the disks of the
`client machines participating in the distributed file system.
`Each disk of a participating machine is partitioned into three
`regions: a scratch area, a global storage area, and a local cache.
`The scratch area holds ephemeral data, such as virtual-memory
`paging files and temporary files. The global storage area houses a
`portion of the global file store, which• is accessible by other
`machines in the distributed system. The local cache is of variable
`size, caching all files accessed within a certain period of time (the
`cache retention period), like some in-memory file-system buffer
`caches (19, 22, 26] and unlike traditional fixed-size caches.
`To provide high availability and reliability in a not-fully-trusted
`environment, our proposed system makes multiple replicas of
`
`34
`
`HPE, Exh. 1012, p. 8
`
`

`

`each file and distributes them among the global storage areas of.
`the client machines. We define the replication factor as the
`number of distributed replicas of each file.
`Since file availability and reliability are drastically affected by the
`file replication factor, it is important to maintain a minimum
`number of replicas for each file. To ensure sufficient disk space
`for these replicas, our proposed system enforces quotas [ 16] on
`the space available to the users of the file system. We propose
`that each user's space allotment depend on how much space the
`user has contributed to the global store.
`Part of the local cache and global storage area is reserved for a
`highly available, reliable, global, distributed directory service,
`much like that provided by xFS [1]. Each machine can use this
`service to locate replicas of requested files. This directory must
`be maintained in a strongly consistent fashion, to ensure that
`clients see the latest version of each file.
`2.1 Efficiency Considerations
`Our proposed system can increase the usable space in the global
`file store through two techniques. First, it can compress files
`before storing them and decompress them on the fly when they are
`read [26]. Second, it can coalesce distinct files that happen to
`have identical contents [5]; for example, many users may store
`copies of a common application program in the global file store,
`and these can be coalesced into a single instance, prior to
`replicating this instance among multiple machines. For clarity, we
`refer to logically distinct files with identical content as duplicates,
`and we refer to the file copies our system generates as replicas.
`We plan for our system to employ a lazy-update strategy, meaning
`that the system waits for a short time after a file is written before
`updating the file's replicas. Since a large fraction of written files
`are deleted or overwritten within a few seconds [29], lazy update
`can significantly reduce file-update traffic. Lazy update also
`allows a file to be written without requiring all (or even a quorum)
`of the replicas to be immediately available. The disadvantage of
`this approach is that the content of newly written files will briefly
`reside on only one machine, so loss of that machine will result in
`loss of the update. The directory service must keep track of which
`replicas contain up-to-date data, so users will not accidentally
`access out-of-date versions of files.
`Since file replicas are distributed among multiple machines, our
`system can select which machine should be accessea to service a
`client request [30]. There are two primary considerations in this
`decision: First, the selected machine should be topologically
`close to the machine that is requesting the file, to minimize both
`transmission delay and generated network traffic. Second, the
`selected machine should be lightly loaded, to minimize both read
`delay and performance impact on the sending machine. The
`impact on the sending machine can also be reduced by performing
`non-cached reads and writes, to prevent buffer cache pollution.
`If remote reads of very popular files are targeted at a small
`number of machines, those machines could become overloaded.
`Our system can avoid creating these hotspots by allowing
`machines to copy files from other machines' on-disk local caches.
`As a side benefit, this improves the availability of popular files,
`since the effective number of replicas is substantially increased.
`2.2 Replica Management
`In our proposed system, the selection of machines for storing file
`replicas is driven by the availability of those machines. The
`system measures machine uptimes and distributes replicas so as to
`maximize the minimum file availability. (§ 2.2.1 describes such a
`replica-placement algorithm.)
`
`In selecting locations for replicas of a given file, the system could
`select a set of machines whose uptimes are negatively correlated,
`thus reducing the likelihood that all machines containing a replica
`will be down at the same time. However, our measurements
`(described in § 3.2.2 and reported in § 4.2.3) suggest that this
`would provide little marginal benefit.
`Our system can improve the availability of sets of related files by
`storing them in the same locations. If a user needs a given set of
`files to accomplish a task, then if any of those files is inaccessible,
`the task cannot be completed; therefore, it is beneficial to store
`related files together, so that either all or none of the files are
`available. Our system could attempt to determine relatedness of
`files by observing file access patterns, but we propose a simpler
`approach for at least the initial implementation: Since files
`accessed
`together temporally tend
`to be grouped
`together
`spatially, replicas for all files in a given directory are stored on the
`same set of machines, and entire directories are cached together.
`When a new machine joins the system, its files are replicated to
`the global storage areas on other machines; space for those
`replicas is made by relocating replicas of other files onto the new
`machine. Similarly, when a machine is decommissioned, the
`system creates and distributes additional replicas of the files
`stored on that machine.
`'
`Machines join the system by explicitly announcing their presence;
`however, machines can leave without notice, particularly if they
`leave due to permanent failure, such as a disk-head crash. If the
`system notices that a machine has been down for an extended
`period of time, it must assume that the machine has been
`decommissioned and accordingly generate new
`replicas;
`otherwise, the possibility of another failure can jeopardize the
`reliability of files with replicas on that machine.
`2.2.1 Replica ma.nagement- placement algorithm
`Ideally, replicas should be assigned to machines so as to maximize
`the minimum availability of any file, while also maximizing the
`minimum reliability of any file. The latter goal merely requires
`minimizing the variance of the replica count, whereas the former
`is more involved. Measured logarithmically, the availability of a
`file equals the sum of the availability of all machines that store a
`replica of the file, assuming randomly correlated machine uptime.
`The following heuristic algorithm yields a low availability
`variance: Set the provisional availability of all files to zero; then,
`iteratively select machines in order of decreasing availability; for
`each selected machine, assign the files with the lowest provisional
`availability to the selected machine, and update the provisional
`availability of those files; repeat until all machines are full.
`Unfortunately, the above algorithm does not minimize reliability
`variance, so we modify it as follows: If an assignment of a file to
`a machine would reduce the remaining free space to below that
`necessary for any two files to differ in replica count by at most
`one, abort the above iteration and begin the following procedure:
`Iteratively select machines in order of increasing availability; for
`each selected machine, · identify the file with
`the highest
`provisional availability from those with the lowest replica count,
`assign it to the selected machine; and update the provisional
`availability of that file; repeat until all machines are full.
`2.3 Data Security
`Distributing multiple replicas of a file protects not only against
`accidental failure but also against malicious attack. To destroy a
`file, an adversary must compromise all machines that hold replicas
`of that file. To prevent an adversary from coercing the system
`into placing all replicas of a given file on a small set of machines
`
`35
`
`HPE, Exh. 1012, p. 9
`
`

`

`under the adversary' s control, the system must use secure methods
`for selecting machines to house file replicas.
`File-update messages are digitally signed by the writer to the file.
`Before applying an update to a replica it stores, each machine
`verifies the digital signature on the update and compares the
`writer' s identity to a list of authorized writers associated with the
`file. This prevents an adversary from forging and distributing file
`updates.
`To prevent files from being read by unauthorized users, the
`contents of files are encrypted before they are replicated.
`However, encryption could interfere with the automatic detection
`and coalescing of duplicate files, since different users may encrypt
`identical plaintext files with different keys, which would normally
`produce different ciphertext files. We have developed a
`cryptographic technology, called convergent encryption, that
`allows the detection and coalescing of identical files even when
`these files are encrypted with separate keys. Rather than
`enciphering the contents of a user's files directly with the user's
`key, the contents of each file are one-way hashed, and the
`resulting hash value is used as a key for enciphering the file
`contents. The user' s key is then used to encipher the hash value,
`and this enciphered value is attached to the file as meta-data. The
`user decrypts a file by first deciphering the hash value and then
`deciphering the file using the hash value as a key. With this
`approach, two files with identical plaintext will also have identical
`ciphertext, irrespective of the secret keys used to encrypt them.
`3. METHODOLOGY
`To analyze the feasibility of deploying our proposed system on an
`existing computing infrastructure, we collected usage data from a
`large number of desktop personal computers at Microsoft
`Corporation. We measured contents of file systems, availability
`of machines, and load on machines.
`To judge the breadth of applicability of our results, we partitioned
`our measurements into six categories by the job function of each
`machine's owner: administration (clerical), business (marketing,
`accounting,
`legal), management, non-technical development
`(writers, artists), technical development (programmers, testers),
`and technical support. Roughly half of all machines belong to
`technical developers; the other half are distributed approximately
`equally among the remaining categories.
`3.1 File System Measurement
`We measured a set of file systems to determine the amount of disk
`space that is free and the amount of disk space that would be free
`if all duplicate files were eliminated. We also use the measured
`data to estimate the rate of ·cache misses, the size of local system
`caches, and the rate of file writes.
`3.1.1 Disk usage measurement
`In
`We collected two data sets for our analysis of \:Jisk usage.
`September 1998, we asked Microsoft employees to run a scanning
`program on their Windows and Windows NT computers that
`collected directory information (file names, sizes, and timestamps)
`from their file systems [7]. By this means, we obtained
`measurements of I0,568 file systems [8].
`In August 1999, we remotely read the performance counters (free
`disk space, total disk space, and Iogon name of the primary user)
`of every Windows NT computer we could access. By this means,
`we obtained measurements of 8669 file systems.
`3.1.2 Disk content measurement
`In February 1999, we asked a random subset of the participants in
`the 1998 study to run a scanning program that computed and
`
`recorded hashes of all the files on their file systems. By this
`means, we obtained data from 550 file systems, which we used to
`determine the amount of duplicate file content.
`3.1.3 File access rate estimation
`We used the timestamps in the 1998 data set to estimate the rate at
`which files are read and written. We determined each file's last
`read time as the most recent timestamp (create, update, or access)
`on the file, and its last write time as the most recent of the create
`and update timestamps.
`There are three aspects of file access rate that are relevant to our
`design: the miss rate of the local file cache, the size of the local
`file cache, and the amount of write traffic sent to file replicas on
`other machines. We consider only files that would be stored in
`the global file store, so we excluded inherently local files,
`including the system paging file and those files in temporary
`directories or the Internet browser cache, all of which account for
`2% of the bytes in the data set.
`Cache miss rate is a function of the cache retention period. We
`estimate miss rate as follows: For a cache retention period of n
`days, the files in directories last accessed n+ 1 days ago will exit
`the cache on the current day. Over the long term, the rate at
`which files exit the cache must match the rate at which they enter
`the cache, so the rate of file exit approximates the rate of file
`entry. Since a file enters the cache only in response to a miss, the
`cache entry rate equals the cache miss rate.
`We estimate the cache size similarly. For a cache retention period
`of n days, the size of the local cache is equal to the sum of the file
`sizeli in all directories accessed in the last n days.
`Write traffic rate is a function of the lazy-update propagation lag.
`We estimate write traffic rate as follows: For a propagation lag of
`n hours, files last written between 2n hours ago and n hours ago
`will have been propagated during the past n hours.
`3.2 Machine Availability Measurement
`To determine file availability in our proposed system, we need to
`know machine uptimes, whether these uptimes are consistent over
`the uptimes of different machines are
`time, and whether
`correlated. When our proposed system sees a mac

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket