`
`Technical Conference
`
`San Diego, California
`
`January 22-26, 1996
`
`re
`
`ISENiI
`
`The UNIX® and Advanced
`Computing Systems Professional
`and Technical Association
`
`Netflix, Inc. - Ex. 1014, Page 000001
`
`IPR2021-01319 (Netflix, Inc. v. CA, Inc.)
`
`
`
`For additional copies of these proceedings, write:
`
`USENIX Association
`2560 Ninth Street, Suite 215
`Berkeley, CA 94710 USA
`
`The price is $32 for members and $40 for nonmembers.
`Outside the USA and Canada, please add
`$18 per copy for postage (via air printed matter).
`
`Past USENIX Technical Conferences
`
`1995 New Orleans
`1994 Summer Boston
`1994 Winter San Francisco
`1993 Summer Cincinnati
`1993 Winter San Diego
`1992 Summer San Antonio
`1992 Winter San Francisco
`1991 Summer Nashville
`199 I Winter Dallas
`1990 Summer Anaheim
`1990 Winter Washington, DC
`1989 Summer Baltimore
`1989 Winter San Diego
`
`1988 Summer San Francisco
`1988 Winter Dallas
`1987 Summer Phoenix
`1987 Winter Washington, DC
`1986 Summer Atlanta
`1986 Winter Denver
`1985 Summer Portland
`1985 Winter Dallas
`1984 Summer Salt Lake City
`1984 Winter Washington, DC
`1983 Summer Toronto
`1983 Winter San Diego
`
`1996 © Copyright by The USENIX Association
`All Rights Reserved.
`
`This volume is published as a collective work. Rights to individual
`papers remain with the author or the author's employer. Permission is
`granted for the noncommercial reproduction of the complete work for
`educational or research purposes. USENIX acknowledges all trade(cid:173)
`marks herein.
`
`ISBN 1-880446-76-6
`
`ft
`Printed in the United States of America on 50% recycled paper, 10-15% post consumer waste. '-.I
`
`Netflix, Inc. - Ex. 1014, Page 000002
`
`
`
`The USENIX Association
`
`Proceedings of the
`USENIX 1996 Annual Technical Conference
`
`January 22- 26, 1996
`San Diego, California, USA
`
`Netflix, Inc. - Ex. 1014, Page 000003
`
`
`
`TABLE OF CONTENTS
`
`USENIX 1996 Annual Technical Conference
`January 22-26, 1996
`San Diego, California
`
`Wednesday, January 24
`
`Keynote Address: Nature and Nurture: The Interplay of UNIX and Networking
`Van Jacobson, Lawrence Berkeley National Laboratory
`
`FILE SYSTEMS
`
`llam-12:30pm
`
`Session Chair: John Ousterhout, Sun Microsystems Laboratories
`
`Scalability in the XFS File System ...................................................... :....................................................................... 1
`Adam Sweeney, Don Doucette, Wei Hu, Curtis Anderson, Mike Nishimoto, and Geoff Peck,
`Silicon Graphics
`
`A Comparison of FFS Disk Allocation Policies .......................................................................................................... 15
`Keith A. Smith and Margo Seltzer, Harvard University
`
`AFRAID--A Frequently Redundant Array of Independent Disks ............................................................................... 27
`Stefan Savage, University of Washington; John Wilkes, Hewlett-Packard Laboratories
`
`OS EXTENSIONS
`
`2pm-3:30pm
`
`Session Chair: Darrell Long, University of California, Santa Cruz
`
`A Comparison of OS Extension Technologies ............................................................................................................. 41
`Christopher Small and Margo Seltzer, Harvard University
`
`An Extensible Protocol Architecture for Application-Specific Networking .............................................................. 55
`Marc E. Fiuczynski and Brian N. Ber shad, University of Washington
`
`Linux Device Driver Emulation in Mach .................................................................................................................... 65
`Shantanu Goel and Dan Duchamp, Columbia University
`
`MULTIMEDIA
`
`4pm-5pm
`
`Session Chair: Brent Welch, Sun Microsystems Laboratories
`
`Calliope: A Distributed, Scalable Multimedia Server ................................................................................................. 75
`Andrew Heybey, Mark Sullivan, and Paul England, Bellcore
`
`Simple Continuous Media Storage Server on Real-Time Mach .................................................................................. 87
`Hiroshi Tezuka' and Tatsuo Nakajima, Japan Advanced Institute of Science and Technology
`
`Netflix, Inc. - Ex. 1014, Page 000004
`
`
`
`Thursday, January 25
`
`NETWORK SESSION
`
`9am-10:30am
`
`Session Chair: John Kohl, Atria Software
`
`Eliminating Receive Livelock in an Interrupt-driven Kernel... .................................................................................... 99
`Jeffrey C. Mogul, Digital Equipment Corporation Western Research Laboratory;
`K. K. Ramakrishnan, AT&T Bell Laboratories
`
`Implementation of 1Pv6 in 4.4 BSD ............................................................................................................................. 113
`Randall J. Atkinson, Daniel L. McDonald, and Bao G. Phan, Naval Research Laboratory;
`Craig W. Metz, Kaman Sciences Corporation; Kenneth C. Chin, Naval Research Laboratory
`
`Supporting Mobility in MosquitoNet ........................................................................................................................... 127
`Mary Baker, )(_inhua Zhao, Stuart Cheshire, and Jonathan Stone, Stanford University
`
`WEB
`
`llam-12:30pm
`
`Session Chair: Christopher Small, Harvard University
`
`World Wide Web Cache Consistency .......................................................................................................................... 141
`James Gwertzman and Margo Seltzer, Harvard University
`
`A Hierarchical Internet Object Cache .......................................................................................................................... 153
`Anawat Chankhunthod, Peter B. Danzig, and Chuck Neerdaels, University of Southern California;
`Michael F. Schwartz and Kurt J. Worrell, University of Colorado, Boulder
`
`Tracking and Viewing Changes on the Web ................................................................................................................ 165
`Fred Doug/is and Thomas Ball, AT&T Bell Laboratories
`
`DISTRIBUTED SYSTEMS
`
`4pm-6pm
`
`Session Chair: David Black, Open Software Foundation Research Institute
`
`Implementation of a Reliable Remote Memory Pager ................................................................................................. 177
`Evangelos P. Markatos and George Dramitinos, Institute of Computer Science, FORTH, Crete
`
`Solaris MC: A Multi Computer OS .............................................................................................................................. 191
`Yousef A. Khalidi, Jose M. Bernabeu, Vlada Matena, Ken Shirriff, and Moti Thadani,
`Sun Microsystems Laboratories
`
`A New Approach to Distributed Memory Management in the Mach Microkernel ..................................................... 205
`Stephan Zeisset, Stefan Tritscher, and Martin Mairandres, Intel GmbH
`
`Fault Tolerance in a Distributed CHORUS/MiX System ............................................................................................ 219
`Sunil Kittur, Online Media; Douglas Steel, /CL High Performance Systems; Francois Armand and
`Jim Lipkis, Chorus Systems
`
`Netflix, Inc. - Ex. 1014, Page 000005
`
`
`
`Friday, January 26
`
`PROTOCOLS
`
`9am-10:30am
`
`Session Chair: Jonathan Smith, University of Pennsylvania
`
`FLIPC: A Low Latency Messaging System for Distributed Real Time Environments ............................................... 229
`David L. Black, Randall D . Smith, Steven J. Sears, and Randall W. Dean, Open Software
`Foundation Research Institute
`
`An Analysis of Process and Memory Models to Support High-Speed Networking in a UNIX Environment... .......... 239
`B. J. Murphy, University of Cambridge; S. Zeadally and C. J. Adams, University of Buckingham
`
`Zero-Copy TCP in Solaris ............................................................................................................................................ 253
`Hsiao-keng Jerry Chu, SunSoft, Inc.
`
`PERFORMANCE
`
`llam-12:30pm
`
`Session Chair: Miehe Baker-Harvey, Digital Equipment Corporation
`
`A Performance Comparison of UNIX Operating Systems on the Pentium .................................................................. 265
`Kevin Lai and Mary Baker, Stanford University
`
`lmbench: Portable Tools for Performance Analysis ..................................................................................................... 279
`Larry McVoy, Silicon Graphics; Carl Staelin, Hewlett-Packard Laboratories
`
`Process Labeled Kernel Profiling: A New Facility to Profile System Activities ......................................................... 295
`Shingo Nishioka, Atsuo Kawaguchi, and Hiroshi Motoda, Advanced Research Laboratory,
`Hitachi Ltd.
`
`POTPOURRI
`
`2pm-4pm
`
`Session Chair: Matt Blaze, AT&T Bell Laboratories
`
`Cut-and-Paste File-Systems: Integrating Simulators and File-Systems ....................................................................... 307
`Peter Bosch and Sape J. Mu/lender, Universiteit Twente, Netherlands
`
`Predicting File-System Actions From Prior Events ...................................................................................................... 319
`Thomas M. Kroeger and Darrell D. E. Long, University of California, Santa Cruz
`
`Transparent Fault Tolerance for Parallel Applications on Networks of Workstations ................................................. 329
`Daniel J. Scales, Digital Equipment Corporation Western Research Laboratory; Monica S. Lam,
`Stanford University
`
`Why Use a Fishing Line When you Have a Net? An Adaptive Multicast Data Distribution Protocol... ..................... 343
`Jeremy R. Cooperstock and Steve Kotsopoulos, University of Toronto
`
`Netflix, Inc. - Ex. 1014, Page 000006
`
`
`
`A Hierarchical Internet Object Cache
`
`Anawat Chankhunthod
`Peter B. Danzig
`Chuck Neerdaels
`Computer Science Department
`University of Southern California
`
`Michael F. Schwartz
`Kurt J. Worrell
`Computer Science Department
`University of Colorado - Boulder
`
`Abstract: This paper discusses the design and performance
`of a hierarchical proxy-cache designed to make Internet in(cid:173)
`formation systems scale better. The design was motivated by
`our earlier trace-driven simulation study of Internet traffic.
`We challenge the conventional wisdom that the benefits of
`hierarchical file caching do not merit the costs, and believe
`the issue merits reconsideration in the Internet environment.
`The cache implementation supports a highly concurrent
`stream of requests. We present performance measurements
`that show that our cache outperforms other popular Inter(cid:173)
`net cache implementations by an order of magnitude under
`concurrent load. These measurements indicate that hierar(cid:173)
`chy does not measurably increase access latency. Our soft(cid:173)
`ware can also be configured as a Web-server accelerator; we
`present data that our httpd-aeeeleratoris ten times faster than
`Netscape's Netsite and NCSA 1.4 servers.
`Finally, we relate our experience fitting the cache into the
`increasingly complex and operational world of Internet in(cid:173)
`fonnation systems, including issues related to security, trans(cid:173)
`parency to cache-unaware clients, and the role of file systems
`in support of ubiquitous wide-area information systems.
`
`Introduction
`1
`Perhaps for expedience or because software developers per(cid:173)
`ceive network bandwidth and connectivity as free commodi(cid:173)
`ties, Internet information services like FTP, Gopher, and
`WWW were designed without caching support in their core
`protocols. The consequence of this misperception now haunts
`popular WWW and FTP servers. For example, NCSA, the
`home of Mosaic, moved to a multi-node cluster of servers to
`meet demand. NASA's Jet Propulsion Laboratory wide-area
`network links were saturated by the demand for Shoemaker(cid:173)
`Levy 9 comet images in July I 994, and Starwave corporation
`runs a five-node SPARC-center I 000 just to keep up with de(cid:173)
`mand for college basketball scores. Beyond distributing load
`away from server "hot spots", caching can also save band- ·
`width, reduce latency, and protect the network from clients
`that erroneously loop and generate repeated requests [9].
`This paper describes the design and perfonnance of the
`Harvest (5) cache, which we designed to make Internet in(cid:173)
`fonnation services scale better. The cache implementation is
`
`optimized to support a highly concurrent stream of requests
`with minimal queuing for OS-level resources, using non(cid:173)
`blocking UO, application-level threading and virtual memory
`management, and a Domain Naming System (DNS) cache.
`Because of its high performance, the Harvest cache can also
`be paired with existing HTTP servers (httpd's) to increase
`document server throughput by an order of magnitude.
`Individual caches can be interconnected hierarchically to
`mirror an internetwork's topology, implementing the design
`motivated by our earlier NSFNET trace-driven simulation
`· study (10).
`
`1.1 Hierarchical Web versus File System Caches
`Our 1993 study of Internet traffic showed that hierarchical
`caching of FTP files could eliminate half of all file transfers
`over the Internet's wide-area network links. (1 OJ. In contrast,
`the hierarchical caching studies of Blaze and Alonso [2] and
`Muntz and Honeyman (17) showed that hierarchical caches
`can, at best, achieve 20% hit rates and cut file server workload
`in half. We believe the different conclusions reached by our
`study and these two file system studies lay in the workloads
`traced.
`Our study traced wide-area FTP traffic from a switch near
`the NSFNET backbone. In contrast, Blaze and Alonso (2)
`and Muntz and Honeyman [ 17) traced LAN workstation file
`system traffic. While workstation file systems share a large,
`relatively static collection of files, such as gee, the Internet
`exhibits a high degree of read-only sharing among a rapidly
`evolving set of popular objects. Because LAN utility files
`rarely change over a five day period, both [17) and (2) studies
`found little value of hierarchical caching over flat file caches
`at each workstation: After the first reference to a shared
`file, the file stayed in the local cache indefinitely and the
`upper-level caches saw low hit rates.
`In contrast to workstation file systems, FTP, WWW, and
`Gopher facilitate read-only sharing of autonomously owned
`and rapidly evolving object spaces. Hence, we found that
`over half of NSFNET FTP traffic is due to sharing of read(cid:173)
`only objects [10] and, since Internet topology tends to be
`organized hierarchically, that hierarchical caching can yield
`a 50% hit rate and reduce server load dramatically. Claffy
`and Braun reported similar statistics for WWW traffic [7],
`which has displaced FTP traffic as the largest contributor to
`Internet traffic. . .
`Second, the cost of a cache miss is much lower for In(cid:173)
`ternet information systems than it is for traditional caching
`applications. Since a page fault can take 105 times longer to
`service than hitting RAM, the RAM hit rate must be 99.99%
`to keep the average access speed at twice the cost of a RAM
`
`1996 USENIX Technical Conference - January 22-26, 1996 - San Diego, CA
`
`153
`
`Netflix, Inc. - Ex. 1014, Page 000007
`
`
`
`hit. In contrast, the typical miss-to-hit cost ratio for Internet
`
`information systems is I 0: I 1, and hence a 50% hit ratio will
`suffice to keep the average cost at twice the hit cost.
`Finally, Internet object caching addresses more than la(cid:173)
`tency reduction. As noted above and in the file system papers,
`hierarchical caching moves load from server hot spots. Not
`mentioned in the file system papers, many commercial sites
`proxy all access to the Web and Ff P space through proxy
`caches, out of concern for Internet security. Many Internet
`sites are forced to use hierarchical object caches.
`The Harvest cache has been in use for 1.5 years by a
`growing collection of about I 00 sites across the Internet, as
`both a proxy-cache and as an httpd-accelerator. Our expe(cid:173)
`riences during this time highlight several important issues.
`First, cache policy choices are made more difficult because
`of the prevalence of information systems that provide neither
`a standard means of setting object Time-To-Live (TTL) val(cid:173)
`ues, nor a standard for specifying objects as non-cacheable.
`For example, it is popular to create WWW pages that modify
`their content each time they are retrieved, by returning the
`date or access count. Such objects should not be cached.
`Second, because it is used in a wide-area network environ(cid:173)
`ment (in which link capacity and congestion vary greatly),
`cache topology is important. Third, because the cache is
`used in an administratively decentralized environment, se(cid:173)
`curity and privacy are important. Fourth, the widespread use
`of location-dependent names (in the form of Uniform Re(cid:173)
`source Locators, or URLs) makes it difficult to distinguish
`duplicated or aliased objects. Finally, the large number of
`implementations of both clients and servers leads to errors
`that worsen cache behavior.
`We discuss these issues in more depth below.
`
`2 Design
`This section describes our design to make the Harvest cache
`fast, efficient, portable, and transparent.
`
`2.1 Cache Hierarchy
`To reduce wide-area network bandwidth demand and to re(cid:173)
`duce the load on Internet information servers, caches resolve
`misses through other caches higher in a hierarchy, as illus(cid:173)
`trated in Figure I. In addition to the parent-child relation(cid:173)
`ships illustrated in this figure, the cache supports a notion of
`siblings. These are caches at the same level in the hierarchy,
`provided to distribute cache server load.
`Each cache in the hierarchy independently decides whether
`to fetch the reference from the object's home site or from its
`parent or sibling caches, using a simple resolution protocol
`that works as follows.
`If the URL contains any of a configurable list of sub(cid:173)
`strings, then the object is fetched directly from the object's
`home, rather than through the cache hierarchy. This feature
`is used to force the cache to resolve non-cacheable ("cgi(cid:173)
`bin") URLs and local URLs directly from the object's home.
`Similarly, if the URL's domain name matches a configurable
`list of substrings, then the object is resolved through the
`particular parent bound to that domain.
`Otherwise, when a cache receives a request for a URL
`that misses, it performs a remote procedure call to all of its
`siblings and parents, checking if the URL hits any sibling or
`
`1 This rough estimate is based on the observation that it takes about one
`second for a browser like Netscape to load an object from disk and render it
`for display, while a remote object takes about IO seconds to be retrieved and
`displayed.
`
`• OqectCache
`
`Bacl<bone -
`
`Figure 1 : Hierarchical Cache Arrangement.
`
`parent. The cache retrieves the object from the site with the
`lowest measured latency.
`Additionally, a cache option can be enabled that tricks the
`referenced URL's home site into implementing the resolution
`protocol. When this option is enabled, the cache sends a "hit"
`message to the UDP echo port of the object's home machine.
`When the object's home echos this message, it looks to the
`cache like a hit, as would be generated by a remote cache
`that had the object. This option allows the cache to retrieve
`the object from the home site if it happens to be closer than
`any of the sibling or parent caches.
`A cache resolves a reference through the first sibling,
`parent, or home site to return a UDP "Hit" packet or through
`the first parent to return a UDP "Miss" message if all caches
`miss and the home's UDP "Hit" packet fails to arrive within
`two seconds. However, the cache will not wait for a home
`machine to time out; it will begin transmitting as soon as
`all of the parent and sibling caches have responded. The
`resolution protocol's goal is for a cache to resolve an object
`through the source (cache or home) that can provide it most
`efficiently. This protocol is really a heuristic, as fast response
`to a ping indicates low latency. We plan'to evolve to a metric
`that combines both response time and available bandwidth.
`As will be shown in Section 3.5, hierarchies as deep as
`three caches add little noticeable access latency. The only
`case where the cache adds noticeable latency is when one
`of its parents fail, but the child cache has not yet detected
`it. In this case, references to this object are delayed by two
`seconds, the parent-to-child cache timeout. As the hierarchy
`deepens, the root caches become responsible for more and
`more clients. To keep root caches servers from becoming
`overloaded, we recommend that the hierarchy terminate at
`the first place in the regional or backbone network where
`bandwidth is plentiful.
`
`2.2 Cache Access Protocols
`The cache supports three access protocols: encapsulating,
`connectionless, and proxy-http. The encapsulating proto(cid:173)
`col encapsulates cache-to-cache data exchanges to permit
`
`154
`
`1996 USENIX Technical Conference - January 22-26, 1996 - San Diego, CA
`
`Netflix, Inc. - Ex. 1014, Page 000008
`
`
`
`d-to-end error detection via checksums and, eventually,
`digital signatures. This protocol also enables a parent cache
`to transmit an object's remaining time-to-live to the child
`cache. The cache uses the UDP-based connectionless proto(cid:173)
`col to implement the parent-child resolution protocol. This
`protocol also permits caches to exchange small objects with(cid:173)
`out establishing a TCP connection, for efficiency. While
`the encapsulating and connectionless protocols both support
`end-to-end reliability, the proxy-http protocol is the proto(cid:173)
`col supported by most Web browsers. In that arrangement,
`clients request objects via one of the standard information
`access protocols (FfP, Gopher, or HTTP) from a cache pro(cid:173)
`cess. The term "proxy" arose· because the mechanism was
`primarily designed to allow clients to interact with the WWW
`from behind a firewall gateway.
`
`Cacheable Objects
`The wide variety of Internet information systems leads to
`a number of cases where objects should not be cached.
`In the absence of a standard for specifying TTLs in ob(cid:173)
`jects themselves, the Harvest cache chooses not to cache
`a number of types of objects. For example, objects that
`are password protected are not cached. Rather, the cache
`acts as an application gateway and discards the retrieved
`object as soon as it has been delivered. The cache simi(cid:173)
`larly discards URL's whose name implies the object is not
`cacheable (e.g. http://www.xyz.com/cgi-bin/query.cgi). for
`details about cacheable objects.). It is possible to limit the
`size of the largest cacheable object, so that a few large Ff P
`objects do not purge ten thousand smaller objects from the
`cache.
`
`Unique Object Naming
`A URL does not name an object uniquely; the URL plus
`the MIME 2 header issued with the request uniquely identify
`an object. For example, a WWW server may return a text
`version of a postscript object if the client's browser is not
`able to view postscript. We believe that this capability is not
`used widely, and currently the cache does not insist that the
`request MIME headers match when a request hits the cache.
`
`2.5 Negative Caching
`To reduce the costs of repeated failures (e.g., from erro(cid:173)
`neously looping clients), we implemented two forms of neg(cid:173)
`ative caching. First, when a DNS lookup failure occurs,
`we cache the negative result for five minutes (chosen be(cid:173)
`cause transient Internet conditions are typically resolved this
`quickly). Second, when an object retrieval failure occurs, we
`cache the negative result for a parameterized period of time,
`with a default of five minutes.
`
`2.6 Cache-Awareness
`When we started designing the cache, we anticipated cache(cid:173)
`aware clients that would decide between resolving an object
`indirectly through a parent cache or directly from the object's
`home. Towards this end, we created a version of Mosaic that
`could resolve objects through multiple caches, as illustrated
`in Figure 2. Within a few months, we reconsidered and
`dropped this idea as the number of new Web clients blos(cid:173)
`somed (cello, lynx, netscape, tkwww, etc.).
`While no Web client is completely cache-aware, most
`support access through IP firewalls. Clients send all their
`
`2MIME stands for "Multipurpose Internet Mail Extensions". It was orig(cid:173)
`inally developed for multimedia mail systems [4], but was later adopted by
`HTTP for passing typing and other meta data between clients and servers.
`
`Client i--------G
`\~
`
`[ Cache]
`
`[ Cache]
`
`[ Cache]
`
`Figure 2: Cache-aware client
`
`requests to their proxy-server, and the proxy-server decides
`how best to resolve it.
`There are advantages and disadvantages to the cache(cid:173)
`aware and cache-unaware approaches. Cache-unaware clients
`are simpler to configure; just set the proxy bindings that
`users already understand and which are needed to provide
`Web service through firewalls. On the other hand, cache(cid:173)
`aware clients would permit load balancing, avoid the single
`point of failure caused by proxy caching, and (as noted in
`Section 2.2) allow a wider range of security and end-to-end
`error checking.
`
`2.7 Security, Privacy, and Proxy-Caching
`What is the effect of proxy-caching on Web security and pri(cid:173)
`vacy? WWW browsers support various authorization mech(cid:173)
`anisms, all encoded in MIME headers exchanged between
`browser and server. The basic authorization mechanism
`involves clear-text exchange of passwords. For protection
`from eavesdropping, the Public Key authorization mecha(cid:173)
`nism is available. Here, the server announces its own public
`key in clear-text, but the rest of the exchange is encrypted
`for privacy. This mechanism is vulnerable to IP-spoofing,
`where a phony server can masquerade as the desired server,
`but the mechanism is otherwise invulnerable to eavesdrop(cid:173)
`pers. Thirdly, for those who want both privacy and authen(cid:173)
`tication, a PGP based mechanism is available, where public
`key exchange is done externally.
`A basic authentication exchange follows the following
`dialog:
`
`1 . Cl i e nt : GET <URL>
`
`2 . Serv er: HTTP :1 .0 4 01 Unau t horized
`Aut h e n tication f ai l e d
`
`3 . Client : GET <URL> Au thori zation :
`<7 - bit-encode d narne :passwo rd>
`
`4 . Ser ver: <r eturns a , b, cor d>
`a. Reply
`b. Unauthori z ed 4 0 1
`c . Forbidden 4 03
`d. Not Found 404
`
`Given the above introduction to HTTP security mecha(cid:173)
`nisms, we now explain how the cache transparently passes
`this protocol between browser and server.
`When a server passes a 40 1 Unauthorized message
`to a cache, the cache forwards it back to the client and purges
`the URL from the cache. The client browser, using the de(cid:173)
`sired security model, prompts for a username and password,
`and reissues the GET URL with the authentication and au(cid:173)
`thorization encoded in the request MIME header. The cache
`detects the authorization-related MIME header, treats it as
`
`USENIX Technical Conference - January 22-26, 1996 - San Diego, CA
`
`155
`
`Netflix, Inc. - Ex. 1014, Page 000009
`
`
`
`any other kind of non-cacheable object, returns the retrieved
`document to the client, but otherwise purges all records of
`the object. Note that under the clear-text basic authorization
`m_odel, anyone, including the cache, could snoop the autho(cid:173)
`rization data. Hence, the cache does not weaken this already
`weak model. Under the Public Key or PCP based models,
`neither the cache nor other eavesdroppers can interpret the
`authentication data.
`Proxy-caching defeats IP address-based authentication,
`since the requests appear to come from the cache's IP address
`rather than the client's. However, since IP addresses can be
`spoofed, we consider this liability an asset of sorts. Proxy(cid:173)
`caching does not prevent servers from encrypting or applying
`digital signature to their documents.
`As a final issue, unless Web objects are digitally signed,
`an unscrupulous system administrator could insert invalid
`data into his proxy-cache. You have to trust the people
`who run your caches, just as you must trust the people who
`run your DNS servers, packet switches, and route servers.
`Hence, proxy-caching does not seriously weaken Web pri(cid:173)
`vacy.
`
`2.8 Threading
`For efficiency and portability across UNIX-like platforms,
`the cache implements its own non-blocking disk and network
`UO abstractions directly atop a BSD select loop. The cache
`avoids forking except for misses to FTP URLs; we retrieve
`FTP URLs via an external process because the complexity
`of the protocol makes it difficult to fit into our select loop
`state machine. The cache implements its own DNS cache
`and, when the DNS cache misses, performs non-blocking
`DNS lookups (although without currently respecting DNS
`TTLs). As referenced bytes pour into the cache, these bytes
`are simultaneously forwarded to all sites that referenced the
`same object and are written to disk, using non-blocking 1/0.
`The only way the cache will stall is ifit takes a virtual memory
`page fault-and the cache avoids page faults by managing the
`size of its VM image (see Section 2.9). The cache employs
`non-preemptive, run-to-completion scheduling internally, so
`it has no need for file or data structure locking. However, to
`its clients, it appears multi-threaded.
`
`2.9 Memory Management
`The cache keeps all meta-data for cached objects (URL,
`TTL, reference counts, disk file reference, and various flags)
`in virtual memory. This consumes 48 bytes + strlen(URL)
`per object on machines with 32-bit words 3
`. The cache will
`also keep exceptionally hot objects loaded in virtual memory,
`if this option is enabled. However, when the quantity of VM
`dedicated to hot object storage exceeds a parameterized high
`water mark, the cache discards hot objects by LRU until
`VM usage hits the low water mark. Note that these objects
`still reside on disk; just their VM image is reclaimed. The
`hot-object VM cache is particularly useful when the cache is
`deployed as an httpd-accelerator (discussed in Section 3.1 ).
`The cache is write-through rather than write-back. Even
`objects in the hot-object VM cache appear on disk. We
`considered memory-mapping the files that represent objects,
`but could not apply this technique because it would lead
`to page-faults. Instead, objects are brought into cache via
`non-blocking UO, despite the extra copies.
`
`Jwe plan to replace the variable length URL with a fi xed length MDS,
`reducing this number to 44+ 16=60 bytes/object.
`
`Objects in the cache are referenced via a hash table keyed
`by URL. Cacheable objects remain cached until their cache(cid:173)
`assigned TTL expires and they are evicted by the cache re(cid:173)
`placement policy, or the user manually evicts them by click(cid:173)
`ing the browser's "reload" button (the mechanism for which
`is discussed in Section 5.1 ). If a reference touches an ex(cid:173)
`pired Web object, the cache refreshes the object's TTL with
`an HTTP "get-if-modified".
`The cache keeps the URL and per-object data structures
`in virtual memory but stores the object itself on disk. We
`made this decision on the grounds that memory should buy
`performance in a server-bottlenecked system: the meta-data
`for 1,000,000 objects will consume 60-80MB of real mem(cid:173)
`ory. If a site cannot afford the memory, then it should use a
`cache optimized for memory space rather than performance.
`
`2.10 Disk Management
`When disk space exceeds the high water mark, the cache
`enters its garbage collection mode. In this mode, every few
`dozen cache references, the cache discards the oldest objects
`encountered in a row of its object hash table. When disk
`usage drops below the low water mark, the cache exits from
`its garbage collection mode. If disk usage ever reaches the
`configured maxmimum, it immediately discards the oldest
`objects from the next row of the hash table.
`The cache manages multiple disks and attempts to bal(cid:173)
`ance load across them. It creates 100 numbered directories
`on each disk and rotates object creation among the vari(cid:173)
`ous disks and directories. Hence, an average directory of a
`cache that manages four disks and a million objects, stores
`2500 numbered files. Since directory entries average about
`24 bytes, an average directory may grow to 15, 4KB disk
`blocks. The number of files per directory can be increased