`Aviani, Jr.
`
`19
`
`54) DATA TRANSMISSION OVER THE
`INTERNET USINGA CACHE MEMORY FILE
`SYSTEM
`
`75 Inventor: James A. Aviani, Jr., Santa Barbara,
`Calif.
`
`73 Assignee: Cisco Technology, Inc., San Jose, Calif.
`
`21 Appl. No.: 08/937,966
`22 Filed:
`Sep. 25, 1997
`(51) Int. Cl. ................................................ G06F 17/30
`52 U.S. Cl. ............................... 707/103; 707/205; 711/3;
`711/135
`58 Field of Search ..................................... 707/103, 205;
`710/5, 36; 711/129, 162, 134, 135; 34.5/502;
`348/7; 395/670, 674, 675, 200.49, 200.61
`
`56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`4,827,411 5/1989 Arrowood et al. ..................... 364/300
`4,965,772 10/1990 Daniel et al. .....
`... 364/900
`5,357,618 10/1994 Mirza et al. ................................ 711/3
`5,446,839 8/1995 Dea et al. ............................... 34.5/502
`5,555,244 9/1996 Gupta et al. ..
`... 370/60.1
`5,673,265 9/1997 Gupta et al. ......
`... 370/432
`5,680,573 10/1997 Rubin et al. ............................ 711/129
`5,805,821 9/1998 Saxena et al. ..................... 395/200.61
`5,829,046 10/1998 Tzelnic et al. .......................... 711/162
`5,832,297 11/1998 Ramagopal et al..................... 395/825
`5,838,916 11/1998 Domenikos et al. .
`... 395/200.49
`5,890,100
`1/1999 Feiste et al. ............................ 711/135
`5,890,169 3/1999 Wong et al. ............................ 707/206
`OTHER PUBLICATIONS
`Valloppillil, Vinod, “Cache Array Routing Protocol v1.0”,
`Oct. 20, 1997, Internet-Draft, http://ds1.internic/net/inter
`net-drafts/draft-Vinod-carp-v1-02.txt, pp. 1-6.
`
`
`
`USOO5950205A
`Patent Number:
`11
`(45) Date of Patent:
`
`5,950,205
`Sep. 7, 1999
`
`Cisco Technology, Inc., “Configuring IP Routing Protocols,”
`Dec. 10, 1997, http://www.cisco.com/univerca/data/doc/
`Software/11 2/cnp1/5ciprout.htm#REF40277, pp. 1-6,
`120-122.
`Ousterhout, John K., “A Trace-Driven Analysis of the
`UNIX 4.2 BSD File System,” Jan. 2, 1993, Computer
`Science Division, Electrical Engineering and Computer
`Science, University of California, Berkeley, CA, pp. 1-12.
`Welch, Brent, “A Comparison of the Vinode and Sprite File
`System Architectures,” Proceedings of the File System
`Workshop, May 1992, pp. 29-44.
`Ousterhout, John K., “Beating the I/O Bottleneck: A Case
`for Log-Structured File Systems,” Jan. 30, 1992, Computer
`Science Division, Electrical Engineering anc Computer Sci
`ences, University of California, Berkeley, CA, pp. 1-17.
`Primary Examiner Paul R. Lintz
`ASSistant Examiner Ella Colbert
`Attorney, Agent, or Firm Beyer & Weaver, LLP
`57
`ABSTRACT
`A method for Storing a plurality of multimedia objects in a
`cache memory is described. First ones of the multimedia
`objects are written into the cache memory Sequentially from
`the beginning of the cache memory in the order in which
`they are received. When a first memory amount from a most
`recently stored one of the first multimedia objects to the end
`of the cache memory is insufficient to accommodate a new
`multimedia object, the new multimedia object is written
`from the beginning of the cache memory, thereby writing
`over a previously stored one of the first multimedia objects.
`Second ones of the multimedia objects are then written into
`the cache memory Sequentially following the new multime
`dia object in the order in which they are received, thereby
`writing over the first ones of the multimedia objects. This
`cycle is repeated, thereby maintaining a Substantially full
`cache memory.
`
`24 Claims, 11 Drawing Sheets
`
`--
`
`A create file
`
`402
`receive new object
`arcassess size
`
`-
`- 404
`existing reference to
`utdated copy
`-- no
`
`408
`read pointers to
`determine locations
`
`410
`compute free disk space
`
`4.08
`es -> remova reference
`delete function)
`
`—
`
`-
`412
`- yes
`< free disk spaces new
`object size
`---
`
`?o
`
`414
`write Inode
`
`:
`
`416
`move write pointer
`
`420
`determine size of
`object indicated by
`flush pointer - Y -
`48
`update directory
`
`419
`create handle
`
`422
`delete object
`from directory
`
`424
`-- move flush pointer
`to next object
`
`-
`
`Z -
`
`(42sy
`end
`X-
`
`M
`
`-1-
`
`Amazon v. Audio Pod
`US Patent 9,954,922
`Amazon EX-1037
`
`
`
`U.S. Patent
`U.S. Patent
`
`Sep. 7, 1999
`Sep. 7, 1999
`
`Sheet 1 of 11
`Sheet 1 of 11
`
`5,950,205
`5,950,205
`
`901
`
`
`—c05
`
`
`
`
`
`aa
`
`vOlL
`
`
`
`WWO}e}qUOVeUSEq
`
`
`
`
`
`
`
`0000It
`
`
`
`—]Lanne
`
`||||||||||||||
`
`
`
`aulbugBuryoey
`
`
`
`
`
`chl
`
`
`
`-2-
`
`
`
`euibuyBuiyoe9
`
`
`
`
`
`
` so]
`Ja)nNoY_|Le
`
`
`
`
`
`
`
`
`
`
`
`-2-
`
`
`
`
`
`
`
`
`
`U.S. Patent
`U.S. Patent
`
`Sep. 7, 1999
`Sep. 7, 1999
`
`Sheet 2 of 11
`Sheet 2 of 11
`
`5,950,205
`5,950,205
`
`«200
`
`
`
`
`
`
`
`S
`
`
`
`202 DBR 204
`
`3.
`206
`
`
`
`
`
`5.
`Fig.
`
`S.
`
`
`
`'
`
`1
`
`ie)
`
`-3-
`
`
`
`U.S. Patent
`
`Sep. 7, 1999
`
`Sheet 3 of 11
`
`5,950,205
`
`(
`
`(
`
`has
`
`304
`
`s
`
`306
`
`so-
`
`306
`
`300
`Ya
`
`Fig. 3a
`
`300
`Ya
`
`304
`
`310
`
`Fig. 3b
`
`-
`
`314
`
`Fig. 3c
`
`300
`
`Ya
`
`300
`Ya
`
`302 -
`
`/
`
`310
`
`306
`
`316
`
`Fig. 3d
`
`-4-
`
`
`
`U.S. Patent
`
`Sep. 7, 1999
`
`Sheet 4 of 11
`
`5,950,205
`
`300
`Ya
`
`304
`
`302-1 i
`
`300
`Y
`
`se
`
`300
`Y
`
`304
`
`K
`
`300
`Y
`
`304
`
`.
`
`f
`
`316
`
`306
`
`Fig. 3e
`
`304
`
`306
`
`/
`
`318
`
`Fig. 3f
`
`306
`
`Fig. 3g
`
`f
`
`3.18
`
`Fig. 3h
`
`306
`
`(
`
`(
`
`s
`
`-5-
`
`
`
`U.S. Patent
`
`Sep. 7, 1999
`
`Sheet S of 11
`
`5,950,205
`
`400
`Create file
`
`402
`receive new object
`and assess size
`
`
`
`
`
`
`
`
`
`404
`existing reference to
`gutdated copy?
`
`remove reference
`(delete function)
`
`408
`read pointers to
`determine locations
`
`KH
`
`
`
`
`
`410
`Compute free disk space
`
`
`
`
`
`412
`free disk space > new
`object size?
`
`414
`Write node
`
`
`
`420
`determine size of
`object indicated by
`flush pointer
`
`422
`delete object
`from directory
`
`416
`move write pointer
`
`418
`update directory
`
`419
`Create handle
`
`424
`move flush pointer
`to next object
`
`426
`end
`
`-6-
`
`
`
`U.S. Patent
`
`Sep. 7, 1999
`
`Sheet 6 of 11
`
`5,950,205
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`500
`Write file
`
`
`
`
`
`502
`look up handle
`
`504
`Perform security check
`
`506
`proceed?
`
`508
`Copy object data to
`disk
`
`510
`update handle
`Write offset
`
`
`
`
`
`512
`update node
`Write offset
`
`- Y -
`514
`update handle CRC
`
`516
`write complete?
`
`O
`
`-7-
`
`
`
`U.S. Patent
`
`Sep. 7, 1999
`
`Sheet 7 of 11
`
`5,950,205
`
`
`
`
`
`
`
`
`
`
`
`
`
`600
`read file
`
`602
`look up handle
`
`604
`Perform security check
`
`606
`proceed?
`
`608
`copy data from disk
`
`610
`update handle
`read offset
`
`612
`update handle CRC
`
`614
`read complete?
`
`-8-
`
`
`
`U.S. Patent
`
`Sep. 7, 1999
`
`Sheet 8 of 11
`
`5,950,205
`
`700
`open file
`
`
`
`702
`look up file
`name in directory
`
`704
`node in volatile
`memory?
`
`yes
`
`7O6
`read node from disk
`
`708
`Create new
`handle for node
`
`
`
`710
`increment node
`reference Count
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`712
`return handle
`
`-9-
`
`
`
`U.S. Patent
`
`Sep. 7, 1999
`
`Sheet 9 of 11
`
`5,950,205
`
`800
`close file
`
`802
`look up handle
`
`804
`perform security check
`
`F i 9 8
`
`Write
`
`806
`Write or read?
`
`read
`
`808
`update Inode CRC
`
`
`
`814
`compare handle
`CRC to node CRC
`
`
`
`
`
`
`
`810
`decrement node
`reference Count
`
`yes
`
`816
`CRCs same?
`
`d
`
`818
`return error Code
`
`812
`release handle
`
`820
`delete file
`
`
`
`822
`end
`
`-10-
`
`
`
`U.S. Patent
`
`Sep. 7, 1999
`
`Sheet 10 of 11
`
`5,950,205
`
`900
`
`902
`look up file
`name in directory
`
`
`
`904
`node in Volatile
`memory?
`
`yes
`
`O
`
`906
`read node from disk
`
`
`
`912
`release handles
`
`Fig. 9
`
`
`
`
`
`
`
`
`
`908
`wipe out
`node name field
`
`910
`Write node
`back to disk
`
`914
`end
`
`-11-
`
`
`
`U.S. Patent
`
`Sep. 7, 1999
`
`Sheet 11 of 11
`
`5,950,205
`
`Fig. 10
`
`
`
`
`
`
`
`
`
`
`
`
`
`1000
`stat function
`
`1002
`look up file
`name in directory
`
`1004
`entry found?
`
`O
`
`yes
`
`1008
`read node from disk
`
`1010
`compare file name to
`node name
`
`1012
`do names match?
`
`O
`
`1006
`return code indicating
`file does not exist
`
`yes
`
`1014
`return code indicating
`file exists
`
`1016
`end
`
`-12-
`
`
`
`1
`DATA TRANSMISSION OVER THE
`INTERNET USINGA CACHE MEMORY FILE
`SYSTEM
`BACKGROUND OF THE INVENTION
`The present invention relates to transmission of data in a
`network environment. More specifically, the present inven
`tion relates to methods and apparatus for improving the
`efficiency with which data are transmitted over the Internet.
`Even more specifically, the invention relates to a file System
`designed with the Special requirements of a cache memory
`in mind.
`Currently, most Internet service providers (ISPs) employ
`various techniqueS Such as, for example, caching proxy
`servers, to accelerate access to the World Wide Web. Having
`evolved from early firewall models, caching proxy servers
`Service data requests intended for the Web using data Stored
`locally from previous Similar requests. For historical
`reasons, the file System employed by Such a proxy server is
`a Unix-based general purpose file System. Unfortunately,
`while Unix-based file systems are extremely versatile and
`useful in the applications for which they were designed, they
`are not optimized for the caching function and, in fact,
`represent the primary traffic bottleneck in caching proxy
`SCWCS.
`The basic requirements of a Unix-based file System are
`that its files are persistent and mutable. That is, once a file
`is created, the user can reasonably anticipate that the file will
`remain there forever, i.e., persistence, or at least until he
`explicitly deletes it. Moreover, a file, once created, may be
`changed at any time, i.e., mutability. By contrast, the basic
`requirements of a caching file System are just the opposite.
`That is, because there is a finite amount of memory in which
`the most recently and frequently requested objects must be
`Stored, older files become obsolete. Thus, files in a caching
`file system should be transient. Moreover, objects stored in
`a caching file System are downloaded, read a number of
`times before being deleted or overwritten (many times if the
`cache is operating efficiently), but never modified. Thus,
`files in a caching file System should be immutable.
`The Unix file System was designed with the expectation
`that a relatively small number of relatively large files would
`be stored within its multilevel hierarchy. This is in contrast
`to millions of Small files, e.g., multimedia objects, typically
`Stored in a caching proxy server. The Small number of files
`anticipated meant, in turn, that a correlatively Small number
`of “creates' were expected. Because only a Small amount of
`time overhead was anticipated to be associated with the
`“create file” function, it was not optimized for speed. By
`contrast, caching proxy Servers create files at a rate well
`beyond that anticipated for a Unix-based System, e.g., 1000
`creates/Second. Therefore, the create function must be fast.
`The multilevel directory structure itself has a rather large
`overhead, consuming about 20% of available storage. This
`represents an inefficiency in a caching file System which, by
`its very nature, needs to make use of as much of its memory
`as possible for the Storage of cache objects. In addition,
`because of the multilevel nature of the directory Structure,
`Significant time is required to traverse the directory levels
`when searching for files. This overhead becomes problem
`atic given the number of transactions per unit time typical
`for a caching proxy server. The Overhead is also problematic
`in that the Unix-based file system is scaleable only to about
`20-25 Gigabytes at which point it becomes unacceptably
`unwieldy and slow. This is unfortunate because current
`Internet traffic is already pushing this limit and it is obvi
`ously desirable to increase cache capacity as traffic
`increases.
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`5,950,205
`
`2
`Finally, an extremely important feature of a caching file
`System, i.e., garbage collection, is not typically provided in
`a Unix-based file System. Rather, in Such a file System it is
`assumed that the user will delete files and optimize the disk
`as needed. Therefore, if a Unix-based file system is to be
`used as a caching file System, a Separate, proceSS must be
`designed with the capability of repeatedly Searching through
`the complex directory structure and identifying and flushing
`obsolete files. This garbage collection proceSS must not
`begin flushing objects too Soon or the disk space will be
`underutilized and the cache hit rate correspondingly low
`ered. On the other hand, the process must not wait too long
`to begin garbage collection because the performance of the
`file system bogs down as the disk becomes full. Thus, not
`only must a separate proceSS be designed to perform the
`necessary garbage collection function, it must be an
`eXtremely Sophisticated process, therefore Significantly
`increasing the already large overhead associated with the file
`System.
`The Unix-based file system currently employed by ISP
`proxy servers incorporates file characteristics (persistence
`and mutability) which are opposed to the requirements of a
`caching file System. It is based on a multilevel directory
`Structure which is problematic for the caching function in a
`number of respects. Finally, because it does not provide
`garbage collection, a separate, highly complex process must
`be designed. It is therefore apparent that a file System is
`desirable which is more in line with the requirements of the
`caching function.
`SUMMARY OF THE INVENTION
`According to the present invention a file System is pro
`vided which is designed specifically with regard to the
`requirements of a caching file System. The file System of the
`present invention essentially organizes the memory of a
`caching Server as a large first-in-first-out (FIFO) memory.
`Files are referenced in a single level directory Structure and
`are identified by a file name and a memory location offset.
`These files are immutable and transient and the directory
`itself may be named arbitrarily by the user. As is desirable
`for a caching file System, garbage collection is inherent in
`the System thereby eliminating the need for creation of a
`Separate collection process. According to a Specific
`embodiment, a flush pointer is positioned in the cache
`memory just ahead of a write pointer. In general, the free
`disk Space, i.e., the memory available for new file creation,
`is just the distance between the write and flush pointers. The
`Situations in which this is not the case will be discussed
`below. When a new file is to be created, the flush pointer is
`moved past currently Stored files one at a time until there is
`enough space behind the flush pointer (and ahead of the
`write pointer) for the new file. The write pointer is then
`moved enough to accommodate the new file and the new file
`data is written over the old files. The write pointer follows
`the flush pointer in this manner through the cache memory,
`eventually wrapping around the disk and beginning the
`process again. Because, in general, the flush pointer is only
`barely ahead of the write pointer, the cache memory is
`always substantially full. This is a desirable condition for
`efficient operation of the caching function because a full
`cache memory increases the probability of cache hits.
`Thus, the present invention provides a method for Storing
`a plurality of objects in a cache memory. First ones of the
`objects are written into the cache memory Sequentially from
`the beginning of the cache memory in the order in which
`they are received. When a first memory amount from a most
`recently stored one of the first objects to the end of the cache
`
`-13-
`
`
`
`3
`memory is insufficient to accommodate a new object, the
`new object is written from the beginning of the cache
`memory, thereby writing over a previously stored one of the
`first objects. Second ones of the objects are then written into
`the cache memory Sequentially following the new object in
`the order in which they are received, thereby writing over
`the first ones of the objects. This cycle is repeated, thereby
`maintaining a Substantially full cache memory.
`In a specific embodiment, the invention provides a
`method for Storing a first object in a cache memory using a
`write pointer to indicate a most recently allocated memory
`portion and a flush pointer to indicate a most recently flushed
`object. The size of the first object is first determined. Then,
`a first memory amount available for writing the first object
`is determined with reference to the write and flush pointers.
`Where the first memory amount is insufficient to accommo
`date the first object, the flush pointer is moved to the end of
`a next flushed object immediately following the most
`recently flushed object, the next flushed object then becom
`ing the most recently flushed object. The first memory
`amount is determined and the flush pointer is moved in this
`manner until the first memory amount is Sufficient to accom
`modate the first object. Where the first memory amount is
`Sufficient to accommodate the first object, the write pointer
`is moved to allocate a Second memory amount correspond
`ing to a first portion of the cache memory, the Second
`memory amount being Smaller than the first memory amount
`and also being Sufficient to accommodate the first object.
`The first object is then written in the first portion of the cache
`memory.
`A further understanding of the nature and advantages of
`the present invention may be realized by reference to the
`remaining portions of the specification and the drawings.
`
`15
`
`25
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`FIG. 1 is a diagram of a hardware environment in which
`a specific embodiment of the invention may be employed;
`FIG. 2 is a diagram of a caching engine for use with a
`Specific embodiment of the present invention;
`FIGS. 3a–3h illustrate operation of a caching file system
`according to a specific embodiment of the present invention;
`FIG. 4 is a flowchart illustrating the creation of a file
`according to a specific embodiment of the invention;
`a flowchart illustrating the writing of a file
`FIG. 5 is
`according to
`a specific embodiment of the invention;
`a flowchart illustrating the reading of a file
`FIG. 6 is
`according to
`a specific embodiment of the invention;
`a flowchart illustrating the opening of a file
`FIG. 7 is
`according to
`a specific embodiment of the invention;
`FIG. 8 is a flowchart illustrating the closing of a file
`according to a specific embodiment of the invention;
`FIG. 9 is a flowchart illustrating the deleting of a file
`according to a specific embodiment of the invention; and
`FIG. 10 is a flowchart illustrating a stat function according
`to a specific embodiment of the invention.
`DESCRIPTION OF THE PREFERRED
`EMBODIMENT
`FIG. 1 shows an example of a hardware environment in
`which the present invention may be implemented. A client
`platform 100 is connected to router 102 which is connected
`via network 104 to destination platform 106. It will be
`assumed for the purposes of this discussion that client
`platform 100 is a single personal computer, that router 102
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`5,950,205
`
`4
`connects platform 100 to the Internet, i.e., network 104, and
`that destination platform 106 is a server on the World Wide
`Web. It should be noted, however, that a variety of configu
`rations similar to this simple model may be employed
`without departing from the Scope of the invention. For
`example, client platform 100 could be personal computer or
`WorkStation which is part of a local or wide area network.
`Router 102 could be an internal router in Such a network
`(e.g., an intranet connection to an internal web page), the
`network's general gateway to the Internet, a direct connec
`tion to destination platform 106, or some intermediate
`platform between the network and destination platform 106.
`The connection between router 102 and client platform 100
`could include several intervening routers. Network 104
`could represent a local or wide area network which includes
`client platform 100 and router 102, or the Internet. Desti
`nation platform 106 could be part of the local or wide area
`network, or a remote Server on the Internet. Caching engines
`108 and 110 are connected to router 102. Additional router
`112 is connected to router 102 and has an additional caching
`engine 114 connected thereto.
`Initially, client platform 100 transmits a request to retrieve
`data Such as, for example, a multimedia object from desti
`nation platform 106. Cache-enable router 102 receives the
`request in the form of at least one data packet. Router 102
`reads the packet header to determine whether, for example,
`it is a TCP packet and indicates port 80 as its destination
`port. If the packet is of a different protocol or is not destined
`for the WorldWideWeb, the packet is simply passed through
`the router and routed according to Standard Internet proto
`cols.
`If, on the other hand, the packet is TCP and port 80 is
`specified, router 102 determines to which of its associated
`caching engines (108 and 110) it will redirect the packet
`based on the destination IP address Specified in the packet.
`Before Sending the packet to one of its associated caching
`engines, router 102 encapsulates the packet for transmission
`to the selected caching engine by adding another TCP/IP
`header which designates the router as the Source of the
`packet and the caching engine as the destination. That is, the
`router encapsulates the packet for transmission to a caching
`engine which might be Several “hops' away. So, for
`example, router 102 might encapsulate the packet for trans
`mission to caching engine 114 which is connected to router
`102 via router 112. Thus, not only may multiple caching
`engines be associated with a particular router, but multiple
`routerS may be Supported by an individual caching engine or
`a group of caching engines. This allows a tremendous
`amount of flexibility in where the caching engine and router
`need to be in relation to each other.
`Router 102 opens a TCP connection between the client
`and the Selected caching engine and transmits the encapsu
`lated packet to the caching engine. The caching engine
`determines if it has the requested object Stored locally by
`comparing the packet URL to its directory. If the object is
`not in the cache, the caching engine makes its own request
`for the object (using its own address as the Source IP
`address) to destination platform 106 via router 102. That is,
`router 102 establishes a TCP connection between the cach
`ing engine and destination platform 106. The router Sees that
`the new request is from the caching engine (by looking at the
`Source address) and thereby knows not to redirect the packet
`to the caching engine. This request and the Subsequent
`retrieval of the object from destination platform 106 is done
`according to standard TCP/IP protocols. The retrieved object
`is then placed in the memory of the caching engine and
`transmitted to client platform 100. If, on the other hand, the
`
`-14-
`
`
`
`S
`object is determined to be locally Stored in the caching
`engine, it is transmitted to client platform 100.
`It will be understood that the above-described caching
`System may be implemented using any of a variety of file
`Systems for the caching engines. However, as discussed
`above, a file System Specifically designed with regard to the
`requirements of Such a caching engine will result in greater
`overall system efficiency. Such a file system will now be
`described with reference to FIGS. 2-10.
`FIG. 2 is a diagram of a caching engine 200 for use with
`a Specific embodiment of the file System of the present
`invention. Caching engine 200 includes three SCSI drives
`202, 204 and 206. According to a specific embodiment, the
`shaded portions of the drives correspond to the memory
`actually being used for the caching function, i.e., for the
`Storing of cache data. This allows for the other portions of
`the drives to be used for other purposes. Typically, however,
`all of each of the drives would be used for the storage of
`cache objects. Upon initialization of the file System a disk
`drive table is created which lets the file system know the
`number of drives, the size of the drives, and the locations of
`the data. The disk drive table specifies each drive name, the
`Starting location of the data on the drive, and the length of
`the data.
`According to a Specific embodiment, the directory Struc
`ture of the present invention for each of the disk drives is
`Stored in the caching Server's volatile memory, e.g., the
`server's RAM. However, because the file system of the
`present invention is intended to Store potentially tens of
`millions of objects it is untenable to store the full names of
`all of these objects in the volatile memory because the
`directory Structure would quickly consume all of the volatile
`memory. Instead, an MD5 hash is kept in volatile memory
`which correlates the object names with memory offsets.
`Certain advantages are realized as a result of the fact that the
`directory Structure is Stored in volatile memory. For
`example, the file System is leSS Vulnerable to corruption from
`hardware failures because, as discussed below, the directory,
`i.e., the MD5 hash, may easily be rebuilt from information
`in the disk drives when the System reboots. Also, because the
`directory is in volatile memory rather than in the disk drive,
`access to the file System is extremely fast.
`The offset points to an Inode at the beginning of an
`associated object on the disk drive. The term “Inode” (or
`indeX node) as used herein, describes a fixed size data field
`at the beginning of each object in the cache which is
`analogous to a similarly named feature in the Unix envi
`ronment. The Inode of the present invention is an abstraction
`of the data in the associated file which provides Specific
`information about that file including, for example, the file
`name, its memory offset, and the current write pointer
`location. The Inode has a fixed length because it Stores only
`the first 256 bytes of the file name. And while the full file
`name is not provided in the Inode, the odds are extremely
`low that a false positive match between the requested file
`name and the 256-byte name field in the Inode will occur.
`When a directory look-up is performed for a particular file
`name and the name in the referenced Inode does not match
`the file name, the file is assumed not to be in the cache. Each
`directory entry is very compact, e.g., a 120-bit hash and a
`31-bit offset value. The specific size of the directory entry
`depends, of course, on the hash algorithm being used. The
`directory look-up employs a binary Search which is an
`efficient Search algorithm. Moreover, the binary Search is
`performed primarily in volatile memory (except for retriev
`ing the Inode). Therefore, the directory look-up is extremely
`fast.
`
`1O
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`5,950,205
`
`6
`When an object is written to the disk, the amount of space
`is determined and then the Inode information is filled in.
`This information includes a variety of meta-information
`about the object including the first 256 bytes of the object
`name, the creation date, and the Size of the object. The Inode
`reference is inserted into the directory and the object is
`Subsequently written to the disk with a sequential write as
`the data comes in. File creation is protected by a muteX until
`the entry is inserted in the directory to avoid the possibility
`of Some other transaction thread coming in and trying to
`create the same file. However, once the directory entry has
`been inserted and the object data has begun to be written,
`any Subsequent attempt to create will realize the file has
`already been created and perform a simple read instead. If
`the data is still being written, the read will come in right
`behind and read the data as it is written.
`In a more specific embodiment, the disk drive(s) are
`partitioned into Several partitions each having an MD5 hash
`asSociated with it. That is, each partition has its own
`directory structure, disk pointers, etc. This is useful where
`the total disk capacity is so large that one MD5 hash would
`be too large to handle efficiently. In Such an embodiment, a
`32-bit CRC hash correlating a file name to a particular
`partition may be used to identify in which MD5 hash to
`perform a directory look-up for the file name. In a still more
`Specific embodiment, each partition is further divided into
`Smaller partitions and the offset associated with the file name
`in the MD5 hash also indicates in which of the Smaller
`partitions the file can be found. Alternatively, each of the
`smaller partitions could have a smaller MD5 hash associated
`with it which is referenced by the larger MD5 hash associ
`ated with the larger partition. Not only are Such Smaller
`hashes more manageable, if one partition gets corrupted, the
`rest of the file system is isolated from the corruption. That
`is, only a relatively Small portion of the total cache capacity
`needs to be rebuilt.
`It will be understood that, notwithstanding the references
`to Specific types of hashes, a wide variety of hash algorithms
`may be employed to implement various aspects of the
`present invention. For example, the following hash algo
`rithms may also be employed: 8-bit, 16-bit, and 32-bit
`CRCs; MD4; MD5; the secure hash standard (SHS) being
`developed by the U.S. Department of Commerce; etc. Thus,
`a wide variety of hash algorithms may be employed as long
`as the requisite functionality is Supported.
`According to another specific embodiment, the directory
`Structure of the present invention is Stored in a separate IDE
`drive 208. The single level directory structure is essentially
`a table of offsets and identifiers, e.g., a B-tree, identifying
`the names and locations of data objects stored in SCSI drives
`202,204 and 206. Although the IDE drive is slower than the
`SCSI drives, it is less costly. In addition, given the small
`amount of information transferred, the penalty for access to
`the directory Structure can be tolerated.
`FIGS. 3a–3h illustrate operation of a caching file system
`according to a specific embodiment of the present invention.
`This embodiment is described with reference to a single
`memory drive 300. However, it will be understood that the
`following description may easily be generalized to include
`the combined SCSI drives of FIG. 2, or a variety of other
`memory configurations. That is, drive 300 is shown as a
`single entity for illustrative purposes only. Drive 300 has a
`disk pointer file 302 stored at the beginning of the drive
`which includes a minimum location and a maximum loca
`tion between which data are stored, a checksum (to check for
`data corruption), a write pointer location (represented by
`arrow 304), and a flush pointer location (represented by
`
`-15-
`
`
`
`7
`arrow 306). Initially, i.e., when nothing has been written to
`drive 300, write pointer 304 and flush pointer 306 are at the
`Same location. AS will become clear, this is the only time at
`which both pointers are at the same location.
`When the first object is written to drive 300, i.e., file 310,
`write pointer 304 is moved to the end of file 310 at which
`location a new file may be written (FIG. 3b). Write pointer
`304 is moved to the end of each newly created file in this
`manner until the drive is substantially full and write pointer
`304 nears the end 312 of drive 300 as shown in FIG. 3c. If,
`at this point, a new file is to be created which is larger than
`the amount of memory between the current location of write
`pointer 304 (at the end of file 314) and the end of the drive,
`write pointer 304 is wrapped around to the beginning of
`drive 300 as shown in FIG. 3d. This is done because the
`Sequential reading of objectS is extremely efficient and
`becomes less So when files are fragmented. Moreover, the
`drive is So large that the number of times this occurs as
`compared to the number of creates is very Small. Finally,
`unless you are very unlucky, the amount of memory left
`open at the end of the disk will typically be very small. It is
`important to note that, before write pointer 304 is wrapped
`around in the manner described above, flush pointer 306 is
`moved to the end of the first file in the drive, i.e., file 310,
`so that it remains ahead of write pointer 304 for future write
`operations (see FIG. 3d). Thus, once drive 300 is full, flush
`pointer 306 remains just ahead of write pointer 304 for the
`remainder of the operation of the file System.
`If the new file to be created will not fit between write
`pointer