`Leighton et al.
`
`54
`(75)
`
`GLOBAL, HOSTING SYSTEM
`
`Inventors: F. Thomson Leighton, Newtonville;
`Daniel M. Lewin, Cambridge, both of
`Mass.
`
`Assignee: Massachusetts Institute of
`Technology, Cambridge, Mass.
`
`Appl. No.: 09/314,863
`Filed:
`May 19, 1999
`Related U.S. Application Data
`Provisional application No. 60/092,710, Jul. 14, 1998.
`Int. Cl. ............................................... G06F 13/00
`U.S. Cl. .......................... 709/226; 709/105; 709/219;
`709/223; 709/224; 709/235
`Field of Search .................................. 707/10, 2, 104,
`707/203, 500, 501, 511, 512, 513,515;
`709/200, 201, 203, 218, 219, 230, 235,
`238, 245, 246, 226, 224, 105, 220; 711/118,
`119, 120, 122, 126, 130, 200, 202, 216
`
`56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`4,922,417 5/1990 Churm et al. ............................... 707/1
`5,287.499 2/1994
`... 707/2
`5,341,477 8/1994 Pitkin et al. ............................ 709/226
`5,542,087 7/1996 Neimat et al. ............................ 707/10
`5,638,443
`6/1997 Stefik et al. ......
`... 705/54
`5,646,676 7/1997 Dewkett et al. ............................ 348/7
`5,715,453 2/1998 Stewart .............
`... 707/104
`5,740,423 4/1998 Logan et al. .............................. 707/10
`5,751,961
`5/1998 Smyk ........
`... 709/217
`5,761,507 12/1999 Govett ...........
`... 395/684
`5,774,660 6/1998 Brendel et al.
`... 709/201
`5,777,989 7/1998 McGarvey ........
`... 370/254
`5,802,291
`9/1998 Balick et al. .
`... 709/202
`5,832,506 11/1998 Kuzma ..............
`... 707/200
`5,856,974
`1/1999 Gervais et al. ...
`... 370/392
`5,870,559 2/1999 Leshem et al. ...
`... 709/224
`5,878.212 3/1999 Civanlar et al. ..
`... 709/203
`5,884,038 3/1999 Kapoor ......
`... 709/226
`5,903,723 5/1999 Becket al. ...
`... 709/200
`5,919,247 12/1999 Van Hoff et al. ....................... 709/217
`
`USOO61087.03A
`Patent Number:
`11
`(45) Date of Patent:
`
`6,108,703
`Aug. 22, 2000
`
`5,933,832 8/1999 Suzuoka et al. ........................ 707/101
`5.945,989 8/1999 Freishtat et al.
`345/329
`5.956,716 9/1999 Kenner et al. ..
`... 707/10
`5,961,596 10/1999 Takubo et al. .......................... 709/224
`5.991,809 11/1999 Kriegsman .............................. 709/226
`6,003,030 12/1999 Kenner et al. ..
`... 707/10
`6,006,264 12/1999 Colby et al. ............................ 709/226
`FOREIGN PATENT DOCUMENTS
`
`
`
`22O2572 10/1998 Canada.
`865180A2 9/1998 European Pat. Off..
`9804985 2/1998 WIPO.
`
`OTHER PUBLICATIONS
`Shaw, David M. “A Low Latency, High Throughput Web
`Service Using Internet-wide Replication.” Department of
`Computer Science, Johns Hopkins University, Aug. 1998, 33
`pgS.
`
`(List continued on next page.)
`Primary Examiner Dung C. Dinh
`ASSistant Examiner Abdullahi E. Salad
`Attorney, Agent, or Firm-David H. Judson
`57
`ABSTRACT
`The present invention is a network architecture or frame
`work that Supports hosting and content distribution on a
`truly global scale. The inventive framework allows a Con
`tent Provider to replicate and Serve its most popular content
`at an unlimited number of points throughout the World. The
`inventive framework comprises a set of ServerS operating in
`a distributed manner. The actual content to be served is
`preferably Supported on a set of hosting Servers (Sometimes
`referred to as ghost servers). This content comprises HTML
`page objects that, conventionally, are served from a Content
`Provider site. In accordance with the invention, however, a
`base HTML document portion of a Web page is served from
`the Content Provider's site while one or more embedded
`objects for the page are served from the hosting Servers,
`preferably, those hosting Servers near the client machine. By
`serving the base HTML document from the Content Pro
`vider's site, the Content Provider maintains control over the
`COntent.
`
`34 Claims, 2 Drawing Sheets
`
`
`
`CLIENT
`
`CONTENT
`PROVIDER
`SITE
`
`E
`
`-1-
`
`Amazon v. Audio Pod
`US Patent 10,805,111
`Amazon EX-1032
`
`
`
`6,108,703
`Page 2
`
`OTHER PUBLICATIONS
`Amir, Yair, et al. “Seamlessly Selecting the Best Copy from
`Internet-Wide Replicated Web Servers.” Department of
`Computer Science, Johns Hopkins University, Jun. 1998, 14
`pgS.
`Bestavros, Azer. “Speculative Data Dissemination and Ser
`vice to Reduce Server Load, Network Traffic and Service
`Time in Distributed Information Systems.” In Proceedings
`of ICDE '96. The 1996 International Conference on Data
`Engineering, Mar. 1996, 4pgs.
`Carter, J. Lawrence, et al. “Universal Classes of Hash
`Function.” Journal of Computer and System Sciences, vol.
`18, No. 2, Apr. 1979, pp. 143–154.
`Chankhunthod, Anawat, et al. “A Hierarchical Internet
`Object Cache.” In Usenix Proceedings, Jan. 1996, pgs.
`153-163.
`Cormen, Thomas H., et al. Introduction to Algorithms, The
`MIT Press, Cambrdige, Massachusetts, 1994, pgs. 219–243,
`991-993.
`Deering, Stephen, et al. “Multicast Routing in Datagram
`Internetworks and Extended LANs.” ACM Transactions On
`Computer Systems, vol. 8, No. 2, May 1990, pgs. 85-110.
`Devine, Robert. “Design and Implementation of DDH: A
`Distributed Dynamic Hashing Algorithm.” In Proceedings
`of 4" International Conference on Foundations of Data
`Organizations and Algorithms, 1993, pgs. 101-114.
`Grigni, Michelangelo, et al. “Tight Bounds on Minimum
`Broadcasts Networks.” SIAM Journal of Discrete Math
`ematics, vol. 4, No. 2, May 1991, pgs. 207-222.
`Gwertzman, James, et al. “The Case for Geographical Push
`Caching.” Technical Report HU TR 34–94(excerpt), Har
`vard University, DAS, Cambridge, MA 02138, 1994, 2 pgs.
`Gwertzman, James, et al. “World-Wide Web Cache Consis
`tency.” In Proceedings of the 1996 USENIX Technical
`Conference, Jan. 1996, 8 pgs.
`Feeley, Michael, et al. “Implementing Global Memory Man
`agement in a Workstation Cluster.” In Proceedings of the
`15th ACM Symposium on Operating Systems Principles,
`1995, pgs. 201-212.
`Floyd, Sally, et al. “A Reliable Multicast Framework for
`Light-Weight Sessions and Application Level Framing.” In
`Proceeding of ACM SIGCOMM'95, pgs. 342–356.
`Fredman, Michael, et al. “Storing a Sparse Table with 0(1)
`Worst Case Access Time.” Journal of the Association for
`Computing Machinery, vol. 31., No. 3, Jul. 1984, pgs.
`538-544.
`Karger, David, et al. “Consistent Hashing and Random
`Trees: Distributed Caching Protocols for Relieving Hot
`Spots on the World Wide Web.” In Proceedings of the
`Twenty-Ninth Annual ACM Symposium On Theory of Com
`puting, May 1997, pgs. 654-663.
`Litwin, Withold, et al. “LH-A Scaleable, Distributed Data
`Structure.” ACM Transactions on Database Systems, vol.
`21, No. 4, Dec. 1996, pgs. 480-525.
`
`Malpani, Radhika, et al. “Making World WideWeb Caching
`Servers Cooperate.” In Proceedings of World Wide Web
`Conference, 1996, 6 pgs.
`Naor, Moni, et al. “The Load, Capacity and Availability of
`Quorum Systems.” In Proceedings of the 35th IEEE Sym
`posium on Foundations of Computer Science, Nov. 1994,
`pgS. 214-225.
`Nisan, Noam. “Psuedorandom Generators for Space
`Bounded Computation.” In Proceedings of the Twenty
`Second Annual ACM Symposium On Theory of Computing,
`May 1990, pgs. 204–212.
`Palmer, Mark, et al. "Fido: A Cache that Learns to Fetch.”
`In Proceedings of the 17th International Conference on Very
`Large Data Bases, Sep. 1991, pgs. 255-264.
`Panigraphy, Rina. Relieving Hot Spots on the World Wide
`Web. Massachusetts Institute of Technology, Jun. 1997, pgs.
`1-66.
`Peleg, David, et al. “The Availability of Quorum Systems.”
`Information and Computation 123, 1995, 210-223.
`Plaxton, C. Greg, et al. “Fast Fault-Tolerant Concurrent
`Access to Shared Objects.” In Proceedings of 37th IEEE
`Symposium On Foundations of Computer Science, 1996, pgs.
`570-579.
`Rabin, Michael. “Efficient Dispersal of Information for
`Security, Load Balancing, and Fault Tolerance.” Journal of
`the ACM, vol. 36, No. 2, Apr. 1989, pgs. 335-348.
`Ravi, R., “Rapid Rumor Ramification: Approximating the
`Miniumum Broadcast Time.” In Proceedings of the 35th
`IEEE Symposium On Foundations of Computer Science,
`Nov. 1994, pgs. 202-213.
`Schmidt, Jeanette, et al. “Chernoff Hoeffding Bounds for
`Applications with Limited Independence.” In Proceedings
`of the 4th ACS SIAM Symposium on Discrete Algorithms,
`1993, pgs. 331-340.
`Tarjan, Robert Endre, et al. “Storing a Sparse Table.”
`Communications of the ACM, vol. 22, No. 11, Nov. 1979,
`pgs. 606–611.
`Vitter, Jeffrey Scott, et al. “Optimal Prefetching via Data
`Compression.” In Proceedings of the 32nd IEEE Symposium
`On Foundations of Computer Science, Nov. 1991, pgs.
`121-130.
`Wegman, Mark, et al. “New Hash Functions and Their Use
`in Authentication and Set Equality.” Journal of Computer
`and System Sciences vol. 22, Jun. 1981, pgs. 265-279.
`Yao, Andrew Chi-Chih. “Should Tables be Sorted?” Journal
`of the Association for Computing Machinery, vol. 28, No. 3,
`Jul. 1981, pgs. 615–628.
`Beavan, Colin “Web Life They're Watching You.” Esquire,
`Aug. 1997, pgs. 104-105.
`Beavan, Colin “Web Life They're Watching You.” Esquire,
`Aug. 1997, pp. 104-105.
`
`-2-
`
`
`
`U.S. Patent
`
`Aug. 22, 2000
`
`Sheet 1 of 2
`
`6,108,703
`
`CLIENT
`
`14
`
`SERVER
`
`
`
`BROWSER
`APPLICATION
`
`30
`
`F 28
`
`FIG. 1
`
`OS
`WEB SERVER
`
`AP
`
`22
`
`26
`
`FIC. 2
`
`30
`
`CONTENT
`PROVIDER
`SITE
`
`50
`30
`
`30
`
`
`
`
`
`CLENT
`
`-3-
`
`
`
`U.S. Patent
`
`Aug. 22, 2000
`
`Sheet 2 of 2
`
`6,108,703
`
`FIC. 4
`
`PREPEND VIRTUAL
`SERVER HOST NAME
`
`
`
`
`
`HASH WA
`
`-HASH VALUE INPUT
`54
`
`H 1 ASH VALUE INPUT
`
`PREPEND HASH
`VALUE AS FINGERPRINT
`
`56
`
`
`
`CONTENT
`PROVIDER
`
`
`
`"TOP LEVEL'
`DNS SERVER
`
`
`
`
`
`
`
`"LOW LEVEL'
`DNS SERVER
`
`-4-
`
`
`
`1
`GLOBAL, HOSTING SYSTEM
`
`6,108,703
`
`2
`mirroring requires Content Providers to waste economic and
`other resources on functions that are not relevant to their
`core business of creating content.
`Moreover, Content Providers also desire to retain control
`of their content. Today, Some ISPs are installing caching
`hardware that interrupts the link between the Content Pro
`vider and the end-user. The effect of Such caching can
`produce devastating results to the Content Provider, includ
`ing (1) preventing the Content Provider from obtaining
`accurate hit counts on its Web pages (thereby decreasing
`revenue from advertisers), (2) preventing the Content Pro
`vider from tailoring content and advertising to specific
`audiences (which severely limits the effectiveness of the
`Content Provider's Web page), and (3) providing outdated
`information to its customers (which can lead to a frustrated
`and angry end user).
`There remains a Significant need in the art to provide a
`decentralized hosting Solution that enables users to obtain
`Internet content on a more efficient basis (i.e., without
`burdening network resources unnecessarily) and that like
`wise enables the Content Provider to maintain control over
`its content.
`The present invention solves these and other problems
`asSociated with the prior art.
`BRIEF SUMMARY OF THE INVENTION
`It is a general object of the present invention to provide a
`computer network comprising a large number of widely
`deployed Internet Servers that form an organic, massively
`fault-tolerant infrastructure designed to serve Web content
`efficiently, effectively, and reliably to end users.
`Another more general object of the present invention is to
`provide a fundamentally new and better method to distribute
`Web-based content. The inventive architecture provides a
`method for intelligently routing and replicating content over
`a large network of distributed servers, preferably with no
`centralized control.
`Another object of the present invention is to provide a
`network architecture that moves content close to the user.
`The inventive architecture allows Web sites to develop large
`audiences without worrying about building a massive infra
`Structure to handle the associated traffic.
`Still another object of the present invention is to provide
`a fault-tolerant network for distributing Web content. The
`network architecture is used to Speed-up the delivery of
`richer Web pages, and it allows Content Providers with large
`audiences to Serve them reliably and economically, prefer
`ably from Servers located close to end users.
`A further feature of the present invention is the ability to
`distribute and manage content over a large network without
`disrupting the Content Provider's direct relationship with the
`end user.
`Yet another feature of the present invention is to provide
`a distributed Scalable infrastructure for the Internet that
`shifts the burden of Web content distribution from the
`Content Provider to a network of preferably hundreds of
`hosting Servers deployed, for example, on a global basis.
`In general, the present invention is a network architecture
`that Supports hosting on a truly global Scale. The inventive
`framework allows a Content Provider to replicate its most
`popular content at an unlimited number of points throughout
`the World. AS an additional feature, the actual content that is
`replicated at any one geographic location is specifically
`tailored to viewers in that location. Moreover, content is
`automatically Sent to the location where it is requested,
`without any effort or overhead on the part of a Content
`Provider.
`
`This application is based on Provisional Application No.
`60/092,710, filed Jul 14, 1998. This application includes
`Subject matter protected by copyright.
`BACKGROUND OF THE INVENTION
`1. Technical Field
`This invention relates generally to information retrieval in
`a computer network. More particularly, the invention relates
`to a novel method of hosting and distributing content on the
`Internet that addresses the problems of Internet Service
`Providers (ISPs) and Internet Content Providers.
`2. Description of the Related Art
`The World Wide Web is the Internet's multimedia infor
`mation retrieval system. In the Web environment, client
`machines effect transactions to Web Servers using the Hyper
`text Transfer Protocol (HTTP), which is a known application
`protocol providing users access to files (e.g., text, graphics,
`images, Sound, Video, etc.) using a standard page description
`language known as Hypertext Markup Language (HTML).
`HTML provides basic document formatting and allows the
`developer to specify “links” to other servers and files. In the
`Internet paradigm, a network path to a server is identified by
`a so-called Uniform Resource Locator (URL) having a
`Special Syntax for defining a network connection. Use of an
`HTML-compatible browser (e.g., Netscape Navigator or
`Microsoft Internet Explorer) at a client machine involves
`Specification of a link via the URL. In response, the client
`makes a request to the Server identified in the link and, in
`return, receives a document or other object formatted
`according to HTML. A collection of documents supported
`on a Web server is sometimes referred to as a Web site.
`It is well known in the prior art for a Web site to mirror
`its content at another Server. Indeed, at present, the only
`method for a Content Provider to place its content closer to
`its readers is to build copies of its Web site on machines that
`are located at Web hosting farms in different locations
`domestically and internationally. These copies of Web sites
`are known as mirror Sites. Unfortunately, mirror Sites place
`unnecessary economic and operational burdens on Content
`Providers, and they do not offer economies of scale.
`Economically, the overall cost to a Content Provider with
`one primary Site and one mirror Site is more than twice the
`cost of a single primary Site. This additional cost is the result
`of two factors: (1) the Content Provider must contract with
`a separate hosting facility for each mirror site, and (2) the
`Content Provider must incur additional overhead expenses
`asSociated with keeping the mirror Sites Synchronized.
`In an effort to address problems associated with mirroring,
`companies Such as Cisco, Resonate, Bright Tiger, F5 Labs
`and Alteon, are developing Software and hardware that will
`help keep mirror Sites Synchronized and load balanced.
`Although these mechanisms are helpful to the Content
`Provider, they fail to address the underlying problem of
`scalability. Even if a Content Provider is willing to incur the
`costs associated with mirroring, the technology itself will
`not scale beyond a few (i.e., less than 10) Web sites.
`In addition to these economic and Scalability issues,
`mirroring also entails operational difficulties. A Content
`Provider that uses a mirror Site must not only lease and
`manage physical Space in distant locations, but it must also
`buy and maintain the Software or hardware that Synchronizes
`and load balances the Sites. Current Solutions require Con
`tent Providers to Supply perSonnel, technology and other
`items necessary to maintain multiple Web Sites. In Summary,
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`-5-
`
`
`
`3
`It is thus a more general object of this invention to provide
`a global hosting framework to enable Content Providers to
`retain control of their content.
`The hosting framework of the present invention com
`prises a set of ServerS operating in a distributed manner. The
`actual content to be served is preferably Supported on a Set
`of hosting servers (sometimes referred to as ghost servers).
`This content comprises HTML page objects that,
`conventionally, are served from a Content Provider site. In
`accordance with the invention, however, a base HTML
`1O
`document portion of a Web page is served from the Content
`Provider's site while one or more embedded objects for the
`page are Served from the hosting Servers, preferably, those
`hosting Servers nearest the client machine. By Serving the
`base HTML document from the Content Provider's site, the
`Content Provider maintains control over the content.
`The determination of which hosting server to use to serve
`a given embedded object is effected by other resources in the
`hosting framework. In particular, the framework includes a
`Second set of Servers (or server resources) that are config
`ured to provide top level Domain Name Service (DNS). In
`addition, the framework also includes a third Set of Servers
`(or server resources) that are configured to provide low level
`DNS functionality. When a client machine issues an HTTP
`25
`request to the Web site for a given Web page, the base
`HMTL document is served from the Web site as previously
`noted. Embedded objects for the page preferably are Served
`from particular hosting Servers identified by the top- and
`low-level DNS servers. To locate the appropriate hosting
`servers to use, the top-level DNS server determines the
`user's location in the network to identify a given low-level
`DNS server to respond to the request for the embedded
`object. The top-level DNS server then redirects the request
`to the identified low-level DNS server that, in turn, resolves
`the request into an IP address for the given hosting Server
`that serves the object back to the client.
`More generally, it is possible (and, in Some cases,
`desirable) to have a hierarchy of DNS servers that consisting
`of several levels. The lower one moves in the hierarchy, the
`closer one gets to the best region.
`A further aspect of the invention is a means by which
`content can be distributed and replicated through a collec
`tion of Servers So that the use of memory is optimized
`Subject to the constraints that there are a Sufficient number
`of copies of any object to Satisfy the demand, the copies of
`objects are spread So that no Server becomes overloaded,
`copies tend to be located on the same Servers as time moves
`forward, and copies are located in regions close to the clients
`that are requesting them. Thus, ServerS operating within the
`framework do not keep copies of all of the content database.
`Rather, given Servers keep copies of a minimal amount of
`data So that the entire System provides the required level of
`Service. This aspect of the invention allows the hosting
`Scheme to be far more efficient than Schemes that cache
`everything everywhere, or that cache objects only in pre
`Specified locations.
`The global hosting framework is fault tolerant at each
`level of operation. In particular, the top level DNS server
`returns a list of low-level DNS servers that may be used by
`the client to service the request for the embedded object.
`Likewise, each hosting Server preferably includes a buddy
`Server that is used to assume the hosting responsibilities of
`its associated hosting Server in the event of a failure condi
`tion.
`According to the present invention, load balancing acroSS
`the Set of hosting Servers is achieved in part through a novel
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6,108,703
`
`15
`
`4
`technique for distributing the embedded object requests. In
`particular, each embedded object URL is preferably modi
`fied by prepending a virtual server hostname into the URL.
`More generally, the Virtual Server hostname is inserted into
`the URL. Preferably, the virtual server hostname includes a
`value (Sometimes referred to as a Serial number) generated
`by applying a given hash function to the URL or by encoding
`given information about the object into the value. This
`function serves to randomly distribute the embedded objects
`over a given set of Virtual Server hostnames. In addition, a
`given fingerprint value for the embedded object is generated
`by applying a given hash function to the embedded object
`itself. This given value Serves as a fingerprint that identifies
`whether the embedded object has been modified. Preferably,
`the functions used to generate the values (i.e., for the virtual
`Server hostname and the fingerprint) are applied to a given
`Web page in an off-line process. Thus, when an HTTP
`request for the page is received, the base HTML document
`is served by the Web site and some portion of the page's
`embedded objects are served from the hosting Servers near
`(although not necessarily the closest) to the client machine
`that initiated the request.
`The foregoing has outlined Some of the more pertinent
`objects and features of the present invention. These objects
`should be construed to be merely illustrative of some of the
`more prominent features and applications of the invention.
`Many other beneficial results can be attained by applying the
`disclosed invention in a different manner or modifying the
`invention as will be described. Accordingly, other objects
`and a fuller understanding of the invention may be had by
`referring to the following Detailed Description of the Pre
`ferred Embodiment.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`For a more complete understanding of the present inven
`tion and the advantages thereof, reference should be made to
`the following Detailed Description taken in connection with
`the accompanying drawings in which:
`FIG. 1 is a representative system in which the present
`invention is implemented;
`FIG. 2 is a simplified representation of a markup language
`document illustrating the base document and a set of embed
`ded objects,
`FIG. 3 is a high level diagram of a global hosting System
`according to the present invention;
`FIG. 4 is a simplified flowchart illustrating a method of
`processing a Web page to modified embedded object URLs
`that is used in the present invention;
`FIG. 5 is a simplified state diagram illustrating how the
`present invention responds to a HTTP request for a Web
`page.
`
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENT
`A known Internet client-Server System is implemented as
`illustrated in FIG. 1. A client machine 10 is connected to a
`Web server 12 via a network 14. For illustrative purposes,
`network 14 is the Internet, an intranet, an extranet or any
`other known network. Web server 12 is one of a plurality of
`Servers which are accessible by clients, one of which is
`illustrated by machine 10. A representative client machine
`includes a browser 16, which is a known Software tool used
`to access the servers of the network. The Web server
`supports files (collectively referred to as a “Web site) in the
`form of hypertext documents and objects. In the Internet
`
`-6-
`
`
`
`6,108,703
`
`15
`
`35
`
`40
`
`25
`
`S
`paradigm, a network path to a server is identified by a
`so-called Uniform Resource Locator (URL).
`A representative Web Server 12 is a computer comprising
`a processor 18, an operating System 20, and a Web Server
`program 22, Such as Netscape Enterprise Server. The Server
`12 also includes a display Supporting a graphical user
`interface (GUI) for management and administration, and an
`Application Programming Interface (API) that provides
`extensions to enable application developerS to extend and/or
`customize the core functionality thereof through Software
`programs including Common Gateway Interface (CGI)
`programs, plug-ins, Servlets, active Server pages, Server Side
`include (SSI) functions or the like.
`A representative Web client is a personal computer that is
`x86-, PowerPC(R)-or RISC-based, that includes an operating
`system such as IBM(R) OS/26) or Microsoft Windows 95,
`and that includes a Web browser, such as Netscape Navi
`gator 4.0 (or higher), having a Java Virtual Machine (JVM)
`and Support for application plug-ins or helper applications.
`A client may also be a notebook computer, a handheld
`computing device (e.g., a PDA), an Internet appliance, or
`any other Such device connectable to the computer network.
`AS Seen in FIG.2, a typical Web page comprises a markup
`language (e.g. HTML) master or base document 28, and
`many embedded objects (e.g., images, audio, Video, or the
`like) 30. Thus, in a typical page, twenty or more embedded
`images or objects are quite common. Each of these images
`is an independent object in the Web, retrieved (or validated
`for change) separately. The common behavior of a Web
`client, therefore, is to fetch the base HTML document, and
`then immediately fetch the embedded objects, which are
`typically (but not always) located on the same server.
`According to the present invention, preferably the markup
`language base document 28 is served from the Web server
`(i.e., the Content Provider site) whereas a given number (or
`perhaps all) of the embedded objects are served from other
`Servers. AS will be seen, preferably a given embedded object
`is served from a server (other than the Web server itself) that
`is close to the client machine, that is not overloaded, and that
`is most likely to already have a current version of the
`required file.
`Referring now to FIG. 3, this operation is achieved by the
`hosting System of the present invention. AS will be seen, the
`hosting System 35 comprises a set of widely-deployed
`Servers (or Server resources) that form a large, fault-tolerant
`infrastructure designed to serve Web content efficiently,
`effectively, and reliably to end users. The servers may be
`deployed globally, or acroSS any desired geographic regions.
`AS will be seen, the hosting System provides a distributed
`architecture for intelligently routing and replicating Such
`content. To this end, the global hosting System 35 comprises
`three (3) basic types of Servers (or server resources): hosting
`servers (sometimes called ghosts)36, top-level DNS servers
`38, and low-level DNS servers 40. Although not illustrated,
`there may be additional levels in the DNS hierarchy.
`55
`Alternatively, there may be a single DNS level that com
`bines the functionality of the top level and low-level servers.
`In this illustrative embodiment, the inventive framework 35
`is deployed by an Internet Service Provider (ISP), although
`this is not a limitation of the present invention. The ISP or
`ISPs that deploy the inventive global hosting framework 35
`preferably have a large number of machines that run both the
`ghost server component 36 and the low-level DNS compo
`nent 40 on their networks. These machines are distributed
`throughout the network, preferably, they are concentrated
`around network exchange points 42 and network acceSS
`points 44, although this is not a requirement. In addition, the
`
`45
`
`50
`
`60
`
`65
`
`6
`ISP preferably has a small number of machines running the
`top-level DNS 38 that may also be distributed throughout
`the network.
`Although not meant to be limiting, preferably a given
`Server used in the framework 35 includes a processor, an
`operating system (e.g., Linux, UNIX, Windows NT, or the
`like), a Web server application, and a set of application
`routines used by the invention. These routines are conve
`niently implemented in Software as a Set of instructions
`executed by the processor to perform various process or
`method steps as will be described in more detail below. The
`Servers are preferably located at the edges of the network
`(e.g., in points of presence, or POPs).
`Several factors may determine where the hosting Servers
`are placed in the network. Thus, for example, the Server
`locations are preferably determined by a demand driven
`network map that allows the provider (e.g., the ISP) to
`monitor traffic requests. By studying traffic patterns, the ISP
`may optimize the Server locations for the given traffic
`profiles.
`According to the present invention, a given Web page
`(comprising a base HTML document and a set of embedded
`objects) is served in a distributed manner. Thus, preferably,
`the base HTML document is served from the Content
`Provider that normally hosts the page. The embedded
`objects, or Some Subset thereof, are preferentially Served
`from the hosting Servers 36 and, Specifically, given hosting
`servers 36 that are near the client machine that in the first
`instance initiated the request for the Web page. In addition,
`preferably loads across the hosting Servers are balanced to
`ensure that a given embedded object may be efficiently
`Served from a given hosting Server near the client when Such
`client requires that object to complete the page.
`To Serve the page contents in this manner, the URL
`asSociated with an embedded object is modified. AS is
`well-known, each embedded object that may be served in a
`page has its own URL. Typically, the URL has a hostname
`identifying the Content Provider's site from where the object
`is conventionally Served, i.e., without reference to the
`present invention. According to the invention, the embedded
`object URL is first modified, preferably in an off-line
`process, to condition the URL to be served by the global
`hosting Servers. A flowchart illustrating the preferred
`method for modifying the object URL is illustrated in FIG.
`4.
`The routine begins at step 50 by determining whether all
`of the embedded objects in a given page have been pro
`cessed. If So, the routine ends. If not, however, the routine
`gets the next embedded object at step 52. At step 54, a virtual
`server hostname is prepended into the URL for the given
`embedded object. The virtual server hostname includes a
`value (e.g., a number) that is generated, for example, by
`applying a given hash function to the URL. AS is well
`known, a hash function takes arbitrary length bit Strings as
`inputs and produces fixed length bit strings (hash values) as
`outputs. Such functions Satisfy two conditions: (1) it is
`infeasible to find two different inputs that produce the same
`hash value, and (2) given an input and its hash value, it is
`infeasible to find a different input with the same hash value.
`In step 54, the URL for the embedded object is hashed into
`a value XX,XXX that is then included in the Virtual Server
`hostname. This Step randomly distributes the object to a
`given virtual Server hostname.
`The present invention is not limited to generating the
`Virtual Server hostname by applying a hash function as
`described above. As an alternative and preferred
`
`-7-
`
`
`
`7
`embodiment, a virtual Server hostname is generated as
`follow S. Consider the representative host name
`a 1234.g.akamaitech.net. The 1234 value, Sometimes
`referred to as a Serial number, preferably includes informa
`tion about the object Such as its size (big or Small), its
`anticipated popularity, the date on which the object was
`created, the identity of the Web site, the type of object (e.g.,
`movie or Static picture), and perhaps Some random bits
`generated by a given random function. Of course, it is not
`required that any given Serial number encode all of Such
`information or even a significant number of Such compo
`nents. Indeed, in the simplest case, the Serial number may be
`a simple integer. In any event, the information is encoded
`into a Serial number in any convenient manner. Thus, for
`example, a first bit is used to denote size, a Second bit is used
`to denote popularity