`Leighton et al.
`
`[54] GLOBAL HOSTING SYSTEM
`Inventors: F. Thomson Leighton, Newtonville;
`[75]
`Daniel M. Lewin, Cambridge, both of
`Mass.
`[73] Assignee: Massachusetts Institute of
`Technology, Cambridge, Mass.
`
`[21] Appl. No.: 09/314,863
`[22] Filed:
`May 19, 1999
`Related U.S. Application Data
`[60] Provisional application No. 60/092,710, Jul. 14, 1998.
`G06F 13/00
`Int. CI.7
`[51]
`[52] U.S. CI
`709/226; 709/105; 709/219;
`709/223; 709/224; 709/235
`[58] Field of Search
`707/10, 2, 104,
`707/203, 500, 501, 511, 512, 513, 515;
`709/200, 201 , 203, 218, 219, 230, 235,
`238, 245, 246, 226, 224, 105, 220; 711/118,
`119, 120, 122, 126, 130, 200, 202, 216
`
`[56]
`
`References Cited
`U.S. PATENT DOCUMENTS
`5/1990 Churm et al
`4,922,417
`2/1994 Nemes
`5,287,499
`8/1994 Pitkin et al
`5,341,477
`7/1996 Neimat et al
`5,542,087
`6/1997 Stefik et al
`5,638,443
`7/1997 Dewkett et al
`5,646,676
`2/1998 Stewart
`5,715,453
`4/1998 Logan et al
`5,740,423
`5/1998 Smyk
`5,751,961
`5,761,507 12/1999 Govett
`5,774,660
`6/1998 Brendel et al
`5,777,989
`7/1998 McGarvey
`5,802,291
`9/1998 Balick et al
`5,832,506 11/1998 Kuzma
`5,856,974
`1/1999 Gervais et al
`5,870,559
`2/1999 Leshem et al
`5,878,212
`3/1999 Civanlar et al
`5,884,038
`3/1999 Kapoor
`5,903,723
`5/1999 Beck et al
`5,919,247 12/1999 Van Hoff et al
`
`707/1
`707/2
`709/226
`707/10
`705/54
`348/7
`707/104
`707/10
`709/217
`395/684
`709/201
`370/254
`709/202
`707/200
`370/392
`709/224
`709/203
`709/226
`709/200
`709/217
`
`111111 ii inn 11111 ii mi 1111 ii
`6,108,703
`Aug. 22, 2000
`
`US006108703A
`[ii] Patent Number:
`[45] Date of Patent:
`
`707/101
`345/329
`707/10
`709/224
`709/226
`707/10
`709/226
`
`8/1999 Suzuoka et al
`5,933,832
`8/1999 Freishtat et al
`5,945,989
`9/1999 Kenner et al
`5,956,716
`5,961,596 10/1999 Takubo et al
`5,991,809 11/1999 Kriegsman
`6,003,030 12/1999 Kenner et al
`6,006,264 12/1999 Colby et al
`FOREIGN PATEN T DOCUMENTS
`2202572 10/1998 Canada .
`865180A2
`9/1998 European Pat. Off. .
`9804985
`2/1998 WIPO .
`OTHER PUBLICATION S
`Shaw, David M. "A Low Latency, High Throughput Web
`Service Using Internet-wide Replication." Department of
`Computer Science, Johns Hopkins University, Aug. 1998, 33
`Pgs-
`
`(List continued on next page.)
`Primary Examiner—Dung C. Dinh
`Assistant Examiner—Abdullahi
`E. Salad
`Attorney, Agent, or Firm—David H. Judson
`[57]
`ABSTRACT
`The present invention is a network architecture or frame-
`work that supports hosting and content distribution on a
`truly global scale. The inventive framework allows a Con-
`tent Provider to replicate and serve its most popular content
`at an unlimited number of points throughout the world. The
`inventive framework comprises a set of servers operating in
`a distributed manner. The actual content to be served is
`preferably supported on a set of hosting servers (sometimes
`referred to as ghost servers). This content comprises HTML
`page objects that, conventionally, are served from a Content
`Provider site. In accordance with the invention, however, a
`base HTM L document portion of a Web page is served from
`the Content Provider's site while one or more embedded
`objects for the page are served from the hosting servers,
`preferably, those hosting servers near the client machine. By
`serving the base HTML document from the Content Pro-
`vider's site, the Content Provider maintains control over the
`content.
`
`34 Claims, 2 Drawing Sheets
`
`CtlENT
`
`1
`
`Patent Owner Limelight Ex. 2007
`
`
`
`6,108,703
`Page 2
`
`OTHER PUBLICATIONS
`Amir, Yair, et al. "Seamlessly Selecting the Best Copy from
`Internet-Wide Replicated Web Servers." Department of
`Computer Science, Johns Hopkins University, Jun. 1998, 14
`Pgs-
`Bestavros, Azer. "Speculative Data Dissemination and Ser-
`vice to Reduce Server Load, Network Traffic and Service
`Time in Distributed Information Systems." In Proceedings
`of ICDE '96: The 1996 International Conference on Data
`Engineering, Mar. 1996, 4 pgs.
`Carter, J. Lawrence, et al. "Universal Classes of Hash
`Function." Journal of Computer and System Sciences, vol.
`18, No. 2, Apr. 1979, pp. 143-154.
`Chankhunthod, Anawat, et al. "A Hierarchical Internet
`Object Cache." In Usenix Proceedings, Jan. 1996, pgs.
`153-163.
`Gormen, Thomas H., et al. Introduction to Algorithms, The
`MIT Press, Cambrdige, Massachusetts, 1994, pgs. 219-243,
`991-993.
`Deering, Stephen, et al. "Multicast Routing in Datagram
`Internetworks and Extended LANs." ACM Transactions on
`Computer Systems, vol. 8, No. 2, May 1990, pgs. 85-110.
`Devine, Robert. "Design and Implementation of DDH: A
`Distributed Dynamic Hashing Algorithm." In Proceedings
`of 4th International Conference on Foundations of Data
`Organizations and Algorithms, 1993, pgs. 101-114.
`Grigni, Michelangelo, et al. "Tight Bounds on Minimum
`Broadcasts Networks." SIAM Journal of Discrete Math-
`ematics, vol. 4, No. 2, May 1991, pgs. 207-222.
`Gwertzman, James, et al. "The Case for Geographical Push-
`Caching." Technical Report HU TR 34—94(excerpt), Har-
`vard University, DAS, Cambridge, MA 02138, 1994, 2 pgs.
`Gwertzman, James, et al. "World-Wide Web Cache Consis-
`tency." In Proceedings of the 1996 USENIX Technical
`Conference, Jan. 1996, 8 pgs.
`Feeley, Michael, et al. "Implementing Global Memory Man-
`agement in a Workstation Cluster." In Proceedings of the
`15th ACM Symposium on Operating Systems Principles,
`1995, pgs. 201-212.
`Floyd, Sally, et al. "A Reliable Multicast Framework for
`Light-Weight Sessions and Application Level Framing." In
`Proceeding of ACM SIGCOMM'95, pgs. 342-356.
`Fredman, Michael, et al. "Storing a Sparse Table with 0(1)
`Worst Case Access Time." Journal of the Association
`for
`Computing Machinery, vol. 31., No. 3, Jul. 1984, pgs.
`538-544.
`Karger, David, et al. "Consistent Hashing and Random
`Trees: Distributed Caching Protocols for Relieving Hot
`Spots on the World Wide Web." In Proceedings of the
`Twenty—Ninth Annual ACM Symposium on Theory of Com-
`puting ,May 1997, pgs. 654-663.
`Litwin, Withold, et al. "LH—A Scaleable, Distributed Data
`Structure." ACM Transactions on Database Systems, vol.
`21, No. 4, Dec. 1996, pgs. 480-525.
`
`Malpani, Radhika, et al. "Making World Wide Web Caching
`Servers Cooperate." In Proceedings of World Wide Web
`Conference, 1996, 6 pgs.
`Naor, Moni, et al. "The Load, Capacity and Availability of
`Quorum Systems." In Proceedings of the 35th IEEE Sym-
`posium on Foundations of Computer Science, Nov. 1994,
`pgs. 214-225.
`Nisan, Noam. "Psuedorandom Generators for Space-
`Bounded Computation." In Proceedings of the Twenty-
`Second Annual ACM Symposium on Theory of Computing,
`May 1990, pgs. 204-212.
`Palmer, Mark, et al. "Fido: A Cache that Learns to Fetch."
`In Proceedings of the 17th International Conference on Very
`Large Data Bases, Sep. 1991, pgs. 255-264.
`Panigraphy, Rina. Relieving Hot Spots on the World Wide
`Web. Massachusetts Institute of Technology, Jun. 1997, pgs.
`1-66.
`Peleg, David, et al. "The Availability of Quorum Systems."
`Information and Computation 123, 1995, 210-223.
`Plaxton, C. Greg, et al. "Fast Fault-Tolerant Concurrent
`Access to Shared Objects." In Proceedings of 37th IEEE
`Symposium on Foundations of Computer Science, 1996, pgs.
`570-579.
`Rabin, Michael. "Efficient Dispersal of Information for
`Security, Load Balancing, and Fault Tolerance." Journal of
`the ACM, vol. 36, No. 2, Apr. 1989, pgs. 335-348.
`Ravi, R., "Rapid Rumor Ramification: Approximating the
`Miniumum Broadcast Time." In Proceedings of the 35th
`IEEE Symposium on Foundations of Computer Science,
`Nov. 1994, pgs. 202-213.
`Schmidt, Jeanette, et al. "Chernoff-Hoeffding Bounds for
`Applications with Limited Independence." In Proceedings
`of the 4th ACS—SIAM Symposium on Discrete Algorithms,
`1993, pgs. 331-340.
`Tarjan, Robert Endre, et al. "Storing a Sparse Table."
`Communications of the ACM, vol. 22, No. 11, Nov. 1979,
`pgs. 606-611.
`Vitter, Jeffrey Scott, et al. "Optimal Prefetching via Data
`Compression." In Proceedings of the 32nd IEEE Symposium
`on Foundations of Computer Science, Nov. 1991, pgs.
`121-130.
`Wegman, Mark, et al. "New Hash Functions and Their Use
`in Authentication and Set Equality." Journal of Computer
`and System Sciences vol. 22, Jun. 1981, pgs. 265-279.
`Yao, Andrew Chi-Chih. "Should Tables be Sorted?" Journal
`of the Association for Computing Machinery, vol. 28, No. 3,
`Jul. 1981, pgs. 615-628.
`Beavan, Colin "Web Life They're Watching You." Esquire,
`Aug. 1997, pgs. 104-105.
`Beavan, Colin "Web Life They're Watching You." Esquire,
`Aug. 1997, pp. 104-105.
`
`2
`
`
`
`U.S. Patent
`
`Aug. 22, 2000
`
`Sheet 1 of 2
`
`6,108,703
`
`CLIENT
`
`BROWSER
`APPLICATION
`
`FIG. 1
`
`SERVER
`S
`
`•12
`
`i
`i
`
`1
`1
`o\
`
`PROCESSOR
`OS
`WEB SERVER
`API
`
`-18
`-20
`-22
`
`-26
`
`CONTENT
`PROVIDER
`SITE
`
`3
`
`
`
`U.S. Patent
`
`Aug. 22, 2000
`
`Sheet 2 of 2
`
`6,108,703
`
`FIG. 4
`
`GET NEXT OBJECT
`
`PREPEND VIRTUAL
`SERVER HOST NAME
`
`PREPEND HASH
`VALUE AS FINGERPRINT
`
`52
`HASH VALUE
`S
`54
`y
`HASH VALUE
`•56
`
`INPUT
`
`INPUT
`
`FIG. 5
`
`4
`
`
`
`6,108,703
`
`5
`
`15
`
`GLOBAL HOSTING SYSTEM
`This application is based on Provisional Application No.
`60/092,710, filed Jul. 14, 1998. This application includes
`subject matter protected by copyright.
`BACKGROUND OF THE INVENTION
`1. Technical Field
`This invention relates generally to information retrieval in
`a computer network. More particularly, the invention relates 10
`to a novel method of hosting and distributing content on the
`Internet that addresses the problems of Internet Service
`Providers (ISPs) and Internet Content Providers.
`2. Description of the Related Art
`The World Wide Web is the Internet's multimedia infor
`mation retrieval system. In the Web environment, client
`machines effect transactions to Web servers using the Hyper-
`text Transfer Protocol (HTTP), which is a known application
`protocol providing users access to files (e.g., text, graphics, ,„
`images, sound, video, etc.) using a standard page description
`language known as Hypertext Markup Language (HTML).
`HTML provides basic document formatting and allows the
`developer to specify "links" to other servers and files. In the
`Internet paradigm, a network path to a server is identified by -,
`a so-called Uniform Resource Locator (URL) having a
`special syntax for defining a network connection. Use of an
`HTML-compatible browser (e.g., Netscape Navigator or
`Microsoft Internet Explorer) at a client machine involves
`specification of a link via the URL. In response, the client 30
`makes a request to the server identified in the link and, in
`return, receives a document or other object formatted
`according to HTML. A collection of documents supported
`on a Web server is sometimes referred to as a Web site.
`It is well known in the prior art for a Web site to mirror 35
`its content at another server. Indeed, at present, the only
`method for a Content Provider to place its content closer to
`its readers is to build copies of its Web site on machines that
`are located at Web hosting farms in different
`locations
`domestically and internationally. These copies of Web sites 4Q
`are known as mirror sites. Unfortunately, mirror sites place
`unnecessary economic and operational burdens on Content
`Providers, and they do not offer economies of scale.
`Economically, the overall cost to a Content Provider with
`one primary site and one mirror site is more than twice the 45
`cost of a single primary site. This additional cost is the result
`of two factors: (1) the Content Provider must contract with
`a separate hosting facility for each mirror site, and (2) the
`Content Provider must incur additional overhead expenses
`associated with keeping the mirror sites synchronized.
`In an effort to address problems associated with mirroring,
`companies such as Cisco, Resonate, Bright Tiger, F5 Labs
`and Alteon, are developing software and hardware that will
`help keep mirror sites synchronized and load balanced.
`Although these mechanisms are helpful to the Content 55
`Provider, they fail to address the underlying problem of
`scalability. Even if a Content Provider is willing to incur the
`costs associated with mirroring, the technology itself will
`not scale beyond a few (i.e., less than 10) Web sites.
`In addition to these economic and scalability issues, 60
`mirroring also entails operational difficulties. A Content
`Provider that uses a mirror site must not only lease and
`manage physical space in distant locations, but it must also
`buy and maintain the software or hardware that synchronizes
`and load balances the sites. Current solutions require Con- 65
`tent Providers to supply personnel, technology and other
`items necessary to maintain multiple Web sites. In summary.
`
`5Q
`
`mirroring requires Content Providers to waste economic and
`other resources on functions that are not relevant to their
`core business of creating content.
`Moreover, Content Providers also desire to retain control
`of their content. Today, some ISPs are installing caching
`hardware that interrupts the link between the Content Pro-
`vider and the end-user. The effect of such caching can
`produce devastating results to the Content Provider, includ-
`ing (1) preventing the Content Provider from obtaining
`accurate hit counts on its Web pages (thereby decreasing
`revenue from advertisers), (2) preventing the Content Pro-
`vider from tailoring content and advertising to specific
`audiences (which severely limits the effectiveness of the
`Content Provider's Web page), and (3) providing outdated
`information to its customers (which can lead to a frustrated
`and angry end user).
`There remains a significant need in the art to provide a
`decentralized hosting solution that enables users to obtain
`Internet content on a more efficient basis (i.e., without
`burdening network resources unnecessarily) and that like-
`wise enables the Content Provider to maintain control over
`its content.
`The present invention solves these and other problems
`associated with the prior art.
`BRIEF SUMMARY OF THE INVENTION
`It is a general object of the present invention to provide a
`computer network comprising a large number of widely
`deployed Internet servers that form an organic, massively
`fault-tolerant infrastructure designed to serve Web content
`efficiently, effectively, and reliably to end users.
`Another more general object of the present invention is to
`provide a fundamentally new and better method to distribute
`Web-based content. The inventive architecture provides a
`method for intelligently routing and replicating content over
`a large network of distributed servers, preferably with no
`centralized control.
`Another object of the present invention is to provide a
`network architecture that moves content close to the user.
`The inventive architecture allows Web sites to develop large
`audiences without worrying about building a massive infra-
`structure to handle the associated traffic.
`Still another object of the present invention is to provide
`a fault-tolerant network for distributing Web content. The
`network architecture is used to speed-up the delivery of
`richer Web pages, and it allows Content Providers with large
`audiences to serve them reliably and economically, prefer-
`ably from servers located close to end users.
`A further feature of the present invention is the ability to
`distribute and manage content over a large network without
`disrupting the Content Provider's direct relationship with the
`end user.
`Yet another feature of the present invention is to provide
`a distributed scalable infrastructure for the Internet that
`shifts the burden of Web content distribution from the
`Content Provider to a network of preferably hundreds of
`hosting servers deployed, for example, on a global basis.
`In general, the present invention is a network architecture
`that supports hosting on a truly global scale. The inventive
`framework allows a Content Provider to replicate its most
`popular content at an unlimited number of points throughout
`the world. As an additional feature, the actual content that is
`replicated at any one geographic location is specifically
`tailored to viewers in that location. Moreover, content is
`automatically sent to the location where it is requested,
`without any effort or overhead on the part of a Content
`Provider.
`
`5
`
`
`
`10
`
`15
`
`20
`
`25
`
`30
`
`6,108,703
`It is thus a more general object of this invention to provide
`technique for distributing the embedded object requests. In
`a global hosting framework to enable Content Providers to
`particular, each embedded object URL is preferably modi-
`fied by prepending a virtual server hostname into the URL.
`retain control of their content.
`More generally, the virtual server hostname is inserted into
`The hosting framework of the present invention com-
`the URL. Preferably, the virtual server hostname includes a
`prises a set of servers operating in a distributed manner. The
`value (sometimes referred to as a serial number) generated
`actual content to be served is preferably supported on a set
`by applying a given hash function to the URL or by encoding
`of hosting servers (sometimes referred to as ghost servers).
`given information about the object into the value. This
`This content comprises HTML page objects
`that,
`function serves to randomly distribute the embedded objects
`conventionally, are served from a Content Provider site. In
`over a given set of virtual server hostnames. In addition, a
`accordance with the invention, however, a base HTML
`given fingerprint value for the embedded object is generated
`document portion of a Web page is served from the Content
`by applying a given hash function to the embedded object
`Provider's site while one or more embedded objects for the
`itself. This given value serves as a fingerprint that identifies
`page are served from the hosting servers, preferably, those
`whether the embedded object has been modified. Preferably,
`hosting servers nearest the client machine. By serving the
`the functions used to generate the values (i.e., for the virtual
`base HTML document from the Content Provider's site, the
`server hostname and the fingerprint) are applied to a given
`Content Provider maintains control over the content.
`Web page in an off-line process. Thus, when an HTTP
`The determination of which hosting server to use to serve
`request for the page is received, the base HTML document
`a given embedded object is effected by other resources in the
`is served by the Web site and some portion of the page's
`hosting framework. In particular, the framework includes a
`embedded objects are served from the hosting servers near
`second set of servers (or server resources) that are config-
`(although not necessarily the closest) to the client machine
`ured to provide top level Domain Name Service (DNS). In
`that initiated the request.
`addition, the framework also includes a third set of servers
`The foregoing has outlined some of the more pertinent
`(or server resources) that are configured to provide low level
`objects and features of the present invention. These objects
`DNS functionality. When a client machine issues an HTTP
`should be construed to be merely illustrative of some of the
`request to the Web site for a given Web page, the base
`more prominent features and applications of the invention.
`HMTL document is served from the Web site as previously
`Many other beneficial results can be attained by applying the
`noted. Embedded objects for the page preferably are served
`disclosed invention in a different manner or modifying the
`from particular hosting servers identified by the top- and
`invention as will be described. Accordingly, other objects
`low-level DNS servers. To locate the appropriate hosting
`and a fuller understanding of the invention may be had by
`servers to use, the top-level DNS server determines the
`referring to the following Detailed Description of the Pre-
`user's location in the network to identify a given low-level
`ferred Embodiment.
`DNS server to respond to the request for the embedded
`object. The top-level DNS server then redirects the request
`BRIEF DESCRIPTION OF THE DRAWINGS
`to the identified low-level DNS server that, in turn, resolves
`the request into an IP address for the given hosting server
`For a more complete understanding of the present inven-
`that serves the object back to the client.
`tion and the advantages thereof, reference should be made to
`More generally, it is possible (and, in some cases,
`the following Detailed Description taken in connection with
`desirable) to have a hierarchy of DNS servers that consisting
`the accompanying drawings in which:
`of several levels. The lower one moves in the hierarchy, the
`FIG. 1 is a representative system in which the present
`closer one gets to the best region.
`invention is implemented;
`A further aspect of the invention is a means by which
`FIG. 2 is a simplified representation of a markup language
`content can be distributed and replicated through a collec-
`document illustrating the base document and a set of embed-
`tion of servers so that the use of memory is optimized
`ded objects;
`subject to the constraints that there are a sufficient number
`FIG. 3 is a high level diagram of a global hosting system
`of copies of any object to satisfy the demand, the copies of
`according to the present invention;
`objects are spread so that no server becomes overloaded,
`FIG. 4 is a simplified flowchart illustrating a method of
`copies tend to be located on the same servers as time moves
`processing a Web page to modified embedded object URLs
`forward, and copies are located in regions close to the clients
`that is used in the present invention;
`that are requesting them. Thus, servers operating within the
`FIG. 5 is a simplified state diagram illustrating how the
`framework do not keep copies of all of the content database.
`present invention responds to a HTTP request for a Web
`Rather, given servers keep copies of a minimal amount of
`page.
`data so that the entire system provides the required level of
`service. This aspect of the invention allows the hosting
`scheme to be far more efficient than schemes that cache
`everything everywhere, or that cache objects only in pre-
`specified locations.
`The global hosting framework is fault tolerant at each
`level of operation. In particular, the top level DNS server
`returns a list of low-level DNS servers that may be used by
`the client to service the request for the embedded object.
`Likewise, each hosting server preferably includes a buddy
`server that is used to assume the hosting responsibilities of
`its associated hosting server in the event of a failure condi-
`tion.
`According to the present invention, load balancing across
`the set of hosting servers is achieved in part through a novel
`
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENT
`A known Internet client-server system is implemented as
`illustrated in FIG. 1. A client machine 10 is connected to a
`Web server 12 via a network 14. For illustrative purposes,
`network 14 is the Internet, an intranet, an extranet or any
`other known network. Web server 12 is one of a plurality of
`servers which are accessible by clients, one of which is
`illustrated by machine 10. A representative client machine
`includes a browser 16, which is a known software tool used
`to access the servers of the network. The Web server
`supports files (collectively referred to as a "Web" site) in the
`form of hypertext documents and objects. In the Internet
`
`35
`
`40
`
`45
`
`60
`
`65
`
`6
`
`
`
`6,108,703
`6
`5
`paradigm, a network path to a server is identified by a
`ISP preferably has a small number of machines running the
`so-called Uniform Resource Locator (URL).
`top-level DNS 38 that may also be distributed throughout
`A representative Web server 12 is a computer comprising
`the network.
`a processor 18, an operating system 20, and a Web server
`Although not meant to be limiting, preferably a given
`program 22, such as Netscape Enterprise Server. The server 5 s e r v e r u s e d in the framework 35 includes a processor, an
`12 also includes a display supporting a graphical user
`o p e r a t i n g s y s t e m ( e - g L i m l X ; U N I x , Windows NT, or the
`interface (GUI) for management and administration, and an
`^
`ap pi i c a t i on
`a W e b
`s e r v e r
`l i c a t i
`a n d a s e t o f
`Application Programming Interface (API) that provides
`,.
`, ,
`,,
`.
`r™
`^
`, ,
`,•
`,-
`,
`1
`1
`, 1
`1,
`routines used by the invention. Ihese routines are conve-
`.
`\
`extensions to enable application developers to extend and/or
`.
`r . ^
`i1
`,
`^ , .
`.
`^
`^
`,
`^
`i,
`„
`i.
`1V ^
`j. i]
`mently implemented in sottware as a set of instructions
`customize the core functionality thereof through sottware
`ln
`,,
`r
`, ,r
`.
`.
`. . .
`•'„ ,
`T , °
`^^,^T\
`lu executed by the processor to perioral various process or
`_
`r
`programs including Common Gateway Interface (CGI)
`^
`, ^
`.„ ,
`1^-111
`,
`.,
`, .
`T,,
`,
`, ^
`i.
`method steps as will be described in more detail below. 1 he
`.
`• /
`programs, plug-ins, servlets, active server pages, server side
`. ,
`,.
`, ,
`,
`. ..
`.
`,
`,
`r ..
`• 1 1 /r.oT\ r
`IM
`servers are preferably located at the edges of the network
`I-
`It
`include (SSI) functions or the like.
`.
`„
`r.^r. \
`,
`'
`.
`.
`(e.g., in points of presence, or POPs).
`v
`A representative Web client is a personal computer that is
`.
`x86-, PowerPC®-or RISC-based, that includes an operating 15
`Several factors may determine where the hosting servers
`system such as IBM® OS/2® or Microsoft Windows '95,
`,are P.laced l n ^e network. Thus, for example, the server
`and that includes a Web browser, such as Netscape Navi-
`locations are preferably determined by a demand driven
`gator 4.0 (or higher), having a Java Virtual Machine (JVM)
`n e t w. o rk m a P
`t h a t a l l o w s the provider (e.g., the ISP) to
`and support for application plug-ins or helper applications.
`m o m t o r t.raffic r e'l u e s t s- ^
`trafflc V^™> ^ KP
`f " ^
`A client may also be a notebook computer, a handheld 20 may optimize the server locations for the given
`traffic
`computing device (e.g., a PDA), an Internet appliance, or
`"
`any other such device connectable to the computer network.
`According to the present invention, a given Web page
`As seen in FIG. 2, a typical Web page comprises a markup
`(comprising a base HTML document and a set of embedded
`language (e.g. HTML) master or base document 28, and
`o bJ e c t s) l s s e r v e d l n a distributed manner. Thus, preferably,
`many embedded objects (e.g., images, audio, video, or the 25 t h e b a s e H ™ L
`d o c u m e n t
`l s s e r v e d
`f r o m
`t h e C o n t e n t
`like) 30. Thus, in a typical page, twenty or more embedded
`Provider that normally hosts the page. The embedded
`o bJ e c t s' o r s o m e s u b s e t
`t h e r eo f' a r e Preferentially served
`images or objects are quite common. Each of these images
`is an independent object in the Web, retrieved (or validated
`f r o m t h e h o s t mg s e r v e r s 3 6 and> specifically, given hosting
`for change) separately. The common behavior of a Web
`s e r v e r s 3 6 t h a t a r e n e a r t h e c l l e n t machine that in the first
`client, therefore, is to fetch the base HTML document, and 30 l n s t a n c e m l t l a t e d t h e r e cl u e st f o r t h e W e b Page- I n a d d l t l o n'
`then immediately fetch the embedded objects, which are
`Preferably loads across the hosting servers are balanced to
`t h a t a g l v en embedded object may be efficiently
`typically (but not always) located on the same server.
`e n s u r e
`s e r v e d f ro m a g l v en ho s t mg s e r v er n e a r t h e c l l e n t w h e n s u c h
`According to the present invention, preferably the markup
`language base document 28 is served from the Web server
`requlres
`t h a t o bJ e ct t o complete the page,
`c l l e n t
`(i.e., the Content Provider site) whereas a given number (or 35
`To serve the page contents in this manner, the URL
`perhaps all) of the embedded objects are served from other
`associated with an embedded object is modified. As is
`servers. As will be seen, preferably a given embedded object
`well-known, each embedded object that may be served in a
`is served from a server (other than the Web server itself) that
`page has its own URL. Typically, the URL has a hostname
`is close to the client machine, that is not overloaded, and that
`identifying the Content Provider's site from where the obj ect
`is most likely to already have a current version of the 40 i s conventionally served, i.e., without reference to the
`required
`file.
`present invention. According to the invention, the embedded
`Referring now to FIG. 3, this operation is achieved by the
`obJect URL is first modified, preferably in an off-line
`hosting system of the present invention. As will be seen, the
`process, to condition the URL to be served by the global
`hosting system 35 comprises a set of widely-deployed
`h o s t m g servers. A flowchart
`illustrating the preferred
`servers (or server resources) that form a large, fault-tolerant 45 m e t h o d f o r modifying the object URL is illustrated in FIG.
`infrastructure designed to serve Web content efficiently,
`effectively, and reliably to end users. The servers may be
`The routine begins at step 50 by determining whether all
`deployed globally, or across any desired geographic regions.
`of the embedded objects in a given page have been pro-
`As will be seen, the hostmg system provides a distributed
`cessed. If so, the routine ends. If not, however, the routine
`architecture for intelligently routing and replicating such 50 gets the next embedded object at step 52. At step 54, a virtual
`content. To this end, the global hosting system 35 comprises
`server hostname is prepended into the URL for the given
`three (3) basic types of servers (or server resources): hosting
`embedded object. The virtual server hostname includes a
`servers (sometimes called ghosts) 36, top-level DNS servers
`value (e.g., a number) that is generated, for example, by
`38, and low-level DNS servers 40. Although not illustrated,
`applying a given hash function to the URL. As is well-
`there may be additional levels in the DNS hierarchy. 55 known, a hash function takes arbitrary length bit strings as
`Alternatively, there may be a single DNS level that com-
`inputs and produces fixed length bit strings (hash values) as
`outputs. Such functions satisfy two conditions: (f) it is
`bines the functionality of the top level and low-level servers.
`In this illustrative embodiment, the inventive framework 35
`infeasible to find two different inputs that produce the same
`is deployed by an Internet Service Provider (ISP), although
`hash value, and (2) given an input and its hash value, it is
`this is not a limitation of the present invention. The ISP or 60 infeasible to find a different input with the same hash value.
`ISPs that deploy the inventive global hosting framework 35
`In step 54, the URL for the embedded object is hashed into
`preferably have a large number of machines that run both the
`a value xx,xxx that is then included in the virtual server
`ghost server component 36 and the low-level DNS compo-
`hostname. This step randomly distributes the object to a
`nent 40 on their networks. These machines are distributed
`given virtual server hostname.
`throughout the network; preferably, they are concentrated 65
`The present invention is not limited to generating the
`around network exchange points 42 and network access
`virtual server hostname by applying a hash function as
`points 44, although this is not a requirement. In addition, the
`described above. As an alternative and preferred
`
`7
`
`
`
`6,108,703
`8
`7
`embodiment, a virtual server hostname is generated as
`hashed into numbers between 0 and 99,999, although this
`representative hostname
`range is not a limitation of the present invention. An
`follows. Conside r
`the
`al234.g.akamaitech.net. The 1234 value, sometimes
`embedded URL is then switched to reference the virtual
`referred to as a serial number, preferably includes informa-
`ghost with that number. For example, the following is an
`tion about the object such as its size (big or small), its 5 embedded URL from the Provider's site:
`anticipated popularity, the date on which the object was
`<IMG SRC=http://www.provider.com/TECH/images/
`created, the identity of the Web site, the type of object (e.g.,
`space.story.gif>
`movie or static picture), and perhaps some random bits
`If the serial number for the object referred to by this URL
`generated by a given random function. Of course, it is not
`is the number 1