`(12) Patent Application Publication (10) Pub. No.: US 2007/0156845 A1
`Devanneaux et al.
`(43) Pub. Date:
`Jul. 5, 2007
`
`US 20070l56845Al
`
`(54) SITE ACCELERATION WITH CONTENT
`PREFETCHING ENABLED THROUGH
`
`(52) U.S. Cl.
`
`............................................................ .. 709/217
`
`CUSTOMER-SPECIFIC CONFIGURATIONS
`
`(57)
`
`ABSTRACT
`
`(75)
`
`II1VeI1t0rSI Thomas P- DeV3IlIle31lX, L05 AITOS, CA
`(US); L3SZI0 K0V3e5s Fester City, CA
`(US); Stephen L- Llldill, Mill Valley,
`CA (US)
`
`Correspondence Address3
`LAW OFFICE OF DAVID II- JUDSON
`15950 DALLAS PARKWAY
`SUITE 225
`DALLASs TX 75243 (US)
`_
`.
`_
`(73) Asslgneei Akamal Technologlesfi Inc‘
`
`(21) Appl. No.:
`
`11/647,750
`
`.
`.
`(22) Ffled
`
`Dec’ 29’ 2006
`D t
`R 1 t d US_ A 1.
`t.
`a a
`e a e
`PP lea Ion
`(60) Provisionalapp1iCati0nN0. 60/755,176,fi1ed on Dec.
`30, 2005. Provisional application No. 60/755,908,
`filed on Dec. 31, 2005.
`
`publication Classification
`
`(51)
`
`Int, Cl,
`G06F 15/16
`
`(2006.01)
`
`A CDN edge server is configured to provide one or more
`extended content delivery features on a domain-specific,
`customer-specific basis, preferably using configuration files
`that are distributed to the edge servers using a configuration
`system. A given configuration file includes a set of content
`handling rules and directives that facilitate one or more
`advanced content handling features,
`such as content
`prefetching. When prefetching is enabled, the edge server
`retrieves objects embedded in pages (normally HTML con-
`tent) at the same time it serves the page to the browser rather
`than waiting for the browser’s request for these objects. This
`can significantly decrease the overall rendering time of the
`page and improve the user experience of a Web site. Using
`a set of metadata tags, prefetching can be applied to either
`cacheable or uncacheable content. When prefetching is used
`for cacheable content, and the object to be prefetched is
`already in cache, the object is moved from disk into memory
`so that it is ready to be served. When prefetching is used for
`uncacheable content, p.referably.the retrieved objects are
`umquely associated with the client browser request that
`triggered the prefetch so that these objects cannot be served
`to a different end user. By applying metadata in the con-
`figuration file, prefetching can be combined with tiered
`distribution and other edge server configuration options to
`further improve the speed of delivery and/or to protect the
`origin server from bursts of prefetching requests.
`
`AKAMAI
`
`EXHIBIT 1003
`
`AKAMAI
`EXHIBIT 1003
`
`
`
`Patent Application Publication
`
`Jul. 5, 2007 Sheet 1 of 3
`
`US 2007/0156845 A1
`
`
`
`108
`
`§ 5 5:
`
`100
`
`Monitoring 110
`
`I Logging112
`
`Web
`proxy 207
`
`Name server 208
`
`Monitoring process 210
`
`Hardware 202
`
`
`
`Data CO“8Cfi0f'|g PWOC33 212Application
`
`Operating system 204
`
`Figure 2
`
`20°
`
`
`
`Patent Application Publication
`
`Jul. 5, 2007 Sheet 2 of 3
`
`US 2007/0156845 A1
`
`318
`
`Custorner ongln
`server
`
`302
`
`CDN Authoritative DNS
`Mechanism
`
`CDN metadatn
`configuration system
`
`306
`
`304
`
`
`
`
`
`Intemei-accessib!e
`nnchine whh web
`bmwserlplayet
`
`300
`
`Figure 3
`
`Table 1.
`
`Supported Tags/Attributes
`
`
`
`
`
`
`37
`
`BASE
`
`
`A
`
`i s, lsnc. Ingd
`7 7
`
`
`SCRIPT 3
`11
`w‘
`
`
`
`
`
`4
`l
`74
`
`IFRAME
`
`_,,
`
`_
`
`.
`
`,,,_,
`
`,_
`
`
`OBJECT
`
`————r——
`
`
`
`
`
`
`
`AREA
`
`
`i‘‘u
`
`
`
`
`
`
`
`
`
`
`
`
`Patent Application Publication
`
`Jul. 5, 2007 Sheet 3 of 3
`
`US 2007/0156845 A1
`
`<!-- Global fetch limit settings to protect origin; should be included in all
`configurations ——>
`
`<edgeservices:prefetch>
`<status>off</status>
`<fetch>
`
`<max—prefetches—per—page>100</max—prefetches—per—page>
`<max—urls~per—page>l0O</max—urls—per—page>
`<log—r—lines>on</log—r-lines>
`</fetch>
`<fetch.limits>
`<status>on</status>
`<time-scale>1s</time—sca1e>
`<requests—high—watermark>20</requests—high—watermark>
`<requests—low—watermark>l5</requests—low—watermark>
`</fetch.limits>
`</edgeservices:prefetch>
`
`<!-- turn prefetch on for content likely to be HTML ——>
`
`<match:uri.ext value="jsp html htm asp">
`<edgeservices:prefetch.status>on</edgeserviceszprefetch.status>
`</match:uri.ext>
`
`<!-- turn prefetch on for directory defaults requested without extension ——>
`<match:uri.ext value="">
`
`<edgeservices:prefetch.status>on</edgeserviceszprefetch.status>
`</match:uri.ext>
`
`<!~4 mark prefetchable content as such -- expand this list of extensions as
`needed ——>
`<match:uri.ext value="css gif jpg jpeg js ico mov png swf txt wav wma xml">
`<edgeserviceszprefetch.prefetchable-object>on</edgeservices:prefetch.
`prefetchable—object>
`<edgeservices:prefetch.status>off</edgeservices:prefetch.status>
`</match:uri.ext>
`
`<1-— prefetch even if the HTML is already cached ——>
`<edgeservices:prefetch.prefetch—on—hit>on</edgeservices:prefetch.prefetch—on—
`hit>
`
`<!-- when Tiered Distribution is used ——>
`<edgeservices:prefetch.indirect—only>on</edgeservices:prefetch.indirect-only>
`
`Figure 5
`
`
`
`US 2007/0156845 A1
`
`Jul. 5, 2007
`
`SITE ACCELERATION WITH CONTENT
`PREFETCHING ENABLED THROUGH
`CUSTOMER-SPECIFIC CONFIGURATIONS
`
`[0001] This application claims priority to Ser. No. 60/755,
`176, filed Dec. 30, 2005, and Ser. No. 60/755,908, filed Dec.
`31, 2005.
`
`[0002] Portions of this application contain subject matter
`that is protected by copyright.
`
`BACKGROUND OF THE INVENTION
`
`[0003]
`
`1. Technical Field
`
`[0004] The present invention relates generally to content
`delivery in distributed networks.
`
`[0005]
`
`2. Brief Description of the Related Art
`
`[0006] A company’s Web site represents its public face. It
`is often the initial point of contact for obtaining access to the
`company’ s information or doing business with the company.
`Public facing Web sites are used for many purposes. They
`can be used to transact commerce, where end consumers
`evaluate and buy products and services, and they are often
`linked to revenue generation and satisfying customer
`requests. They can be used as news and information portals
`for supplying the latest content for consumers. A company’s
`Web site can be used as a customer self-service venue, where
`customer satisfaction is critical to loyalty in getting custom-
`ers to return to the Web site. These are merely representative
`examples, of course. As companies place greater importance
`on the Internet, Web sites increasingly become a key com-
`ponent of a company’s business and its external communi-
`cations. As such, the capability and flexibility of the sup-
`porting Internet
`infrastructure for the Web site becomes
`mission-critical. In particular, the infrastructure must pro-
`vide good performance for all end user consumers, regard-
`less of their location. The site must scale to handle high
`trafiic load during peak usage periods.
`It must remain
`available 24x7, regardless of conditions on the Internet.
`When performance, reliability, or scalability problems do
`occur, Web site adoption and usage can be negatively
`impacted, resulting in greater costs, decreased revenue, and
`customer satisfaction issues.
`
`It is known in the prior art to ofi"-load Web site
`[0007]
`content for delivery by a third party distributed computer
`system. One such distributed computer system is a “content
`delivery r1etwor ” or “CDN” that is operated and managed
`by a service provider. The service provider typically pro-
`vides the service on behalf of third parties. A “distributed
`system” of this type typically refers to a collection of
`autonomous computers linked by a network or networks,
`together with the software, systems, protocols and tech-
`niques designed to facilitate various services, such as con-
`tent delivery or the support of outsourced site infrastructure.
`Typically, “content delivery” means the storage, caching, or
`transmission of content, streaming media and applications
`on behalf of content providers, including ancillary technolo-
`gies used therewith including, without
`limitation, DNS
`request handling, provisioning, data monitoring and report-
`ing, content targeting, personalization, and business intelli-
`gence. The term “outsourced site infrastructure” means the
`distributed systems and associated technologies that enable
`an entity to operate and/or manage a third party’s Web site
`infrastructure, in whole or in part, on the third party’s behalf.
`
`[0008] FIGS. 1-2 illustrate a known CDN infrastructure
`for managing content delivery on behalf of participating
`content providers. In this example, computer system 100 is
`configured as a CDN and is managed by a service provider.
`The CDN is assumed to have a set of machines 102a-12
`distributed around the Internet, and some or even all of these
`machines may be located in data centers owned or operated
`by third parties. Typically, most of the machines are servers
`located near the edge of the Internet, i.e., at or adjacent end
`user access networks. A Network Operations Command
`Center (NOCC) 104 may be used to administer and manage
`operations of the various machines in the system. Third
`party content sites, such as Web site 106, oflload delivery of
`content (e.g., HTML, embedded page objects, streaming
`media, software downloads, and the like) to the distributed
`computer system 100 and, in particular, to “edge” servers.
`Typically, this service is provided for a fee. In one common
`scenario, CDN content provider customers ofiload their
`content delivery by aliasing (e.g., by a DNS canonical name)
`given content provider domains or sub-domains to domains
`that are managed by the service provider’s authoritative
`domain name service. End users that desire such content
`
`may be directed to the distributed computer system to obtain
`that content more reliably and efficiently.
`[0009] The distributed computer system typically also
`includes other infrastructure, such as a distributed data
`collection system 108 that collects usage and other data from
`the edge servers, aggregates that data across a region or set
`of regions, and passes that data to other back-end systems
`110, 112, 114 and 116 to facilitate monitoring,
`logging,
`alerts, billing, management and other operational and
`administrative functions. Distributed network agents 118
`monitor the network as well as the server loads and provide
`network,
`traffic and load data to a DNS query handling
`mechanism 115, which is authoritative for content domains
`being managed by the CDN. A distributed data transport
`mechanism 120 may be used to distribute control informa-
`tion (e.g., metadata to manage content,
`to facilitate load
`balancing, and the like) to the edge servers. As illustrated in
`FIG. 2, a given machine 200 comprises commodity hard-
`ware (e.g., an Intel Pentium processor) 202 running an
`operating system kernel (such as Linux or variant) 204 that
`supports one or more applications 206a-n. To facilitate
`content delivery services,
`for example, given machines
`typically run a set of applications, such as an HTTP Web
`proxy 207, a name server 208, a local monitoring process
`210, a distributed data collection process 212, and the like.
`For streaming media, the machine typically includes one or
`more media servers, such as a Windows Media Server
`(WMS) or Flash 2.0 server, as required by the supported
`media formats.
`
`[0010] The CDN may be configured to provide certain
`advanced content delivery functionality, for example, in the
`case where the edge server does not have the requested
`content (e.g., the content is not present, the content is present
`but is stale, the content is “dynamic” and must be created on
`the origin server, and the like). In such circumstances, the
`edge server must “go forward” to obtain the requested
`content. An enhanced CDN often provides the capability to
`facilitate this “go forward” process. Thus, it is known to
`provide a “tiered distribution” by which additional edge
`servers in the CDN provide a buffer mechanism to the Web
`site origin server. In a tiered distribution scheme, a subset of
`the edge servers in the CDN is organized as a cache
`
`
`
`US 2007/0156845 A1
`
`Jul. 5, 2007
`
`hierarchy, so that a given edge server in an edge region has
`an associated “parent” region that may store an authoritative
`copy of certain requested content. A cache hierarchy of this
`type is then controlled at a fine-grain level using edge server
`and parent server configuration rules that are provided
`through the distributed data transport mechanism. U.S. Pat.
`No. 7,133,905, which is assigned to the assignee of the
`present
`application,
`describes
`this
`scheme. Another
`advanced function that may be implemented is quite useful
`when an edge server has to go forward to an origin server for
`dynamic or non-cacheable content. According to this tech-
`nique, the CDN is configured so that a given edge server has
`the option of going forward (to the origin) using interme-
`diate CDN edge nodes instead of relying upon default BGP
`routing. In this function, the CDN performs tests to deter-
`mine a set of alternative best paths between a given edge
`server and the origin server, and it makes those paths known
`to the edge server dynamically, typically in the form of a
`map. When the edge server needs to go forward, it examines
`the map to determine whether to go forward using default
`BGP or one of the alternate paths through an intermediate
`CDN node. This path optimization process is quite useful
`when the content in question must be generated dynamically,
`although the process can be used whenever it is necessary
`for a given edge server to obtain given content from a given
`source. This performance-based path optimization scheme is
`described in U.S. Publication No. 2002/0163882, which is
`also assigned to the assignee of the present application.
`
`BRIEF SUMMARY OF THE INVENTION
`
`[0011] A CDN edge server is configured to provide one or
`more extended content delivery features on a domain-
`specific, customer-specific basis, preferably using configu-
`ration files that are distributed to the edge servers using a
`configuration system. A given configuration file preferably is
`XML-based and includes a set of content handling rules and
`directives that facilitate one or more advanced content
`
`such as content prefetching. When
`handling features,
`prefetching is enabled,
`the edge server retrieves objects
`(such as images and scripts) embedded in pages (normally
`HTML content) at the same time it serves the page to the
`browser rather than waiting for the browser’s request for
`these objects. This can significantly decrease the overall
`rendering time of the page and improve the user experience
`of a Web site. Using a set of metadata tags, prefetching can
`be applied to either cacheable or uncacheable content. When
`prefetching is used for cacheable content, and the object to
`be prefetched is already in cache, the object is moved from
`disk into memory so that it is ready to be served. When
`prefetching is used for uncacheable content, the retrieved
`objects are uniquely associated with the client browser
`request that triggered the prefetch so that
`these objects
`carmot be served to a different end user. By applying
`metadata in the configuration file, prefetching can be com-
`bined with tiered distribution and other edge server configu-
`ration options to further improve the speed of delivery
`and/or to protect the origin server from bursts of prefetching
`requests.
`
`[0012] The foregoing has outlined some of the more
`pertinent features of the present invention. These features
`should be construed to be merely illustrative. Many other
`beneficial results can be attained by applying the disclosed
`invention in a different manner or by modifying the inven-
`tion as will be described.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`[0013] For a more complete understanding of the present
`invention and the advantages thereof, reference is now made
`to the following descriptions taken in conjunction with the
`accompanying drawings, in which:
`
`FIG. 1 is a representative prior art content delivery
`[0014]
`network in which the present
`invention may be imple-
`mented;
`
`FIG. 2 is a representative edge server of the content
`[0015]
`delivery network of FIG. 1;
`
`FIG. 3 is a portion of the CDN of FIG. 1 in which
`[0016]
`content prefetching is enabled according to the present
`invention;
`
`FIG. 4 is a table of HTML elements that may be
`[0017]
`prefetched based on settings in a customer-specific configu-
`ration file; and
`
`FIG. 5 is a representative default set of metadata to
`[0018]
`enable the prefetching feature according to the present
`invention.
`
`DETAILED DESCRIPTION OF AN
`ILLUSTRATIVE EMBODIMENT
`
`[0019] As seen in FIG. 3, a CDN customer has ofi"-loaded
`all or some of its content delivery requirements to the CDN
`service provider in a well-known manner. In this case, the
`CDN customer operates a site at the origin server 316. An
`Internet-accessible client 300 (e.g., an end user client
`machine having a browser and media player) has been
`directed by CDN authoritative DNS mechanism 302 to a
`nearby edge server 304. This process is described, for
`example, in U.S. Pat. Nos. 6,108,703, 6,553,413 and 6,996,
`616. Edge server 304 may be configured as described above
`and illustrated in FIG. 2. This server includes a management
`process that provides the content prefetching functionality of
`the present invention, as will be described in more detail
`below.
`
`[0020] The given edge server 304 may be located in a set
`(or “region”) of edge servers that are co-located at a given
`Internet-accessible data center. For convenience, only one
`edge server per region is shown. Content handling rules are
`configured into each edge server, preferably via a metadata
`configuration system 306. As shown, the configuration sys-
`tem provides edge server content control metadata via links
`318, which themselves may include other infrastructure
`(servers, and the like). U.S. Pat. No. 7,111,057 illustrates a
`useful
`infrastructure for delivering and managing edge
`server content control information, and this and other edge
`server control information can be provisioned by the CDN
`service provider itself, or (via an extranet or the like) the
`content provider customer who operates the origin server
`316. If configured appropriately, given subsets of edge
`servers, such as edge servers 304 and 310, may comprise a
`cache hierarchy so that edge server 304 may “go forward”
`to a CDN parent instead of to the origin server as needed.
`This tiered distribution is described in U.S. Pat. No. 7,133,
`905, as noted above. Also, if configured appropriately, the
`CDN may provide overlay path routing to enable the edge
`server 304 to go forward to the origin server 316 through an
`alternate CDN path, such as the path through edge server
`308, or through edge server 312, depending on whether one
`
`
`
`US 2007/0156845 A1
`
`Jul. 5, 2007
`
`of these alternative paths provides better performance than
`a default BGP path. As noted above, this performance-based
`overlay path delivery scheme is described in U.S. Publica-
`tion No. 2002/0163882. The disclosures of each of the
`
`above-identified references are incorporated herein by ref-
`erence. As also seen in FIG. 3, an edge server 314 may be
`co-located with the customer origin server, although this is
`not required.
`[0021] According to the present invention, a given CDN
`edge server is configured to provide one or more extended
`content delivery features. To this end, the CDN edge servers
`are configurable to provide these delivery features on a
`customer-specific, customer domain-specific, preferably
`using XML-based configuration files that are distributed to
`the edge servers using a metadata configuration system such
`as described above. A given XML-based configuration file
`includes a set of content handling rules and directives that
`facilitate one or more advanced content handling features.
`Thus, for example, when an edge server management pro-
`cess receives a request for content, it searches an index file
`for a match on a customer hostname associated with the
`
`request. If there is no match, the edge server process rejects
`the request. If there is a match, the edge server process loads
`metadata from the configuration file to determine how it will
`handle the request. Thus, for example, the metadata for the
`hostname may indicate whether to serve the request from
`cache or from the origin. If the metadata indicates that the
`request is associated with cached or cacheable content, the
`information may then direct the edge server to look for the
`content in a local cache or, failing that, to fetch the content
`from a CDN cache hierarchy parent node. If content is
`cacheable, the metadata may instruct the edge server process
`to apply given content handling directives. One such set of
`directives implement a content prefetch function, which will
`be explained in detail below. If, on the other hand,
`the
`configuration file indicates that the edge server should go
`forward to handle the request (because, e.g., the request
`involves a transaction that must occur at the origin server)
`the metadata may indicate how the edge server should go
`forward, e.g., using path optimization to try to reach the
`origin using intermediate CDN paths. Other metadata may
`control how a given edge server establishes and maintains
`connections with one or more other edge servers or other
`machines, or how the edge server should deliver the content
`to the requesting end user browser once it has been obtained.
`In any event, a set of content handling directives are set forth
`in the XML configuration file for a given customer domain
`and used to control
`the edge server to provide these
`advanced functions.
`
`in one embodiment, an XML-
`[0022] As noted above,
`based configuration file controls an edge server to provide an
`enhanced content delivery function, namely,
`content
`prefetching, on a per-customer, per customer domain-basis.
`Using this service, a given CDN customer can set up
`(directly or with the assistance of the CDN service provider)
`an edge server handling configuration for all or part of the
`customer’s Web site or other content to be delivered over the
`
`CDN. A participating content provider can be a formal Web
`publisher, or the content in question can be user-generated
`content (UGC). In a typical use scenario, the edge server is
`shared among participating content providers, and one or
`more of such providers establish a prefetching configuration
`(by default, or as a custom configuration) that is enabled and
`enforced on the edge server. Thus,
`for example, when
`
`prefetching is enabled, the edge server retrieves images and
`scripts embedded in pages (normally HTML content) at the
`same time it serves the page to the browser rather than
`waiting for the browser’s request for these objects. This
`operation can significantly decrease the overall rendering
`time of the page and improve the user experience of a Web
`site.
`
`[0023] Although the remainder of this description focuses
`primarily on the content prefetching capability, one of
`ordinary skill
`in the art will appreciate that, by using
`XML-based configurations, this function can be combined
`readily with other edge server functions that are also defined
`in such customer-specific, domain-specific configurations.
`These functions include, without limitation, path optimiza-
`tion (e.g., for non-cacheable or dynamic content), client-
`server (e.g., edge server-to-edge server) TCP connection
`optimizations, content compression, and the like. Path opti-
`mization, as described in U.S. Publication No. 2002/
`0163882, the disclosure of which is incorporated by refer-
`ence, significantly decreases latency when the edge server
`has to go forward to the origin (or other source), which is
`often required to facilitate transactions (or other occur-
`rences) that call for dynamic content generation. TCP con-
`nection optimization involves adjusting one or more TCP
`settings (e.g., congestion window size, retransmit timeout,
`packet reordering, and the like), which reduces edge server-
`to-edge server communication latency, as does content com-
`pression.
`
`[0024] The following describes a content prefetching
`enhancement.
`
`[0025] As will be seen, prefetching can be applied to either
`cacheable or uncacheable content. When prefetching is used
`for cacheable content, and the object to be prefetched is
`already in cache, the object is moved from disk into memory
`so that it is ready to be served. When prefetching is used for
`uncacheable content,
`the retrieved objects are uniquely
`associated with the client browser request that triggered the
`prefetch so that these objects carmot be served to a different
`end user. By applying metadata in the configuration file,
`prefetching can be combined with tiered distribution to
`further improve the speed of object delivery and to protect
`the origin server from bursts of prefetching requests.
`
`the following restrictions
`In one embodiment,
`[0026]
`apply when determining whether to scan a response body for
`prefetchable content or to prefetch a referenced object: the
`edge server applies prefetching only to responses with a
`content-type header that begins with certain extensions (e.g.
`text/html, or some other given format), only responses with
`HTTP status codes of 200 or 404 are scarmed for prefetch-
`able objects; objects to be prefetched are referenced using
`the same protocol (HTTP or HTTPS) as the client used to
`request the original page; and object references use the same
`hostname as the original request.
`
`[0027] The following provides a more detailed description
`of the prefetching feature including descriptions of the
`request flow, the conditions for scarming the base page, the
`content of prefetch requests, and composition of the
`browser-ID for uncacheable content. By way of background,
`without prefetching, the edge server requests content from
`the origin server (or a parent edge server) only when it
`receives a request for the content from an end client
`(browser). This means that images referenced by a page are
`
`
`
`US 2007/0156845 A1
`
`Jul. 5, 2007
`
`not retrieved until the end user’s browser has received and
`
`read the page and requested those images. The normal
`request flow is as follows: the browser requests the page
`from the edge server, the edge server retrieves the page from
`cache (or from the origin server if the page is not already in
`cache), the edge server returns the page to the browser, the
`browser scans the contents of the page and requests the
`objects referenced by the page, the edge server retrieves the
`images and other content from cache (or from the origin
`server if the objects are not already in cache), and the edge
`server returns the requested objects to the browser. With
`prefetching enabled, however, the edge server actively scans
`the page for embedded images and scripts and retrieves
`these objects before they are requested by the end-user’s
`browser. The new request flow is as follows. The end-user
`client requests a page from the edge server; the edge server
`retrieves the page from cache (or from the origin server if the
`page is not already in cache). The edge server then scans the
`page (usually HTML) for referenced images and scripts at
`the same time it serves the page to the browser. Note that it
`is not required that every response is scanned. The condi-
`tions that determine whether a response is scanned are
`described below. Now, overlapping in time, the following
`events occur: the edge server retrieves the referenced images
`and scripts from cache (or from the origin server if the
`objects are not already in cache), and the browser scans the
`page and requests the objects referenced by it. The edge
`server returns the requested objects to the client browser.
`[0028] Preferably, when prefetching is used,
`the edge
`server scans appropriate responses from the origin. Not
`every response is required to be scarmed. The edge server
`scans the response and begins prefetching embedded objects
`if one or more of the following conditions are true (and
`preferably all of them must be): a prefetching status is “on”
`for this request, an HTTP status code of the response sent to
`the client is 200 or 404, a response content-type starts with
`one of a set of configured strings (by default, the edge server
`scans the response only if the content
`type starts with
`text.html), no preset limit (a configurable threshold) on an
`average number of prefetch requests per unit time has been
`reached, a prefetch-on-hit metadata tag is “on” if the page is
`already in cache, an indirect-only metadata tag is “ofi” if the
`edge server is an edge server that connects directly to the
`origin, a push.status metadata tag is “ofi” if the edge server
`is connecting to a cache hierarchy parent, and the edge
`server received a special request header from a child peer
`(that reads X-Cdnsp-Prefetch:
`type=push-to-edge) if the
`edge server is a cache hierarchy parent.
`[0029] When the edge server identifies a response that
`should trigger prefetching,
`it parses the page (usually
`HTML) as it sends the response to the client. Each time the
`edge server encounters a tag in the page that is a candidate
`for prefetching (with two exceptions noted below), it creates
`a dummy request for the URL for this object and applies
`metadata to the request. Preferably, this is done only for the
`first instance of an object reference, so that multiple refer-
`ences to the same object within the page do not result in
`multiple requests for that object. The following are several
`possible settings for scanning a page: scan using an HTML
`processor (which treats the entire page as SGML and scans
`for URLs contained in specific elements as listed below);
`scan using a regular expressions processor (this treats the
`entire page as plaintext and uses regular expression rules to
`identify links to prefetch), and/or scan using the HTML
`
`processor for the defined tags, using a regular expressions
`processor on the <script> elements to identify URLs within
`JavaScript sections. By default, HTML elements that gen-
`erate candidates for prefetching are <lMG> and <SCRlPT>,
`although optionally a given configuration file can be used to
`configure prefetching of objects referenced by any of the
`elements listed in the table in FIG. 4. Any type of content
`object may be prefetched and not just images and scripts.
`After metadata is applied to the dummy object request, the
`edge server decides to prefetch an object if the following are
`true: prefetchable-object metadata is “on” for this object, the
`object is not found in cache (if the object is found in cache,
`preferably it
`is moved to a hot object cache), and no
`prefetching limit has been reached. In an alternative embodi-
`ment, a regular expression parser in the edge server allows
`scanning of any text file for prefetch candidate URLs. This
`is particularly useful when the response is JavaScript. If the
`regular expressions processor is used, the configuration file
`must enclose the metadata within a match on a given
`content-type or file extension of the object to avoid having
`these settings apply to other parsing. Using the configuration
`file, it is also possible to define regular expression rules for
`selecting URLs within the JavaScript sections of an HTML
`page. This can be done by defining the regular expression
`matching rules in the configuration file but then leaving the
`processor type set to HTML; this will cause all <script>
`sections in the HTML to be scarmed with the regular
`expressions processor.
`
`[0030] As noted above, when the page is scanned for
`prefetch candidates, an exception may prevent an otherwise
`qualified URL from being prefetched. For example, in one
`embodiment, an exception that may cause an object to be
`skipped is that the object’s URL does not have the same
`hostname as the base page. Another possible exception is
`that protocol (HTTP or HTTPS) of the embedded object
`reference is not the same as the request that triggered the
`prefetch. This is not a limitation of the prefetching func-
`tionality of the invention, however, as in certain circum-
`stances it may be desired to allow the edge server to prefetch
`objects from different hostnames, or if a protocol match is
`not present.
`
`[0031] When the edge server creates the requests to
`prefetch embedded objects, the request typically contains a
`number of components. These include,
`for example, a
`header X-Cdnsp-Prefetch-Object, which is used to prevent
`these requests from triggering further prefetching. The
`header preferably is also sent to the origin, and it may
`include a current level of recursion when recursive prefetch-
`ing is enabled (as described below). The request may also
`include the request headers from the base page (including all
`cookies, regardless of whether the path specified for a cookie
`matches the path for the embedded object). Preferably, the
`request also includes cookies created from any Set-Cookie
`headers in the response page, where the domain of the
`Set-Cookie matches the hostname for the request. Prefer-
`ably, the edge server ignores path and secure parameters of
`the Set-Cookie headers. If necessary, using the configuration
`fil