`(12) Patent Application Publication (10) Pub. No.: US 2003/0002484A1
`Freedman
`(43) Pub. Date:
`Jan. 2, 2003
`
`US 20030002484A1
`
`(54) CONTENT DELIVERY NETWORK MAP
`GENERATION USING PASSIVE
`MEASUREMENT DATA
`
`(76)
`
`(21)
`(22)
`
`(60)
`
`Inventor: AVraham T. Freedman, Elkins Park,
`PA (US)
`Correspondence Address:
`AKAMAITECHNOLOGIES, INC.
`ATTENTION: DAVID H. JUDSON
`500 TECHNOLOGY SQUARE
`CAMBRIDGE, MA 02139 (US)
`Appl. No.:
`10/163,969
`
`Filed:
`
`Jun. 6, 2002
`Related U.S. Application Data
`Provisional application No. 60/296,375, filed on Jun.
`6, 2001.
`
`Publication Classification
`
`(51)
`
`Int. Cl. ................................................ H04L 12/66
`
`(52) U.S. Cl. ............................................ 370/352; 370/401
`
`(57)
`
`ABSTRACT
`
`A routing method operative in a content delivery network
`(CDN) where the CDN includes a request routing mecha
`nism for routing clients to Subsets of edge Servers within the
`CDN. According to the routing method, TCP connection
`data Statistics are collected are edge Servers located within a
`CDN region. The TCP connection data statistics are col
`lected as connections are established between requesting
`clients and the CDN region and requests are Serviced by
`those edge Servers. Periodically, e.g., daily, the connection
`data Statistics are provdied from the edge Servers in a region
`back to the request routing mechanism. The TCP connection
`data Statistics are then used by the request routing mecha
`nism in Subsequent routing decisions and, in particular, in
`the map generation processes. Thus, for example, the TCP
`connection data may be used to determine whether a given
`quality of Service is being obtained by routing requesting
`clients to the CDN region. If not, the request routing
`mechanism generates a map that directs requesting clients
`away from the CDN region for a given time period or until
`the quality of Service improves.
`
`ORCN
`SERVER
`
`15 INITIATOR
`106
`TAGGING
`
`INTERNE
`
`109-AcENT
`
`- AGENT
`
`O9
`
`the 5.)
`
`102
`
`CON
`SERVER
`
`REGION
`
`O9
`
`MAKER
`
`ACENT
`
`107
`
`119-
`
`119
`
`119
`
`===
`a
`
`O4
`
`DNS
`
`108
`
`CDN
`SERVER
`
`POP
`102
`
`CON
`
`SERVER
`
`RE
`
`o
`
`12Od
`
`MEADAA
`CONTROL
`
`STAGING
`
`118
`
`2Ob
`
`CDN
`
`o
`c
`
`STAGING
`
`12On
`
`-1-
`
`Amazon v. Audio Pod
`US Patent 10,735,488
`Amazon EX-1016
`
`
`
`Patent Application Publication
`
`Jan. 2, 2003 Sheet 1 of 3
`
`US 2003/0002484 A1
`
`
`
`
`
`
`
`
`
`
`
`REGION
`106
`
`102
`
`SERVER
`
`STAGING
`
`1200
`
`METADATA
`CONTROL
`
`SAGING
`
`CDN
`SERVER
`
`118
`
`120b
`
`-2-
`
`
`
`Patent Application Publication
`
`Jan. 2, 2003 Sheet 2 of 3
`
`US 2003/0002484 A1
`
`FIC. 2
`
`TOP LEVEL MAP
`
`IDENTIFIES
`REGION
`WITHIN CON
`
`CDN DNS REQUEST
`ROUTING MECHANISM
`(DISTRIBUTED)
`
`
`
`OW LEVEL MAP
`
`IDENTIFIES
`SERVER
`WITHIN REGION
`
`
`
`
`
`
`
`
`
`
`
`
`OPTIMAL SERVER REGION/
`SERVER IDENTIFICATION
`
`REGION X
`
`CDN
`CONTENT
`SERVERY
`
`
`
`202
`USER REQUEST
`
`200
`
`FIG. 3
`
`GHOST
`12 NHop object cachE
`'Y top stars
`
`306
`
`KERNEL
`FILE SYSTEM CACHE
`
`
`
`304
`
`O2
`
`
`
`TCP
`
`
`
`
`
`
`
`
`
`
`
`
`
`DISK
`STORACE
`
`-3-
`
`
`
`Patent Application Publication
`
`Jan. 2, 2003 Sheet 3 of 3
`
`US 2003/0002484 A1
`
`
`
`EDGE SERVER
`
`MONITORING
`
`402
`
`AGGREGATOR
`
`CDN REQUEST
`ROUTING MECHANISM
`MAP MAKER
`
`TCP STATS
`DATA
`
`SERVER
`LINK
`
`DATA FROM OTHER
`IN-REGION EDGE SERVERS
`
`408
`
`-4-
`
`
`
`US 2003/0002484 A1
`
`Jan. 2, 2003
`
`CONTENT DELIVERY NETWORK MAP
`GENERATION USING PASSIVE MEASUREMENT
`DATA
`0001. This application is based on and claims priority
`from Provisional Application Serial No. 60/296,375, filed
`Jun. 6, 2001.
`
`BACKGROUND OF THE INVENTION
`0002) 1. Technical Field
`0003. The present invention relates generally to high
`performance, fault-tolerant HTTP, streaming media and
`applications delivery in a content delivery network (CDN).
`0004 2. Description of the Related Art
`0005. It is well-known to deliver HTTP and streaming
`media using a content delivery network (CDN). A CDN is a
`network of geographically distributed content delivery
`nodes that are arranged for efficient delivery of digital
`content (e.g., Web content, streaming media and applica
`tions) on behalf of third party content providers. A request
`from a requesting end user for given content is directed to a
`“best” replica, where “best” usually means that the item is
`Served to the client quickly compared to the time it would
`take to fetch it from the content provider origin Server. An
`entity that provides a CDN is sometimes referred to as a
`content delivery network service provider or CDNSP.
`0006 Typically, a CDN is implemented as a combination
`of a content delivery infrastructure, a request-routing
`mechanism, and a distribution infrastructure. The content
`delivery infrastructure usually comprises a set of "Surrogate'
`origin servers that are located at Strategic locations (e.g.,
`Internet network acceSS points, Internet Points of Presence,
`and the like) for delivering copies of content to requesting
`end users. The request-routing mechanism allocates Servers
`in the content delivery infrastructure to requesting clients in
`a way that, for web content delivery, minimizes a given
`client's response time and, for Streaming media delivery,
`provides for the highest quality. The distribution infrastruc
`ture consists of on-demand or push-based mechanisms that
`move content from the origin Server to the Surrogates. An
`effective CDN serves frequently-accessed content from a
`Surrogate that is optimal for a given requesting client. In a
`typical CDN, a Single Service provider operates the request
`routers, the Surrogates, and the content distributors. In
`addition, that Service provider establishes busineSS relation
`ships with content publishers and acts on behalf of their
`origin Server Sites to provide a distributed delivery System.
`A commercial CDN service that provides web content and
`media Streaming is provided by Akamai Technologies, Inc.
`of Cambridge, Mass.
`0007. A typical CDN edge server includes commodity
`hardware, an operating System Such as Linux, a TCP/IP
`connection manager, a cache, and one or more applications
`that provide various functions Such as cache management,
`logging, and other control routines that facilitate the content
`delivery techniques implemented by the CDNSP at the
`Server. In an illustrative case, the operating System kernel is
`Linux-based and tracks and provides access to per Session
`and aggregate TCP/IP information, Such as per-System num
`ber of packets, bytes Sent and received, number of retrans
`mits, and the like. The TCP connection information that is
`available from monitoring the operating System kernel has
`
`not been fully mined for its potential value, especially to
`CDN service providers. TCP stream state data, however,
`generates implicit information about the State of the net
`work. Thus, for example, packet retransmissions can indi
`cate congestion within the network. An estimated round
`trip-time (RTT) derived from TCP connection information
`indicates latency to a remote host. Early FIN message
`receipt can indicate a dropped connection. A lower window
`Size than usual can indicate instability in topological path.
`Each Sessions overall and Smaller time-Scale throughput is
`one of the best measures of actual end-user performance.
`0008. It would be desirable to be able to use edge server
`CDN statistics in other CDN control processes.
`
`BRIEF SUMMARY OF THE INVENTION
`0009. According to the invention, TCP connection infor
`mation resulting from prior CDN mapping decisions to a
`given edge server region (or to a given edge server therein)
`is logged, aggregated, and then used to improve Subsequent
`routing of client requests to Servers in a content delivery
`network.
`0010 More generally, it is an object of the invention to
`use passive measurement data to facilitate the generation or
`evaluation of client-to-Server request routing maps in a
`content delivery network. Passive measurement data is
`logged at CDN edge Server machines, preferably on a
`per-connection basis or a per HTTP connection basis.
`0011. It is another more specific object of the invention to
`collect TCP connection information from CDN edge servers
`to allow network performance to be correlated with particu
`lar hosts or address blocks, allowing for improved maps to
`be generated during the CDN map generation process.
`0012. According to the present invention, TCP statistics
`data from remote machines is logged and delivered back to
`a central location and used by a CDN to generate request
`routing maps, such as an IP block to CDN region map. This
`enables the CDN map to be modified as a function of passive
`measurement data that reflects how well the CDN request
`routing mechanism actually mapped prior web requests.
`0013 The present invention generally describes a routing
`method operative in a content delivery network having a
`request routing mechanism for routing clients to edge Serv
`erS. At a given edge Server located within a CDN region,
`data associated with one or more connections that have been
`established between requesting clients and the CDN region
`is collected. That data is then provided back to the request
`routing mechanism, where it is used is a Subsequent routing
`decision. Preferably the data is per HTTP connection data
`collection from a configurable percentage of client requests
`that are serviced by the given edge server. This TCP con
`nection data preferably is aggregated with Similar data from
`other edge Servers in the CDN region before being passed
`back to the CDN request routing mechanism. This enables
`the request routing mechanism to make new maps based on
`an accurate view as to how well given connections are being
`serviced within the CDN region.
`0014.
`In a more detailed, yet illustrative embodiment, a
`routing method is operative in a content delivery network
`(CDN) where the CDN includes a request routing mecha
`nism for routing clients to Subsets of edge Servers within the
`CDN. According to the routing method, TCP connection
`
`-5-
`
`
`
`US 2003/0002484 A1
`
`Jan. 2, 2003
`
`data Statistics are collected are edge Servers located within a
`CDN region comprising a Subset of edge servers. The TCP
`connection data Statistics are collected as connections are
`established between requesting clients and the CDN region
`and requests are Serviced by those edge Servers. Either in
`real-time or delayed (e.g., hourly or daily), the detailed
`and/or Summarized connection data Statistics are provided
`from the edge Servers in a region back to the request routing
`mechanism. The TCP connection data statistics are then used
`by the request routing mechanism in Subsequent routing
`decisions and, in particular, in the map generation processes.
`Thus, for example, the TCP connection data may be used to
`determine whether a given quality of Service is being
`obtained by routing requesting clients to the CDN region. If
`not, the request routing mechanism generates a map that
`directs requesting clients away from the CDN region for a
`given time period or until the quality of Service improves.
`0.015 The foregoing has outlined some of the more
`pertinent objects and features of the present invention. These
`objects should be construed to be merely illustrative of some
`of the more prominent features and applications of the
`invention.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`0016 For a more complete understanding of the present
`invention and the advantages thereof, reference should be
`made to the following Detailed Description taken in con
`nection with the accompanying drawings, in which:
`0017 FIG. 1 is a diagram of a known content delivery
`network in which the present invention may be imple
`mented;
`FIG. 2 is a simplified diagram of a two level
`0.018
`request routing mechanism used in the content delivery
`network of FIG. 1;
`0019 FIG. 3 is a simplified diagram of a typical CDN
`edge server that has been modified to include the TCP
`Statistics monitoring proceSS according to the present inven
`tion; and
`0020 FIG. 4 is an simplified diagram of how TCP data
`is logged, aggregated and then delivered to a CDN request
`routing mechanism in an illustrative embodiment of the
`present invention.
`
`DESCRIPTION OF THE PREFERRED
`EMBODIMENT
`0021 AS seen in FIG. 1, an Internet content delivery
`infrastructure usually comprises a set of "Surrogate' origin
`Servers 102 that are located at Strategic locations (e.g.,
`Internet network access points, and the like) for delivering
`copies of content to requesting end users 119. A Surrogate
`origin server is defined, for example, in IETF Internet Draft
`titled “Requirements for Surrogates in the HTTP dated
`Aug. 9, 2000, which is incorporated herein by reference. The
`request-routing mechanism 104 allocates servers 102 in the
`content delivery infrastructure to requesting clients in a way
`that, for web content delivery, minimizes a given client's
`response time and, for Streaming media delivery, provides
`for the highest quality. The distribution infrastructure con
`Sists of on-demand or push-based mechanisms that move
`content from the origin server to the surrogates. A CDN
`service provider (CDNSP) may organize sets of surrogate
`
`origin Servers as a “region.” In this type of arrangement, a
`CDN region 106 typically comprises a set of one or more
`content Servers that share a common backend, e.g., a LAN,
`and that are located at or near an Internet acceSS point. Thus,
`for example, a typical CDN region may be co-located within
`an Internet Service Provider (ISP) Point of Presence (PoP)
`108. A representative CDN content server is a Pentium
`based caching appliance running an operating System (e.g.,
`Linux, Windows NT, Windows 2000) and having suitable
`RAM and disk storage for CDN applications and content
`delivery network content (e.g., HTTP content, streaming
`media and applications). Such content servers are Sometimes
`referred to as "edge” Servers as they are located at or near the
`so-called outer reach or "edges' of the Internet. The CDN
`typically also includes network agents 109 that monitor the
`network as well as the Server loads. These network agents
`are typically co-located at third party data centerS or other
`locations. Map maker software 107 receives data generated
`from the network agents and periodically creates maps that
`dynamically associate IP addresses (e.g., the IP addresses of
`client-side local name servers) with the CDN regions.
`0022. In one service offering, available from Akamai
`Technologies, Inc. of Cambridge, Mass., content is marked
`for delivery from the CDN using a content migrator or
`rewrite tool 106 operated, for example, at a participating
`content provider server. Tool 106 rewrites embedded object
`URLs to point to the CDNSP domain. A request for CDN
`enabled content is resolved through a CDNSP-managed
`DNS to identify a “best” region, and then to identify an edge
`Server within the region that is not overloaded and that is
`likely to host the requested content. An illustrative request
`routing technique is described in U.S. Pat. No. 6,108,703,
`which is incorporated by reference. Instead of using content
`provider-Side migration (e.g., using the tool 106), a partici
`pating content provider may simply direct the CDNSP to
`serve an entire domain (or subdomain) by a DNS directive
`(e.g., a CNAME). In such case, the CDNSP may provide
`object-specific metadata to the CDN content servers to
`determine how the CDN content servers will handle a
`request for an object being served by the CDN. Metadata, as
`used herein, refers to the Set of control options and param
`eters for an object (e.g., coherence information, origin server
`identity information, load balancing information, customer
`code, other control codes, etc.), and Such information may
`be provided to the CDN content servers via a configuration
`file, in HTTP headers, or in other ways. An object URL that
`is served from the CDN in this manner need not be modified
`by the content provider. When a request for the object is
`made, for example, by having an end user navigate to a site
`and select the URL, a customer's DNS system directs the
`name query (for a domain in the URL) to the CDNSP DNS
`request routing mechanism. Once an edge Server is identi
`fied, the browser passes the object request to the Server,
`which applies the metadata Supplied from a configuration
`file or HTTP response headers to determine how the object
`will be handled.
`0023 The CDNSP may operate a metadata transmission
`System 116 comprising a set of one or more Servers to enable
`metadata to be provided to the CDNSP content servers. The
`system 116 may comprise at least one control server 118,
`and one or more staging Servers 120a-n, each of which is
`typically an HTTP server (e.g., Apache). Metadata is pro
`vided to the control server 118 by the CDNSP or the content
`provider (e.g., using a Secure extranet application) and
`
`-6-
`
`
`
`US 2003/0002484 A1
`
`Jan. 2, 2003
`
`periodically delivered to the staging servers 120a-n. The
`staging servers deliver the metadata to the CDN content
`Servers as necessary.
`0024. As illustrated in FIG. 2, a dynamic DNS system
`200 such as described generally above directs each user web
`request 202 to the optimal server 204 for content delivery. In
`one approach, a "top level” map 206 directs a specific
`request to one of a given number of Server regions, while a
`“low level” map 208 further directs the request to a given
`Server within a region. Thus, for example, the top level map
`206 may associate each Internet IP address block with a
`CDN server region that can deliver content to clients in that
`block most quickly. To prepare for generating this map,
`mapping agents (e.g., one per CDN server region) may
`collect the following information: (a) IP blocks (a list of IP
`address blocks currently in use in the Internet), (b) load
`(per-IP block measurements of the amount of web load
`currently being handled by the CDN, (c) communication
`costs (e.g., a table listing the measured communication cost
`for each {IP block, CDN server region pair, and (d)
`capacity (e.g., an aggregate Server and network capacity of
`each CDN server region). A combination of different meth
`ods may be used to put together the list of IP blocks
`representing all of the leaf networks (e.g., endpoint LAN's
`on the global Internet): BGP peering, harvesting information
`from network registration databases (e.g., RIPE, APNIC and
`ARIN), and random traceroutes into very large blocks (e.g.,
`UUNET). The load on the CDN generated by each IP block
`may be determined by gathering and aggregating measure
`ments from the CDN content servers. One or more different
`communication costs may be used to determine the cost of
`communication between an IP block and a CDN server
`region: network health of server region (e.g., a binary metric
`indicating that the region is up or down), ASPATH length
`between the block and the Server region (e.g., as Supplied by
`BGP), round trip time (RTT) between the region's mapping
`agent and a given point in the IP block, packet loSS rate
`between the region's mapping agent and the given point in
`the IP block, geographic distance, and perhaps others. These
`metrics may be combined into a single cost metric for each
`IP block, Server region pair, with the priority, or weighting,
`of each individual metric Set to be proportional to its position
`on the list. Two types of capacity measurement may be
`made: total Server capacity in each region and physical
`network capacity in each region. The Server capacity is
`determined, for example, from the number of Servers cur
`rently up in a region. Physical network capacity is deter
`mined, for example, with packet pair measurements. Region
`capacity may be calculated as a given function (e.g., the
`minimum) of these two measurements.
`0.025. As noted above, the top level map 206 maps each
`IP block to an optimal CDN server region. One technique for
`generating the top level map involves identifying a number
`of candidate regions for each IP block (e.g., based on the IP
`block, Server region communication costs), generating a
`bipartite graph using all of the measured and collected
`network information (e.g., with one side of the graph rep
`resenting each of the IP blocks and the other side represent
`ing CDN server regions), and then running a min-cost flow
`algorithm on the graph. Each IP block node is labeled with
`its measured load, which is treated as the “flow coming
`from that node. Running the algorithm results in an optimal
`assignment of IP block load to Server regions. This assign
`ment is the top level map, which is generated periodically
`
`and then delivered to the dynamic DNS request routing
`mechanism. The above map generation proceSS is merely
`exemplary and is not meant to limit the present invention of
`COSC.
`0026 FIG. 3 illustrates a typical machine configuration
`for a CDN content server. Typically, the content server 300
`is a Pentium-based caching appliance running an operating
`System kernel 302 (e.g., based on Linux), a file System cache
`304, CDN control software 306, TCP connection manager
`308, and disk storage 310. CDN control software 306,
`among other things, is useful to create an object cache 312
`for popular objects being served by the CDN. In operation,
`the content server 300 receives end user requests for http
`content, determines whether the requested object is present
`in the hot object cache or the disk Storage, Serves the
`requested object (if it is present) via http, or it establishes a
`connection to another content Server or an origin Server to
`attempt to retrieve the requested object upon a cache miss.
`According to the invention, the CDN Software 306 also
`includes a logging routine, called TCPStats 314, which in an
`illustrative embodiment logs a record for every TCP con
`nection made to/by the machine on which this Software is
`running in addition to connections made to/by the CDN
`Software process itself. Generalizing, the TCPStats process
`logs arbitrary pieces of information about a TCP connection.
`0027. In an illustrative embodiment as shown in FIG. 4,
`each edge Server 400 in a region runs one or more moni
`toring processes 402, and an instance of a query process 404.
`A monitoring process monitors the health of the local
`machine and the network to which it is connected; another
`monitoring proceSS monitors the hits and bytes Served by the
`CDN Software running on the machine. The TCP statistics
`monitoring is preferably performed by one of these moni
`toring processes 402. Generally, the TCP statistics data is
`collected by that proceSS and made available to the local
`instance of the query process 404. Periodically, a central
`instance of the query proceSS 406 running on an aggregator
`machine 408 (typically somewhere else in the network)
`makes a request to the local instance of the query process.
`There may be a hierarchy of aggregators, depending on the
`Size and Scope of the network deployment. When requested,
`the query process collects tables of data from machines in
`the same region (typically within a given data center) and
`relays them to the aggregator machine 408, which accumu
`lates and stores the data. According to the invention, the TCP
`Statistics data is then Supplied to the CDN request routing
`mechanism 410 to facilitate future mapping decisions. Data
`preferably is delivered between machines over a Secure
`connection, which can be implemented with known Software
`tools.
`0028 Generalizing, TCPStats data aggregated from the
`CDN content ServerS is used in Subsequent revisions to a
`given map, e.g., the IP block to CDN region map. In
`particular, the TCPStats data provides an additional refine
`ment to the map making process to provide a map that
`includes passive measurement data about how a given
`number of individual requests were previously routed by the
`CDN request routing mechanism. This feedback mechanism
`enables a more accurate map to be generated in the future
`based, in part, on an after-the-fact evaluation of how well
`earlier CDN mapping decisions routed prior requests, pref
`erably on an aggregate basis, as evidenced by the actual TCP
`StatisticS logged at the CDN content Servers within a given
`
`-7-
`
`
`
`US 2003/0002484 A1
`
`Jan. 2, 2003
`
`region. If, for example, those Statistics illustrate that prior
`mapping decisions with respect to a given region did not
`provide a Sufficient quality of Service, then the map making
`proceSS can be modified appropriately.
`0029. As a specific example, assume that TCPStats data
`is aggregated on a per machine and per region basis. This
`data enables a given process to monitor the health of the
`region itself, especially if the data is used in conjunction
`with other historical data. The TCPStats data provides
`detailed information about the quality of the connections to
`the various machines in the region. If that data establishes
`that connections to the region (in general, or for a specific IP
`block mapped to the region) are receiving a quality of
`Service below a given threshold, the map making algorithm
`may then bias requests away from that region for a given
`time period or until the connectivity data shows improve
`ment. AS another example, assume that the map generation
`process identifies two (2) regions that appear equally good
`for handling a given request. In Such case, the TCPStats data
`can be used as a tiebreaker. In addition, the TCPStats data
`may be used to provide an indication of how well the
`mapping algorithm performed over a given time period (e.g.,
`daily). Of course, the above examples are merely exemplary
`and should not be taken to limit the Scope of the present
`invention, which should be broadly construed to cover the
`use of the TCP Stats passive measurement data to facilitate
`the generation or evaluation of client-to-Server request rout
`ing maps in a content delivery network.
`0030) The TCP/IP protocol’s fully-reliable transport
`model and intricate congestion control mechanisms allow a
`CDNSP to gather a great deal of useful information. The
`following is representative. Thus, on a per client X Server X
`URL basis, the CDNSP can determine, for example: the
`number of bytes transmitted, the duration of connection
`(including the duration of each phase of the connection), loss
`Seen in the connection, latency between client and Server as
`measured and used, variance in latency Seen between the
`client and Server, the maximum/average measurements of
`the Size of the network connection between the client and
`Server, overall and instantaneous throughput, window Size
`and the like. In an illustrative embodiment, TCP statistics
`across three (3) axes (client, server, and URL) are collected
`by the TCPStats process and is used to provide a profiling
`tool for every connection.
`0.031) More specifically, TCP statistics entries may
`include one or more of the following fields (familarity with
`the TCP/IP protocol is assumed):
`0032 Time initial SYN packet was received (sent):
`this is the time the first packet on the connection was
`received (if the connection came from a remote
`client) or sent (if a connection is being established to
`a remote server). The time is expressed in Sec.msec,
`where Sec is number of Seconds Since a Unix epoch
`and msec is the number of milliseconds Since the
`beginning of that Second. All other times preferably
`are offsets from this time.
`0033) Local IP address:port: the IP address of the
`machine that the CDN Software runs on, which is
`Specified in the 4 byte dotted quad notation (w.x.y.z)
`followed by a colon (:) and the local IP port number.
`0034) Direction: a single character identifier that
`tells if the connection was made local to remote
`machine (>) or remote to local machine (<).
`
`0035) Remote IP address:port: IP address of the
`remote machine in 4 byte format, a colon, and the
`remote IP port number.
`0036) Number of packets received.
`0037 Number of packets sent.
`0038) Number of duplicate packets sent (retrans
`mits).
`0039 Total bytes sent.
`0040 Total bytes received.
`0041) Total duplicates bytes sent and received.
`0.042) Max Smooth Round Trip Time (SRTT) during
`the connection (in msec).
`0043 Min Smooth Round Trip Time during the
`connection.
`0044 Log of RTT estimates obtained and/or sum
`mary Statistics.
`0045 Log of calculated SRTT values and/or Summary
`Statistics.
`0046 Time spent in each phase of the states associated
`with the TCP connection:
`0047. From begin until ESTABLISHED: the
`elapsed time from the receipt of the initial SYN from
`the client (the second field in the log entry) until the
`ACK of the initial SYN-ACK is received by the
`CDN Software process. In the case of a forward
`connection, this is the time from SYN send until the
`SYN-ACK was received by the remote server. This
`and all other delta times below are expressed as
`mSec, the number of milliseconds from the connec
`tion begin time (SYN time, as described above).
`0.048 Time from begin until FIN WAIT: The
`elapsed time between when the connection began
`and when the connection got into the FIN WAIT
`State (Zero if not applicable).
`0049) Time from begin until FIN WAIT1 state (zero
`if not applicable).
`0050 Time from begin until FIN WAIT2 state (zero
`if not applicable).
`0051 Time from begin until CLOSING state (zero if
`not applicable).
`0.052 Time from begin until the last ACK was
`received (Zero if not applicable).
`0053) Time from begin until WAIT state (zero if not
`applicable).
`0054) # Duplicate ACK's sent
`0055) Max window size (in bytes)
`0056. Number of Times the RTO timer expired
`0057 Delayed ACK count
`0.058
`Average window size
`0059 Average IP TTL observed
`0060) TCPStats data is generated by any convenient
`mechanism. The Linux operating System kernel provides
`
`-8-
`
`
`
`US 2003/0002484 A1
`
`Jan. 2, 2003
`
`Some of this data directly. In particular, the kernel keeps
`track and provides access to aggregate information including
`per-System number of packets, bytes Sent and received, and
`number of retransmits, among other things. To facilitate TCP
`Statistics collection, the operating System kernel preferably
`is modified to provide access to per-connection Statistics.
`The modified code keeps track of that information per
`connection (in the kernel) and provides an interface for an
`application to mark a connection as interesting and to get its
`connection information when the connection is complete.
`Preferably, the application also implements per-HTTP con
`nection Statistics. This requires marking a TCP connection
`with the beginning and end points of an HTTP request. The
`kernel keeps track of bytes Sent/received for the duration of
`the request and provides Statistics to the application upon
`request. This allows a more accurate estimation of per
`connection bandwidth than is possible with per-connection
`Statistics because many TCP connections are allowed to Stay
`open (in an HTTP persistent connection state) after the
`HTTP response has been sent, in the hopes another request
`will reuse the established connection. In contrast, just look
`ing at bytes Sent/total time is not as accurate a measure, as
`the per connection time will reduce the apparent bandwidth
`by a significant amount. In an illustrative embodiment, these
`Statistics are provided by the kernel to user Space preferably
`through a device file interface, which is a Standard way for
`the kernel to communicate with an application. The Statistics
`themselves preferably are kept in a circular memory buffer
`so that the kernel does not run out of memory even if the
`logging application lags behind. Preferably, the application
`is designed to read all available statistics out of the kernel at
`a configurable interval (e.g., once per Second) and to write
`Statistics into a log for a configurable fraction of all requests
`(e.g., 1%). This allows the application to obtain a statistical
`sample of all of traffic served from the machine. Preferably,
`the application marks when it is Sending and receiving data
`to get better bandwidth measurements. More information
`about the TCP/IP protocol and the Linux operating system
`kernel can be obtained from the following resources:
`Stevens, TCP/IP Illustrated Volume 1: The Protocols. Addi
`Son-Wesley, and Beck, et al., Linux Kernel Internals, Second
`Edition. Addison-Wesley.
`0061. Other techniques for collecting the TCP statistics
`information may also be used. Thus, for example, the CDN
`edge Server may be provisioned with atcpdump proceSS and
`a filter to look at the TCP packet headers and collect
`information from them (



