`
`Tbe Definitive Guide
`
`O'REILLY®
`
`David Gourley & Brian Totty
`with Marjorie Sayer, Sailu Reddy & Anshu Aggarwal
`
`Exhibit 2002
`IPR2016-01431 - Part 1 of 2
`
`
`
`03$.
`
`
`L..._4...r/r.f..hL.__.$..:
`
`_._:.,.wwFwnyxu\N1.:K».r._.,.:.._?._,_;.,
`
`.;V_.%_.
`....:.::_,.,.
`
`V,\¢\,_:..
`
`
`
`
`
`HTTP
`The Definitive Guide
`
`
`
`HTTP
`The Definitive Guide
`
`David Gourley and Brian Totty
`with Marjorie Sayer, Sailu Reddy, and Anshu Aggarwal
`
`O'REILLY®
`
`
`
`HTIP: The Definitive Guide
`by David Gourley and Brian Totty
`with Marjorie Sayer, Sailu Reddy, and Anshu Aggarwal
`
`c;opyright © 2002 O'Reilly Media, Inc. All rights reserved.
`Primed in the United States of America.
`
`Published by O'Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol,
`CA95472.
`
`O'Reilly Media, Inc. books may be purchased for educational, business, or sales promotional use. On(cid:173)
`line editions are also available for most titles (safari.oreilly.com). For more information, contact our cor(cid:173)
`porate/institutional sales department: (800) 998-9938 or corporate@oreilly.com.
`
`Editor:
`Production Editor:
`Cover Designer:
`Interior Designers:
`
`Printing History;
`
`Linda Mui
`
`Rachel Wheeler
`
`Ellie Volckhausen
`
`David Futato and Melanie Wang
`
`September 2002:
`
`First Edition.
`
`Nutshell Handbook, the Nutshell Handbook logo, and the O'Reilly logo are registered trademarks of
`O'Reilly Media, Inc. HTTP: The Definitive Guide, the image of a thirteen-lined ground squirrel, and
`related trade dress are trademarks of O'Reilly Media, Inc. Many of the designations used by
`manufacturers and sellers to distinguish their products are claimed as trademarks. Where those
`designations appear in this book, and O'Reilly Media, Inc. was aware of a trademark claim, the
`designations have been printed in caps or initial caps.
`
`While every precaution has been taken in the preparation of this book, the publisher and authors
`assume no responsibility for errors or omissions, or for damages resulting from the use of the
`information contained herein.
`
`ISBN: 978-1-56592-509-0
`[LSI]
`
`[2011-'01-27]
`
`
`
`Table of Contents
`
`Preface ................................................................ xiii
`
`Part I.
`
`HTTP: The Web's Foundation
`
`1. Overview of HTTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
`3
`HTTP: The Internet's Multimedia Courier
`4
`Web Clients and Servers
`Resources
`4
`8
`Transactions
`Messages
`10
`11
`Connections
`16
`Protocol Versions
`17
`Architectural Components of the Web
`21
`The End of the Beginning
`21
`For More Information
`
`2. URLs and Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
`Navigating the Internet's Resources
`24
`URL Syntax
`26
`URL Shortcuts
`30
`35
`Shady Characters
`A Sea of Schemes
`38
`The Future
`40
`41
`For More Information
`
`3. HTTP Messages .................................................... 43
`The Flow of Messages
`43
`44
`The Parts of a Message
`
`v
`
`
`
`Methods
`St;Hus Codes
`Headers
`For More Information
`
`53
`59
`67
`73
`
`4. Connection Management ........................................... 74
`TCP Connections
`7 4
`TCP Performance Considerations
`80
`86
`HTTP Connection Handling
`Parallel Connections
`88
`Persistent Connections
`90
`99
`Pipelined Connections
`101
`The Mysteries of Connection Close
`For More Information
`104
`
`Part II. HTTP Architecture
`
`5 .. Web Servers ........ : ............................................ 109
`Web Servers Come in All Shapes and Sizes
`109
`111
`A Minimal Perl Web Server
`113
`What Real Web Servers Do
`115
`Step 1: Accepting Client Connections
`116
`Step 2: Receiving Request Messages
`120
`Step 3: Processing Requests
`120
`Step 4: Mapping and Accessing Resources
`125
`Step 5: Building Responses
`127
`Step 6: Sending Responses
`127
`Step 7: Logging
`For More Information
`127
`
`:6. Proxies .................................................. · ........ 129
`Web Intermediaries
`129
`Why Use Proxies?
`131
`Where Do Proxies Go?
`137
`Client Proxy Settings
`141
`Tricky Things About Proxy Requests
`144
`150
`Tracing Messages
`156
`Proxy Authentication
`
`vi
`
`I Table of Contents
`
`
`
`Proxy Inter6peration
`For More Information
`
`157
`160
`
`7. Caching ............. : . ..... · ....................................... , ... 161
`161
`Redundant Data Transfers
`161
`Bandwidth Bottlenecks
`163
`Flash Crowds
`163
`Distance Delays
`164
`Hits and Misses
`168
`Cache Topologies
`171
`Cache Processing Steps
`175
`Keeping Copies Fresh
`182
`Controlling Cachability
`186
`Setting Cache Controls
`187
`Detailed Algorithms
`194
`Caches and Advertising
`196
`For More Information
`
`8.
`
`Integration Points: Gateways, Tunnels, and Relays . . . . . . . . . . . . . . . . . . . . 197
`197
`Gateways
`200
`Protocol Gateways
`203
`Resource Gateways
`205
`Application Interfaces and Web Services
`206
`Tunnels
`212
`Relays
`213
`For More Information
`
`9. Web Robots ...................................................... 215
`215
`Crawlers and Crawling
`225
`Robotic HTTP
`228
`Misbehaving Robots
`229
`Excluding Robots
`239
`Robot Etiquette
`242
`Search Engines
`246
`For More Information
`
`10. HTTP-NG ......................................................... 247
`247
`HTTP's Growing Pains
`248
`HTTP-NG Activity
`
`Table of Contents
`
`l vii
`
`
`
`Modularize and Enhance
`Distributed Objects
`Layer 1: Messaging
`Layer 2: Remote Invocation
`Layer 3: Web Application
`WebMUX
`Binary Wire Protocol
`Current Status
`For More Information
`
`248
`249
`250
`250
`251
`251
`252
`252
`253
`
`Part Ill.
`
`Identification, Authorization, and Security
`
`11. Client Identification and Cookies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
`The Personal Touch
`257
`HTTP Headers
`258
`Client IP Address
`259
`User Login
`260
`Fat URLs
`262
`Cookies
`263
`For More Information
`276
`
`12. Basic Authentication .............................................. 277
`Authentication
`277
`Basic Authentication
`281
`The Security Flaws of Basic Authentication
`283
`For More Information
`285
`
`13. Digest Authentication.· ............................................ 286
`The Improvements of Digest Authentication
`286
`Digest Calculations
`291
`Quality of Protection Enhancements
`299
`Practical Considerations
`300
`Security Considerations
`303
`For More Information
`306-
`
`14. Secure HTTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
`Making HTTP Safe
`307
`Digital Cryptography
`309
`
`viii
`
`I Table of Contents
`
`
`
`Symmetric-Key Cryptography
`Public-' Key Cryptography
`Digital Signatures
`Digital Certificates
`HTTPS: The Details
`A Real HTTPS Client
`Tunneling SecureTraffic Through Proxies
`For More Information
`
`313
`315
`317
`319
`322
`328
`335
`336
`
`Part IV. Entities, Encodings, and Internationalization
`
`15. Entities and Encodings .... : . ...................................... 341
`342
`Messages Are Crates, Entities Are Cargo
`Content-Length: The Entity's Size
`344
`347
`Entity Digests
`348
`Media Type and Charset
`351
`Content Encoding
`354
`Transfer Encoding and Chunked Encoding
`359
`Time-Varying Instances
`360
`Validators and Freshness
`363
`Range Requests
`365
`Delta Encoding
`369
`For More Information
`
`16~ Internationalization .............................................. 370
`370
`HTTP Support for International Content
`371
`Character Sets and HTTP
`376
`Multilingual Character Encoding Primer
`384
`Language Tags and HTTP
`389
`Internationalized URis
`392
`Other Considerations
`392
`For More Information
`
`17. Content Negotiation and Transcoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
`395
`Content-Negotiation Techniques
`396
`Client-Driven Negotiation
`397
`Server-Driven Negotiation
`400
`Transparent Negotiation
`
`Table of Contents
`
`I ix
`
`
`
`Transcoding
`Next Steps
`For More Information
`
`403
`405
`406
`
`Part V. Content Publishing and Distribution
`
`18. Web Hosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
`Hosting Services
`411
`Virtual Hosting
`413
`Making Web Sites Reliable
`419
`Making Web Sites Fast
`422
`For More Information
`423
`
`19. Publishing Systems ............................................... 424
`FrontPage Server Extensions for Publishing Support
`424
`WebDAV and Collaborative Authoring
`429
`For More Information
`446
`
`20. Redirection and Load Balancing .................................... 448
`Why Redirect?
`449
`Where to Redirect
`449
`Overview of Redirection Protocols
`450
`General Redirection Methods
`452
`Proxy Redirection Methods
`462
`Cache Redirection Methods
`469
`Internet Cache Protocol
`473
`Cache Array Routing Protocol
`475
`Hyper Text Caching Protocol
`478
`For More Information
`481
`
`21. Logging and Usage Tracking ....................................... 483
`What to Log?
`483
`Log Formats
`484
`Hit Metering
`492
`A Word on Privacy
`495
`For More Information
`495
`
`x I Table of Contents
`
`
`
`Part VI. Appendixes
`
`A. URI Schemes ..................................................... 499
`
`B. HTTP Status Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505
`
`C. HTTP Header Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508
`
`D. MIME Types ...................................................... 533
`
`E. Base-64 Encoding ................................................. 570
`
`F. Digest Authentication ............................................. 574
`
`G. Language Tags . ..................................................... 581
`
`H. MIME Charset Registry .........................•............ · ....... 602
`
`Index .•............................................................... 617
`
`TableofContents
`
`I xi
`
`
`
`
`
`Preface
`
`The Hypertext Transfer Protocol (HTTP) is the protocol programs use to communi(cid:173)
`cate over the World Wide Web. There are many applications of HTTP, but HTTP is
`most famous for two-way conversation between web browsers and web servers.
`HTTP began as a simple protocol, so you might think there really isn't that much to
`say about it. And yet here you stand, with a two-pound book in your hands. If you're
`wondering how we could have written 650 pages on HTTP, take a look at the Table
`of Contents. This book isn't just an HTTP header reference manual; it's a veritable
`bible of web architecture.
`
`In this book, we try to tease apart HTTP's interrelated and often misunderstood
`rules, and we offer you a series of topic-based chapters that explain all the aspects of
`HTTP. Throughout the book, we are careful to explain the "why" of HTTP, not just
`the "how." And to save you time chasing references, we explain many of the critical
`non-HTTP technologies that are required to make HTTP applications work. You can
`find the alphabetical header reference (which forms the basis of most conventional
`HTTP texts) in a conveniently organized appendix. We hope this conceptual design
`makes it easy for you to work with HTTP.
`
`This book is written for anyone who wants to understand HTTP and theunderlying
`architecture of the Web. Software and hardware engineers can use this book as a
`coherent reference for HTTP and related web technologies. Systems architects and
`network administrators can use this book to better understand how to design,
`deploy, and manage complicated web architectures. Performance engineers and ana(cid:173)
`lysts can benefit from the sections on caching and performance optimization. Mar(cid:173)
`keting and consulting professionals will be able to use the conceptual orientation to
`better understand the landscape of web technologies.
`
`This book illustrates common misconceptions, advises on "tricks of the trade," pro(cid:173)
`vides convenient reference material, and serves as a readable introduction to dry and
`confusing standards specifications. In a single book, we detail the essential and inter(cid:173)
`related technologies that make the Web work.
`
`xiii
`
`
`
`This book is the result of a tremendous amount of work by many people who share
`an enthusiasm for Internet technologies. We hope you find it useful.
`
`Running Example: Joe's Hardware Store
`Many of our chapters include a running example of a hypothetical online hardware
`and home-improvement store called "Joe's Hardware" to demonstrate technology
`concepts. We have set up a real web site for the store (http://www.joes-hardware.
`com) for you to test some of the examples in the book. We will maintain this web site
`while this book remains in print.
`
`Chapter-by-Chapter Guide
`This book contains 21 chapters, divided into 5 logical parts (each with a technology
`theme), and 8 U$eful appendixes containing reference data and surveys of related
`technologies:
`
`Part I, HTTP: The Web's Foundation
`· Part II, HTTP Architecture
`Part III, Identification, Authorization, and Security
`Part IV, Entities, Encodings, and Internationalization
`Part V, Content Publishing and Distribution
`Part VI, Appendixes
`
`Part I, HTTP: The Web's Foundation, describes the core technology ofHTTP, the
`foundation of theW eb, in four chapters:
`
`• Chapter 1, Overview of HTTP, is a rapid-paced overview of HTTP.
`• Chapter 2, URLs and Resources, details the formats of uniform resource locators
`(URLs) and the various types of resources that URLs name across the Internet. It
`also outlines the evolution to uniform resource names (URNs).
`• Chapter 3, HTTP Messages, details how HTTP messages transport web content.
`• Chapter 4, Connection Management, explains the commonly mis1,mderstood and
`poorly documented rules and behavior for managing HTTP connections.
`
`Part II, HTTP Architecture, highlights the HTTP server, proxy, cache, gateway, and
`robot applications that are the architectural building blocks of web systems. (Web
`browsers are another building block, of course, but browsers already were covered
`thoroughly in Part I of the book.) Part II contains the following six chapters:
`
`• Chapter 5, Web Servers, gives an overview of web server architectures.
`• Chapter 6, Proxies, explores HTTP proxy servers, which are intermediary serv(cid:173)
`ers that act as platforms for HTTP services and controls.
`• Chapter 7, Caching, delves into the science of web caches-devices that improve
`performance and reduce traffic by making local copies of popular documents.
`
`xiv
`
`I Preface
`
`
`
`• Chapter 8, Integration Points: Gateways, Tunnels, and Relays, explains gateways
`and application servers that allow HTTP to work with software that speaks dif,.
`ferent protocols, including Secure Sockets Layer (SSL) encrypted protocols.
`• Chapter 9, Web Robots, describes the various types of clients that pervade the
`Web, including the ubiquitous browsers, robots and spiders, and search engines.
`• Chapter 10, HTTP-NG, talks about HTTP developments still in the works: the
`. HTTP-NG protocol.
`
`Part III, Identification, Authorization, and Security, presents a suite of techniques and
`technologies to track identity, enforce security, and control access to content. It con(cid:173)
`tains the following four chapters:
`
`• Chapter 11, Client Identification and Cookies, talks about techniques to identify
`users so that content. can be personalized to the user audience.
`• Chapter 12, Basic Authentication, highlights the basic mechanisms to verify user
`identity. The chapter also examines how HTTP authentication interfaces with
`databases.
`• Chapter 13, Digest Authentication, explains digest authentication, a complex
`proposed enhancement to HTTP that provides significantly enhanced security.
`• Chapter 14, Secure HTTP, is a detailed overview of Internet cryptography, digi-
`tal certificates, and SSL.
`·
`
`Part IV, Entities; Encodings, and Internationalization, focuses on the bodies of HTTP
`messages (which contain the actual web content) and on the web standards that
`describe and manipulate content stored in the message bodies. Part IV contains three
`chapters:
`
`• Chapter 15, Entities and Encodings, describes the structure of HTTP content.
`• Chapter 16, Internationalization, surveys the web standards that allow users
`around the globe to exchange content in different languages and character sets.
`• Chapter 17, Content Negotiation and Transcoding, explains mechanisms for
`negotiating acceptable content.
`
`Part V, Content Publishing and Distribution, discusses the technology for publishing
`and disseminating web content. It contains four chapters:
`
`• Chapter 18, Web Hosting, discusses the ways people deploy servers in modern
`web hosting environments and HTTP support for virtual web hosting.
`• Chapter 19, Publishing Systems, discusses the technologies for creating webcon(cid:173)
`tent and installing it onto web servers.
`• Chapter 20, Redirection and Load Balancing, surveys the tools and techniques for
`distributing incoming web traffic among a collection of servers.
`• Chapter 21, Logging and Usage Tracking, covers log formats and common
`questions.
`
`Preface
`
`I xv
`
`
`
`Part VI, Appendixes, contains helpful reference appendixes and tutorials in related
`technologies:
`
`• Appendix A, URI Schemes, summarizes the protocols supported through uni(cid:173)
`form resource identifier (URI) schemes.
`• Appendix B, HTTP Status Codes, conveniently lists the HTTP response codes.
`• Appendix C, HTTP Header Reference, provides a reference list of HTTP header
`fields.
`• Appendix D, MIME Types, provides an extensive list of MIME types and
`explains how MIME types are registered.
`• Appendix E, Base-64 Encoding, explains base-64 encoding, used by HTTP
`authentication.
`• Appendix F, Digest Authentication, gives details on how to implement various
`authentication schemes in HTTP.
`• Appendix G, Language Tags, defines language tag values for HTTP language
`headers.
`• Appendix H, MIME Charset Registry, provides a detailed list of character encod(cid:173)
`ings, used for HTTP internationalization support.
`
`Each chapter contains many examples and pointers to additional reference material.
`
`Typographic Conventions
`In this book, we use the following typographic conventions:
`Italic
`Used for URLs, C functions, command names, MIME types, new terms where
`they are defined, and emphasis
`Constant width
`Used for computer output, code, and any literal text
`Constant width bold
`Used for user input
`
`Comments and Questions
`Please address comments and questions concerning this book to the publisher:
`
`O'Reilly & Associates, Inc.
`1005 Gravenstein Highway North
`Sebastopol, CA 95472
`(800) 998-9938 (in the United States or Canada)
`(707) 829-0515 (international/local)
`(707) 829-0104 (fax)
`
`xvi
`
`I Preface
`
`
`
`There is a web page for this book, which lists errata, examples, or any additional
`information. You can access this page at:
`http://www. oreilly. comlcatalog/httptdg!
`To comment or ask technical questions about this book, send email to:
`bookquestions@oreilly. com
`
`For more information about books, conferences, Resource Centers, and the O'Reilly
`Network, see the O'Reilly web site at:
`http://www.oreilly.com
`
`Acknowledgments
`This book is the labor of many. The five authors would like to hold up a few people
`.
`in thanks for their significant contributions to this project.
`
`To start, we'd like to thank Linda Mui, our editor at O'Reilly. Linda first met with
`David and Brian way back in 1996, and she refined and steered several concepts into
`the book you hold today. Linda also helped keep our wandering gang of first-time
`book authors moving in a coherent direction and on a progressing (if not rapid) time(cid:173)
`line. Most of all, Linda gave us the chance to create this book. We're very grateful.
`
`We'd also like to thank several tremendously bright, knowledgeable, and kind souls
`who devoted noteworthy energy to reviewing, commenting on, and correcting drafts
`of this book. These include Tony Bourke, Sean Burke, Mike Chowla, Shernaz Daver,
`Fred Douglis, Paula Ferguson, Vikas Jha, Yves Lafon, Peter Mattis, Chuck Neer(cid:173)
`daels, Luis Tavera, Duane Wessels, Dave Wu, and Marco Zagha. Their viewpoints
`and suggestions have improved the book tremendously.
`
`Rob Romano from O'Reilly created most of the amazing artwork you'll find in this
`book. The book contains an unusually large number of detailed illustrations that
`make subtle concepts very clear. Many of these illustrations were painstakingly cre(cid:173)
`ated and revised numerous times. If a picture is worth a thousand words, Rob added
`hundreds of pages of value to this book.
`
`Brian would like to personally thank all of the authors for their dedication to this
`project. A tremendous amount of time was invested by the authors in a challenge to
`make the first detailed but accessible treatment of HTTP. Weddings, childbirths,
`killer work projects, startup companies, and graduate schools intervened, but the
`authors held together to bring this project to a successful completion. We believe the
`result is worthy of everyone's hard work and, most importantly, that it provides a
`valuable serv'ice. Brian also would like to thank the employees of Inktomi for their
`enthusiasm and support and for their deep insights about the use of HTTP in real(cid:173)
`world applications. Also, thanks to the fine folks at Cajun-shop.com for allowing us
`to use their site for some of the examples in this book.
`
`Preface
`
`I xvii
`
`
`
`David would like to thank his family, particularly his mother and grandfather for
`their ongoing support. He'd like to thank those that have put up with his erratic
`schedule over the years writing the book. He'd also like to thank Slurp, Orctomi, and
`Norma for everything they've done, and his fellow authors for all their hard work.
`Finally, he would like to thank Brian for roping him into yet another adventure.
`Marjorie would like to thank her husband, Alan Liu, for techniCal insight, familial
`support and understanding. Marjorie thanks her fellow authors for many insights
`and inspirations. She is grateful for the experience of working together on this book.
`
`Sailu would like to thank David and Brian for the opportunity to work on this book,
`and Chuck Neerdaels for introducing him to HTTP.
`Anshu would like to thank his wife, Rashi, and his parents for their patience, sup(cid:173)
`port, and encouragement during the long years spent writing this book.
`Finally, the authors collectively thank the famous and nameless Internet pioneers,
`whose research, development, and evangelism over the past four decades contrib(cid:173)
`uted so much to our scientific, social, and economic community. Without these
`labors, there would be no subject for this book.
`
`xviii
`
`I Preface
`
`
`
`PART I
`HTTP: The Web's Foundation
`
`This section is an introduction to the HTTP protocol. The next four chapters
`describe the core technology of HTTP, the foundation of the Web:
`
`• Chapter 1, Overview of HTTP, is a rapid-paced overview of HTTP.
`• Chapter 2, URLs and Resources, details the formats of URLs and the various
`types of resources that URLs name across the Internet. We also outline the evo(cid:173)
`lution to URNs.
`• Chapter 3, HTTP Messages, details the HTTP messages that transport web
`content.
`• Chapter 4, Connection Management, discusses the commonly misunderstood
`and poorly documented rules and behavior for managing TCP connections by
`HTTP.
`
`
`
`
`
`CHAPTER 1
`Overview of HTTP
`
`The world's web browsers, servers, and related web applications all talk to each
`other through HTTP, the Hypertext Transfer Protocol. HTTP is the common lan(cid:173)
`guage of the modern global Internet.
`
`This chapter is a concise overview of HTTP. You'll see how web applications use
`HTTP to communicate, and you'll get a rough idea of how HTTP does its job. In
`particular, we talk about:
`
`• How web clients and servers communicate
`• Where resources (web content) come from
`• How web transactions work
`• The format of the messages used for HTTP communication
`• The underlying TCP network transport
`• The different variations of the HTTP protocol
`• Some of the many HTTP architectural components installed around the Internet
`
`We've got a lot of ground to cover, so let's get started on our tour of HTTP.
`
`HTTP: The Internet's Multimedia Courier
`Billions of ]PEG images, HTML pages, text files, MPEG movies, W A V audio files,
`Java applets, and more cruise through the Internet each and every day. HTTP moves
`the bulk of this information quickly, conveniently, and reliably from web servers all
`around the world to web browsers on people's desktops.
`
`Because HTTP uses reliable data-transmission protocols, it guarantees that your data
`will not be damaged or scrambled in transit, even when it comes from the other side of
`the globe. This is good for you as a user, because you can access information without
`worrying about its integrity. Reliable transmission is also good for you as an Internet
`application developer, because you don't have to worry about HTTP communications
`
`3
`
`
`
`being destroyed, duplicated, or distorted in transit. You can focus on programming
`the distinguishing details of your application, without worrying about the flaws and
`foibles of the Internet.
`
`Let's look more closely at how HTTP transports the Web's traffic.
`
`Web Clients and Servers
`Web content lives on web servers. Web servers speak the HTTP protocol, so they are
`often called HTTP servers. These HTTP servers store the Internet's data and provide
`the data when it is requested by HTTP clients. The clients send HTTP requests to
`servers, and servers return the requested data in HTTP responses, as sketched in
`Figure 1-1. Together, HTTP clients and HTTP servers make up the basic compo(cid:173)
`nents of the World Wide Web.
`
`Figure 1-1. Web clients and servers
`
`You probably use HTTP clients every day. The most common client is a web
`browser, such as Microsoft Internet Explorer or Netscape Navigator. Web browsers
`request HTTP objects from servers and display the objects on your screen.
`
`When you browse to a page, such as "http://www.oreilly.com/index.html," your
`browser sends an HTTP request to the server www.oreilly.com (see Figure 1-1). The
`server tries to find the desired object (in this case, "/index.html") and, if successful,
`sends the object to the client in an HTTP response, along with the type of the object,
`the length of the object, and other information.
`
`Resources
`Web servers host web resources. A web resource is the source of web content. The
`simplest kind of web resource is a static file on the web server's filesystem. These
`files can contain anything: they might be text files, HTML files, Microsoft Word
`files, Adobe Acrobat files, ]PEG image files, A VI movie files, or any other format you
`can think of.
`
`However, resources don't have to be static files. Resources can also be software pro- ·
`grams that generate content on demand. These dynamic content resources can gen(cid:173)
`erate content based on your identity, on what information you've requested, or on
`
`4 I Chapter 1: Overview of HTTP
`
`
`
`the tim'e of day. They can show you a live image from a camera, or let you trade
`stocks, search real estate databases, or buy gifts from online stores (see Figure 1-2).
`
`r·····-----------~~~"iii~~~-------------------i~~~~~~~s--i
`
`I
`
`:
`
`Image file
`
`Text file
`1---------1
`
`Real estate search
`gateway
`-.~--+-• $11000101101$
`l
`.
`£-commerce
`[ ____________________ ~~~~~~Y.. __________________________________ _l
`
`'
`
`Figure 1-2. A web resource is anything that provides web content
`
`In summary, a resource is any kind of content source. A file containing your com(cid:173)
`pany's sales forecast spreadsheet is a resource. A web gateway to scan your local
`public library's shelves is a resource. An Internet search engine is a resource.
`
`Media Types
`Because the Internet hosts many thousands of different data types, HTTP carefully
`tags each object being transported through the Web with a data format label called a
`MIME type. MIME (Multipurpose Internet Mail Extensions) was originally designed
`to solve problems encountered in moving messages between different electronic mail
`systems, MIME worked so well for email that HTTP adopted it to describe and label
`its own multimedia content.
`
`Web servers attach a MIME type to all HTTP object data (see Figure 1-3). When a
`web browser gets an object back from a server, it looks at the associated MIME type
`to see if it knows how to handle the object. Most browsers can handle hundreds of
`popular object types: displaying image files, parsing and formatting HTML files,
`playing audio files through the computer's speakers, or launching external plug-in
`software to handle special formats.
`
`Resources
`
`I 5
`
`
`
`~MIME type
`: Content-txpe: image/ j peg i
`,-----------~-----,-~~----
`1 Content-length: 12984
`
`I
`I
`I
`
`1
`I
`I
`I
`I
`(:
`I
`F
`
`Client
`
`1_ -
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`- - - -
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`I
`I
`- J
`
`Server
`
`Figure 1-3. MIME types are sent back with the data content
`
`A MIME type is a textual label~ represented as a primary object type and a specific
`subtype, separated by a slash. For example:
`
`• An HTML-formatted text document would be labeled with type text/html.
`• A plain ASCII text document would be labeled with type text/plain.
`• A]PEG version of an image would be image/jpeg.
`• A GIF-format image would be image/gif.
`• An Apple QuickTime movie would be video/quicktime.
`• A Microsoft PowerPoint presentation would be application/vnd.ms-powerpoint.
`
`There are hundreds of popular MIME types, and many more experimental or limited(cid:173)
`use types. A very thorough MIME type list is provided in Appendix D.
`
`URis
`Each web server resource has a name, so clients can point out what resources they
`are interested in. The server resource name is called a uniform resource identifier, or
`URI. URis are like the postal addresses of the Internet, uniquely identifying and
`locating information resources around the world.
`
`Here's a URI for an image resource on Joe's Hardware store's web server:
`http://www.joes-hardware.com/specials/saw-blade.gif
`Figure 1-4 shows how the URI specifies the HTTP protocol to access the saw-blade
`GIF resource on Joe's store's server. Given the URI, HTTP can retrieve the object.
`URis come in two flavors, called URLs and URNs. Let's take a peek at each of these
`types of resource identifiers now.
`
`URLs
`The uniform resource locator (URL) is the most common form of resource identifier.
`URLs describe the specific location of a resource on a particular server. They tell you
`exactly how to fetch a resource from a precise, fixed location. Figure 1-4 shows how
`a URL tells precisely where a resource is located and how to access it. Table 1-1
`shows a few examples of URLs.
`
`6 I Chapter 1: Overview of HTTP
`
`
`
`2
`1
`Go to wwwjoes-hardware.com
`Use HTTP protocol
`I \. · ........ •.J./··
`
`..•••• · ' • ·
`
`'~:!±i>.:?U'~~r1~~"~.7·~~;~.~W~:E~:~-~2Ti:$p.~s~~!~(s~~i'9~~·4~ .. ~g~f:
`
`3
`Grab the resource called /specials/saw-blade.gif
`.·· .. •· .. ··.····•···· . . ··.· .. ·.·· .• I .·· •·· .. ·.·
`·..
`••
`.
`
`.........
`
`;
`
`Client
`
`~-------------------------~
`
`www.joes·hardware.com
`
`Figure 1-4. URLs specify protocol, server, and local resource
`
`Table 1-1. ExampleURLs
`.:;ijR(;·:••
`http://www.oreilly.com/index.html
`
`http://www.yahoo.com/images/logo.gif
`
`http://www.joes-hardware.com/inventory-check.
`cgi?item= 12731
`
`ftp:/!joe:tools4u@ftp.joes-hardware.com/locking(cid:173)
`pliers.gif
`
`The home URl for O'Reilly & Associates, Inc.
`The URl for the Yahoo! web site's logo
`The URl for a program that checks if inventory item
`#12731 is in stock
`The URl for the locking-pliers.gifimage file, using
`password-protected FTP as the access protocol
`
`Most URLs follow a standardized format of three main parts:
`
`• The first part of the URL is called the scheme, and it describes the protocol used
`to access the resource. This is usually the HTTP protocol (http:/!).
`• The second part gives the server Internet address (e.g., www.joes~hardware.com).
`• The rest names a resource on the web server (e.g., !specials!saw-blade.gif).
`
`Today, almost every URI is a URL.
`
`URNs
`The second flavor of URI is the uniform resource name, or URN. A URN serves as a
`unique name for a particular piece of content, independent of where the resource
`currently resides. These location-independent URNs allow resources to move from
`place to place. URNs also allow resources to be accessed by multiple network access
`protocols while maintaining the same name.
`
`For example, the following