`
`'1 1—4
`
`5H (
`
`D H :
`
`3
`(D
`H
`
`
`
`jCDi
`
`DH C
`
`D
`
`
`
`‘SJapgdSI8mgpun‘s.19>[o.lgo‘smmpuem
`
`
`
`an E"ECD —. (I,
`
`VMware - Exhibit 1014
`
`VMware v. IV | - |PR2020-00470
`
`Page 1 of 435
`
`VMware - Exhibit 1014
`VMware v. IV I - IPR2020-00470
`Page 1 of 435
`
`
`
`Create and
`
`effectively
`
`manage
`
`agents and
`
`explore
`
`their effects
`
`
`
`on the
`
`Internet
`
`VMware - Exhibit 1014
`
`VMware v. IV | - |PR2020-00470
`
`Page 2 of 435
`
`VMware - Exhibit 1014
`VMware v. IV I - IPR2020-00470
`Page 2 of 435
`
`
`
`Internet Agents:.
`Spiders, W.llnd~,;aers, Brokers, and 'Bots
`••••••••••••••••••••••••••••••••••
`
`Fah.:Chun Cheong
`
`.S'hC•,••t·,c,,;,,,,;,~,.. ~
`
`,,.,,;\>\•;ic;i\f.;.y<',$'
`
`ishi~g,\lndianapolis, Indiana
`
`VMware - Exhibit 1014
`VMware v. IV I - IPR2020-00470
`Page 3 of 435
`
`
`
`J l
`
`Internet Agents: Spiders, Wanderers, Brokers, and 'Bots
`By Fah-Chun Cheong
`
`Published by:
`New Riders Publishing
`201 West 103rd Street
`Indianapolis, IN 46290 USA
`
`All rights reserved. No part of this book may be reproduced or trans(cid:173)
`mitted in any form or by any means, electronic or mechanical, in(cid:173)
`cluding photocopying, recording, or by any information storage and
`retrieval system, without written permission from the publisher, ex(cid:173)
`cept for the inclusion of brief quotations in a review.
`
`Copyright© 1996 by New Riders Publishing
`
`Printed in the United States of America 1 2 3 4 5 6 7 8 9 0
`
`CIP Data Available upon Request
`
`Warning and Disclaimer
`This book is designed to provide information about Internet agents.
`Every effort has been made to make this book as complete and as
`accurate as possible, but no warranty or fitness is implied.
`
`The information is provided on an "as is" basis. The author and New
`Riders Publishing shall have neither liability nor responsibility to any
`person or entity with respect to any loss or damages arising from
`the information contained in this book or from the use of the disks or
`programs that may accompany it.
`
`Publisher
`Publishing Manager
`Marketing Manager
`Managing Editor
`
`Don Fowley
`Jim LeValley
`Ray Robinson
`Tad Ringo
`
`~od~ct'llavelopment··•
`. Specialist
`.
`. Julie Fairweather
`ll,ev.~lopment Eclitor
`Suzanne Snyder
`Pl'ot:l~ctirin.Editor
`Gliff ShutJs
`IJapyErino~
`Arny Bezek,~ra,n.Blauw,.·
`Gal.I a.1.1rfakoffi .Laura
`·
`Frey, t.iscl Wilson
`·
`As11ac:iate M11rketin~
`M11,nager
`·••
`. · Tarriara Apple
`·. J\°:qui~Hions Cot1rdinat~r
`rr~~vJwge~9"!\ . · •..•. ·
`Ptl~lishe~'s~sistaiJt ·
`Karet:r Opal ·
`Jl~~~r~~s!fjn~i• ··
`···J,ay,cqrp~s .•
`• tray,r!lllls~at1w ..
`Roger•Morga'rt •
`Btnlltl1asigner
`Sahdra Schroeder
`· Martilfaatlil'jn.fr. Coordinator
`l?aul Gilchrist
`Produc:ti~ri fvt11nager.
`Kelly 0, Qobbs
`. ProdutltiotrTearri Supem!isor
`Laurie .<;;.asey
`. . ·.•.
`· BraRbits bnlige Specialists
`Jas()n Hanel, C::llpt · ..•.•
`Lahner;i; ~aura.Robl:lihs,
`C::ta:igSrni:111, ToddWeQte
`Pl't1d1:mUq11 Aiial~t11
`· 4\r(gEJla o '.. aanqan
`Bobbi Satter:fieltf
`Proiju~tion 'feaIQ
`. l-:l~a@3r l3Litler, Dan
`•·•
`.. • ,G)aparo1 J<irn.Cofet\. Kevit1
`• E'oltt, Eirika MiUef);EriG!1 .
`. J,Hicht~r;. Chnstir\e
`Tyner, l<~renWalsh ·
`r1111e~el\
`< Ghrist.op~er Cl~yelahd
`
`ii
`
`Internet Agents: Spiders, Wanderers, Brokers, and 'Bots
`
`VMware - Exhibit 1014
`VMware v. IV I - IPR2020-00470
`Page 4 of 435
`
`
`
`About the Author
`Fah-Chun Cheong consults with start-up compa(cid:173)
`nies around the San Francisco Bay Area in the
`application of agent technologies for electronic
`commerce on the World Wide Web and Internet.
`
`Mr. Cheong received his B.S. in Electrical Engi(cid:173)
`neering from The University of Texas at Austin in
`1986, and his M.S. and Ph.D. degrees in Com(cid:173)
`puter Science from the University of Michigan in
`1988 and 1992, respectively. His Ph.D. research
`work is on the design and development of an ex(cid:173)
`perimental agent-oriented programming language
`and compiler system for heterogeneous distrib(cid:173)
`uted computing environments. He founded Agent
`Computing, Inc. in 1994, with a vision to develop
`innovative application-specific agent technologies
`for the Internet.
`
`Trademark Acknowledgments
`All terms mentioned in this book that are known to
`be trademarks or service marks have been appro(cid:173)
`priately capitalized. New Riders Publishing cannot
`attest to the accuracy of this information. Use of a
`term in this book should not be regarded as affect(cid:173)
`ing the validity of any trademark or service mark.
`
`Dedication
`To my parents and sisters
`
`Acknowledgments
`This book might not have been written (well, at
`least not in 19951) if Vinay Kumar had not invited
`me along to a dinner earlier this year at a sushi
`place in San Francisco with Jim LeValley, Publish(cid:173)
`ing Manager for New Riders Publishing. I thank
`him for that and for the many interesting and in(cid:173)
`sightful discussions on a variety of topics we have
`had over many cups of espresso.
`
`A very big thank you to Kevin Hughes for review(cid:173)
`ing drafts of this book. I am grateful to ex(cid:173)
`colleagues at EIT and ex-EIT friends, especially
`Jeff Pan and Jim McGuire, for information in a
`variety of areas, most notably procurement
`agents, Web robots, and secure HTTP.
`
`I would like to thank all the people on the Internet
`whose pioneering work in agents, spiders, wan(cid:173)
`derers, and Web robots has made an early book
`on this topic a possibility. Special thanks to all the
`authors of Web robots, spiders, and wanderers
`who have a,nswered e-mail questionnaires on
`Internet agents; their insightful comments and
`responses have contributed much toward shap(cid:173)
`ing the content of this book.
`
`I am indebted to Roy Fielding for his libwww-perl
`and MOMspider source code, which, in a vastly sim(cid:173)
`plified form, have now become the basis upon which
`WebWalker is built. Many thanks to Bruce Krulwich
`whose Bargain Finder agent on the Web inspired the
`development of WebShopper for this book.
`
`Martijn Koster has authored and maintained a
`number of marvelous Web pages on the net.
`Among his creations, I have found the List of Ro(cid:173)
`bots a comprehensive reference and an invalu(cid:173)
`able resource for much of this book.
`
`The Stanford Libraries have proved invaluable to
`me on this project, as on others. I am extremely
`grateful that Stanford opens its Mathematics and
`Computer Science Library, and also the Engineer(cid:173)
`ing Library, to the surrounding community at large.
`
`A very big thank you to the friendly, competent,
`and generally fantastic editorial staff at New Rid(cid:173)
`ers who prepared this book for publication. I am
`indebted to Jim LeValley for taking an interest in
`Internet agents, coming up with an initial plan for
`this book, and supplying me continuously with an
`unending strecJm of helpful sources and materi(cid:173)
`als. I am especially thankful to Julie Fairweather
`for developing the book, coordinating the process
`to keep publication on schedule, and for helping
`with numerous screen-shots of the Web. Special
`thanks to Cliff Shubs for his excellent editing and
`his many thoughtful remarks on the book, and to
`Suzanne Snyder for helping with the development
`of the book. Many thanks go to Roger Morgan for
`designing the great spider on the front cover.
`
`Internet Agents: Spiders, Wanderers, Brokers, and 'Bats
`
`iii
`
`VMware - Exhibit 1014
`VMware v. IV I - IPR2020-00470
`Page 5 of 435
`
`
`
`Contents at a Glance
`
`Part I: Introduction
`The World of Agents ....................................................................... 3
`
`2 The Internet: Past, Present, and Future ........................................ 37
`
`3 World Wide Web: Playground for Robots ..................................... 61
`
`Part II: Web Robot Construction
`4 Spiders for Indexing the Web ........................................................ 81
`
`5 Web Robots: Operational Guidelines .......................................... 105
`
`6 HTTP: Protocol of Web Robots ................................................... 125
`
`7 WebWalker: Your Web Maintenance Robot ............................... 153
`
`Part III: Agents and Money on the Net
`8 Web Transaction Security ........................................................... 185
`
`9 Electronic Cash and Payment Services ....................................... 205
`
`Part IV: Bots in Cyberspace
`10 Worms and Viruses ..................................................................... 229
`
`11 MUD Agents and Chatterbots ..................................................... 249
`
`Part V: Appendices
`A HTTP 1.0 Protocol Specifications ................................................ 283
`
`B WebWalker 1.00 Program Listing ............................................... 293
`
`C WebShopper 1.00 Program Listing ............................................. 337
`
`D List of Online Bookstores Visited by BookFinder .......... • .............. 347
`
`E List of Online Music Stores Visited by CDFinder ........................ 351
`
`F List of Active MUD Sites on the Internet .................................... 355
`
`G List of World Wide Web Spiders and Robots .............................. 375
`
`Bibliography ................................................................................. 387
`
`Index ............................................................................................ 401
`
`iv
`
`Internet Agents: Spiders, Wanderers, Brokers, and 'Bots
`
`VMware - Exhibit 1014
`VMware v. IV I - IPR2020-00470
`Page 6 of 435
`
`
`
`Table of Contents
`
`Part I: Introduction
`
`1
`
`1 The World of Agents
`3
`What are Agents? ........................................................................................ 5
`Agents and Delegation ................................................................................. 6
`Personal Assistants ................................................................................. 6
`Envoy Desktop Agents ............................................................................ 8
`New Wave Desktop Agents .................................................................... 8
`Surrogate Bots ........................................................................................ 9
`Internet Softbots ................................................................................... 10
`Agents and Coordination ............................................................................ 12
`Conference-Support Agents .................................................................. 12
`Integrated Agents .................................................................................. 13
`Communicative Agents ......................................................................... 15
`Agents and Knowledge .............................................................................. 17
`Teaching Agents .................................................................................... 17
`Learning Agents .................................................................................... 19
`Common-Sense Agents ........................................................................ 21
`Physical Agents ..................................................................................... 23
`Agents and Creativity ............. ·.· .................................................................. 24
`Creative Agents ..................................................................................... 24
`Automated Design Agents .................................................................... 27
`Agents and Emotion ................................................................................... 27
`Art of Animation .................................................................................... 28
`Artificial lritelligence .............................................................................. 28
`The Oz Project ....................................................................................... 28
`Agents and Programming .......................................................................... 30
`KidSim ................................................................................................... 30
`Oasis ..................................................................................................... 31
`Agents and Society .................................................................................... 32
`Control ................................................................. .' ................................. 33
`Over Expectations ............................................................... : ................. 33
`Safety ..................................................................................... : .............. 33
`Privacy ................................................................................................... 33
`Commercial Future of Agents .................................................................... 33
`Product Suites ....................................................................................... 34
`Mobile Computing ................................................................................. 34
`Concluding Remarks .................................................................................. 35
`
`I Table of Contents
`
`V
`
`VMware - Exhibit 1014
`VMware v. IV I - IPR2020-00470
`Page 7 of 435
`
`
`
`2 The Internet: Past, Present, and Future
`37
`Early Days of ARPAnet ............................................................................... 38
`Notable Computer Networks ..................................................................... 39
`Internet and NSFnet ................................................................................... 41
`NSF and AUP ............................................................................................. 42
`Growth of the Internet ............................................................................... 42
`How Big is the Internet? ............................................................................ 45
`Internet Society, IAB, and IETF .................................................................. 55
`Information Superhighway and the National Information Infrastructure .... 57
`
`3 World Wide Web: Playground for Robots
`61
`World Wide Web Development ................................................................. 62
`Growth of the Web ............................................................................... 62
`Information Dissemination with the Web ............................................. 62
`Innovative Uses of the Web .................................................................. 65
`Architecture of the World Wide Web ......................................................... 65
`Web Clients ........................................................................................... 66
`Web Servers .......................................................................................... 66
`Web Proxies .......................................................................................... 67
`Web Resource Naming, Protocols, and Formats .................................. 67
`URI and URL: Universal Resource Identifier and Locator .......................... 67
`Common URI Syntax ............................................................................. 68
`URLs for Various Protocols ................................................................... 69
`Gopher and WAIS .................................................................................. 69
`HTTP: HyperText Transfer Protocol ........................................................... 69
`Statelessness in HTTP .......................................................................... 70
`Format Negotiations .............................................................................. 70
`HTML: HyperText Markup Language ......................................................... 71
`Level of HTML Conformance ................................................................ 71
`HTML Tags ............................................................................................ 72
`Forms and Image maps: Enhanced Web Interactivity ................................ 73
`Fill-Out Forms ........................................................................................ 73
`Clickable Images ................................................................................... 73
`Gateway Programming: Processing Client Input ....................................... 74
`Gateway Program Interaction ............................. '. .................................. 75
`The Next Step: Agents on the Web ........................................................... 76
`Early Commerce Agents ....................................................................... 76
`Web Agents of the Future? ................................................................... 78
`
`Part II: Web Robot Construction
`
`79
`
`4 Spiders for Indexing the Web
`81
`Web Indexing Spiders ................................................................................ 82
`WebCrawler: Finding What People Want .................................................. 84
`
`vi
`
`Internet Agents: Spiders, Wanderers, Brokers, and 'Bots
`
`VMware - Exhibit 1014
`VMware v. IV I - IPR2020-00470
`Page 8 of 435
`
`
`
`Searching with WebCrawler .................................................................. 84
`How WebCrawler Moves in Webs pace ................................................ 85
`Lycos: Hunting WWW Information ............................................................ 89
`Searching with Lycos ............................................................................ 90
`Lycos' Search Space ............................................................................. 91
`Lycos Indexing ...................................................................................... 92
`How Lycos Moves in Webspace ........................................................... 92
`Harvest: Gathering and Brokering Information ........................................... 93
`Searching with Harvest ......................................................................... 94
`Harvest Architecture ............................................................................. 95
`WebAnts: Hunting in Packs ....................................................................... 99
`WebAnts Motivation ........................................................................... 100
`WebAnts Searching and Indexing ....................................................... 100
`Issues of Web Indexing ........................................................................... 100
`Recall and Precision ............................................................................ 101
`Good Web Citizenship ......................................................................... 101
`Performance ........................................................................................ 102
`Scalability ............................................................................................. 102
`Spiders of the Future ............................................................................... 1 03
`
`105
`5 Web Robots: Operational Guidelines
`Web Robot Uses ...................................................................................... 106
`Web Resource Discovery .................................................................... 107
`Web Maintenance ............................................................................... 107
`Web Mirroring ..................................................................................... 107
`Proposed Standard for Robot Exclusion ................................................... 108
`Robot Exclusion Method ..................................................................... 108
`Robot Exclusion File Format.. .............................................................. 109
`Recognized Field Names ..................................................................... 109
`Sample Robot Exclusion Files ............................................................. 110
`The Four Laws of Web Robotics .............................................................. 110
`I. A Web Robot Must Show Identifications ......................................... 111
`II. A Web Robot Must Obey Exclusion Standard ................................ 112
`Ill. A Web Robot Must Not Hog Resources ........................................ 113
`IV. A Web Robot Must Report Errors .................................................. 115
`The Six Commandments for Robot Operators .......... .' .............................. 115
`I. Thou Shalt Announce thy Robot ....................................... : .............. 116
`II. Thou Shalt Test, Test, and Test thy Robot Locally .......................... 117
`111. Thou Sha It Keep thy Robot Under Control ..................................... 118
`IV. Thou Shalt Stay in Contact with the World .................................... 119
`V. Thou Shalt Respect the Wishes of Webmasters ............................ 119
`VI. Thou Shalt Share Results with thy Neighbors ................................ 120
`Robot Tips for Webmasters ..................................................................... 121
`Web Ethics ............................................................................................... 122
`
`I Table of Contents
`
`vii
`
`VMware - Exhibit 1014
`VMware v. IV I - IPR2020-00470
`Page 9 of 435
`
`
`
`6 HTTP: Protocol of Web Robots
`125
`Understanding HTTP Operation ............................................................... 126
`Messaging with HTTP .............................................................................. 128
`Message Headers ............................................................................... 128
`General Message Header Fields ......................................................... 129
`Request Message .................................................................................... 130
`Method ................................................................................................ 130
`Request Header Fields ........................................................................ 133
`Response Message ................................................................................. 136
`Status Codes and Reason Phrases ..................................................... 137
`Response Header Fields ..................................................................... 140
`Entity ........................................................................................................ 141
`Entity Header Fields ............................................................................ 141
`Entity Body .......................................................................................... 146
`Protocol Parameters ................................................................................. 14 7
`HTTP Version ....................................................................................... 14 7
`Universal Resource Identifiers ............................................................ 147
`Date/Time Formats .............................................................................. 147
`Content Parameters ................................................................................. 148
`Media Types ........................................................................................ 148
`Character Sets ..................................................................................... 148
`Encoding Mechanisms ........................................................................ 149
`Transfer Encodings .............................................................................. 149
`Language Tags .................................................................................... 150
`Content Negotiation ................................................................................. 150
`Access Authentication ............................................................................. 151
`
`7 WebWalker: Your Web Maintenance Robot
`153
`The Web Maintenance Problem .......................................................... 154
`Web lnfostructure ............................................................................... 154
`Past Approaches ................................................................................. 154
`Web Maintenance Spiders .................................................................. 155
`WebWalker Operation .............................................................................. 156
`Processing Task Descriptions ............................................................. 156
`Avoiding and Excluding URLs ........................... , ................................. 156
`Keeping History ................................................................ ,. ................. 157
`Traversing the Web ............................................................................. 157
`Generating Reports ............................................................................. 157
`Is WebWalker a Good Robot? ............................................................. 157
`WebWalker Limitations ....................................................................... 158
`WebWalker Program Installation .............................................................. 158
`WebWalker Task File ............................................................................... 159
`Global Directives ................................................................................. 159
`Task Directives .................................................................................... 160
`Task File Format .................................................................................. 161
`
`viii
`
`Internet Agents: Spiders, Wanderers, Brokers, and 'Bots
`
`VMware - Exhibit 1014
`VMware v. IV I - IPR2020-00470
`Page 10 of 435
`
`
`
`WebWalker Usage Examples ................................................................... 161
`Sample WebWalker Output ................................................................ 162
`WebWalker Forms Interface ............................................................... 167
`WebWalker Program Organization ........................................................... 169
`External Library Calls ........................................................................... 169
`WebWalker Program Call-Graph .......................................................... 170
`Configuration Section .......................................................................... 172
`Avoidance Package ............................................................................. 17 4
`History Package ................................................................................... 175
`Traversal Package ................................................................................ 177
`Summary Package ............................................................................... 178
`Growing into the Future ........................................................................... 181
`
`Part III: Agents and Money on the Net
`
`183
`
`185
`8 Web Transaction Security
`Concepts of Security ................................................................................ 186
`Privacy: Keeping Private Messages Private ........................................ 187
`Authentication: Proving You Are Who You Claim to Be ...................... 188
`Integrity: Ensuring Message Content Remains Unaltered .................. 189
`Brief Tour of Classical Cryptography ........................................................ 189
`The Role of NSA .................................................................................. 190
`Development of Data Encryption Standard (DES) ............................... 190
`Development of Public-Key Cryptography ............................................... 191
`Problems with Secret Keys ................................................................. 191
`Key Management ................................................................................ 192
`The RSA Alternative ............................................................................ 192
`Comparing Secret-Key and Public-Key Cryptography .......................... 193
`Digital Signatures ..................................................................................... 194
`How Digital Signatures Work .............................................................. 194
`The Digital Signature Standard ............................................................ 197
`Key Certification ....................................................................................... 197
`Certifying Authority ............................................................................. 197
`Certificate Format .................................................. : ............................. 198
`Two Approaches to Web Security ............................................. · .............. 198
`Secure Socket Layer (SSL) .................................................................. 200
`Secure HTTP (S-HTTP) ........................................................................ 201
`Current Practice and Future Trend in Web Security ............................ 203
`
`205
`9 Electronic Cash and Payment Services
`Brief History of Money ............................................................................. 206
`Choice of Payment Methods ................................................................... 207
`What is Digital Cash? ............................................................................... 207
`Digital Cashier's Check ........................................................................ 208
`Anonymous Digital Cash through Blind Signatures ............................. 210
`
`I Table of Contents
`
`ix
`
`VMware - Exhibit 1014
`VMware v. IV I - IPR2020-00470
`Page 11 of 435
`
`
`
`Ecash from DigiCash ........................................................................... 211
`Ecash Security and Other Issues ........................................................ 213
`Payment Systems on the Internet ........................................................... 214
`U.S. Payment Systems Today ............................................................. 214
`CyberCash Internet Payment Service ................................................. 215
`Information Commerce on the Internet ................................................... 220
`Economics of Information Commerce ................................................ 220
`First Virtual Payment System .............................................................. 222
`The Future ................................................................................................ 225
`
`Part IV: Bots in Cyberspace
`
`227
`
`10 Worms and Viruses
`229
`Short History of Worms ........................................................................... 230
`The First Worm ................................................................................... 230
`The Christmas Tree Worm .................................................................. 232
`The Internet Worm .............................................................................. 232
`Anatomy of the Internet Worm ................................................................ 232
`Method of Worm Attack .......................................................................... 232
`Method of Worm Defense .................................................................. 233
`What Does the Worm Not Do? ........................................................... 234
`Brief History of Viruses ............................................................................ 234
`Types of Viruses ....................................................................................... 236
`Boot-Sector lnfectors .......................................................................... 236
`File lnfectors ........................................................................................ 236
`PC Virus Basics ........................................................................................ 237
`Viral Activation in the Boot Process ......................................................... 238
`Step One: ROM BIOS Routines Execution ......................................... 238
`Step Two: Partition Record Code Execution ....................................... 238
`Step Three: Boot-Sector Code Execution ............................................ 238
`Step Four: IQ