`
`on the
`
`Internet
`
`Create and
`effectively
`manage
`agents.and
`explore
`their effects
`
`DATABRICKS EX1008 Page 2
`
`IPR2025-00715
`
`IPR2025-00715
`DATABRICKS EX1008 Page 2
`
`
`
`
`
`
`
`
`
`Contents at a Glance
`
`Part I: Introduction
`The World of Agents ....................................................................... 3
`
`2 The Internet: Past, Present, and Future ........................................ 37
`
`3 World Wide Web: Playground for Robots ..................................... 61
`
`Part II: Web Robot Construction
`4 Spiders for Indexing the Web ........................................................ 81
`
`5 Web Robots: Operational Guidelines .......................................... 105
`
`6 HTTP: Protocol of Web Robots ................................................... 125
`
`7 WebWalker: Your Web Maintenance Robot ............................... 153
`
`Part III: Agents and Money on the Net
`8 Web Transaction Security ........................................................... 185
`
`9 Electronic Cash and Payment Services ....................................... 205
`
`Part IV: Bots in Cyberspace
`10 Worms and Viruses ..................................................................... 229
`
`11 MUD Agents and Chatterbots ..................................................... 249
`
`Part V: Appendices
`A HTTP 1.0 Protocol Specifications ................................................ 283
`
`B WebWalker 1.00 Program Listing ............................................... 293
`
`C WebShopper 1.00 Program Listing ............................................. 337
`
`D List of Online Bookstores Visited by BookFinder .......... • .............. 347
`
`E List of Online Music Stores Visited by CDFinder ........................ 351
`
`F List of Active MUD Sites on the Internet .................................... 355
`
`G List of World Wide Web Spiders and Robots .............................. 375
`
`Bibliography ................................................................................. 387
`
`Index ............................................................................................ 401
`
`iv
`
`Internet Agents: Spiders, Wanderers, Brokers, and 'Bots
`
`IPR2025-00715
`DATABRICKS EX1008 Page 6
`
`
`
`Table of Contents
`
`Part I: Introduction
`
`1
`
`1 The World of Agents
`3
`What are Agents? ........................................................................................ 5
`Agents and Delegation ................................................................................. 6
`Personal Assistants ................................................................................. 6
`Envoy Desktop Agents ............................................................................ 8
`New Wave Desktop Agents .................................................................... 8
`Surrogate Bots ........................................................................................ 9
`Internet Softbots ................................................................................... 10
`Agents and Coordination ............................................................................ 12
`Conference-Support Agents .................................................................. 12
`Integrated Agents .................................................................................. 13
`Communicative Agents ......................................................................... 15
`Agents and Knowledge .............................................................................. 17
`Teaching Agents .................................................................................... 17
`Learning Agents .................................................................................... 19
`Common-Sense Agents ........................................................................ 21
`Physical Agents ..................................................................................... 23
`Agents and Creativity ............. ·.· .................................................................. 24
`Creative Agents ..................................................................................... 24
`Automated Design Agents .................................................................... 27
`Agents and Emotion ................................................................................... 27
`Art of Animation .................................................................................... 28
`Artificial lritelligence .............................................................................. 28
`The Oz Project ....................................................................................... 28
`Agents and Programming .......................................................................... 30
`KidSim ................................................................................................... 30
`Oasis ..................................................................................................... 31
`Agents and Society .................................................................................... 32
`Control ................................................................. .' ................................. 33
`Over Expectations ............................................................... : ................. 33
`Safety ..................................................................................... : .............. 33
`Privacy ................................................................................................... 33
`Commercial Future of Agents .................................................................... 33
`Product Suites ....................................................................................... 34
`Mobile Computing ................................................................................. 34
`Concluding Remarks .................................................................................. 35
`
`I Table of Contents
`
`V
`
`IPR2025-00715
`DATABRICKS EX1008 Page 7
`
`
`
`
`
`Searching with WebCrawler .................................................................. 84
`How WebCrawler Moves in Webs pace ................................................ 85
`Lycos: Hunting WWW Information ............................................................ 89
`Searching with Lycos ............................................................................ 90
`Lycos' Search Space ............................................................................. 91
`Lycos Indexing ...................................................................................... 92
`How Lycos Moves in Webspace ........................................................... 92
`Harvest: Gathering and Brokering Information ........................................... 93
`Searching with Harvest ......................................................................... 94
`Harvest Architecture ............................................................................. 95
`WebAnts: Hunting in Packs ....................................................................... 99
`WebAnts Motivation ........................................................................... 100
`WebAnts Searching and Indexing ....................................................... 100
`Issues of Web Indexing ........................................................................... 100
`Recall and Precision ............................................................................ 101
`Good Web Citizenship ......................................................................... 101
`Performance ........................................................................................ 102
`Scalability ............................................................................................. 102
`Spiders of the Future ............................................................................... 1 03
`
`105
`5 Web Robots: Operational Guidelines
`Web Robot Uses ...................................................................................... 106
`Web Resource Discovery .................................................................... 107
`Web Maintenance ............................................................................... 107
`Web Mirroring ..................................................................................... 107
`Proposed Standard for Robot Exclusion ................................................... 108
`Robot Exclusion Method ..................................................................... 108
`Robot Exclusion File Format.. .............................................................. 109
`Recognized Field Names ..................................................................... 109
`Sample Robot Exclusion Files ............................................................. 110
`The Four Laws of Web Robotics .............................................................. 110
`I. A Web Robot Must Show Identifications ......................................... 111
`II. A Web Robot Must Obey Exclusion Standard ................................ 112
`Ill. A Web Robot Must Not Hog Resources ........................................ 113
`IV. A Web Robot Must Report Errors .................................................. 115
`The Six Commandments for Robot Operators .......... .' .............................. 115
`I. Thou Shalt Announce thy Robot ....................................... : .............. 116
`II. Thou Shalt Test, Test, and Test thy Robot Locally .......................... 117
`111. Thou Sha It Keep thy Robot Under Control ..................................... 118
`IV. Thou Shalt Stay in Contact with the World .................................... 119
`V. Thou Shalt Respect the Wishes of Webmasters ............................ 119
`VI. Thou Shalt Share Results with thy Neighbors ................................ 120
`Robot Tips for Webmasters ..................................................................... 121
`Web Ethics ............................................................................................... 122
`
`I Table of Contents
`
`vii
`
`IPR2025-00715
`DATABRICKS EX1008 Page 9
`
`
`
`
`
`WebWalker Usage Examples ................................................................... 161
`Sample WebWalker Output ................................................................ 162
`WebWalker Forms Interface ............................................................... 167
`WebWalker Program Organization ........................................................... 169
`External Library Calls ........................................................................... 169
`WebWalker Program Call-Graph .......................................................... 170
`Configuration Section .......................................................................... 172
`Avoidance Package ............................................................................. 17 4
`History Package ................................................................................... 175
`Traversal Package ................................................................................ 177
`Summary Package ............................................................................... 178
`Growing into the Future ........................................................................... 181
`
`Part III: Agents and Money on the Net
`
`183
`
`185
`8 Web Transaction Security
`Concepts of Security ................................................................................ 186
`Privacy: Keeping Private Messages Private ........................................ 187
`Authentication: Proving You Are Who You Claim to Be ...................... 188
`Integrity: Ensuring Message Content Remains Unaltered .................. 189
`Brief Tour of Classical Cryptography ........................................................ 189
`The Role of NSA .................................................................................. 190
`Development of Data Encryption Standard (DES) ............................... 190
`Development of Public-Key Cryptography ............................................... 191
`Problems with Secret Keys ................................................................. 191
`Key Management ................................................................................ 192
`The RSA Alternative ............................................................................ 192
`Comparing Secret-Key and Public-Key Cryptography .......................... 193
`Digital Signatures ..................................................................................... 194
`How Digital Signatures Work .............................................................. 194
`The Digital Signature Standard ............................................................ 197
`Key Certification ....................................................................................... 197
`Certifying Authority ............................................................................. 197
`Certificate Format .................................................. : ............................. 198
`Two Approaches to Web Security ............................................. · .............. 198
`Secure Socket Layer (SSL) .................................................................. 200
`Secure HTTP (S-HTTP) ........................................................................ 201
`Current Practice and Future Trend in Web Security ............................ 203
`
`205
`9 Electronic Cash and Payment Services
`Brief History of Money ............................................................................. 206
`Choice of Payment Methods ................................................................... 207
`What is Digital Cash? ............................................................................... 207
`Digital Cashier's Check ........................................................................ 208
`Anonymous Digital Cash through Blind Signatures ............................. 210
`
`I Table of Contents
`
`ix
`
`IPR2025-00715
`DATABRICKS EX1008 Page 11
`
`
`
`
`
`Advanced Viral Techniques ...................................................................... 245
`Encrypted Virus ................................................................................... 245
`Multi-Encrypted Virus .......................................................................... 246
`Instructions Rescheduling ................................................................... 246
`Mutation Engine .................................................................................. 246
`Armored Virus ..................................................................................... 246
`Worms and Viruses Summarized ............................................................. 247
`
`11 MUD Agents and Chatterbots
`249
`The Turing Test ........................................................................................ 250
`Eliza: The Mother of All Chatterbots ........................................................ 251
`A Conversation with Eliza .................................................................... 252
`Eliza Internals ...................................................................................... 252
`Parry: The Artificial Paranoia Agent .......................................................... 253
`An Interview with Parry ....................................................................... 254
`Distinguishing Parry from Human Patients ......................................... 255
`MUDs: Virtual Worlds on the Internet.. .................................................... 255
`Inside MUDs ....................................................................................... 256
`Sample MUD Interactions ................................................................... 256
`TinyMUDs: Virtual Communities on the Internet ..................................... 266
`Sample TinyMUD Interactions ............................................................ 267
`Social Interactions on TinyMUDs ........................................................ 268
`Colin: The Prototypical MUD Agent ......................................................... 269
`Colin's Information Services ................................................................ 270
`Colin's Mapping Services .................................................................... 271
`Colin's Miscellaneous Services ........................................................... 273
`Julia: Chatterbot with a Personality .......................................................... 273
`An Interaction with Julia ...................................................................... 274
`Inside Julia ........................................................................................... 274
`The Loebner Prize Competition ........................................................... 275
`Julia in the Loebner Prize Competition ................................................ 276
`CHAT: A Knowledgeable Chatterbot ................................................... 278
`Tricks of the Chatterbots .......................................................................... 278
`Eliza's Tricks ........................................................................................ 278
`Parry's Tricks ....................................................... , ............................... 279
`Julia's Tricks ......................................................................................... 279
`Closing Words .......................................................................................... 279
`
`Part V: Appendices
`
`281
`
`283
`A HTTP 1.0 Protocol Specifications
`Notational Conventions ............................................................................ 284
`Augmented BNF .................................................................................. 284
`Basic Rules .......................................................................................... 285
`
`I Table of Contents
`
`xi
`
`IPR2025-00715
`DATABRICKS EX1008 Page 13
`
`
`
`
`
`375
`G List of World Wide Web Spiders and Robots
`The JumpStation Robot ........................................................................... 376
`RBSE Spider ............................................................................................. 376
`The WebCrawler ...................................................................................... 376
`The NorthStar Robot ................................................................................ 376
`W4 (World Wide Web Wanderer) ............................................................ 377
`The Fish Search ........................................................................................ 377
`The Python Robot .................................................................................... 377
`HTML Analyzer ......................................................................................... 377
`MOMspider .............................................................................................. 377
`HTMLgobble ............................................................................................ 378
`WWWW-the WORLD WIDE WEB WORM ........................................... 378
`WM32 Robot ............................................................................................ 378
`Websnarf .................................................................................................. 379
`The Webfoot Robot .................................................................................. 379
`Lycos ........................................................................................................ 379
`ASpider (Associative Spider) .................................................................... 379
`SG-Scout .................................................................................................. 379
`EIT Link Verifier Robot ............................................................................. 380
`NHSE Web Forager .................................................................................. 380
`Web linker ................................................................................................ 380
`Emacs W3 Search Engine ........................................................................ 380
`Arachnophilia ............................................................................................ 381
`Mac WWWWorm .................................................................................... 381
`Ch URL ...................................................................................................... 381
`Tarspider .................................................................................................. 381
`The Peregrinator ....................................................................................... 381
`Checkbot .................................................................................................. 382
`Webwalk .................................................................................................. 382
`Harvest ..................................................................................................... 382
`Kati po ....................................................................................................... 382
`lnfoSeek Robot ........................................................................................ 383
`GetU R L .................................................................................................... 383
`Open Text Corporation Robot .................................................................. 383
`NIKOS ....................................................................... • ............................... 383
`The TkWWW Robot ................................................................................. 384
`A Tel W3 Robot ........................................................................................ 384
`TIT AN ....................................................................................................... 384
`CS-HKUST WWW Index Server ............................................................... 384
`Spry Wizard Robot ................................................................................... 384
`Weblayers ................................................................................................ 385
`WebCopy ................................................................................................. 385
`Scooter ..................................................................................................... 385
`Aretha ....................................................................................................... 385
`WebWatch ............................................................................................... 385
`
`I Table of Contents
`
`xiii
`
`IPR2025-00715
`DATABRICKS EX1008 Page 15
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`The definition of agents, however, usually deviates
`from such a simple one as delegated software pro(cid:173)
`grams given above. Agent research has drawn upon
`the ideas and results produced by people from
`diverse disciplines, including robotics, software
`engineering, programming languages, computer
`networks, knowledge engineering, machine learn(cid:173)
`ing, cognitive science, psychology, computer
`graphics-even art, music, and film. From this
`diversity of perspectives, not one definition, but a
`rich set of views on agents, has emerged.
`
`In addition to being understood as delegated soft(cid:173)
`ware entities, agents can also be studied along other
`important dimensions, such as coordination, knowl(cid:173)
`edge, creativity, and emotion. The programming and
`social aspects of agents are also important consid(cid:173)
`erations. The remaining sections of this chapter
`explore the concept of agents along these various
`dimensions.
`
`Agents and Delegation
`Agents are primarily human-delegated software
`entities that can perform a variety of tasks for their
`human masters. This section examines their roles
`as personal assistants, desktop agents, surrogate
`bots, and softbots.
`
`begin to reach a larger populace, the current domi(cid:173)
`nant metaphor of direct manipulation
`(Schneiderman 1988), which requires the user to
`initiate all tasks explicitly and to monitor all events,
`might not be the most convenient for many new,
`untrained users. She favors an alternative, comple(cid:173)
`mentary style of interaction called "indire,ct man(cid:173)
`agement," (Kay 1990) which engages the user in a
`cooperative process with a computer program
`known as the intelligent personal assistant.
`
`Maes's work has resulted in agents that provide
`personalized assistance for a variety of tasks,
`including meeting scheduling, e-mail handling, elec(cid:173)
`tronic news filtering, and the selection of books,
`music, and other forms of entertainment. In the pro(cid:173)
`cess of constructing such agents, Maes has identi(cid:173)
`fied the following two problems:
`
`➔ Competence. How does an agent acquire the
`knowledge to decide when, with what, and how
`to help the user?
`
`➔ Trust. How do you ensure that users feel com-
`fortable delegating tasks to an agent?
`
`According to Maes, both problems can be solved
`with a machine-learning approach, where the agent
`learns about its user's habits through interactions
`over time. Specifically, a learning agent gradually
`acquires its competence by the following:
`
`Personal Assistants
`Pattie Maes, an assistant professor with the Mas(cid:173)
`sachusetts Institute of Technology Media Lab, has
`been working to create agents that reduce work
`and information overload for computer users (1994).
`She believes that as computers and networks
`
`➔ Observing and imitating the user
`
`➔ Receiving positive and negative feedback from
`the user
`
`➔ Receiving explicit instructions from the user
`
`➔ Asking other agents for advice
`
`6
`
`p
`
`a
`
`t I
`
`Introduction
`
`IPR2025-00715
`DATABRICKS EX1008 Page 22
`
`
`
`
`
`
`
`tasks users perform frequently. Application devel(cid:173)
`opers implement a defined set of protocols to make
`their applications agent-aware in the New Wave
`environment.
`
`A New Wave user can specify routine tasks by dem(cid:173)
`onstration. Say, for example, the user wants to start
`a database access application, download specific
`information into a spreadsheet, generate a graph
`from the spreadsheet data, copy the graph to a text
`document, and mail it to a group of users. All the
`user needs to do is turn on the recording feature
`and perform the desired sequence of actions inter(cid:173)
`actively. The task is represented as a script docu(cid:173)
`ment on the desktop and can be scheduled for ex(cid:173)
`ecution using the calendar. The script also can be
`edited by the user if needed.
`
`The integration of agent functionality into the desk(cid:173)
`top environment enables users to automate rou(cid:173)
`tine and repetitive tasks quite easily. Because tasks
`can be defined by example, the cognitive overhead
`of learning a scripting language is substantially re(cid:173)
`duced. A user needs only be sufficiently familiar
`with the language to make any necessary modifi(cid:173)
`cations to scripts. In addition, the calendar on the
`New Wave desktop provides an intuitive metaphor
`and convenient mechanism for scheduling agent
`tasks.
`
`Surrogate Bots
`Agents can relieve users of low-level administra(cid:173)
`tive and clerical tasks, such as setting up meetings,
`sending out papers, locating information, tracking
`whereabouts of people, and so on. Research sci(cid:173)
`entists at AT&T Bell Labs, Henry Kautz and Bart
`Selman, and MIT graduate student Michael Coen,
`
`have built and tested an agent system consisting
`of surrogate bots that addresses the real-world prob(cid:173)
`lem of handling the communication involved in
`scheduling a visitor to their laboratory at AT&T Bell
`Labs (1994).
`
`Kautz, Selman, and Coen have identified the fol(cid:173)
`lowing issues as important for successful deploy(cid:173)
`ment of agents: reliability, security, and ease-of-use.
`Users should be able to assume that the surrogate
`bots are reliable and predictable, and human users
`should remain in ultimate control.
`
`They approach the problem in a bottom-up fashion
`by first identifying specific tasks that are both fea(cid:173)
`sible using current technology and also truly useful
`to the everyday users. After this, a set of software
`surrogate bots are designed, implemented, and
`tested with real users.
`
`Visitor Scheduling Bots
`The job of scheduling visitors is quite routine, but it
`consumes a substantial amount of the host's time.
`The normal sequence of tasks are as follows:
`
`1. Announce the upcoming visit by e-mail.
`
`2. Collect responses from people who would like
`to meet with the visitor.
`
`3. Put together a schedule that satisfies as many
`constraints as possible.
`
`4. Send out the schedule to participants.
`
`5. Possibly reschedule people at the last minute
`due to unforeseen events.
`
`In their agent system, a specialized surrogate bot,
`the visitorbot, handles the visitor scheduling. For
`each individual user, there is a userbot whose job
`
`The World of Agents
`
`C
`
`h
`
`a
`
`p
`
`t
`
`e
`
`9
`
`IPR2025-00715
`DATABRICKS EX1008 Page 25
`
`
`
`
`
`The Internet Softbot uses a Unix shell and the World
`Wide Web to interact with a wide range of Internet
`resources. Softbot sensors are analogous to whis(cid:173)
`kers on a physical insect robot, and include Internet
`facilities such as archie, gopher, netfind, and oth(cid:173)
`ers. Softbot effectors are analogous to the mechani(cid:173)
`cal arms and legs on a physical robot, and include
`ftp, telnet, mail, and numerous file manipulation
`commands. The softbot is designed to incorporate
`new sensor and effector facilities into its repertoire
`of Internet-based tools as they become available.
`
`According to Etzioni and Weld, the softbot supports
`a qualitatively different kind of human-computer
`interface. In addition to simply allowing the user to
`interact with the computer, the softbot behaves like
`an intelligent personal assistant. The user can make
`a high-level request, and the softbot uses search,
`inference, and knowledge to determine how to
`satisfy the request. Furthermore, the softbot is
`designed to be robust enough that it can tolerate
`and recover from ambiguity, omissions, and errors
`in human requests.
`
`Softbot Planner
`The planning component of softbot is called the
`softbot planner. It takes as input a logical expres(cid:173)
`sion which describes the user's goal in the form of
`a sentence in first-order predicate logic. For users
`unfamiliar with logical expressions, a graphical fill(cid:173)
`in form that automatically translates to a softbot
`goal is available.
`
`After searching a library of action schemata describ(cid:173)
`ing available information sources, databases,
`utilities, and software commands, a sequence of
`actions to achieve the goal is then generated. The
`softbot planner is able to decompose complex goal
`
`expressions into simpler components and solve
`them with divide-and-conquer techniques. Interac(cid:173)
`tions between subgoals, which are usually prob(cid:173)
`lematic, are automatically detected and resolved.
`
`The softbot planner relies on a logical model of the
`available Internet resources that tells it how these
`resources can be invoked or accessed, as well as
`the effect of doing so. Unlike traditional programs
`and scripts which are committed to a rigid flow of
`control determined by the programmer when the
`program was coded, the softbot planner synthe(cid:173)
`sizes plans on demand when the program is run,
`based upon the user's goal. In the words of Etzioni
`and Weld, a softbot "is worth a thousand shell
`scripts."
`
`Example Softbot Usage
`With the Internet Softbot, for example, a user can
`quickly perform the task of "sending the budget
`memos to Mitchell at CMU" with ease (see fig.
`1.5).
`
`The softbot first disambiguates the reference to
`Mitchell at CMU by executing the command
`finger mitchell@cmu.edu and recording who the
`various Mitchells are at CMU. If necessary, it
`prompts the user to select the intended recipient.
`If it decides to send the memos, the softbot deter(cid:173)
`mines the correct e-mail address and reasons about
`the document format (for example, postscript if it
`contains figures and LaTeX source otherwise). Fur(cid:173)
`thermore, if