`
`1997\~International Conference on
`Intelligent User Interfaces
`
`Editors
`Johanna Moore, Papers & Program Chair
`Ernest Edmonds, Panels
`Angel Puerta, Papers & Debate
`
`Sponsored by:
`ACM SIGART - Special Interest Group on Artificial Intelligence
`ACM SIGCHI
`- Special Interest Group on Computer-Human
`Interaction
`
`GOOGLE EXHIBIT 1029
`
`Page 1 of 10
`
`
`
`The Association for Computing Machinery
`1515 Broadway
`New York, N.Y. 10036
`
`Inc.(ACM). Permission to
`Copyright © 1997 by the Association for Computing Machinery,
`make digital or hard copies of all of this work for personal or classroom use is granted without
`fee provided that the copies are not made or distributed for profit or commercial advantage
`and that copies bear this notice and the full citation on the first page. Copyrights
`for compo-
`nents of this work owned by others than ACM must be honored. Abstracting with credit
`is
`permitted. To copy otherwise,
`to republish,
`to post on servers or to redistribute to lists
`requires prior specific permission and/or a fee. Request permission to republish from: Publi-
`Inc. Fax + 1 (212) 869-0481 or < permissions@acm.org>
`cations Dept. ACM,
`--l
`For other copying of articles that carry a code at the bottom of the first or last page, copying
`is permitted provided that the per-copy fee indicated in the code is paid through the Copyright
`Clearance Center, 222 Rosewood Drive, Danvers, MA 01923.
`
`ACM ISBN: 0-89791-839-8
`
`Additional copies may be ordered prepaid from:
`
`ACM Order Department
`PO Box 12114
`Church Street Station
`New York NY 10257
`
`Q A 7tp, l
`,0183
`,I5
`) 997
`
`ACM European Service Center
`108 Cow ley Road
`Oxford OX 4 UF UK
`Phone: +44-1-865-382338
`Fax: +44-1-865-3811338
`E-mail: acm_europe@acm.org
`URL: http://www.acm.org
`
`ACM Order Number:
`
`608970
`
`1-800-342-6626
`Phone:
`(USA and Canada)
`+ 1-212-944-1318
`Fax: + 1-21-944-1318
`E-mail:
`acmpubs@acm.org
`
`Printed in the U.S.A.
`
`Page 2 of 10
`
`
`
`Multimodal User Interfaces in the Open Agent Architecture
`
`Douglas B. Moran
`Adam J. Cheyer
`Luc E. Julia
`David L. Martin
`SRI International
`333 Ravenswood Avenue
`Menlo Park CA 94025 USA
`+ 14158596486
`{moran,cheyer,julia,martin} @ai.sci.com
`
`Sangkyu Park
`Artificial Intelligence Section
`Electronics and Telecommunications
`Research Institute (ETRI)
`161 Kajong-Dong
`Yusong-Gu, Taejon 305-350 KOREA
`+8242 8605641
`skpark@com.etrLre.kr
`
`ABSTRACT
`The design and development of the Open Agent Architecture
`(OAA) I system has focused on providing access to agent-
`based applications
`through an intelligent, cooperative, dis-
`tributed, and multimodal agent-based user interfaces. The
`current multimodal
`interface supports a mix of spoken lan-
`guage, handwriting and gesture, and is adaptable to the user's
`preferences,
`resources and environment. Only the primary
`user interface agents need run on the local computer,
`thereby
`simplifying the task of using a range of applications
`from a
`variety of platforms, especially low-powered computers such
`as Personal Digital Assistants (PDAs). An important consid-
`eration in the design of the OAA was to facilitate mix-and-
`match:
`to facilitate the reuse of agents in new and unantici-
`pated applications, and to support rapid prototyping by facil-
`itating the replacement of agents by better versions.
`The utility of the agents and tools developed as part of this
`ongoing research project has been demonstrated by their use
`as infrastructure in unrelated projects.
`Keywords:
`agent architecture, multimodal,
`handwriting, natural
`language
`
`speech, gesture,
`
`INTRODUCTION
`systems
`A major component of our research on multiagent
`is in the user interface to large communities of agents. We
`have developed agent-based multimodal user interfaces us-
`ing the same agent architecture used to build the back ends
`of these applications. We describe these interfaces and the
`larger architecture, and outline some of the applications
`that
`have been built using this architecture and interface agents.
`
`for
`copies of all or part of this material
`to make digital/hard
`Permission
`personal or classroom use is granted without
`fee provided that the copies
`are not made or distributed
`for profit or commercial
`advantage,
`the copy-
`right notice,
`the title of the publication
`and its date appear,
`and notice is
`given that copyright
`is by permission
`of the ACM,
`Inc. To copy otherwise,
`to republish,
`to post on servers or to redistribute
`to lists,
`requires
`specific
`permission
`andlor
`fee.
`lUI 97, Orlando Florida USA
`e 1997 ACM 0-89791-839-8/96/01
`
`.. $3.50
`
`OVERVIEW OF OPEN AGENT ARCHITECTURE
`The Open Agent Architecture (OAA) is a multiagent system
`that focuses on supporting the creation of applications
`from
`agents that were not designed to work together,
`thereby fa-
`cilitating the wider reuse of the expertise embodied by an
`agent. Part of this focus is the user interface to these ap-
`plications, which can be viewed as supporting the access of
`human agents to the automated agents. Key attributes of the
`OAAare
`
`• Open: The OAA supports agents written in multi-
`ple languages and on multiple platforms. Currently
`are C, Prolog, Lisp, Java, Mi-
`supported languages
`crosoft's Visual Basic and Borland's Delphi.
`Cur-
`rently supported platforms are PCs (Windows 3.1 and
`95), Sun Workstations (Solaris 1.1 and 2.x) and SGIs.
`
`• Distributed: The agents that compose an application
`can run on multiple platforms.
`
`• Extensible: Agents can be added to the system while
`it is running, and their capabilities will become imme-
`diately available to the rest of the agents. Similarly,
`agents can be dynamically removed from the system
`(intentionally or not).
`
`• Mobile: OAA-based applications
`can be run from a
`lightweight portable computer
`(or PDA) because only
`the user
`interface agents need run on the portable.
`They provide the user with access to a range of agents
`running on other platforms.
`
`• Collaborative: The user interface is implemented with
`agents, and thus the user appears to be just another
`agent
`to the automated agents. This greatly simplies
`creating systems where multiple humans and auto-
`mated agents cooperate.
`
`• Multiple Modalities: The user interface supports hand-
`writing, gesture and spoken language in addition to the
`traditional graphical user interface modalities.
`
`61
`
`Page 3 of 10
`
`
`
`Interaction: Users can enter commands
`• Multimodal
`with a mix of modalities,
`for example, a spoken com-
`mand in which the object to be acted on is identified by
`a pen gesture (or other graphical pointing operation).
`
`The OAA has been influenced by work being done as part
`of DARPA's 13 (Intelligent
`Integration of Information) pro-
`gram (http://isx.com/pub/l3)
`and Knowledge Sharing Effort
`(http://www-ksl.stanford.edulknowledge-sharingl)
`[13].
`
`THE USER INTERFACE
`The User Interface Agent
`The user interface is implemented with a set of agents that
`have at their logical center an agent called the User Inter-
`face (UI) Agent.
`The User
`Interface Agent manages
`the
`various modalities
`and applies additional
`interpretation
`to
`those inputs as needed. Our current system supports speech,
`handwriting and pen-based gestures in addition to the con-
`ventional keyboard and mouse inputs. When speech input
`is detected,
`the UI Agent sends a command to the Speech
`Recognition agent to process the audio input and to return the
`corresponding text. Three modes are supported for speech
`open microphone,
`push-to-talk,
`and click-to-start-
`input:
`talking.
`Spoken and handwritten inputs can be treated as
`either raw text, or interpreted by a natural
`language under-
`standing agent.
`There are two basic styles of user interface. The first style
`parallels the traditional graphical user interface (GUI) for an
`application: The user selects an application and is presented
`with a window that has been designed for the application im-
`plemented by that agent and that is composed of the familiar
`GUI-style items. In this style interface,
`the application is typ-
`ically implemented as a primary agent, with which the user
`interacts, and a number of supporting agents that are used by
`the primary agent, and whose existence is hidden from the
`user. When text entry is needed,
`the user may use handwrit-
`ing or speech instead of the keyboard, and the pen may be
`used as an alternative to the mouse. Because the UI Agent
`handles all the alternate modalities,
`the applications are iso-
`lated from the details of which modalities are being used.
`This simplifies the design of the applications, and simplifies
`adding new modalities.
`In the second basic style of interface, not only is there no
`primary agent,
`the individual agents are largely invisible to
`the user, and the user's requests may involve the cooperative
`actions of multiple agents.
`In the systems we have imple-
`mented,
`this interface is based on natural
`language (for ex-
`ample, English), and is entered with either speech or hand-
`writing. When the UI Agent detects speech or pen-based
`input,
`it invokes a speech recognition agent or handwriting
`recognition agent, and sends the text returned by that agent
`to a natural language understanding agent, which produces a
`logicalform representation of the user's request. This logical
`
`form is then passed to a Facilitator agent, which identifies
`the subtasks and delegates them to the appropriate applica-
`in our Map-based Tourist Informa-
`tion agents. For example,
`tion application for the city of San Francisco,
`the user can ask
`for the distance between a hotel and sightseeing destination.
`The locations of the two places are in different databases,
`which are managed by different agents, and the distance cal-
`culation is performed by yet another agent.
`These two basic styles of interfaces can be combined in a sin-
`In our Office Assistant application,
`gle interface.
`the user is
`presented with a user interface based on the Rooms metaphor
`and is able to access conventional
`applications
`such as e-
`mail, calendar, and databases in the familiar manner.
`In ad-
`dition there is a subwindow for spoken or written natural lan-
`guage commands that can involve multiple agents.
`typi-
`A major
`focus of our research is multimodal
`inputs,
`cally a mix of gesture/pointing with spoken or handwritten
`language. The UI agent manages the interpretation of the in-
`dividual modalities and passes the results to a Modality Co-
`ordination agent, which returns the composite query, which
`is then passed to the Facilitator agent for delegation to the
`appropriate application agents (described in subsequent sec-
`tions).
`
`Speech Recognition
`sub-
`systems,
`speech recognition
`We have used different
`stituting to meet different criteria. We use research sys-
`tems developed by another
`laboratory in our organization
`(http://www-speech.sri.com/)
`[3] and by a commercial spin-
`off from that laboratory? We are currently evaluating other
`speech recognizers,
`and will create agents to interface to
`their application programming interfaces
`(APIs) if they sat-
`isfy the requirements
`for new applications being considered.
`
`Natural Language Understanding
`is
`A major advantage of using an agent-based architecture
`that it provides
`simple mix-and-match
`for the components.
`In developing systems, we have used three different natural
`language (NL) systems: a simple one, based on Prolog DCG
`(Definite Clause Grammar),
`then an intermediate one, based
`on CHAT [16], and finally, our most capable research system
`GEMINI
`[6, 7]. The ability to trivially substitute one natural
`language agent for another has been very useful in rapid pro-
`totyping of systems. The DCG-based agent
`is used during
`the early stages of development because grammars are eas-
`ily written and modified. Writing grammars for the more so-
`phisticated NL agents requires more effort, but provides bet-
`ter coverage of the language that real users are likely to use,
`and hence we typically delay upgrading to the more sophis-
`ticated agents until the application crosses certain thresholds
`of maturity and usage.
`
`and OAA are trademarks
`'Open Agent Architecture
`trademarks of their respective holders.
`2 Nuance Corporation (formerly Corona Corp.), Building 110,333 Ravenswood Avenue, Menlo Park, CA 94025 (domain: coronacorp.com)
`
`of SRI International. Other brand names and product names herein are trademarks
`
`and registered
`
`62
`
`Page 4 of 10
`
`
`
`Pen Input
`We have found that including a pen in the user interface has
`several significant advantages. First,
`the gestures that users
`employ with a pen-based system are substantially richer than
`those employed by other pointing and tracking systems (e.g.,
`a mouse). Second, handwriting is an important adjunct
`to
`spoken language.
`Speech recognizers
`(including humans)
`can have problems with unfamiliar words (e.g., new names).
`Users can use the pen to correct misspelled words, or may
`even anticipate the problem and switch from speaking to
`handwriting.
`Third, our personal experience is that when
`a person who has been using a speech-and-gesture
`interface
`faces an environment where speech is inappropriate,
`replac-
`ing speech with handwriting is more natural.
`Using 2D gestures in the human-computer
`interaction holds
`promise for recreating the pen-and-paper
`situation where the
`user is able to quickly express visual
`ideas while she or he
`is using another modality such as speech. However,
`to suc-
`cessfully attain a high level of human-computer
`cooperation,
`the interpretation of on-line data must be accurate and fast
`enough to give rapid and correct feedback to the user.
`The gestures-recognition
`engine used in our application is
`fully described in [9] as the early recognition process. There
`is no constraint on the number of strokes. The latest eval-
`uations gave better than 96% accuracy, and the recognition
`was performed in less than half a second on a PC 486/50,
`satisfying what we judge is required in terms of quality and
`speed.
`this engine shares pen data with a hand-
`In most applications,
`writing recognizer. The use of the same medium to handle
`two different modalities
`is a source of ambiguities
`that are
`solved by a competition between both recognizers
`in order
`to determine whether
`the user wrote (a sentence or a com-
`mand) or produced a gesture. A remaining problem is to
`solve a mixed input (the user draws and writes in the same
`set of strokes).
`.
`The main strength of the gestures recognition engine is its
`adaptability and reusability.
`It allows the developer
`to easily
`define the set of gestures according to the application. Each
`gesture is actually described with a set of parameters
`such
`as the number of directions, a broken segment, and so forth.
`Adding a new gesture consists of finding the description for
`each parameter. If a conflict appears with an existing object,
`the discrimination is done by creating a new parameter. For
`a given application,
`as few as four parameters are typically
`required to describe and discriminate the set of gestures.
`We can use any handwriting recognizer compatible with Mi-
`crosoft's PenWindows.3
`
`Modality Coordination Agent
`Our interface supports a rich set of interactions between nat-
`ural language (spoken, written, or typed) and gesturing (e.g.,
`than that seen in the put-
`pointing, circling)-much
`richer
`
`that-there systems. Deictic words (e.g., this, them, here) can
`be used to refer to many classes of objects, and also can be
`used to refer to either individuals or collections of individu-
`als.
`for
`is responsible
`(MC) agent
`The Modality Coordination
`combining the inputs in the different modalities to produce a
`single meaning that matches the user's intention. Itis respon-
`sible for resolving references,
`for filling in missing informa-
`tion for an incoming request, and for resolving ambiguities
`by using contexts, equivalence or redundancy.
`Taking into account contexts implies establishing a hierarchy
`of rules between them. The importance of each context and
`the hierarchy may vary during a single session.
`In the actual
`system, missing information is extracted from the dialogue
`context (no graphical context or interaction context).
`When the user says "Show me the photo of this hotel" and
`simultaneously points with the pen to a hotel, the MC agent
`If no hotel is ex-
`resolves references based on that gesture.
`plicitly indicated,
`the MC agent searches the conversation
`context for an appropriate reference (for example,
`the hotel
`may have been selected by a gesture in the previous com-
`If there is no selected hotel
`mand).
`in the current context,
`the MC Agent will wait a certain amount of time (currently
`2 to 3 seconds) before asking the user to identify the ho-
`tel intended. This short delay is designed to accommodate
`different synchronizations
`of speech and gesture: different
`users (or a single user in different circumstances) may point
`before, during or just after speaking.
`the user says "Show me the distance
`In another example,
`from the hotel to here" while pointing at a destination. The
`previous queries have resulted in a single hotel being focused
`upon, and the MC agent resolves "the hotel" from this con-
`text." The gesture provides the MC agent with the referent of
`"here". Processing the resulting query may involve multiple
`agents, for example,
`the location of hotels and sightseeing
`destinations may well be in a different databases, and these
`locations may be expressed in different
`formats,
`requiring
`another agent
`to resolve the differences and then compute
`the distance.
`
`Flexible Sets of Modalities
`The OAA allows
`the user maximum flexibility in what
`modalities will be used.
`Sometimes,
`the user will be on
`a computer
`that does not support
`the full range of modali-
`ties (e.g., no pen or handwriting recognition).
`Sometimes,
`the user's environment
`limits the choice of modalities,
`for
`example, spoken commands are inappropriate
`in a meeting
`where someone else is speaking, whereas in a moving ve-
`hicle, speech is likely to be more reliable than handwriting.
`And sometimes,
`the user's choice of modalities is influenced
`by the data being entered [14].
`With this flexibility,
`the telephone has become our low-end
`user interface to the system. For example, we can use the
`
`is Handwriter for Windows from Communication
`Intelligence Corp (CIC) of Redwood City, CA.
`30ur preferred recognizer
`4User feedback about which items are in focus (contextually)
`is provided by graphically highlighting them.
`
`63
`
`Page 5 of 10
`
`
`
`telephone to check on our appointments, and we use the tele-
`phone to notify us of the arrival and content of important
`e-mail when we are away from our computers.
`in accom-
`This flexibility has also proven quite advantageous
`modating hardware failure. For example, moving the PC for
`one demonstration of the system shook loose a connection
`on the video card. The ill agent detected that no monitor
`was present, and used the text-to-speech agent
`to generate
`the output that was normally displayed graphically.
`the des-
`In another project's demonstration (CommandTalk),
`ignated computer was nonfunctional,
`and an underpowered
`computer had to be substituted. Using the OANs innate ca-
`pabilities,
`the application's
`components were distributed to
`other computers on the net. However,
`the application had
`been designed and tested using the microphone on the local
`computer, and the substitute had none. The solution was to
`add the Telephone agent
`that had been created for other ap-
`plications:
`it automatically replaced the microphone
`as the
`input to the speech recognizer.
`
`Learning the System
`that utilize
`One of the well-known problems with systems
`natural
`language is in communicating
`to the user what can
`and cannot be said. A good solution to this is an open re-
`search problem. Our approach has been to use the design
`of the Gill to help illustrate what can be said: All the sim-
`ple operations can also be invoked through traditional Gill
`items, such as menus,
`that cover much of the vocabulary.
`
`OAAAGENTS
`Overview
`in a high-level
`OAA agents communicate with each other
`language called the Interagent Communication Lan-
`logical
`guage (ICL). ICL is similar in style and functionality to the
`Knowledge Query and Manipulation Language (KQML) of
`the DARPA Knowledge Sharing Effort. The differences are a
`result of our focus on the user interface: ICL was designed to
`be compatible with the output of our natural
`language under-
`standing systems,
`thereby simplifying transforming a user's
`query or command into one that can be handled by the auto-
`mated agents.
`We have developed in initial set of tools (the Agent De-
`velopment Toolkit)
`to assist
`in the creation of agents [11].
`These tools guide the developer through the process, and au-
`tomatically generate code templates from specifications
`(in
`the style of various commercial CASE tools). These tools
`are implemented as OAA agents, so they can interact with,
`and build upon, existing agents. The common agent support
`routines have been packaged as libraries, with coordinated
`libraries for the various languages that we support.'
`These tools support building both entirely new agents and
`creating agents from existing applications,
`including legacy
`systems. These latter agents are called wrappers (or trans-
`ducers);
`they convert between ICL and the application's API
`
`(or other interface if there is no API).
`
`The Facilitator Agent
`the Facilitator agents play a key
`In the OAA framework,
`role. When an agent
`is added to the application,
`it registers
`its capabilities with the Facilitator. Part of this registration
`is the natural
`language vocabulary that can be used to talk
`about
`the tasks that the agent can perform. When an agent
`needs work done by other agents within the application,
`it
`sends a request
`to the Facilitator, which then delegates it to
`an agent, or agents,
`that have registered that they can han-
`dle the needed tasks. The ability of the Facilitator
`to han-
`dle complex requests from agents is an important attribute
`of the OAA design. The goal
`is to minimize the informa-
`tion and assumptions
`that the developer must embed in an
`agent,
`thereby making it easier to reuse agents in disparate
`applications.
`between applica-
`The OAA supports direct communication
`tion agents, but this has not been heavily utilized in our im-
`plementations because our focus has been on aspects of ap-
`plications in which the role of the Facilitator
`is crucial. First,
`we are interested in user interfaces that support
`interactions
`with the broader community of agents, and the Facilitator
`is
`key to handling complex queries. The Facilitator
`(and sup-
`porting agents) handle the translation of the user's model of
`the task into the system model (analogous to how natural lan-
`guage interfaces to databases handle transforming the user's
`model
`into the database's
`schemas). Second,
`the Facilitator
`If a commu-
`simplifies reusing agents in new applications.
`nity of agents is assembled using agents acquired from other
`communities,
`those agents cannot be assumed to all make
`atomic requests that can be handled by other agents: simple
`requests in one application may be implemented by a combi-
`nation of agents in another application. The Facilitator
`is re-
`sponsible for decomposing complex requests and translating
`the terminology used. This translation is typically handled
`by delegating it to another agent.
`In the OAA, the Facilitator
`is a potential bottleneck if there
`is a high volume of communication between the agents. Our
`focus has been on supporting a natural user interface to a
`very large community of intelligent agents, and this environ-
`ment produces relatively low volume through the Facilitator.
`In the CommandTalk application (discussed later),
`the mul-
`tiagent system is actually partitioned into two communities:
`the user interface and the simulator. The simulator has very
`high volume interaction and a carefully crafted communica-
`tion channel and appears as a single agent to the Facilitator
`and the user interface agents.
`
`Triggers
`applications, users
`In an increasing variety of conventional
`triggers (also called monitors, daemons or watch-
`can set
`dogs)
`to take specific action when an event occurs. How-
`ever,
`the possible actions are limited to those provided in
`
`5A release of a version of this software is planned. The announcement will appear on http://www.ai.sri.coml
`
`....oaa/.
`
`64
`
`Page 6 of 10
`
`
`
`that application. The OAA supports triggers in which both
`the condition and action parts of a request can cover the full
`range of functionality represented by the agents dynamically
`connected to the network.
`In a practical real-world example, one of the authors success-
`fully used agent triggers to find a new home. The local rental
`housing market
`is very tight, with all desirable offerings be-
`ing taken immediately. Thus, you need to be among the first
`to respond to a new listing.
`Several of the local newspa-
`pers provide on-line versions of their advertisements before
`the printed versions are available, but there is considerable
`variability in when they actually become accessible. To au-
`tomatically check for suitable candidates,
`the author made
`"When a house
`the following request
`to the agent system:
`for rent is available in Menlo Park for less than 1800 dol-
`lars, notify me immediately." This natural
`language request
`installed a trigger on an agent knowledgeable
`about the do-
`main of World Wide Web sources for house rental
`listings.
`At regular intervals,
`the agent instructs a Web retrieval agent
`to scan data from three on-line newspaper databases. When
`an advertisement meeting the specified criteria is detected,
`a request
`is sent to the Facilitator
`for a notify action to be
`delegated to the appropriate other agents.
`The notify action involves a complex series of interactions
`between several agents, coordinated by the Notify and Facil-
`itator agents. For example,
`if the user is in a meeting in a
`conference room,
`the Notify agent first determines his cur-
`rent location by checking his calendar
`(if no listing is found,
`the default location is his office, which is found from another
`database). The Notify agent then requests contact
`informa-
`tion for the conference
`room, and finds only a telephone
`number. Subsequent
`requests create a spoken version of the
`advertisement and retrieve the user's confirmation password.
`When all required information is collected,
`the Facilitator
`contacts the Telephone agent with a request
`to dial the tele-
`phone, ask for the user, confirm his identity with password
`(entered by TouchTone), and finally play the message. Other
`media,
`including FAX, e-mail and pager, can be considered
`by the Notify agent if agents for handling these services hap-
`pen to be connected to the network.
`
`DISTRIBUTED SYSTEMS
`Multiple Platforms
`that we have implemented run on a
`The OAA applications
`and the exact
`location of individual
`variety of platforms,
`agents is easily changed. We currently support PCs (Win-
`dows 3.1 and 95) and Sun and SGI workstations. Our pri-
`mary user
`interface platform is the PC, partly because it
`currently offers better support for pen-based computing and
`partly because of our emphasis on providing user interfaces
`on lightweight
`computers
`(portable PCs and PDAs in near
`future). PCs also have the advantage of mass-market GUI-
`building packages such as Visual Basic and Delphi. A lesser
`version of the user interface has been implemented under X
`for UNIX workstations.
`
`Even when the UI is on a PC, some of the agents in the UI
`package are running elsewhere. Our preferred speech recog-
`nizer requires a UNIX workstation, and our natural language
`agents and Modality Coordination agent have been written
`for UNIX systems.
`
`Mobile Computing
`We view mobile computing not only as people moving about
`with portable computers using wireless communication,
`but
`also people moving between computers. Today's user may
`have a workstation
`in his office, a personal
`computer
`at
`home, and a portable or PDA for meetings.
`In additional,
`when the user meets with management, colleagues and cus-
`tomers ("customers"
`in the broad sense of the people who
`require his services),
`their computers may be different plat-
`forms. From each of these environments,
`the user should be
`able to access his data and run his applications.
`The OAA facilitates
`supporting multiple platforms because
`only the primary user interface agents need to be running on
`the local computer,
`thereby simplifying the problem of port-
`ing to new platforms and modality devices. Also, since only
`a minimal set of agents need to be run locally,
`lightweight
`computers (portables, PDA, and older systems) have the re-
`sources needed to be able to utilize heavyweight,
`resource-
`hungry applications.
`
`COLLABORATION
`One ofthe major advantages of having an agent-based inter-
`face to a multiagent application is that it greatly simplifies
`the interactions between the user and the application: appli-
`cation agents may interact with a human in the same way
`they interact with any other agent.
`This advantage is readily seen when building collaborative
`systems. Perhaps
`the simplest
`form of collaboration is to
`allow users to share input and output to each other's applica-
`tions. This form of cooperation is inherent
`in the design of
`the OAA:
`it facilitates
`the interoperation of software devel-
`oped by distributed communities,
`especially disparate user
`communities
`(different platforms, different conventions).
`We are currently integrating more sophisticated styles of col-
`laboration into the OAA framework, using the synchronous
`collaborative
`technology [5] built by another group within
`our organization.
`In the resulting systems, humans can com-
`municate with agents, agents can work with other automated
`agents, and humans can interact
`in realtime with other hu-
`mans users.
`
`APPLICATIONS AND REUSE
`and Map-based
`the Office Assistant
`Two applications,
`Tourist 1nformation have been the primary experimental en-
`vironments for this research project. The agent architecture
`and the specific agents developed on this research project
`have proved to be so useful that they are being used by an ex-
`panding set of other projects within our organization. These
`other internal projects are helping us improve the documen-
`
`65
`
`Page 7 of 10
`
`
`
`tation and packaging of our toolkits and libraries, and we are
`hoping to release a version in the near future.
`Some of the projects adopting the OAA have been motivated
`by the availability of various agents, especially the user in-
`terface agents. Some projects have gone further and used
`the OAA to integrate the major software components being
`developed on those projects.
`
`Office Assistant
`The OAA has been used as the framework for a number
`of applications
`in several domain areas.
`In the first OAA-
`based system, a multifunctional
`"office assistant",
`fourteen
`autonomous agents provide information retrieval and com-
`munication services for a group of coworkers in a networked
`computing environment
`([4]). This system makes use of a
`multi modal user interface running on a pen-enabled portable
`PC, and allows for the use of a telephone to give spoken com-
`mands to the system. Services are provided by agents run-
`ning on UNIX workstations, many of which were created by
`providing agent wrappers for legacy applications.
`In a typical scenario, agents with expertise in e-mail process-
`ing, text-to-speech translation, notification planning, calen-
`dar and database access, and telephone control cooperate to
`find a user and alert him or her of an important message. The
`office assistant system provides a compelling demonstration
`of how new services can arise from the synergistic combi-
`nation of the capabilities of components
`that were originally
`intended to operate in isolation.
`In addition, as described
`earlier,
`it demonstrates
`the combination of two basic styles
`of user interaction -
`one that directly involves a particular
`agent as the primary point of contact, and one that anony-
`mously delegates requests across a collection of agents -
`in
`a way that allows the user to switch freely between the two.
`In the interface for this system,
`the initial screen portrays
`an office,
`in which familiar objects are associated with the
`appropriate functionality, as provided by some agent. For in-
`stance, clicking on a wall clock brings up a dialogue that al-
`lows one to interact with the calendar agent (that is, browsing
`and editing one's appointments).
`In this style of interaction,
`even though the calendar agent may calion other agents in
`responding to some request,
`it has primary responsibility,
`in
`that all requests through that dialogue are handled by it.
`The alternative style of interaction is one in which the user
`might speak "Where will I be at 2:00 this afternoon?".
`In
`this case,
`the delegation of the request
`to the appropriate
`agents - which is done by the User Interface agent in con-
`cert with a Facilitator
`agent -
`reflects a style that
`is less
`direct and more anonymous.
`
`Information
`Map-based Tourist
`In a number of domains, access