throbber
Subseries of Lecture al
`
`+ Trae1LTE: ee ee | P
`
`
`
`
`
`Notes in Computer Scrence
`
`Moricael8 |
`
`Harry Bunt Robbert-Jan Beun
`Tijn Borghuis (Eds.)
`eae cra
` yoeeeee
`1heae ee nessun
`if iiaMiaAMaPmal
`
`Multimodal
`' Human-Computer
`Communication
`
`Systems, Techniques, and Experiments
`
`
`
`GOOGLE EXHIBIT 1032
`
`Page 1 of 19
`
`

`

`Lecture Notes in Artificial Intelligence
`
`1374
`
`Subseries of Lecture Notes in Computer Science
`Edited by J. G. Carbonell and J. Siekmann
`
`Lecture Notes in Computer Science
`Edited by G. Goos, J. Hartmanis and J. van Leeuwen
`
`Page 2 of 19
`
`

`

`
`
`Springer
`Berlin
`Heidelberg
`New York
`Barcelona
`Budapest
`Hong Kong
`London
`Milan
`Paris
`Santa Clara
`Singapore
`Tokyo
`
`Page 3 of 19
`
`
`
`Page 3 of 19
`
`

`

`Harry Bunt Robbert-Jan Beun
`Tijn Borghuis (Eds.)
`
`Multimodal
`Human-Computer
`Communication
`
`Systems, Techniques,
`and Experiments
`
`Page 4 of 19
`
`

`

`Shy
`
`Te
`4
`4
`%
`
`Me
`i
`ef
`a
`
`aP
`
`s
`&
`
`iP.
`"3
`*e
`‘
`ty
`ey
`me
`4s
`%
`K
`“4
`a
`a
`
`hy
`
`er
`n
`Ay,
`®, {
`my
`ry i
`ie "
`ey
`a
`&
`
`
`
`Volume Editors
`
`Harry Bunt
`
`Tilburg University
`Warandelaan 2, 5000 LETilburg, The Netherlands
`E-mail: bunt@kub.nl
`
`Robbert-Jan Beun
`Center for Research on UserSystem Interaction (IPO)
`P.O. Box 513, 5600 MB Eindhoven,The Netherlands
`E-mail; rjbeun@ipo.tue.nl
`
`Tijn Borghuis
`Eindhoven University of Technology
`P.O, Box 513, 5600 MB Eindhoven,The Netherlands
`E-mail: tijn@ win.tue.nl
`
`Cataloging-in-Publication Data applied for
`g
`PP
`Die Deutsche Bibliothek - CIP-Einheitsaufnahme
`Multimodal human computer communication : systems,
`techniques, and experiments / Harry Bunt... (ed.). - Berlin ;
`Heidelberg ; New York ; Barcelona ; Budapest ; Hong Kong;
`London ; Milan ; Paris ; Santa Clara ; Singapore ; Tokyo : Springer,
`1998
`(Lecture notes in computer science ; Vol. 1374 : Lecture notes in
`artificial intelligence)
`ISBN 3-540-64380-X
`
`CR Subject Classification (1991): 1.2, H.5.1-2, 1.3.6, D.2.2, K.4.2
`
`ISSN 0302-9743
`ISBN 3-540-64380-X SpringerVerlag Berlin Heidelberg New York
`
`This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
`concerned,specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,
`reproduction on microfilms orin any other way, and storage in data banks. Duplicationofthis publication
`orparts thereofis permitted only under the provisions of the German Copyright Law of September9, 1965,
`in dts current version, and permission for use must always be obtained from Springer -Verlag. Violations are
`linble for prosecution under the German Copyright Law.
`© Springer-Verlag Berlin Heidelberg 1998
`Printed in Germany
`
`Typesetting; Camera ready by author
`
`SPIN 10631926 06/9142- 4543210—Printed on acid-free paper
`
`Page 5 of 19
`
`IF-3Y6O
`
`Page 5 of 19
`
`

`

` Preface
`
`
`
`This volume contains revised versions of seventeenselected papers fromthe First
`International Conference on Cooperative Multimodal Communication (CMG/95),
`held in Eindhoven, the Netherlands, in May 1995. This was thefirst conference
`in a series, of which the second one was held in Tilburg, The Netherlands, in
`January 1998. Three of these papers were presented by invited speakers; those
`by Mark Maybury, Bonnie Webber, and Kent Wittenburg. From the submitted
`papers that were accepted by the CMC/95 program committee, thirteen were
`selected for publication in this volume, after revision.
`We thank the program committee for their excellent and timely feedback to
`authors of submitted papers, and at alater stage for advising on the contents
`of this volume and for providing additional suggestions for improving the se-
`lected contributions. The program committee consisted of Norman Badler, Harry
`Bunt, Jeroen Groenendijk, Walther von Hahn, Dieter Huber, Hans Kamp, John
`Lee, Joseph Mariani, Mark Maybury, Paul Me Kevitt, Rob Nederpelt, Kees van
`Overveld, Ray Perrault, Donia Scott, Wolfgang Wahlster, Bonnie Webber, and
`Kent Wittenburg. We thank the Royal Dutch Academy of Sciences (KNAW)
`and the Organization for Cooperation among Universities in Brabant (SOBU)
`for their grants that, made the conference possible.
`
`Harry Bunt
`Robbert-Jan Beun
`Tijn Borghuis
`
`Page 6 of 19
`
`

`

`Issues in Multimodal Human-Computer Communication
`
`Towards Cooperative Multimedia Interaction
`
`Multimodal Cooperation with the DenK System
`Harry Bunt, René Ahn, Robbert-Jan Beun, Tijn Borghwis and
`
`Synthesizing Cooperative Conversation
`Catherine Pelachaud, Justine Cassell, Norman Badler, Mark Steedman,
`Scott Prevost and Matthew Stone
`
`Instructing Animated Agents: Viewing Language in Behavioral
`
` Table of Contents
`
`Part II: Techniques
`
`Modeling and Processing of Oral and Tactile Activities in the
`
`Jacques Sirour, Mare Guyomard, Franck Multon and
`Christophe Remondeau
`
`Multimodal Maps: An Agent-Based Approach
`Adam Cheyer and Lue Julia
`
`Using Cooperative Agents to Plan Multimodal Presentations
`Yi Han and Ingrid Zukerman
`
`Developing Multimodal Interfaces: A Theoretical Framework
`and Guided Propagation Networks
`Jean-Claude Martin, Remko Veldman. and Dominique Béroule
`
`13
`
`39
`
`68
`
`89
`
`101
`
`111
`
`122
`
`158
`
`Page 7 of 19
`
`

`

`A Multimedia Interface for Circuit Board Assembly
`Fergal McCaffery, Michael MeTear and Maureen Murphy
`
`Visual Language Parsing: If I Had a Hammer...
`Kent Witlenburg
`
`Anaphora in Multimodal Discourse
`John Lee and Keith Stenning
`
`Part III: Experiments
`
`Speakers’ Responses to Requests for Repetition in a
`Multimedia Language Processing Environment
`Laurel Fais, Kytung-ho Loke-Kim and Young-Duk Park
`
`Object Reference in Task-Oriented Keyboard Dialogues
`Anita Cremers
`
`Referent Identification Requests in Multi-Modal Dialogs
`Tsuneaki Kato and Yukiko I. Nakano
`
`Studies into Full Integration of Language and Action
`Carla Huls and Edwin Bos
`
`The Role of Multimodal Communication in Cooperation:
`The Cases of Air Traffic Control
`Marie-Christine Bressolle, Bruno Pavard and Marcel Leroux
`
`Author Index
`
`Page 8 of 19
`
`Page 8 of 19
`
`

`

`Multimodal Maps: An Agent-Based Approach
`
`Adam Cheyer and Lue Julia
`
`SRI International
`333 Ravenswood Ave
`Menlo Park, CA 94025 - USA
`
`In this paper, wediscuss how multiple input modalities may
`be combined to produce more natural user interfaces. To illustrate this
`technique, we present a prototype map-based application for a travel
`planning domain, The application is distinguished by a synergistic com-
`bination of handwriting, gesture and speech modalities; access to exist-
`ing data sources including the World Wide Web; and a mobile handheld
`interface. To implement
`the described application, a hierarchical dis-
`tributed network of heterogeneous software agents was augmented by
`appropriate functionality for developing synergistic multimodal applica-
`
`Introduction
`
`As computer systems become more powerful and complex, efforts to make com-
`puter interfaces more simple and natural become increasingly important. Nat-
`ural interfaces should be designed tofacilitate communication in ways people
`are already accustomed to using. Such interfaces allow users to concentrate on
`the tasks they are trying to accomplish, not worry about what they must do to
`control the interface.
`In this paper, we begin by discussing what input modalities humans are
`comfortable using when interacting with computers, and how these modalities
`should best be combined in order to produce natural interfaces. In Sect. 3, we
`present a prototype map-based application for the travel planning domain which
`uses a synergistic combination of several input modalities. Section 4 describes
`the agent-based approach we used to implement the application and the work on
`whichit is based. In Sect. 5, we summarize our conclusions andfuture directions,
`
`2 Natural Input
`
`Input Modalities
`
`Page 9 of 19
`
`

`

`dimension direct manipulation interfaces. Gestures allow users to communicate
`a surprisingly wide range of meaningful requests with a few simple strokes, Re-
`search has shown that multiple gestures can be combined to form dialog, with
`rules of temporal grouping overriding temporal sequencing (Rhyne, 1987). Ges-
`tural commandsare particularly applicable to graphical or editing type tasks,
`Direct manipulation interactions possess many desirable qualities: commu-
`nication is generally fast and concise; input techniques are easy to learn and
`remember; the user has a good idea about what can be accomplished, as thevi-
`sual presentation of the available actions is generally easily accessible. However,
`direct manipulation suffers from limitations when trying to access or describe
`entities which are not or can not bevisualized by the user.
`Limitations of direct manipulation style interfaces can be addressed by an-
`other interface technology, that of natural language interfaces. Natural language
`interfaces excel
`in describing entities that are not currently displayed on the
`monitor,
`in specifying temporal relations between entities or actions, and in
`identifying members ofsets. These strengths are exactly the weaknesses of di-
`rect manipulation interfaces, and concurrently, the weaknesses of natural lan-
`guage interfaces (ambiguity, conceptual coverage, etc.) can be overcome by the
`strengths of direct manipulation.
`Natural language content can be entered throughdifferent input modalities,
`including typing, handwriting, and speech. It is important to note that, while
`the same textual content can be provided by the three modalities, each modality
`has widely varying properties.
`
`— Spoken language is the modality usedfirst and foremost in human-human
`interactive problemsolving (Cohenetal., 1990). Speechis an extremely fast
`medium, several times faster than typing or handwriting. In addition, speech
`input contains content that is not present in other forms of natural language
`input, such as prosidy,
`tone and characteristics of the speaker (age, sex,
`accent).
`— Typing is the most common way of entering information into a computer,
`because it is reasonably fast, very accurate, and requires no computational
`resources.
`
`~ Handwriting has been shown to be useful for certain types of tasks, such as
`performing numerical calculations and manipulating names which are dif-
`ficult to pronounce (Oviatt, 1994; Oviatt and Olson, 1994). Because of its
`relatively slow production rate, handwriting may induce users to produce
`different types of input
`than is generated by spoken language; abbrevia-
`tions, symbols and non-grammatical patterns may be expected to be more
`prevalent amid written input.
`
`Page 10 of 19
`
`Page 10 of 19
`
`

`

`Multimodal Maps: An Agent-Based Approach
`
`113
`
`2.2 Combination of Modalities
`
`As noted in the previous section, direct manipulation and natural language seem
`to be very complementary modalities. It is therefore not surprising that a number
`of multimodal systems combine the two.
`Notable among such systems is the Cohen’s Shoptalk system (Cohen, 1992),
`a prototype manufacturing and decision-support systemthat aids in tasks such
`as quality assurance monitoring, and production scheduling. The natural lan-
`guage module of Shoptalk is based on the Chat-85 natural language system
`(Warren and Perreira, 1982) and is particularly good at handling time, tense,
`and temporal reasoning.
`A numberofsystems havefocused on combining the speedof speech with the
`reference provided by direct manipulation of a mouse pointer. Such systems in-
`clude the XTRA system(Allegayeretal, 1989), CUBRICON(Neal and Shapiro,
`1991), thePAC-Amodeus model (Nigay and Coutaz, 1993), and TAPAGE (Faure
`XTRA and CUBRICONareboth systems that combine complex spoken
`input with mouseclicks, using several knowledgesources for reference identifica-
`tion.CUBRICON’s domainis a map-basedtask, makingit similar to the applica-
`tion developedin this paper. However, the two are different in that CUBRICON
`ean only use direct manipulation to indicate a specific item, whereas our sys-
`tem produces a richer mixing of modalities by adding both gestural and written
`language as input modalities.
`‘The PAC-Amodeussystems such as VoicePaint and Notebook allow the user
`to synergistically combine vocal or mouse-click commands when interacting with
`notes or graphical objects. However, due to the selected domains, the natural
`language inputis very simple, generally of the style “Insert a note here”,
`TAPAGBis another system that allows true synergistic combination of spo-
`ken input with direct manipulation. Like PAC-Amodeus, TAPAGE’s domain
`provides only simple linguistic input. However, TAPAGE uses a pen-based in-
`terface instead of a mouse, allowing gestural commands. TAPAGE,selected as a
`building block for our map application, will be described morein detail in Sect.
`
`Other interesting work regarding the simultaneous combination of handges-
`tures and gaze canbe foundin Bolt (1980) and Koons, Sparrell and Thorisson
`
`3° A Multimodal Map Application
`
`In this section, we will describe a prototype map-based applicationfor atravel
`planning domain.In order to provide the most natural user interface possible, the
`
`Page 11 of 19
`
`

`

`0.2ma|Travel Planning: San Francisco
`
`Fig. 1. Multimodal application for travel planning
`
`— The user interface must be light and fast enough to run on a handheld PDA
`while able to access applications and datathat may require a more powerful
`machine,
`~ Existing commercial or research natural language and speech recognition
`systems should be used.
`interface, a user must be able to transparently
`~ Through the multimodal
`access a wide variety of data sources, including information stored in HTML
`form on the World Wide Web,
`
`Asillustrated in Fig. 1, the user is presented with a pen sensitive map dis-
`play on which drawn gestures and written natural language statements may be
`combined with spoken input. As opposed toa static paper map,thelocation,res-
`olution, and content presented by the map change, according to the requests of
`the user. Objects of interest, such as restaurants, movie theaters, hotels, tourist
`sites, municipal buildings, etc. are displayed as icons. The user may ask the map
`to perform various actions. For example:
`~ distance calculation: e.g. “How far is the hotel from Fisherman's Wharf?”
`object location: e.g. “Where is the nearest post office?”
`jiltering : e.g. “Display the French restaurants within 1 mile of this hotel.”
`informationretrieval : e.g. “Show meall available information about Alca-
`traz.”
`
`~
`
`The application also makes use of multimodal (multimedia) output as well
`as input; video, text, sound and voice can all be combined when presenting an
`answer to a query.
`
`Page 12 of 19
`
`Page 12 of 19
`
`

`

`Multimodal Maps: An Agent-Based Approach
`
`115
`
`During input, requests can be entered using gestures (see Fig. 2 for sample
`gestures), handwriting, voice, or a combination of pen and voice, For instance,
`in order to calculate the distance between two points on the map, a command
`may be issued using the following:
`gesture, by simply drawing a line between the two points of interest.
`~ voice, by speaking “Whatis the distance from thepost office to the hotel?”,
`handwriting, by writing “dist. p.o. to hotel?”
`— synergistic combination ofpen and voice, by speaking “What is the distance
`from here to this hotel?” while simultaneously indicating the specified loca-
`tions by pointing or circling.
`Notice that in our example of synergistic combination of pen and voice, the
`arguments to the verb “distance” can be specified before, at the same time, or
`shortly after the vocalization of the request to calculate thedistance. If a user's
`request is ambiguous or underspecified, the system will wait several seconds and
`then issue a prompt requesting additional information.
`Theuserinterface runs on pen-equipped PC’s or a Dauphin handheld PDA
`(Dauphin, DTR-1 User’s Manual) using either a microphoneora telephonefor
`voice input. The interface is connected either by modemor ethernet to a server
`machine which will manage database access, natural language processing and
`speechrecognitionfor the application. The resultis a mobile system that provides
`a synergistic pen/voiceinterface to remote databases.
`For gestural com-
`In general, the speed of the system is quite acceptable.
`mands, which are handled locally on the user interface machine, a response is
`producedin less than one second. For handwritten commands, the time to ree-
`ognize the handwriting, process the English query, access a database and begin
`to display the results on the user interface is less than three seconds (assuming
`an ethernet connection, and good network and database response). Solutions to
`verbal commandsare displayed in three to five seconds after the end of speech
`has been detected; partial feedback indicating the current status of the speech
`recognition is providedearlier.
`
`Select
`
`Move, Scroll, Select
`
`Zoom In
`
`Distance
`
`Fig. 2. Sample gestures
`
`Page 13 of 19
`
`

`

`to augmenta proven agent- based architecture with functionalities developed for
`a synergistically multimodal application. The result is a flexible methodology for
`designing and implementing distributed multimodal applications.
`
`4.1 Building Blocks
`Open Agent Architecture. The Open Agent Architecture (OAA) (Cohen
`et al., 1994) provides a framework for coordinating a society of agents which
`interact to solve problems for the user. Through the use of agents, the OAA
`provides distributed access to commercial applications, such as mail systems,
`calendar programs, databases, ete.
`The Open Agent Architecture possesses several properties which make it a
`goodcandidate for our needs:
`AnInteragent Communication Language (ICL) and Query Protocol have
`been developed, allowing agents to communicate among themselves. Agents
`can run ondifferent platforms and be implemented in a variety of program-
`ming languages.
`— Several natural language systems have been integrated into the OAA‘which
`convert English into the Interagent Communication Language. Inaddition,
`a speechrecognition agent has been developedto provide transparent access
`to the Coronaspeech recognition system.
`~ Theagent architecture has been used to provide natural language andagent
`access to various heterogeneous data and knowledge sources.
`~ Agent interaction is very fine-grained. The architecture was designed so that
`a number of agents can work together, when appropriatein parallel, to pro-
`ducefast responses to queries,
`The architecture for the OAA, based loosely on Schwartz's FLiPSIDE system
`(Schwartz, 1993), uses a hierarchical configuration where client agents connect ta
`a “facilitator” server. Facilitators provide content-based message routing, global
`data management, and process coordination for their set of connected agents.
`Facilitators can, in turn, be connectedasclients of other facilitators. Each facil-
`itator records the published functionality of their sub-agents, and when queries
`arrive in Interagent. Communication Language form,
`they are responsible for
`breaking apart any complex queries and for distributing goals to the appropri-
`ate agents. An agent solving a goal may require supporting information and
`the agent architecture provides numerous means of requesting data from other
`agents or from the user,
`Among the assortment of agentarchitectures, the Open Agent Architecture
`can be most closely compared to work by the ARPA knowledge sharing commu-
`nity (Genesereth andSingh, 1994). The OAA’s query protocol, Interagent Com-
`munication Language andFacilitator mechanisms have similar instantiations in
`
`Page 14 of 19
`
`Page 14 of 19
`
`

`

`Multimodal Maps: An Agent-Based Approach
`
`117
`
`the SHADEproject, in the form ofKQML, KIF andvarious independent capabil-
`ity matchmakers. Other agent architectures, such as General Magic’s Telescript
`(General Magic, 1995), MASCOS (Park et al, submitted), or the CORBA dis-
`tributed object approach (Object Management Group, 1991) do not provideas
`fully developed mechanisms for interagent communication and delegation.
`The Open Agent Architecture provides capability for accessing distributed
`knowledge sources through natural language andvoice, butit is lacking integra-
`tion with a synergistic multimodal interface.
`
`TAPAGE. TAPAGE (edition de Tableaux par la Parole et la Geste) is a syn-
`ergistic pen/voice system for designing andcorrecting tables.
`To capture signals emitted during a user’s interaction, TAPAGE integrates
`a set of modality agents, each responsible for a very specialized kind of signal
`(Faure and Julia, 1994), The modality agents are connected to an ‘interpret
`agent’ which is responsible for combining the inputs across all modalities to
`form a valid commandfor the application. The interpret agent receives filtered
`results from the modality agents, sorts the information into the correct fields,
`performs type-checking on the arguments, and prompts the user for any missing
`information, according to the model of the interaction. Theinterpret agent is also
`responsible for merging the data streams sent by the modality agents, and for
`resolving ambiguities among them, based on its knowledgeof the application’s
`internal state, Another function of the interpret agent is to produce reflexes:
`reflexes are actions output at the interface level without involving the functional
`core of the application.
`‘The TAPAGE systemcan accept multimodal input, but it is not a distributed
`system; its functional core is fixed. In TAPAGE, theset of linguistic inputis
`limited to a verb object argument format.
`
`In the Open Agent Architecture, agents are distributed entities that can run
`on different machines, and communicate together to solve a task for the user.
`In TAPAGE,agents are used to provide streams of input to a central interpret
`Process, responsible for merging incoming data. A generalization of these two
`types of agents could be:
`Macro Agents: contain some knowledgeandability to reason about a domain,
`and can answer or make queries to other macro agents using the Interagent,
`Communication Language.
`Micro Agents: are responsible for handling a single input or output data
`stream, either filtering the signal to or from a hierarchically superior ‘interpret’
`
`The network architecture that we used was hierarchical at two resolutions:
`Micro agents are connected to a superior macro agent, and macro agents are
`
`Page 15 of 19
`
`

`

`among agents produced by auser's request.
`Speech Recagnition (SR) Agent: The SR agent provides a mapping fromthe
`Interagent Communication Language to the API for the Decipher (Corona)
`speech recognition system (Cohenet al., 1990), a continuous speech speaker
`independent recognizer based on Hidden Markov Modeltechnology. This macro
`agent is also responsible for supervising a child micro agent whose task is to con-
`trol the speech datastream. The SR. agent can provide feedback to aninterface
`agent about the current status and progress of the micro agent (e.g. “listening”,
`“end of speech detected”, etc.) This agent is written in G;
`Natural Language (NL) Parser Agent: translates English expressions into the
`Interagent Communication Language (ICL), For a more complete description of
`the ICL, see Cohen etal. (Cohenet al., 1994). The NL agent we selected for
`ourapplicationis the simplest of those integrated into the OAA. It is written in
`Prolog using Definite Clause Grammars, and supports adistributed vocabulary;
`each agent dynamically adds word definitions as it connects to the network,
`A current project is underway to integrate the Gemini natural language sys-
`tem (Cohen et al., 1990), a robust bottom up parser and semantic interpreter
`specifically designed for use in Spoken Language Understanding projects,
`Database Agents: Database agents can reside at local or remote locations
`and can be grouped hierarchically according to content. Micro agents can be
`connected to database agents to monitor relevant positions or events in real
`time. In our travel planning application, database agents provide maps for each
`city, as well as icons, vocabulary and information about available hotels, restau-
`rants, movies, theaters, municipal buildings and tourist attractions. Three types
`of databases were used: Prolog databases, X.500 hierarchical databases, and
`data loaded automatically by scanning HTML pages from the World Wide Web
`(WWW). In one instance, a local newspaper provides weekly updates to its
`Mosaic-accessible list of current movie times and reviews, as well as adding sey-
`eral new restaurant reviews to a growing collection; this information is extracted
`by an HTMLreading database agent and made accessible to the agent archi-
`tecture. Descriptions and addresses of new restaurants are presented to the user
`on request, and the user can choose to add them to the permanent database
`by specifying positional coordinates on the map (e.g. “add this new restaurant
`here”), information lacking in the WWW database.
`Reference Resolution Agent: This agent is responsible for merging requests
`arriving in parallel from different modalities, and for controlling interactions
`between the user interface agent, database agents and modality agents. In this
`implementation, the reference resolution agent is domain specific: knowledge is
`encoded as to what actions must be performed to resolve each possible type of
`ICL request in its particular domain. For a given ICLlogical form, the agent can
`verify argument types, supply default. values, and resolve argument references.
`Some argument references are descriptive (“How faris it to the hotel on Emerson
`Street?”); in this case, a domain agent will try to resolve the definite reference by
`
`Page 16 of 19
`
`Page 16 of 19
`
`

`

`Multimodal Maps: An Agent-Based Approach
`
`119
`
`sending database agent requests, Other references, particularly when contextual
`or deictic, are resolved by the user interface agent (“What are the rates for this
`hotel?”). Once arguments to a query have beenresolved, this agent coordinates
`the actions and calculations necessary to produce the result of the request.
`Interface Agent: This macro agent is responsible for managing whatis cur-
`rently being displayed to the user, and for accepting the user’s multimodalinput.
`The Interface Agent also coordinates client modality agents and resolves ambi-
`guities among them : handwriting and gestures are interpreted locally by micro
`agents and combined with results from the speech recognition agent, running
`on a remote speech server, The handwriting micro-agent
`interfaces with the
`Microsoft PenWindows API andaccesses a handwriting recognizer by CIC Cor-
`poration. The gesture micro- agent accesses recognition algorithms developed
`
`Animportant task for the interface agent is to record which objects of each
`type are currently salient, in order to resolve contextual references such as “the
`hotel” or “where I was before.” Deictic references are resolved by gestural or
`direct manipulation commands. If no such indication is currently specified, the
`user interface agent waits long enoughto give the user an opportunity to supply
`the value, and then prompts the userforit.
`
`
`
`{
`a
`I Facilitator Agents
`a) Maero Agents
`i
`i
`_ Modality Agents
`
`'
`
`| NL: Natural Language Agent
`|
`: SR: Speech Recognition Agent
`RR: Reference Resolution Agent
`| UL: UserInterface Agents
`| WWW: World Wide Web Agent |
`
`Fig. 3. Agent Architecture for Map Application
`
`Page 17 of 19
`
`

`

`1. A user speaks: “How far is the restaurant from this hotel?”
`2. The speech recognition agent monitors the status and results from its micro
`agent, sending feedback received by the user interface agent. When the string
`is recognized, a translation is requested.
`3. The English request is received by the NL agent and translated into ICL
`form.
`4. The reference resolution agent (RR) receives the ICL distance request con-
`taining one definite and onedeictic reference and asks for resolution of these
`references.
`5. The interface agent uses contextual structures to find what “the restaurant”
`refers to, and waits for the user to make a gesture indicating “the hotel”,
`issuing prompts if necessary.
`the domain agent (RR) sends
`6. When the references have been resolved,
`database requests asking for the coordinates of the items in question. It
`then calculates the distance according to the scale of the currently displayed
`map, and requests the user interface to produce output displaying the result
`of the calculation.
`
`5 Conclusions
`
`By augmenting an existing agent-based architecture with concepts necessary for
`synergistic multimodal input, we were able to rapidly develop a map-based ap-
`plication for a travel planning task. The resulting application has met ourinitial
`requirements: a mobile, synergistic pen/voice interface providing good natural
`language access to heterogeneous distributed knowledge sources. The approach
`used was general and should provide a for developing synergistic multimodal
`applications for other domains.
`The system described here is one of the first that accepts commands made
`of synergistic combinations of spoken language, handwriting and gestural input.
`This fusion of modalities can produce more complex interactions than in many
`systems and the prototype application will serve as a testbed for acquiring a
`better understanding of multimodal input.
`In the near future, we will continue to verify and extend our approach by
`building other multimodal applications. We are interested in generalizing the
`methodology even further; work has already begun on an agent-building tool
`which will simplify and automate many of the details of developing new agents
`and domains,
`
`References
`
`Allegayer, J., Jansen-Winkeln, R., Reddig, C. and Reithinger, N. (1989) Bidirectional
`use of knowledge in the multi-modal NL access system XTRA. In Proceedings of
`IJCAI-89, Detroit, pp. 1492-1497.
`
`Page 18 of 19
`
`Page 18 of 19
`
`

`

`Multimodal Maps: An Agent-Based Approach
`
`121
`
`Bolt, R. (1980) Put that there: Voice and Gesture at the Graphic Interface, Computer
`Graphics, 14(3), pp. 262-270,
`Cohen, M., Murveit, H., Bernstein, J., Price, P., and Weintraub, M. (1990) The DE-
`CIPHER. Speech Recognition System. In 1990 IEE ICASSP, pp. 77-80.
`Cohen, P. (1992) Therole of natural languagein a multimodal interface. In Proceedings
`of UIST’92, pp. 143-149,
`Cohen, P.R., Cheyer, A., Wang, M. and Baeg, §.C. (1994) An Open Agent Architecture.
`In Proceedings AAAI'94 — SA, Stanford, pp. 1-8.
`Dauphin DTR-1 User’s Manual, Dauphin Technology, Inc., Lombard, Ill 60148.
`Faure, C. and Julia, L. (1994) An Agent-Based Architecturefor a Multimodal Interface.
`In Proceedings AAAI'94 — IM4S, Stanford, pp. 82-86.
`Genesereth, M, and Singh, N.P. (1994) A knowledge sharing approachto software inter-
`operation, unpublished manuscript, Computer Science Department, Stanford Uni-
`
`Teleseript Product Documentation (1995), General Magie Inc.
`Koons, D.B., Sparrell, C.J,, and Thorisson, K.R. (1993) Integrating Simultaneous In-
`put from Speech, Gaze and Hand Gestures. In Intelligent Multimedia Interfaces,
`Maybury, M.T. (ed.), Menlo Park: AAAI Press/MITPress.
`(1993)
`Intelligent Multimedia Interfaces, Menlo Park: AAAI
`
`(1991) Intelligent Multi-media Interface Technology.
`Neal, J.G., and Shapiro, §5,C,
`In Intelligent User Interfaces, Sullivan, J.W. and Tyler, $.W. (eds.), Reading:
`Addison-Wesley Pub. Co., pp. 11-43.
`Nigay, L. and Coutaz, J. (1993) A Design Space for Multimodal Systems: Concurrent
`Processing and Data Fusion. In Proceedings InterCHI'93, Amsterdam, ACM Press,
`
`Object Management Group (1991) The CommonObject Request Broker: Architeeture
`and Specification, OMG Document Number 91,12.1.
`Oviatt, 8. (1994) Toward Empirically-Based Design of Multimodal Dialogue Systems.
`In Proceedings of AAAI’94 - IM4S, Stanford, pp. 30-36.
`Oviatt, S. and Olsen, E. (1994) Integration Themes in Multimodal Human-Computer
`Interaction. In Proceedings of ICSLP’94, Yokohama, pp. 551-554.
`Park, 8.K., Choi J.M., Myeong-WukJ., Lee G.L., and Lim Y.H, (submitted for publi-
`cation), MASCOS : A Mulli-Agent System as the Computer Seeretary.
`Rhyne J. (1987) Dialogue Management for Gestural Interfaces, Computer Graphics,
`
`(1993) Cooperating heterogeneous systems: A blackboard-based meta
`approach, Technical Report 93-112, Center for Automation and Intelligent Systems
`Research, Case Western Reserve University, Cleveland Ohio, (unpublished PhD.
`
`Sullivan, J. and Tyler, S. (eds.) (1991) Intelligent User Interfaces, Reading: Addison-
`
`Warren, D. and Pereira, F. (1982) An Efficient Easily Adaptable System for Interpret-
`ing Natural Language Queries, American Journal of Computational Linguistics,
`
`Page 19 of 19
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket