`
`(12) United States Patent
`Maes et al.
`
`(10) Patent N0.:
`(45) Date of Patent:
`
`US 7,003,463 B1
`Feb. 21, 2006
`
`(54) SYSTEM AND METHOD FOR PROVIDING
`NETWORK COORDINATED
`CO VERSATIO AL SERVICES
`N
`N
`
`(56)
`
`References Cited
`
`US. PATENT DOCUMENTS
`
`(75) Inventors: Stephane H. Maes, Danbury, CT (US);
`Ponani Gopalakrishnan, Yorktown
`Ho1ghtS,NY(U$)
`
`5,544,228 A
`5,594,789 A
`5,774,857 A
`
`8/1996 Wagner er a1,
`1/1997 SeaZholtZ et al.
`6/1998 Newlin
`
`(73) Assignee: International Business Machines
`Corporation, Armonk, NY (US)
`
`(Continued)
`FOREIGN PATENT DOCUMENTS
`
`( * ) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`EP
`
`0450610 A2 10/1991
`(Continued)
`
`OTHER PUBLICATIONS
`
`(21) Appl. No.:
`
`09/806,425
`
`(22) PCT Filed:
`
`Oct. 1, 1999
`
`(86) PCT No.:
`
`PCT/US99/22925
`
`§ 371 (c)(1),
`(2), (4) Date:
`
`Jun. 25, 2001
`
`Patent Abstract of Japan, for publication No.: 09-098221.
`
`(Continued)
`Primary Examiner—W. R. Young
`Assistant Examiner—Matthew J. Sked
`(74)Att0rney, Agent, or Firm—Frank V. DeRosa; F.Chau &
`Associates,LLC
`
`PCT Pub' Date: Apr‘ 13’ 2000
`
`A system and method for providing automatic and coordi
`
`-
`-
`Related US. Application Data
`
`(60) Provisional application No. 60/102,957, ?led on Oct.
`2, 1998, provisional application No. 60/117,595, ?led
`on Jan 27, 1999_
`
`(51) Int, Cl,
`G101, 21/00
`G10L 15/00
`G06F 15/16
`
`(200601)
`(200601)
`(200601)
`
`_
`_
`(52) US' Cl' """"""""" " 704/2701’ 704/231’
`
`_
`
`_
`_
`_
`(58) Field of Classi?cation Search ........... .. 704/2701;
`709/203
`
`nctions
`nate s arm 0 conversationa resources, e. .,
`d h ' g f
`'
`1
`g fu
`'
`and arguments, between network-connected servers and
`devices and their corresponding applications. In one aspect,
`a system for providing automatic and coordinated sharing of
`conversational resources inlcudes a network having a ?rst
`and second network device, the ?rst and second network
`device each comprising a set of conversational resources, a
`dialog manager for managing a conversation and executing
`calls requesting a conversational service, and a communi
`cation stack for communicating messages over the network
`using conversational protocols, wherein the conversational
`protocols establish coordinated network communication
`between the dialog managers of the ?rst and second network
`device to automatically share the set of conversational
`resources of the ?rst and second network device, when
`necessary, to perform their respective requested Conversa_
`tional service.
`
`See application ?le for complete search history.
`
`6 Claims, 5 Drawing Sheets
`
`m MEI.
`
`Page 1 of 17
`
`
`
`US 7,003,463 B1
`Page 2
`
`US. PATENT DOCUMENTS
`
`FOREIGN PATENT DOCUMENTS
`
`8/1999 Tel ........................ .. 704/270.1
`5,943,648 A *
`9/1999 Jacobs et a1. .......... .. 704/2701
`5,956,683 A *
`9/1999 Barclay et al.
`5,960,399 A
`8/2000 Matsumoto ............... .. 704/260
`6,098,041 A *
`9/2000 Kuhn et al. ............... .. 704/270
`6,119,087 A *
`1/2001 Bijl et al. ................. .. 704/235
`6,173,259 B1 *
`2/2001 Loring et al. ..
`704/275
`6,195,641 B1 *
`8/2001 Hughes et al. ......... .. 379/88.03
`6,282,268 B1 *
`8/2001 Kimura et a1. .............. .. 704/10
`6,282,508 B1 *
`6,327,568 B1 * 12/2001 Joost ........... ..
`. 704/270.1
`6,363,348 B1 *
`3/2002 Besling et a1_ _________ __ 704/27Q1
`6,408,272 B1 *
`6/2002 White et al. ........... .. 704/270.1
`704/270-1
`6,456,974 El *
`9/2002 Baker et a1
`6,594,628 B1 *
`7/2003 Jacobs et al. ............. .. 704/231
`6,615,171 B1 *
`9/2003 Kanevsky et al. ........ .. 704/246
`2005/0131704 A1 *
`6/2005 Dragosh et al. ....... .. 704/270.1
`
`EP
`JP
`JP
`JP
`JP
`W0
`
`0654930 A1
`09.098221
`10.207683
`10214255;
`10_228431
`WO 97/4712
`
`5/1995
`4/1997
`8/1998
`8/1998
`8/1998
`12/1997
`
`OTHER PUBLICATIONS
`
`Patent Abstract of Japan, for publication No.: 10-207683.
`Patent Abstract of Japan, for publication No.: 10-214258.
`Patent Abstract of Japan, for publication No.: 10-228431.
`
`* cited by examiner
`
`Page 2 of 17
`
`
`
`U.S. Patent
`
`Feb. 21, 2006
`
`Sheet 1 0f5
`
`US 7,003,463 B1
`
`A02
`
`r103
`
`Local Dialog Manager
`
`Conversational Discovery,
`Regisirolion and
`Negolialion Prolocols
`
`l_ _ _ _ _ _ _ _ _ _ m _ __ _ _ _ _ __ _l
`
`110
`
`Neiworlced Server
`
`Conversolronal Engines
`
`i Q)
`
`'00
`
`/105
`
`122. _ _ _ _ _ ._ _ _
`
`115"‘|
`Server Communicalion
`
`Server Applicalion(s)
`
`Server Conversational l
`Engines
`
`Sewer Dialog Manager
`
`Conversolional Discovery,
`Regislralion and
`Negolialion ProIocoIs
`
`I'-
`I
`I
`I
`I
`I
`I
`I
`|
`I
`I
`I
`I
`
`Page 3 of 17
`
`
`
`U.S. Patent
`
`Feb. 21, 2006
`
`Sheet 2 0f5
`
`US 7,003,463 B1
`
`/200
`Receive inpui Speech Local” or
`Request: from Local Application
`
`Feaiures/ ?
`may I
`“Miami 10 Remoie Server
`1»
`
`/205
`
`"A mm *' Feafures/Wmhnn/M?s_h 3
`
`
`India:
`For Pmusmg
`
`FIG. 2
`
`Page 4 of 17
`
`
`
`U.S. Patent
`
`Feb. 21,2006
`
`Sheet 3 0f5
`
`US 7,003,463 B1
`
`300
`Receive lnpul Speech Locally or
`Requesls From Local Applicalion
`
`r301
`Local Processing?
`
`Yes
`
`302
`
`Allocafe Local Conversafional
`Engine To A Porl
`
`03
`3
`Allocaie Local Engine To Anolher
`Perl ll Nol Currenlly Used
`
`K304
`Allacale Anolher Engine To
`Original Porl ll Local Engine
`is lloi Available
`
`ls Remale Server
`Overloaded?
`
`Perform Remole Processing
`
`FIG. 5
`
`306
`
`Page 5 of 17
`
`
`
`U.S. Patent
`
`Feb. 21, 2006
`
`Sheet 4 0f5
`
`US 7,003,463 B1
`
`3.,SE
`
`5.325
`
`28
`
`_.new.8
`
`.Ezom22.
`
`
`
`.....a:£_a_§§$._.
`
`seam2.35
`
`.52.._8H
`
`a2__._%
`
`.83
`
`.4.o_.._
`
`_;___.___.m_._5:
`
`___E__§§__8.
`
`Page 6 of 17
`
`Page 6 of 17
`
`
`
`U.S. Patent
`
`Feb. 21, 2006
`
`Sheet 5 0f5
`
`US 7,003,463 B1
`
`25825
`
`as;
`
`atom
`
`2.23 .
`
`spasm _8
`
`32% E25
`
`ism E 1
`
`.
`
`EEm 22. lg
`
`32%
`255
`
`$235
`
`533m
`35m
`
`33%;
`
`m
`
`232E
`
`new
`
`32%225
`Bs?wa
`:2 $3322
`
`85%;
`
`E:
`. 52:29
`
`E2525
`
`
`
`4 0. 23%:
`
`0
`
`
`
`, 2.: _.
`
`55%225
`
`2325
`is
`
`Page 7 of 17
`
`
`
`US 7,003,463 B1
`
`1
`SYSTEM AND METHOD FOR PROVIDING
`NETWORK COORDINATED
`CONVERSATIONAL SERVICES
`
`CROSS-REFERENCE TO RELATED
`APPLICATIONS
`
`This application is a US. National Phase Application ?led
`under 35 U.S.C. 371 based on International Application No.
`PCT/US99/22925, ?led on Oct. 1, 1999, Which is based on
`provisional applications U.S. Ser. No. 60/102,957, ?led on
`Oct. 2, 1998, and US. Ser. No. 60/117,595 ?led on Jan. 27,
`1999.
`
`BACKGROUND
`
`1. Technical Field
`The present application relates generally to conversa
`tional systems and, more particularly, to a system and
`method for automatic and coordinated sharing of conversa
`tional functions/resources betWeen netWork-connected
`devices, servers and applications.
`2. Description of Related Art
`Conventional conversational systems (i.e., systems With
`purely voice I/ O or multi-modal systems With voice I/O) are
`typically limited to personal computers (PCs) and local
`machines having suitable architecture and suf?cient process
`ing poWer. On the other hand, for telephony applications,
`conversational systems are typically located on a server
`(e.g., the IVR server) and accessible via a conventional and
`cellular phones. Although such conversational systems are
`becoming increasingly popular, typically all the conversa
`tional processing is performed either on the client side or on
`the server side (i.e., all the con?gurations are either fully
`local or fully client/server).
`With the emergence of pervasive computing, it is
`expected that billions of loW resource client devices (e.g.,
`PDAs, smartphones, etc.) Will be netWorked together. Due to
`the decreasing siZe of these client devices and the increasing
`complexity of the tasks that users expect such devices to
`perform, the user interface (UI) becomes a critical issue
`since conventional graphical user interfaces (GUI) on such
`small client devices Would be impractical. For this reason, it
`is to be expected that conversational systems Will be key
`element of the user interface to provide purely speech/audio
`1/0 or multi-modal 1/0 with speech/audio I/O.
`Consequently, speech embedded conversational applica
`tions in portable client devices are being developed and
`reaching maturity. Unfortunately, because of limited
`resources, it is to be expected that such client devices may
`not be able to perform complex conversational services such
`as, for example, speech recognition (especially When the
`vocabulary siZe is large or specialiZed or When domain
`speci?c/application speci?c language models or grammars
`are needed), NLU (natural language understanding), NLG
`(natural language generation), TTS (text-to-speech synthe
`sis), audio capture and compression/decompression, play
`back, dialog generation, dialog management, speaker rec
`ognition, topic recognition, and audio/multimedia indexing
`and searching, etc. For instance, the memory and CPU (and
`other resource) limitations of a device can limit the conver
`sational capabilities that such device can offer.
`Moreover, even if a netWorked device is “poWerful”
`enough (in terms of CPU and memory) to execute all these
`conversational tasks, the device may not have the appropri
`ate conversational resources (e.g., engines) or conversa
`tional arguments (i.e, the data ?les used by the engines)
`
`2
`(such as grammars, language models, vocabulary ?les, pars
`ing, tags, voiceprints, TTS rules, etc.) to perform the appro
`priate task. Indeed, some conversational functions may be
`too speci?c and proper to a given service, thereby requiring
`back end information that is only available from other
`devices or machines on the netWork. For example, NLU and
`NLG services on a client device typically require server-side
`assistance since the complete set of conversational argu
`ments or functions needed to generate the dialog (e.g.,
`parser, tagger, translator, etc.) either require a large amount
`of memory for storage (not available in the client devices) or
`are too extensive (in terms of communication bandWidth) to
`transfer to the client side. This problem is further exacer
`bated With multi-lingual applications When a client device or
`local application has insuf?cient memory or processing
`poWer to store and process the arguments that are needed to
`process speech and perform conversational functions in
`multiple languages. Instead, the user must manually connect
`to a remote server for performing such tasks.
`Also, the problems associated With a distributed architec
`ture and distributed processing betWeen client and servers
`requires neW methods for conversational netWorking. Such
`methods comprise management of traffic and resources
`distributed across the netWork to guarantee appropriate
`dialog How of for each user engaged in a conversational
`interaction across the netWork.
`Accordingly, a system and method that alloWs a netWork
`device With limited resources to perform complex speci?c
`conversational tasks automatically using netWorked
`resources in a manner Which is automatic and transparent to
`a user is highly desirable.
`
`SUMMARY OF THE INVENTION
`
`The present invention is directed to a system and method
`for providing automatic and coordinated sharing of conver
`sational resources betWeen netWork-connected servers and
`devices (and their corresponding applications). A system
`according to one embodiment of the present invention
`comprises a plurality of netWorked servers, devices and/or
`applications that are made “conversationally aWare” of each
`other by communicating messages using conversational
`netWork protocols (or methods) that alloW each conversa
`tionally aWare netWork device to automatically share con
`versational resources automatically and in a coordinated and
`synchroniZed manner so as to provide a seamless conver
`sational interface through an interface of one of the netWork
`devices.
`In accordance With one aspect of the present invention, a
`system for providing automatic and coordinated sharing of
`conversational resources comprises:
`a netWork comprising at least a ?rst and second netWork
`device;
`the ?rst and second netWork device each comprising
`a set of conversational resources;
`a dialog manager for managing a conversation and
`executing calls requesting a conversational service;
`and
`a communication stack for communicating messages
`using conversational protocols over the netWork,
`Wherein the messages communicated by the conver
`sational protocols establish coordinated netWork
`communication betWeen the dialog managers of the
`?rst and second device to automatically share the set
`of conversational resources of the ?rst and second
`netWork device, When necessary, to perform their
`respective requested conversational service.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`55
`
`60
`
`65
`
`Page 8 of 17
`
`
`
`US 7,003,463 B1
`
`3
`The present invention allows a loW resource client device
`to transparently perform simple tasks locally, as Well as
`complex tasks in binary or analog connection With a server
`(or other device) having more complex conversational capa
`bilities. The server-side functions (such as speech recogni
`tion) can be performed through a regular IP netWork or LAN
`netWork as Well as via digital transmission over a conven
`tional telephone line or a packet sWitched network, or via
`any conventional Wireless data protocol over a Wireless
`netWork.
`Advantageously, the present invention offers a full
`?edged conversational user interface on any device (such as
`a pervasive embedded device) With limited CPU, memory
`and poWer capabilities (as Well as limited conversational
`resources), Which provides complex conversational services
`using a loW resource client device Without the need to
`doWnload, for example, the necessary conversational argu
`ments from a netWork server. The local capabilities alloWs
`the user to utiliZe the local device Without requiring con
`nection, e.g., outside coverage of a Wireless phone provider.
`Also, the cost of a continuous connection is reduced and the
`difficulties of recoveries When such continuous connections
`are lost can be mitigated.
`These and other aspects, features and advantages of the
`present invention Will be described and become apparent
`from the folloWing detailed description of preferred embodi
`ments, Which is to be read in connection With the accom
`panying draWings.
`
`15
`
`25
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a block diagram of a system for providing
`conversational services via automatic and coordinated shar
`ing of conversational resources betWeen netWorked devices
`according to an embodiment of the present invention;
`FIG. 2 is a flow diagram of a method for providing
`conversational services via automatic and coordinated shar
`ing of conversational resources betWeen netWorked devices
`according to one aspect of the present invention;
`FIG. 3 is a flow diagram of a method for providing
`conversational services via automatic and coordinated shar
`ing of conversational resources betWeen netWorked devices
`according to another aspect of the present invention;
`FIG. 4 is a block diagram of a distributed system for
`providing conversational services according to another
`embodiment of the present invention employing a conver
`sational broWser; and
`FIG. 5 is a block diagram of a distributed system for
`providing conversational services according to another
`embodiment of the present invention employing a conver
`sational broWser.
`
`35
`
`40
`
`45
`
`DETAILED DESCRIPTION OF PREFERRED
`EMBODIMENTS
`
`55
`
`It is to be understood that the present invention may be
`implemented in various forms of hardWare, softWare, ?rm
`Ware, special purpose processors, or a combination thereof.
`Preferably, the present invention is implemented in softWare
`as an application comprising program instructions that are
`tangibly embodied on a program storage device (e.g., mag
`netic ?oppy disk, RAM, CD ROM, ROM and Flash
`memory) and executable by any device or machine com
`prising suitable architecture such as one or more central
`processing units (CPU), a random access memory (RAM),
`and audio input/output (I/O) interface(s).
`
`65
`
`4
`It is to be further understood that, because some of the
`constituent system components and method steps depicted
`in the accompanying Figures are preferably implemented in
`softWare, the actual connections betWeen the system com
`ponents (or the process steps) may differ depending upon the
`manner in Which the present invention is programmed.
`Given the teachings herein, one of ordinary skill in the
`related art Will be able to contemplate these and similar
`implementations or con?gurations of the present invention.
`Referring noW to FIG. 1, a block diagram illustrates a
`system for providing conversational services through the
`automatic and coordinated sharing of conversational
`resources and conversational arguments (data ?les) betWeen
`netWorked devices according to an exemplary embodiment
`of the present invention. The system comprises a local client
`device 100 comprising an acoustic front end 101 for pro
`cessing audio/speech input and outputting audio/speech gen
`erated by the client device 100. The client device 100 may
`be, for example, a smartphone or any speech-enabled PDA
`(personal digital assistant). The client device 100 further
`comprises one or more local conversational engines 102 for
`processing the acoustic features and/or Waveforms gener
`ated and/or captured by the acoustic front-end 101 and
`generating dialog for output to the user. The local conver
`sational engines 102 can include, for instance, an embedded
`speech recognition, a speaker recognition engine, a TTS
`engine, a NLU and NLG engine and an audio capture and
`compression/decompression engine as Well as any other type
`of conversational engine.
`The client device 100 further comprises a local dialog
`manager 103 that performs task management and controls
`and coordinates the execution of a conversational service
`(either locally or via a netWork device) that is requested via
`a system call (API or protocol call), as Well as managing the
`dialog locally and With netWorked devices. More speci?
`cally, as explained in greater detail beloW, the dialog man
`ager 103 determines Whether a given conversational service
`is to be processed and executed locally on the client 100 or
`on a remote netWork-connected server (or device). This
`determination is based on factors such as the conversational
`capabilities of the client 100 as compared With the capabili
`ties of other netWorked devices, as Well the available
`resources and conversational arguments that may be neces
`sary for processing a requested conversational service. Other
`factors include netWork traffic and anticipated delays in
`receiving results from netWorked devices. The dialog man
`ager 103 performs task management and resource manage
`ment tasks such as load management and resource alloca
`tion, as Well as managing the dialog betWeen the local
`conversational engines 102 and speech-enabled local appli
`cations 104.
`As shoWn in FIG. 1 by Way of example, the client device
`100 is netWork-connected via netWork 105 to a server 106
`that comprises server applications 109, as Well as server
`conversational engines 107 for providing conversational
`services to the client device 100 (or any other netWork
`device or application) as necessary. As With the local engines
`102, the server engines 107 can include, for instance, an
`embedded speech recognition, a TTS engine, a NLU and
`NLG engine, an audio capture and compression/decompres
`sion engine, as Well as any other type of conversational
`engine. The server 106 comprises a server dialog manager
`108 Which operates in a manner similar to the local dialog
`manager 103 as described above. For example, the server
`dialog manager 108 determines Whether a request for a
`conversational service from the local dialog manager 103 is
`to be processed and executed by the server 106 or on another
`
`Page 9 of 17
`
`
`
`US 7,003,463 B1
`
`5
`remote netWork-connected server or device. In addition, the
`server dialog manager 108 manages the dialog betWeen the
`server conversational engines 107 and speech-enabled
`server applications 109.
`The system of FIG. 1 further illustrates the client device
`100 and the remote server 106 being netWork-connected to
`a server 110 having conversational engines and/or conver
`sational arguments that are accessible by the client 100 and
`server 106 as needed. The netWork 105 may be, for example,
`the Internet, a LAN (local area network), and corporate
`intranet, a PSTN (public sWitched telephone network) or a
`Wireless netWork (for Wireless communication via RF (radio
`frequency), or IR (infrared). It is to be understood that
`although FIG. 1 depicts an client/server system as that term
`is understood by those skilled in the art, the system of FIG.
`1 can include a plurality of netWorked servers, devices and
`applications that are “conversationally aWare” of each other
`to provide automatic and coordinated sharing of conversa
`tional functions, arguments and resources. As explained in
`further detail beloW, such “conversational aWareness” may
`be achieved using conversational netWork protocols (or
`methods) to transmit messages that are processed by the
`respective dialog managers to alloW the netWorked devices
`to share conversational resources and functions in an auto
`matic and synchroniZed manner. Such conversational coor
`dination provides a seamless conversational interface for
`accessing remote servers, devices and applications through
`the interface of one netWork device.
`In particular, to provide conversational coordination
`betWeen the netWorked devices to share their conversational
`functions, resources and arguments, each of the netWorked
`devices communicate messages using conversational proto
`cols (or methods) to exchange information regarding their
`conversational capabilities and requirements. For instance,
`as shoWn in FIG. 1, the client device 100 comprises a
`communication stack 111 for transmitting and receiving
`messages using conversational protocols 112, conversa
`tional discovery, registration and negotiation protocols 113
`and speech transmission protocols 114 (or conversational
`coding protocols). LikeWise, the server 106 comprises a
`server communication stack 115 comprising conversational
`protocols 116, conversational discovery, registration and
`negotiation protocols 117 and speech transmission protocols
`118. These protocols (methods) are discussed in detail With
`respect to a CVM (conversational virtual machine) in the
`patent application IBM Docket No. YO999-111P, ?led con
`currently hereWith, entitled “Conversational Computing Via
`Conversational Virtual Machine,” (i.e., International Appl.
`No. PCT/US99/22927, ?led on Oct. 1, 1999 and correspond
`ing U.S. patent application Ser. No. 09/806,565) Which is
`commonly assigned and incorporated herein by reference.
`Brie?y, the conversational protocols 112, 116 (or What is
`referred to as “distributed conversational protocols” in
`YO999-111P) are protocols (or methods) that alloW the
`netWorked devices (e.g., client 100 and server 106) or
`applications to transmit messages for registering their con
`versational state, arguments and context With the dialog
`managers of other netWork devices. The conversational
`protocols 112, 116 also alloW the devices to exchange other
`information such as applets, ActiveX components, and other
`executable code that alloWs the devices or associated appli
`cations to coordinate a conversation betWeen such devices
`in, e.g., a master/slave or peer-to-peer conversational net
`Work con?guration. The distributed conversational protocols
`112, 116 alloW the exchange of information to coordinate the
`conversation involving multiple devices or applications
`including master/salve conversational netWork, peer conver
`
`15
`
`25
`
`35
`
`40
`
`45
`
`55
`
`65
`
`6
`sational netWork, silent partners. The information that may
`be exchanged betWeen netWorked devices using the distrib
`uted conversational protocols comprise, pointer to data ?les
`(arguments), transfer (if needed) of data ?les and other
`conversational arguments, noti?cation for input, output
`events and recognition results, conversational engine API
`calls and results, noti?cation of state and context changes
`and other system events, registration updates: handshake for
`registration, negotiation updates: handshake for negotiation,
`and discovery updates When a requested resources is lost.
`The (distributed) conversational protocols also comprise
`dialog manager (DM) protocols Which alloW the dialog
`mangers to distribute services, behavior and conversational
`applications, I/O and engine APIs such as described in IBM
`Docket No. Y0999-111P. For instance, the DM protocols
`alloW the folloWing information to be exchanged: (1) DM
`architecture registration (e.g., each DM can be a collection
`of locals DMs); (2) pointers to associated meta-information
`(user, device capabilities, application needs, etc.); (3) nego
`tiation of DM netWork topology (e.g., master/slave, peer
`to-peer); (4) data ?les (conversational arguments) if appli
`cable i.e., if engines are used that are controlled by a master
`DM); (5) noti?cation of I/O events such as user input,
`outputs to users for transfer to engines and/or addition to
`contexts; (6) noti?cation of recognition events; (7) transfer
`of processed input from engines to a master DM; (8) transfer
`of responsibility of master DM to registered DMs; (9) DM
`processing result events; (10) DM exceptions; (11) transfer
`of con?dence and ambiguity results, proposed feedback and
`output, proposed expectation state, proposed action, pro
`posed context changes, proposed neW dialog state; (12)
`decision noti?cation, context update, action update, state
`update, etc; (13) noti?cation of completed, failed or inter
`rupted action; (14) noti?cation of context changes; and/or
`(15) data ?les, context and state updates due to action.
`For instance, in master-slave netWork con?guration, only
`one of the netWorked devices drives the conversation at any
`given time. In particular, the master device (i.e., the dialog
`manager of the master device) manages and coordinates the
`conversation betWeen the netWork devices and decides
`Which device Will perform a given conversational service or
`function. This decision can based on the information pro
`vided by each of the devices or applications regarding their
`conversational capabilities. This decision may also be based
`on the master determining Which slave device (having the
`necessary conversational capabilities) can perform the given
`conversational function most optimally. For instance, the
`master can request a plurality of slaves to perform speech
`recognition and provide the results to the master. The master
`can then select the optimal results. It is to be understood that
`What is described here at the level of the speech recognition
`is the mechanism at the level of the DM (dialog manager)
`protocols betWeen distributed dialog managers (as described
`in Y0999-111P). Indeed When dialog occurs betWeen mul
`tiple dialog managers, the master Will obtain measure of the
`score of the results of each dialog manager and a decision
`Will be taken accordingly to see Which dialog manager
`proceeds With the input, not only on the basis of the speech
`recognition accuracy, but based on the dialog (meaning),
`context and history (as Well as other items under consider
`ation, such as the preferences of the user, the history, and the
`preferences of the application.
`In peer-to-peer connections, each device Will attempt to
`determine the functions that it can perform and log a request
`to do so. The device that has accepted the task Will perform
`
`Page 10 of 17
`
`
`
`US 7,003,463 B1
`
`7
`such task and then score its performance. The devices Will
`then negotiate Which device Will perform the task based on
`their scores.
`In one embodiment, the distributed conversational proto
`cols 112, 116 are implemented via RMI (remote method
`invocation) or RPC (remote procedure call) system calls to
`implement the calls betWeen the applications and the dif
`ferent conversational engines over the netWork. As is knoWn
`in the art, RPC is a protocol that alloWs one application to
`request a service from another application across the net
`Work. Similarly, RMI is a method by Which objects can
`interact in a distributed netWork. RMI alloWs one or more
`objects to be passed along With the request. In addition, the
`information can be stored in an object Which is exchanged
`via CORBA or DCOM or presented in a declarative manner
`(such as via XML). As discussed in the above-incorporated
`patent application IBM Docket No. YO999-111P, conversa
`tional protocols (methods) (or the distributed protocols) can
`be used for achieving distributed implementation of conver
`sational functions supported by a CVM (conversational
`virtual machine) shell betWeen conversational applications
`and the CVM shell via conversational APIs or betWeen the
`CVM and conversational engines via conversational engine
`APIs. The conversational engine APIs are interfaces
`betWeen the core engines and applications using them and
`protocols to communicate With core engines (local and/or
`netWorked). The conversational APIs provide an API layer
`to hook or develop conversationally aWare applications,
`Which includes foundation classes and components to build
`conversational user interfaces.
`Similarly, a dialog manager in accordance With the
`present invention can communicate via APIs With applica
`tions and engines (local and/or netWorked). In this manner,
`a dialog manager can act on the results and call backs from
`all remote procedures (procedural calls to remote engines
`and applications) as if it Was a local application so as to, e.g.,
`arbitrate betWeen the applications and resources (local and/
`or netWorked) to prioritiZe and determine the active appli
`cation, and determine Which result to consider as active.
`The conversational discovery, registration and negotiation
`protocols 113, 117 are netWork protocols (or methods) that
`are used to “discover” local or netWork conversationally
`aWare systems (i.e. applications or devices that “speak”
`conversational protocols). The registration protocols alloW
`devices or applications to register their conversational capa
`bilities, state and arguments. The negotiation protocols
`alloW devices to negotiate master-slave, peer-to-peer or
`silent partner netWork.
`In one embodiment, the discovery protocols implement a
`“broadcast and listen” approach to trigger a reaction from
`other “broadcast and listen” devices. This can alloW, for
`instance, the creation of dynamic and spontaneous netWorks
`(such as Bluetooth and Hopping netWorks discussed beloW).
`In another embodiment, a default server (possibly the mas
`ter) setting can be used Which registers the “address” of the
`different netWork devices. In this embodiment, the discovery
`amounts to each device in the netWork communicating With
`the server to check the list of registered devices so as to
`determine Which devices connect to such devices. The
`information that is exchanged via the discovery protocols
`comprises the folloWing: (1) broadcast requests for hand
`shake or listening for requests; (2) exchange of device
`identi?ers; (3) exchange of handles/pointer for ?rst regis
`tration; and (4) exchange of handles for ?rst negotiation.
`In one embodiment for implementing the registration
`protocols, upon connection, the devices can exchange infor
`mation about their conversational capabilities With a prear
`
`15
`
`25
`
`35
`
`40
`
`45
`
`55
`
`65
`
`8
`ranged protocol (e.g., TTS English, any text, Speech recog
`nition, 500 Words+FSG grammar, no speaker recognition,
`etc.) by exchanging a set of ?ags or a device property object.
`LikeWise, applications can exchange engine requirement
`lists. With a master/slave netWork con?guration, the master
`dialog manager can compile all the lists and match the
`functions and needs With conversational capabilities. In the
`absence of a master device (dialog manager), a common
`server can be used to transmit the conversational informa
`tion to each machine or device in the netWork. The regis
`tration protocols alloW the folloWing information to be
`exchanged: (1) capabilities and load messages including
`de?nition and update events; (2) engine resources (Whether
`a given device includes NLU, DM, NLG, TTS, speaker
`recognition, speech recognition compression, coding, stor
`age, etc.); (3) I/O capabilities; (4) CPU, memory, and load
`capabilities; (5) data ?le types (domain speci?c, dictionary,
`language models, languages, etc.); (6) netWork addresses
`and features; (7