`
`US 20020072918A1
`
`(19) United States
`(12) Patent Application Publication (10) Pub. No.: US 2002/0072918 A1
`White et al.
`(43) Pub. Date:
`Jun. 13, 2002
`
`(54) DISTRIBUTED VOICE USER INTERFACE
`
`Related US. Application Data
`
`(76) Inventors: George M. White, Delray Beach, FL
`(US); James J. Buteau, Mountain
`View, CA (US); Glen E. Shires,
`Danville, CA (US); Kevin J. Surace,
`Sunnyvale, CA (US); Steven
`Markman, L05 Gatos, CA (Us)
`
`Correspondence Address:
`SKJERVEN MORRILL MACPHERSON LLP
`THREE EMBARCADERO CENTER
`28TH FLOOR
`SAN FRANCISCO, CA 94111 (Us)
`
`(21) Appl, No;
`
`10/057,523
`
`(22) Filed:
`
`Jan. 22, 2002
`
`(62) Division of application No. 09/290,508, ?led on Apr.
`12, 1999.
`
`Publication Classi?cation
`
`(51) Int. Cl.7 ......................... .. G10L 21/00; G10L 11/00
`(52) US. Cl. ........................................................ .. 704/270.1
`
`(57)
`
`ABSTRACT
`
`A distributed voice user interface system includes a local
`device Which receives speech input issued from a user. Such
`speech input may specify a command or a request by the
`user. The local device performs preliminary processing of
`the speech input and determines Whether it is able to respond
`to the command or request by itself. If not, the local device
`initiates communication With a remote system for further
`processing of the speech input.
`
`14d
`i
`
`.#—! H Local 1
`
`
`
`14e
`3
`
`Device l
`
`A
`
`a
`
`Local
`
`Device
`
`14‘
`i
`
`
`
`E '
`
`Local
`
`Device
`
`k
`
`18'
`
`LAN
`
`W ;lL__wiL I
`
`1
`
`l 1 12
`
`*
`
`Remote System
`
`Telecommunlc
`Network
`
`' ations
`
`i
`
`"—‘—
`
`14b
`Local
`.
`Devlce /
`
`, L
`10
`Distributed Voice User Interface System
`
`Page 1 of 18
`
`
`
`Patent Application Publication Jun. 13, 2002 Sheet 1 0f 5
`
`US 2002/0072918 A1
`
`93. E03
`\ @250
`
`{2532
`
`AJ
`
`
`
`
`
`E226 momtBE 6w; 86> UmSQEwE
`
`w .5
`
`m3
`
`Page 2 of 18
`
`
`
`Jun. 13, 2002 Sheet 2 of 5
`
`US 2002/0072918 Al
`
`
`
`8c
`
`
`
`
`
`||
`
`Patent Application Publication
`
`0¢\qusuodw0)faenoy
` Voor;|fxejdsiqyoeg-AeldhkoT||||
`
`
`z‘blsa
`‘me?ZeBorenaeoul;/9Zauopeeuegyoeds=|ty
`
`Bulsse90JdzeJve™uoqoenxy—|4|Ys:yuguoduiogJe}wele||
`
`
`
`yoesadsIipo1|vzaca{__wouoduogwee||14
`-(|I||B01Aeq|3|_somo“ppl|yndu|-aand
`
`jenue|Oy)eurBuawontuBooee
`t(|olINA98
`ce“oo
`PT|Ayeuojouny
`[ooposersylAe
`Potoe4!|J@AISOSUBL]
`Vee4
`
`
`
`
`
`
`Bd1A9q|B907
`
`
`
`
`
`yuauodoD
`
`7I
`
`Page 3 of 18
`
`Page 3 of 18
`
`
`
`
`
`
`
`
`Patent Application Publication
`
`Jun. 13, 2002 Sheet 3 of 5
`
`US 2002/0072918 Al
`
`Local Devices
`
`
`
`I>plegeur]||‘!L
`
`
`
`Suoydeajea)_
`
`
`
`peuomuBboosyyoseds
`
`
`
`
` juauoduioguoneyeoue5oudg
`
`
`yUsUOdLUODBuissaoclgjBUBIS
`SIEWULUEIQ||pO]ONSNooyabl7aulbugGL
`
`gL?\92euibug
`
`juauodwoul-afieg
`
`uo}eiausyyoseds
` f—1O}9UUNDNVM
`
`Wa}sASsjOUUeY
`
`
`cl
`
`To yy
`(
`
`
`y9-
`
`
`
`
`
`>
`
`JOUJEIU|
`
`e“bls
`
`Page 4 of 18
`
`Page 4 of 18
`
`
`
`
`
`
`Patent Application Publication Jun. 13, 2002 Sheet 4 0f 5
`
`US 2002/0072918 A1
`
`Star!
`
`NO
`
`1
`
`Activation
`Event’?
`
`Yes
`
`J
`
`100
`
`113
`
`V 1 14
`
`Timeout?
`
`Yes
`
`Receive Speech 104
`
`No
`
`No
`
`Response
`received from
`remote system?
`
`Processing at local 106
`device
`
`‘
`
`Yes
`
`Yes
`
`108
`
`Local
`processing
`suf?cient?
`
`_
`_
`Terminate connection
`between local device
`and remote system
`
`118
`
`—_i 1 10
`Establish connection
`between local device
`and remote system
`
`Transmit data/speech
`from local device to
`remote system
`
`112
`
`Take action based
`on processing
`
`End session?
`
`End
`
`Fig. 4
`
`Page 5 of 18
`
`
`
`Patent Application Publication Jun. 13, 2002 Sheet 5 0f 5
`
`US 2002/0072918 A1
`
`200
`
`216
`
`‘
`l
`
`Have user select
`from listed
`commands or
`requests
`
`218
`
`Compare to
`grammars
`
`" 220
`
`Match?
`
`Compare to
`grammars
`
`204
`
`‘
`
`206
`
`Match?
`
`NO
`
`Yes+®
`
`Request more
`input
`
`208
`
`Compare to ‘
`grammars
`
`210
`
`212
`
`eYes+®
`
`No
`
`214
`
`Yes
`
`More attempts?
`
`End
`session?
`
`Page 6 of 18
`
`
`
`US 2002/0072918 A1
`
`Jun. 13, 2002
`
`DISTRIBUTED VOICE USER INTERFACE
`
`CROSS-REFERENCE TO RELATED
`APPLICATIONS
`
`[0001] This Application relates to the subject matter dis
`closed in the following co-pending U.S. Applications: US.
`application Ser. No. 08/609,699, ?led Mar. 1, 1996, entitled
`“Method and Apparatus For Telephonically Accessing and
`Navigating the Internet;” and US. application Ser. No.
`09/071,717, ?led May 1, 1998, entitled “Voice User Inter
`face With Personality.” These co-pending applications are
`assigned to the present Assignee and are incorporated herein
`by reference.
`
`BACKGROUND OF THE INVENTION
`
`[0002] A voice user interface (VUI) alloWs a human user
`to interact With an intelligent, electronic device (e.g., a
`computer) by merely “talking” to the device. The electronic
`device is thus able to receive, and respond to, directions,
`commands, instructions, or requests issued verbally by the
`human user. As such, a VUI facilitates the use of the device.
`[0003] A typical VUI is implemented using various tech
`niques Which enable an electronic device to “understand”
`particular Words or phrases spoken by the human user, and
`to output or “speak” the same or different Words/phrases for
`prompting, or responding to, the user. The Words or phrases
`understood and/or spoken by a device constitute its “vocabu
`lary.” In general, the number of Words/phrases Within a
`device’s vocabulary is directly related to the computing
`poWer Which supports its VUI. Thus, a device With more
`computing poWer can understand more Words or phrases
`than a device With less computing poWer.
`
`[0004] Many modern electronic devices, such as personal
`digital assistants (PDAs), radios, stereo systems, television
`sets, remote controls, household security systems, cable and
`satellite receivers, video game stations, automotive dash
`board electronics, household appliances, and the like, have
`some computing poWer, but typically not enough to support
`a sophisticated VI With a large vocabulary—i.e., a VUI
`capable of understanding and/or speaking many Words and
`phrases. Accordingly, it is generally pointless to attempt to
`implement a VUI on such devices as the speech recognition
`and speech output capabilities Would be far too limited for
`practical use.
`
`SUMMARY
`[0005] The present invention provides a system and
`method for a distributed voice user interface (VUI) in Which
`a remote system cooperates With one or more local devices
`to deliver a sophisticated voice user interface at the local
`devices. The remote system and the local devices may
`communicate via a suitable netWork, such as, for example,
`a telecommunications netWork or a local area netWork
`In one embodiment, the distributed VUI is achieved
`by the local devices performing preliminary signal process
`ing (e.g., speech parameter extraction and/or elementary
`speech recognition) and accessing more sophisticated
`speech recognition and/or speech output functionality
`implemented at the remote system only if and When neces
`sary.
`
`[0006] According to an embodiment of the present inven
`tion, a local device includes an input device Which can
`
`receive speech input issued from a user. A processing
`component, coupled to the input device, extracts feature
`parameters (Which can be frequency domain parameters
`and/or time domain parameters) from the speech input for
`processing at the local device or, alternatively, at a remote
`system.
`
`[0007] According to another embodiment of the present
`invention, a distributed voice user interface system includes
`a local device Which continuously monitors for speech input
`issued by a user, scans the speech input for one or more
`keyWords, and initiates communication With a remote sys
`tem When a keyWord is detected. The remote system
`receives the speech input from the local device and can then
`recogniZe Words therein.
`
`[0008] According to yet another embodiment of the
`present invention, a local device includes an input device for
`receiving speech input issued from a user. Such speech input
`may specify a command or a request by the user. A pro
`cessing component, coupled to the input device, is operable
`to perform preliminary processing of the speech input. The
`processing component determines Whether the local device
`is by itself able to respond to the command or request
`speci?ed in the speech input. If not, the processing compo
`nent initiates communication With a remote system for
`further processing of the speech input.
`
`[0009] According to still another embodiment of the
`present invention, a remote system includes a transceiver
`Which receives speech input, such speech input previously
`issued by a user and preliminarily processed and forwarded
`by a local device. A processing component, coupled to the
`transceiver at the remote system, recogniZes Words in the
`speech input.
`[0010] According to still yet another embodiment of the
`present invention, a method includes the folloWing steps:
`continuously monitoring at a local device for speech input
`issued by a user; scanning the speech input at the local
`device for one or more keyWords; initiating a connection
`betWeen the local device and a remote system When a
`keyWord is detected; and passing the speech input, or
`appropriate feature parameters extracted from the speech
`input, from the local device to the remote system for
`interpretation.
`[0011] A technical advantage of the present invention
`includes providing functional control over various local
`devices (e.g., PDAs, radios, stereo systems, television sets,
`remote controls, household security systems, cable and
`satellite receivers, video game stations, automotive dash
`board electronics, household appliances, etc.) using sophis
`ticated speech recognition capability enabled primarily at a
`remote site. The speech recognition capability is delivered to
`each local device in the form of a distributed VUI. Thus,
`functional control of the local devices via speech recognition
`can be provided in a cost-effective manner.
`
`[0012] Another technical advantage of the present inven
`tion includes providing the vast bulk of hardWare and/or
`softWare for implementing a sophisticated voice user inter
`face at a single remote system, While only requiring minor
`hardWare/softWare implementations at each of a number of
`local devices. This substantially reduces the cost of deploy
`ing a sophisticated voice user interface at the various local
`devices, because the incremental cost for each local device
`
`Page 7 of 18
`
`
`
`US 2002/0072918 A1
`
`Jun. 13, 2002
`
`is small. Furthermore, the sophisticated voice user interface
`is delivered to each local device Without substantially
`increasing its siZe. In addition, the poWer required to operate
`each local device is minimal since most of the capability for
`the voice user interface resides in the remote system; this can
`be crucial for applications in Which a local device is battery
`poWered. Furthermore, the single remote system can be
`more easily maintained and upgraded With neW features or
`hardWare, than can the individual local devices.
`[0013] Yet another technical advantage of the present
`invention includes providing a transient, on-demand con
`nection betWeen each local device and the remote system—
`i.e., communication betWeen a local device and the remote
`system is enabled only if the local device requires the
`assistance of the remote system. Accordingly, communica
`tion costs, such as, for eXample, long distance charges, are
`minimiZed. Furthermore, the remote system is capable of
`supporting a larger number of local devices if each such
`device is only connected on a transient basis.
`[0014] Still another technical advantage of the present
`invention includes providing the capability for data to be
`doWnloaded from the remote system to each of the local
`devices, either automatically or in response to a user’s
`request. Thus, the data already present in each local device
`can be updated, replaced, or supplemented as desired, for
`eXample, to modify the voice user interface capability (e.g.,
`speech recognition/output) supported at the local device. In
`addition, data from neWs sources or databases can be doWn
`loaded (e.g., from the Internet) and made available to the
`local devices for output to users.
`[0015] Other aspects and advantages of the present inven
`tion Will become apparent from the folloWing descriptions
`and accompanying draWings.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`[0016] For a more complete understanding of the present
`invention and for further features and advantages, reference
`is noW made to the folloWing description taken in conjunc
`tion With the accompanying draWings, in Which:
`
`[0017] FIG. 1 illustrates a distributed voice user interface
`system, according to an embodiment of the present inven
`tion;
`[0018] FIG. 2 illustrates details for a local device, accord
`ing to an embodiment of the present invention;
`
`[0019] FIG. 3 illustrates details for a remote system,
`according to an embodiment of the present invention;
`
`[0020] FIG. 4 is a How diagram of an eXemplary method
`of operation for a local device, according to an embodiment
`of the present invention; and
`[0021] FIG. 5 is a How diagram of an eXemplary method
`of operation for a remote system, according to an embodi
`ment of the present invention.
`
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENTS
`
`[0022] The preferred embodiments of the present inven
`tion and their advantages are best understood by referring to
`FIGS. 1 through 5 of the draWings. Like numerals are used
`for like and corresponding parts of the various draWings.
`
`[0023] Turning ?rst to the nomenclature of the speci?ca
`tion, the detailed description Which folloWs is represented
`largely in terms of processes and symbolic representations
`of operations performed by conventional computer compo
`nents, such as a central processing unit (CPU) or processor
`associated With a general purpose computer system, memory
`storage devices for the processor, and connected piXel
`oriented display devices. These operations include the
`manipulation of data bits by the processor and the mainte
`nance of these bits Within data structures resident in one or
`more of the memory storage devices. Such data structures
`impose a physical organiZation upon the collection of data
`bits stored Within computer memory and represent speci?c
`electrical or magnetic elements. These symbolic represen
`tations are the means used by those skilled in the art of
`computer programming and computer construction to most
`effectively convey teachings and discoveries to others
`skilled in the art.
`
`[0024] For purposes of this discussion, a process, method,
`routine, or sub-routine is generally considered to be a
`sequence of computer-executed steps leading to a desired
`result. These steps generally require manipulations of physi
`cal quantities. Usually, although not necessarily, these quan
`tities take the form of electrical, magnetic, or optical signals
`capable of being stored, transferred, combined, compared, or
`otherWise manipulated. It is conventional for those skilled in
`the art to refer to these signals as bits, values, elements,
`symbols, characters, teXt, terms, numbers, records, ?les, or
`the like. It should be kept in mind, hoWever, that these and
`some other terms should be associated With appropriate
`physical quantities for computer operations, and that these
`terms are merely conventional labels applied to physical
`quantities that eXist Within and during operation of the
`computer.
`[0025] It should also be understood that manipulations
`Within the computer are often referred to in terms such as
`adding, comparing, moving, or the like, Which are often
`associated With manual operations performed by a human
`operator. It must be understood that no involvement of the
`human operator may be necessary, or even desirable, in the
`present invention. The operations described herein are
`machine operations performed in conjunction With the
`human operator or user that interacts With the computer or
`computers.
`[0026] In addition, it should be understood that the pro
`grams, processes, methods, and the like, described herein are
`but an exemplary implementation of the present invention
`and are not related, or limited, to any particular computer,
`apparatus, or computer language. Rather, various types of
`general purpose computing machines or devices may be
`used With programs constructed in accordance With the
`teachings described herein. Similarly, it may prove advan
`tageous to construct a specialiZed apparatus to perform the
`method steps described herein by Way of dedicated com
`puter systems With hard-Wired logic or programs stored in
`non-volatile memory, such as read-only memory (ROM).
`[0027] NetWork System OvervieW
`
`[0028] Referring noW to the draWings, FIG. 1 illustrates a
`distributed voice user interface (VUI) system 10, according
`to an embodiment of the present invention. In general,
`distributed VUI system 10 alloWs one or more users to
`interact—via speech or verbal communication—With one or
`
`Page 8 of 18
`
`
`
`US 2002/0072918 A1
`
`Jun. 13, 2002
`
`more electronic devices or systems into Which distributed
`VUI system 10 is incorporated, or alternatively, to Which
`distributed VUI system 10 is connected. As used herein, the
`terms “connected,”“coupled,” or any variant thereof, means
`any connection or coupling, either direct or indirect,
`betWeen tWo or more elements; the coupling or connection
`can be physical or logical.
`[0029] More particularly, distributed VUI system 10
`includes a remote system 12 Which may communicate With
`a number of local devices 14 (separately designated With
`reference numerals 14a, 14b, 14c, 14d, 14e, 14f, 14g, 14h,
`and 14i) to implement one or more distributed VUIs. In one
`embodiment, a “distributed VUI” comprises a voice user
`interface that may control the functioning of a respective
`local device 14 through the services and capabilities of
`remote system 12. That is, remote system 12 cooperates With
`each local device 14 to deliver a separate, sophisticated VUI
`capable of responding to a user and controlling that local
`device 14. In this Way, the sophisticated VUIs provided at
`local devices 14 by distributed VUI system 10 facilitate the
`use of the local devices 14. In another embodiment, the
`distributed VUI enables control of another apparatus or
`system (e.g., a database or a Website), in Which case, the
`local device 14 serves as a “medium.”
`[0030] Each such VUI of system 10 may be “distributed”
`in the sense that speech recognition and speech output
`softWare and/or hardWare can be implemented in remote
`system 12 and the corresponding functionality distributed to
`the respective local device 14. Some speech recognition/
`output softWare or hardware can be implemented in each of
`local devices 14 as Well.
`[0031] When implementing distributed VUI system 10
`described herein, a number of factors may be considered in
`dividing the speech recognition/output functionality
`betWeen local devices 14 and remote system 12. These
`factors may include, for example, the amount of processing
`and memory capability available at each of local devices 14
`and remote system 12; the bandWidth of the link betWeen
`each local device 14 and remote system 12; the kinds of
`commands, instructions, directions, or requests expected
`from a user, and the respective, expected frequency of each;
`the expected amount of use of a local device 14 by a given
`user; the desired cost for implementing each local device 14;
`etc. In one embodiment, each local device 14 may be
`customiZed to address the speci?c needs of a particular user,
`thus providing a technical advantage.
`[0032] Local Devices
`[0033] Each local device 14 can be an electronic device
`With a processor having a limited amount of processing or
`computing poWer. For example, a local device 14 can be a
`relatively small, portable, inexpensive, and/or loW poWer
`consuming “smart device,” such as a personal digital assis
`tant (PDA), a Wireless remote control (e.g., for a television
`set or stereo system), a smart telephone (such as a cellular
`phone or a stationary phone With a screen), or smart jeWelry
`(e.g., an electronic Watch). A local device 14 may also
`comprise or be incorporated into a larger device or system,
`such as a television set, a television set top box (e. g., a cable
`receiver, a satellite receiver, or a video game station), a video
`cassette recorder, a video disc player, a radio, a stereo
`system, an automobile dashboard component, a microWave
`oven, a refrigerator, a household security system, a climate
`control system (for heating and cooling), or the like.
`
`[0034] In one embodiment, a local device 14 uses elemen
`tary techniques (e. g., the push of a button) to detect the onset
`of speech. Local device 14 then performs preliminary pro
`cessing on the speech Waveform. For example, local device
`14 may transform speech into a series of feature vectors or
`frequency domain parameters (Which differ from the digi
`tiZed or compressed speech used in vocoders or cellular
`phones). Speci?cally, from the speech Waveform, the local
`device 14 may extract various feature parameters, such as,
`for example, cepstral coef?cients, Fourier coef?cients, linear
`predictive coding (LPC) coef?cients, or other spectral
`parameters in the time or frequency domain. These spectral
`parameters (also referred to as features in automatic speech
`recognition systems), Which Would normally be extracted in
`the ?rst stage of a speech recognition system, are transmitted
`to remote system 12 for processing therein. Speech recog
`nition and/or speech output hardWare/softWare at remote
`system 12 (in communication With the local device 14) then
`provides a sophisticated VUI through Which a user can input
`commands, instructions, or directions into, and/or retrieve
`information or obtain responses from, the local device 14.
`
`[0035] In another embodiment, in addition to performing
`preliminary signal processing (including feature parameter
`extraction), at least a portion of local devices 14 may each
`be provided With its oWn resident VUI. This resident VUI
`alloWs the respective local device 14 to understand and
`speak to a user, at least on an elementary level, Without
`remote system 12. To accomplish this, each such resident
`VUI may include, or be coupled to, suitable input/output
`devices (e.g., microphone and speaker) for receiving and
`outputting audible speech. Furthermore, each resident VUI
`may include hardWare and/or softWare for implementing
`speech recognition (e.g., automatic speech recognition
`(ASR) softWare) and speech output (e.g., recorded or gen
`erated speech output softWare). An exemplary embodiment
`for a resident VUI of a local device 14 is described beloW in
`more detail.
`
`[0036] A local device 14 With a resident VUI may be, for
`example, a remote control for a television set. Auser may
`issue a command to the local device 14 by stating “Channel
`four” or “Volume up,” to Which the local device 14 responds
`by changing the channel on the television set to channel four
`or by turning up the volume on the set.
`
`[0037] Because each local device 14, by de?nition, has a
`processor With limited computing poWer, the respective
`resident VUI for a local device 14, taken alone, generally
`does not provide extensive speech recognition and/or speech
`output capability. For example, rather than implement a
`more complex and sophisticated natural language (NL)
`technique for speech recognition, each resident VUI may
`perform “Word spotting” by scanning speech input for the
`occurrence of one or more “keyWords.” Furthermore, each
`local device 14 Will have a relatively limited vocabulary
`(e.g., less than one hundred Words) for its resident VUI. As
`such, a local device 14, by itself, is only capable of respond
`ing to relatively simple commands, instructions, directions,
`or requests from a user.
`
`[0038] In instances Where the speech recognition and/or
`speech output capability provided by a resident VUI of a
`local device 14 is not adequate to address the needs of a user,
`the resident VUI can be supplemented With the more exten
`sive capability provided by remote system 12. Thus, the
`
`Page 9 of 18
`
`
`
`US 2002/0072918 A1
`
`Jun. 13, 2002
`
`local device 14 can be controlled by spoken commands and
`otherwise actively participate in verbal exchanges With the
`user by utilizing more complex speech recognition/output
`hardWare and/or softWare implemented at remote system 12
`(as further described herein).
`[0039] Each local device 14 may further comprise a
`manual input device—such as a button, a toggle sWitch, a
`keypad, or the like—by Which a user can interact With the
`local device 14 (and also remote system 12 via a suitable
`communication network) to input commands, instructions,
`requests, or directions Without using either the resident or
`distributed VUI. For example, each local device 14 may
`include hardWare and/or softWare supporting the interpreta
`tion and issuance of dual tone multiple frequency (DTMF)
`commands In one embodiment, such manual input device
`can be used by the user to activate or turn on the respective
`local device 14 and/or initiate communication With remote
`system 12.
`[0040] Remote System
`[0041] In general, remote system 12 supports a relatively
`sophisticated VUI Which can be utiliZed When the capabili
`ties of any given local device 14 alone are insufficient to
`address or respond to instructions, commands, directions, or
`requests issued by a user at the local device 14. The VUI at
`remote system 12 can be implemented With speech recog
`nition/output hardWare and/or softWare suitable for perform
`ing the functionality described herein.
`
`[0042] The VUI of remote system 12 interprets the vocal
`iZed expressions of a user—communicated from a local
`device 14—so that remote system 12 may itself respond, or
`alternatively, direct the local device 14 to respond, to the
`commands, directions, instructions, requests, and other input
`spoken by the user. As such, remote system 12 completes the
`task of recogniZing Words and phrases.
`
`[0043] The VUI at remote system 12 can be implemented
`With a different type of automatic speech recognition (ASR)
`hardWare/softWare than local devices 14. For example, in
`one embodiment, rather than performing “Word spotting,” as
`may occur at local devices 14, remote system 12 may use a
`larger vocabulary recogniZer, implemented With Word and
`optional sentence recognition grammars. A recognition
`grammar speci?es a set of directions, commands, instruc
`tions, or requests that, When spoken by a user, can be
`understood by a VUI. In other Words, a recognition grammar
`speci?es What sentences and phrases are to be recogniZed by
`the VUI. For example, if a local device 14 comprises a
`microWave oven, a distributed VUI for the same can include
`a recognition grammar that alloWs a user to set a cooking
`time by saying, “Oven high for half a minute,” or “Cook on
`high for thirty seconds,” or, alternatively, “Please cook for
`thirty seconds at high.” Commercially available speech
`recognition systems With recognition grammars are pro
`vided by ASR technology vendors such as, for example, the
`folloWing: Nuance Corporation of Menlo Park, Calif.;
`Dragon Systems of NeWton, Mass.; IBM of Austin, Tex.;
`KurZWeil Applied Intelligence of Waltham, Mass.; Lernout
`Hauspie Speech Products of Burlington, Mass.; and Pure
`Speech, Inc. of Cambridge, Mass.
`
`[0044] Remote system 12 may process the directions,
`commands, instructions, or requests that it has recogniZed or
`understood from the utterances of a user. During processing,
`
`remote system 12 can, among other things, generate control
`signals and reply messages, Which are returned to a local
`device 14. Control signals are used to direct or control the
`local device 14 in response to user input. For example, in
`response to a user command of “Turn up the heat to 82
`degrees,” control signals may direct a local device 14
`incorporating a thermostat to adjust the temperature of a
`climate control system. Reply messages are intended for the
`immediate consumption of a user at the local device and may
`take the form of video or audio, or text to be displayed at the
`local device. As a reply message, the VUI at remote system
`12 may issue audible output in the form of speech that is
`understandable by a user.
`[0045] For issuing reply messages, the VUI of remote
`system 12 may include capability for speech generation
`(synthesiZed speech) and/or play-back (previously recorded
`speech). Speech generation capability can be implemented
`With text-to-speech (TTS) hardWare/softWare, Which con
`verts textual information into synthesiZed, audible speech.
`Speech play-back capability may be implemented With an
`analog-to-digital
`converter driven by CD ROM (or
`other digital memory device), a tape player, a laser disc
`player, a specialiZed integrated circuit (IC) device, or the
`like, Which plays back previously recorded human speech.
`[0046] In speech play-back, a person (preferably a voice
`model) recites various statements Which may desirably be
`issued during an interactive session With a user at a local
`device 14 of distributed VUI system 10. The person’s voice
`is recorded as the recitations are made. The recordings are
`separated into discrete messages, each message comprising
`one or more statements that Would desirably be issued in a
`particular context (e.g., greeting, fareWell, requesting
`instructions, receiving instructions, etc.). AfterWards, When
`a user interacts With distributed VUI system 10, the recorded
`messages are played back to the user When the proper
`context arises.
`[0047] The reply messages generated by the VUI at
`remote system 12 can be made to be consistent With any
`messages provided by the resident VUI of a local device 14.
`For example, if speech play-back capability is used for
`generating speech, the same person’s voice may be recorded
`for messages output by the resident VUI of the local device
`14 and the VUI of remote system 12. If synthesiZed (com
`puter-generated) speech capability is used, a similar sound
`ing arti?cial voice may be provided for the VUIs of both
`local devices 14 and remote system 12. In this Way, the
`distributed VUI of system 10 provides to a user an interac
`tive interface Which is “seamless” in the sense that the user
`cannot distinguish betWeen the simpler, resident VUI of the
`local device 14 and the more sophisticated VUI of remote
`system 12.
`[0048] In one embodiment, the speech recognition and
`speech play-back capabilities described herein can be used
`to implement a voice user interface With personality, as
`taught by US. patent application Ser. No. 09/071,717,
`entitled “Voice User Interface With Personality,” the text of
`Which is incorporated herein by reference.
`[0049] Remote system 12 may also comprise hardWare
`and/or softWare supporting the interpretation and issuance of
`commands, such as dual tone multiple frequency (DTMF)
`commands, so that a user may alternatively interact With
`remote system 12 using an alternative input device, such as
`a telephone key pad.
`
`Page 10 of 18
`
`
`
`US 2002/0072918 A1
`
`Jun. 13, 2002
`
`[0050] Remote system 12 may be in communication With
`the “Internet,” thus providing access thereto for users at
`local devices 14. The Internet is an interconnection of
`computer “clients” and “servers” located throughout the
`World and exchanging information according to Transmis
`sion Control Protocol/Internet Protocol (TCP/IP), Internet
`Work Packet eXchange/Sequence Packet exchange (IPX/
`SPX), AppleTalk, or other suitable protocol. The Internet
`supports the distributed application knoWn as the “World
`Wide Web.” Web servers may exchange information With
`one another using a protocol knoWn as hypertext transport
`protocol
`Information may be communicated from
`one server to any other computer using HTTP and is
`maintained in the form of Web pages, each of Which can be
`identi?ed by a respective uniform resource locator (URL).
`Remote system 12 may function as a client to interconnect
`With Web servers. The interconnection may use any of a
`variety of communication links, such as, for example, a local
`telephone communication line or a dedicated communica
`tion line. Remote system 12 may comprise and locally
`execute a “Web broWser” or “Web proxy” program. A Web
`broWser is a computer program that alloWs remote system
`12, acting as a client, to exchange information With the
`World Wide Web. Any of a variety of Web broWsers are
`available, such as NETSCAPE NAVIGATOR from
`Netscape Communications Corp. of Mountain VieW, Calif.,
`INTERNE