throbber
United States Patent
`US 6,742,021 Bl
`(10) Patent No.:
`(12)
`May25, 2004
`(45) Date of Patent:
`Halverson etal.
`
`US006742021B1
`
`(54)
`
`(75)
`
`NAVIGATING NETWORK-BASED
`ELECTRONIC INFORMATION USING
`SPOKEN INPUT WITH MULTIMODAL
`ERROR FEEDBACK
`
`Inventors: Christine Halverson, San Jose, CA
`(US); Luc Julia, Menlo Park, CA (US);
`Dimitris Voutsas, Thessaloniki (GR);
`Aden J. Cheyer, Palo Alto, CA (US)
`
`(73)
`
`Assignee: SRI International, Inc., Menlo Park,
`CA (US)
`
`(*)
`
`Notice:
`
`Subject to any disclaimer, the term ofthis
`patent is extended or adjusted under 35
`US.C. 154(b) by 0 days.
`
`(21)
`
`(22)
`
`(63)
`
`(60)
`
`(61)
`(52)
`
`(58)
`
`(56)
`
`Appl. No.: 09/524,095
`
`Filed:
`
`Mar. 13, 2000
`
`Related U.S. Application Data
`
`Continuation-in-part of application No. 09/225,198, filed on
`Jan. 5, 1999,
`Provisional application No. 60/124,718, filed on Mar. 17,
`1999, provisional application No. 60/124,720, filed on Mar.
`17, 1999, and provisional application No. 60/124,719,filed
`on Mar. 17, 1999,
`
`Tint. C17 oes cceseeeeseseseseseees GO06F 15/16
`US. C1. cccccccccccterteeeeees 709/218; 707/5; 707/4;
`707/102
`Field of Search ............00.cccceee 709/218; 707/5,
`707/4, 102; 704/257, 231
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5,197,005 A
`5,386,556 A
`
`3/1993 Schwartz et al... 364/419
`1/1995 Hedin et al... 395/600
`
`(List continued on next page.)
`FOREIGN PATENT DOCUMENTS
`
`WO
`
`WO 00/11869
`
`3/2000
`
`OTHER PUBLICATIONS
`
`“Com-
`http:/Avww.ai.sri.com/~lesaf/commandtalk-html:
`mandTalk: A Spoken—Language Interface for Battlefield
`Simulations”, 1997, by Robert Moore, John Dowding, Harry
`Bratt, J. Mark Gawron, Yonael Gorfu and Adam Cheyer, in
`“Proceedings of the Fifth Conference on Applied Natural
`Language Processing”, Washington, DC, pp. 1-7, Associa-
`tion for Computational Linguistics.
`“The CommandTalk Spoken Dialogue System’, 1999, by
`Amanda Stent, John Dowding, Jean Mark Gawron, Eliza-
`beth Owen Bratt and Robert Moore, in “Proceedings of the
`Thirty-Seventh Annual Meeting of the ACL”, pp. 183-190,
`University of Maryland, College Park, MD, Association for
`Computational Linguistics.
`Stent, Amandaetal., “The CommandTalk Spoken Dialogue
`System”, SRI International.
`Moore, Robert et al., “CommandTalk: A Spoken—Language
`Interface for Battlefield Simulations”, Oct. 23, 1997, SRI
`International.
`Dowding, Johnetal., “Interpreting Language in Context in
`CommandTalk”, Feb. 5, 1999, SRI International.
`http:/Avww.ai.sri.com/~oaa/infowiz.html, InfoWiz: An Ani-
`mated Voice Interactive Information System, May 8, 2000.
`(List continued on next page.)
`Primary Examiner—JamesP. Trammell
`Assistant Examiner—Firmin Backer
`(74) Attorney, Agent,
`or Firm—Moser,
`Sheridan, LLP.; Kin-Wah Tong, Esq.
`(57)
`ABSTRACT
`
`Patterson &
`
`A system, method, and article of manufacture are provided
`for navigating an electronic data source by means of spoken
`language. When a spoken input request is received from a
`user, it 1s interpreted. Additional input is solicited from the
`user in a modality different than the original request and
`used to refine the navigation query. The resulting interpre-
`tation of the request is thereupon used to automatically
`construct an operational navigation query to retrieve the
`desired information from one or more electronic network
`data sources.
`
`EP
`0 803 826 A2=10/1997
`132 Claims, 7 Drawing Sheets
`
`
`
`
`A02 [roeWE SPOKEN NL REQUEST
`
`404
`INTERPRET REQUEST
`
`
`|q———____.
`
`
`A405 [loeNTIFYiseLecr DATA SOURCE
`
`
`406|CONSTRUCT NAVIGATION QUERY
`
`
`
`
` a
`
`aa
`\.
`407 <DEFION.NGIES?
`oye
` NO
`
`dD:
`ATA SOURGE
`NAVIGATE
`
`
`
`SOLICIT
`ADDITIONAL,
`(MULTIMODAL)
`USER INPUT
`A42
`
`
`
`is0g,
`
`y
`
`ES
`
`aREFINE
`weRY?
`aNOy
`
` TRANSMITAND DISPIAY TOCLIENT.
`
`408
`
`
`
`aig)—_
`
`Page | of 21
`
`GOOGLEEXHIBIT 1005
`
`GOOGLE EXHIBIT 1005
`
`Page 1 of 21
`
`

`

`US 6,742,021 B1
`
`Page 2
`
`U.S. PATENT DOCUMENTS
`
`5,434,777 A
`5,519,608 A
`5,608,624 A
`5,721,938 A
`5,729,659 A
`5,748,974 A
`5,774,859 A
`5,794,050 A
`5,802,526 A
`5,805,775 A
`5,855,002 A
`5,890,123 A
`5,963,940 A
`6,003,072 A
`6,012,030 A
`
`6,021,427 A
`6,026,388 A
`6,080,202 A
`6,144,989 A
`6,173,279 Bl *
`6,192,338 B1 *
`6,226,666 B1
`6,338,081 B1
`
` LUCIW oo. eeeeeeeee eens 364/419
`TIV99OS
`
`5/1996 Kupiec ...
`364/419.08
`3/1997 Luciw ....
`. 395/794
`
`... 395/754
`2/1998 Stuckey ..
`w 395/2.79
`3/1998 Potter .....
`
`... 395/759
`5/1998 Johnson..
`6/1998 Houseret al.
`....
`w. 704/275
`........... 395/708
`8/1998 Dahlgren et al.
`
`9/1998 Faweett et al. we. 707/104
`9/1998 Ebermanet al. «00... 395/12
`
`... 704/270
`12/1998 Armstrong....
`
`3/1999 Brownetal. ..
`w. 704/275
`
`10/1999 Liddy et al. oe 707/5
`........... 709/218
`12/1999 Gerritsen et al.
`1/2000 French-
`St. George et al.
`1/2000 Spagnaetal.
`2/2000 Liddy et al. oe TO7/1
`6/2000. Strickland et al.
`11/2000 Hodjat et al.
`1/2001 Levinet al. we. 7O7/5
`2/2001 Zasto et al. 704/257
`5/2001 Changet al.
`1/2002 Furusawaetal.
`
`......... 704/275
`
`OTHER PUBLICATIONS
`
`Dowding, John, “Interleaving Syntax and Semantics in an
`Efficient Bottom-up Parser”, SRI International.
`Moore, Robertet al., “Combining Linguistic and Statistical
`Knowledge Sources in Natural—Language Processing for
`ATIS”, SRI International.
`
`Dowding, Johnet al., “Gemini: A Natural Language System
`For Spoken—Language Understanding”, SRI International.
`Moran, Douglas B. et al., “Intelligent Agent-based User
`Interfaces”, Article Intelligence center, SRI International.
`Martin, David L. et al., “Building Distributed Software
`Systems with the Open Agent Architecture”.
`Julia, Luc. et al., “Cooperative Agents and Recognition
`System (CARS) for Drivers and Passengers”; SRI Interna-
`tional.
`
`Moran, Douglas et al., “Multimodal User Interfaces in the
`Open Agent Architecture”.
`Cheyer, Adam et al., “Multimodal Maps: An Agent-based
`Approach”, SRI International.
`Cutkosky, Mark R. et al., “An Experiment in Integrating
`Concurrent Engineering Systems”.
`Martin, David et al., “Development Tools for the Open
`Agent Architecture”, The Practical Application of Intel-
`leigent Agents and Multi-Agent Technology (PAAM96),
`London, Apr. 1996.
`Cheyer, Adam etal., “The Open Agent Architecture,,,”, SRI
`International, AI center.
`Dejima, Inc., http:/Avww.dejima.com/.
`Cohen, Philip et al., “An Open Agent Architecture”, AAAI
`Spring Symposium, pp. 1-8, Mar. 1994.
`Martin, David et al., “Information Brokering in an Agent
`Architecture”, Proceeding of the 2” Int’l1 Conference on
`Practical Application of Intelligent Agents & Multi-Agent
`Technology, London, Apr. 1997.
`
`* cited by examiner
`
`Page 2 of 21
`
`Page 2 of 21
`
`

`

`U.S. Patent
`
`May25, 2004
`
`Sheet 1 of 7
`
`US 6,742,021 B1
`
`104
`
`102 — |
`
`=
`
`@))
`
`Network
`
`106
`
`— — RO
`
`300 (see Fig. 3)
`
`108
`
`110
`
` — — oC a
`
`
`
`Fig. 1a
`
`Page 3 of 21
`
`Page 3 of 21
`
`

`

`U.S. Patent
`
`May25, 2004
`
`Sheet 2 of 7
`
`US 6,742,021 B1
`
`106
`
`Network
`
`
`
`
`
`—_ — ©
`
`Fig. 1b
`
`Page 4 of 21
`
`Page 4 of 21
`
`

`

`U.S. Patent
`
`May25, 2004
`
`Sheet 3 of 7
`
`US 6,742,021 B1
`
`e >)
`
`202n
`
`Ve
`
`Network
`206
`
`
`
`
`300 (see Fig. 3)
`
`210
`
`208
`
`o>»
`|
`
`202
`
`210n
`
`Fig. 2
`
`Page 5 of 21
`
`Page 5 of 21
`
`

`

`U.S. Patent
`
`May25, 2004
`
`Sheet 4 of 7
`
`US 6,742,021 B1
`
`REQUEST PROCESSING LOGIC 300
`
`
`
`SPEECH RECOGNITION
`ENGINE
`
`NATURAL LANGUAGE
`PARSER
`
`
`
`
`
`
`
`
` QUERY CONSTRUCTION
`
`
`
`
`LOGIC
`
`QUERY REFINEMENT LOGIC
`
`Fig. 3
`
`Page 6 of 21
`
`Page 6 of 21
`
`

`

`U.S. Patent
`
`May25, 2004
`
`Sheet 5 of 7
`
`US 6,742,021 B1
`
`402 RECEIVE SPOKEN NL REQUEST
`
`404
`
`INTERPRET REQUEST
`
`405 DENTINVIGeLEeT DATA SOURCE
`
`406 [construct NAVIGATION QUERY
`
`USER INPUT
`
`SOLICIT
`ADDITIONAL
`YES
`_
`CIENC
`(MULTIMODAL)
`407<DEFICIENCIES?
`
`—
`
`408
`
`NAVIGATE DATA SOURCE
`
`412
`
`408
`
`REFINE
`QUERY?
`
`NO
`
`10
`—
`
`TRANSMIT AND DISPLAY TO
`CLIENT
`
`Fig. 4
`
`Page 7 of 21
`
`Page 7 of 21
`
`

`

`U.S. Patent
`
`May25, 2004
`
`Sheet 6 of 7
`
`US 6,742,021 B1
`
`(from step 406, Fig. 4)
`
`po
`SCRAPE THE ONLINE SCRIPTED FORM TO 520
`EXTRACT AN INPUT TEMPLATE
`oO
`
`INSTANTIATE THE INPUT TEMPLATE USING 522
`INTERPRETATION OF STEP 404
`—
`were
`
`(to step 407, Fig. 4)
`
`Fig. 5
`
`Page 8 of 21
`
`Page 8 of 21
`
`

`

`US 6,742,021 B1
`
`SINOHd3731L
`
`
`
`LNSDVASILONTWYn.vNn
`
`LINSOVLNaoy
`
`0Z9arenLNaOVSOWNONVT
`
`YVONSIVO9‘Bid
`INS9OV.
`
`
`Sheet 7 of 7
`
`
`
`INS9VYODA“AVISGSFNSSVEVLvG
`SAUR_asvavivaNOLLINDOOSY=FOVSYSININaoy89gamJINOELOSTALIATTICH3asn
`
`
`
`
`
`LNS9VINSOWaIvININZOVSLNAOVozs)
`0E9500Ors
`
`
`U.S. Patent
`
`May 25, 2004
`
`
`
`OC <
`
`q -
`
`<q
`
`Page 9 of 21
`
`Page 9 of 21
`
`
`
`

`

`US 6,742,021 B1
`
`1
`NAVIGATING NETWORK-BASED
`ELECTRONIC INFORMATION USING
`SPOKEN INPUT WITH MULTIMODAL
`ERROR FEEDBACK
`
`This is a Continuation In Part of co-pending U'S. patent
`application Ser. No. 09/225,198, filed Jan. 5, 1999, Provi-
`sional U.S. patent application Ser. No. 60/124,718, filed
`Mar. 17, 1999, Provisional U.S. patent application Ser. No.
`60/124,720,filed Mar. 17, 1999, and Provisional U.S. patent
`application Ser. No. 60/124,719, filed Mar. 17, 1999, from
`which applications priority is claimed and these application
`are incorporated herein by reference.
`BACKGROUND OF THE INVENTION
`
`The present invention relates generally to the navigation
`of electronic data by means of spoken natural language
`requests, and to feedback mechanisms and methods for
`resolving the errors and ambiguities that may be associated
`with such requests.
`As global electronic connectivity continues to grow, and
`the universe of electronic data potentially available to users
`continues to expand,there is a growing need for information
`navigation technology that allowsrelatively naive users to
`navigate and access desired data by meansof natural lan-
`guage input. In many of the most
`important markets—
`including the homeentertainment arena, as well as mobile
`computing—spoken natural
`language input
`is highly
`desirable, if not ideal. As just one example, the proliferation
`of high-bandwidth communications infrastructure for the
`home entertainment market (cable, satellite, broadband)
`enables delivery of movies-on-demand andotherinteractive
`multimedia content to the consumer’s hometelevisionset.
`
`For users to take full advantage of this content stream
`ultimately requires interactive navigation of content data-
`bases in a manner that is too complex for user-friendly
`selection by means ofa traditional remote-control clicker.
`Allowing spoken natural language requests as the input
`modality for rapidly searching and accessing desired content
`is an important objective for a successful consumer enter-
`tainment product in a context offering a dizzying range of
`database content choices. As further examples, this same
`need to drive navigation of (and transaction with) relatively
`complex data warehouses using spoken natural language
`requests applies equally to surfing the Internet/Web or other
`networks for general information, multimedia content, or
`e-commerce transactions.
`
`In general, the existing navigational systems for brows-
`ing electronic databases and data warehouses (search
`engines, menus,etc.), have been designed without naviga-
`tion via spoken natural
`language as a specific goal. So
`today’s world is full of existing electronic data navigation
`systems that do not assume browsing via natural spoken
`commands, but rather assume text and mouse-click inputs
`(or in the case of TV remote controls, even less). Simply
`recognizing voice commands within an extremely limited
`vocabulary and grammar—the spoken equivalent of button/
`click input (e.g., speaking “channel 5” selects TV channel
`5)—isreally not sufficientby itself to satisfy the objectives
`described above. In order to deliver a true “win” for users,
`the voice-driven front-end must accept spoken natural lan-
`guage input
`in a manner that
`is intuitive to users. For
`example, the front-end should not require learning a highly
`specialized command language or
`format. More
`fundamentally,
`the front-end must allow users to speak
`directly in terms of what the user ultimately wants —e.g.,
`“I'd like to see a Western film directed by Clint
`Eastwood”—as opposed to speaking in terms of arbitrary
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`navigation structures (e.g., hierarchical layers of menus,
`commands,etc.) that are essentially artifacts reflecting con-
`straints of the pre-existing text/click navigation system. At
`the same time, the front-end must recognize and accommo-
`date the reality that a stream of naive spoken natural
`language input will, over time, typically present a variety of
`errors and/or ambiguities: e.g., garbled/unrecognized words
`(did the user say “Eastwood” or “Easter”?) and under-
`constrained requests (“Show me the Clint Eastwood
`movie”). An approach is needed for handling and resolving
`such errors and ambiguities in a rapid, user-friendly, non-
`frustrating manner.
`What
`is needed is a methodology and apparatus for
`rapidly constructing a voice-driven front-end atop an
`existing, non-voice data navigation system, whereby users
`can interact by meansofintuitive natural language input not
`strictly conforming to the step-by-step browsing architecture
`of the existing navigation system, and wherein anyerrors or
`ambiguities in user input are rapidly and conveniently
`resolved. The solution to this need should be compatible
`with the constraints of a multi-user, distributed environment
`such as the Internet/Web or a proprietary high-bandwidth
`content delivery network; a solution contemplating one-at-
`a-time userinteractions at a single location is insufficient,for
`example.
`
`SUMMARYOF THE INVENTION
`
`invention addresses the above needs by
`The present
`providing a system, method, and article of manufacture for
`navigating network-based electronic data sources in
`response to spoken input requests. When a spoken input
`request is received from a user, it is interpreted, such as by
`using a speech recognition engine to extract speech data
`from acoustic voice signals, and using a language parser to
`linguistically parse the speech data. The interpretation of the
`spoken request can be performed on a computing device
`locally with the user or remotely from the user. The resulting
`interpretation of the request is thereupon used to automati-
`cally construct an operational navigation query to retrieve
`the desired information from one or more electronic network
`data sources, which is then transmitted to a client device of
`the user. If the network data source is a database,
`the
`navigation query is constructed in the format of a database
`query language.
`Typically, errors or ambiguities emerge in the interpreta-
`tion of the spoken request, such that the system cannot
`instantiate a complete, valid navigational template. This is to
`be expected occasionally, and one preferred aspect of the
`invention 1s the ability to handle such errors and ambiguities
`in relatively graceful and user-friendly manner. Instead of
`simply rejecting such input and defaulting to traditional
`input modes or simply asking the user to try again, a
`preferred embodiment of the present
`invention seeks to
`converge rapidly toward instantiation of a valid navigational
`template by soliciting additional clarification from the user
`as necessary, either before or after a navigation of the data
`source, via multimodal input, i.e., by means of menu selec-
`tion or other input modalities including and in addition to
`spoken input. This clarifying, multi-modal dialogue takes
`advantage of whatever partial navigational information has
`been gleaned from the initial interpretation of the user’s
`spoken request. Thisclarification process continues until the
`system converges toward an adequately instantiated navi-
`gational template, which is in turn used to navigate the
`network-based data and retrieve the user’s desired informa-
`tion. The retrieved information is transmitted across the
`network and presented to the user on a suitable client display
`device.
`
`Page 10 of 21
`
`Page 10 of 21
`
`

`

`US 6,742,021 B1
`
`3
`In a further aspect of the present invention, the construc-
`tion of the navigation query includes extracting an input
`template for an online scripted interface to the data source
`and using the input template to construct the navigation
`query. The extraction of the input
`template can include
`dynamically scraping the online scripted interface.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The invention, together with further advantages thereof,
`may best be understood by reference to the following
`description taken in conjunction with the accompanying
`drawings in which:
`FIG. 1a illustrates a system providing a spoken natural
`language interface for network-based information
`navigation,
`in accordance with an embodiment of the
`present invention with server-side processing of requests;
`FIG. 16 illustrates another system providing a spoken
`natural language interface for network-based information
`navigation,
`in accordance with an embodiment of the
`present invention with client-side processing of requests;
`FIG. 2 illustrates a system providing a spoken natural
`language interface for network-based information
`navigation,
`in accordance with an embodiment of the
`present invention for a mobile computing scenario;
`FIG. 3 illustrates the functional logic components of a
`request processing module in accordance with an embodi-
`ment of the present invention;
`FIG. 4 illustrates a process utilizing spoken natural lan-
`guage for navigating an electronic database in accordance
`with one embodimentof the present invention;
`FIG. 5 illustrates a process for constructing a navigational
`query for accessing an online data source via an interactive,
`scripted (e.g., CGI) form; and
`FIG. 6 illustrates an embodimentof the present invention
`utilizing a community of distributed, collaborating elec-
`tronic agents.
`DETAILED DESCRIPTION OF THE
`INVENTION
`
`1. System Architecture
`a. Server-End Processing of Spoken Input
`FIG. 1a is an illustration of a data navigation system
`driven by spokennatural language input, in accordance with
`one embodimentof the present invention. As shown,a user’s
`voice input data is captured by a voice input device 102,
`such as a microphone. Preferably voice input device 102
`includes a button or the like that can be pressed or held-
`downto activate a listening mode, so that the system need
`not continually pay attention to, or be confused by,irrelevant
`backgroundnoise. In one preferred embodiment well-suited
`for the home entertainmentsetting, voice input device 102
`is a portable remote control device with an integrated
`microphone, and the voice data is transmitted from device
`102 preferably via infrared (or other wireless) link to com-
`munications box 104 (e.g., a set-top box or a similar
`communications device that is capable of retransmitting the
`raw voice data and/or processing the voice data) local to the
`user’s environment and coupled to communications network
`106. The voice data is then transmitted across network 106
`
`to a remote server or servers 108. The voice data may
`preferably be transmitted in compressed digitized form, or
`alternatively—particularly where bandwidth constraints are
`significant—in analog format (e.g., via frequency modulated
`transmission), in the latter case being digitized upon arrival
`at remote server 108.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`the voice data is processed by
`At remote server 108,
`request processing logic 300 in order to understand the
`user’s request and construct an appropriate query or request
`for navigation of remote data source 110, in accordance with
`the interpretation process exemplified in FIG. 4 and FIG. 5
`and discussed in greater detail below. For purposes of
`executing this process, request processing logic 300 com-
`prises functional modules including speech recognition
`engine 310, natural language (NL) parser 320, query con-
`struction logic 330, and query refinement
`logic 340, as
`shownin FIG. 3. Data source 110 may comprise database(s),
`Internet/web site(s), or other electronic information
`repositories, and preferably resides on a central server or
`servers—which may or may not be the same as server 108,
`depending on the storage and bandwidth needs of the
`application and the resources available to the practitioner.
`Data source 110 may include multimedia content, such as
`movies or other digital video and audio content, other
`various forms of entertainment data, or other electronic
`information. The contents of data source 110 are
`navigated—i.e., the contents are accessed and searched, for
`retrieval of the particular information desired by the user—
`using the processes of FIGS. 4 and 5 as described in greater
`detail below.
`Oncethe desired information has beenretrieved from data
`source 110,it is electronically transmitted via network 106
`to the user for viewing on client display device 112. In a
`preferred embodiment well-suited for the home entertain-
`mentsetting, display device 112 is a television monitor or
`similar audiovisual entertainment device, typically in sta-
`tionary position for comfortable viewing by users.
`In
`addition, in such preferred embodiment, display device 112
`is coupled to or integrated with a communications box
`(which is preferably the same as communications box 104,
`but may also be a separate unit) for receiving and decoding/
`formatting the desired electronic information that is received
`across communications network 106.
`Network 106 is a two-way electronic communications
`network and may be embodiedin electronic communication
`infrastructure including coaxial (cable television)
`lines,
`DSL, fiber-optic cable,
`traditional copper wire (twisted
`pair), or any other type of hardwired connection. Network
`106 may also include a wireless connection such as a
`satellite-based connection, cellular connection, or other type
`of wireless connection. Network 106 may be part of the
`Internet and may support TCP/IP communications, or may
`be embodied in a proprietary network, or in any other
`electronic communications network infrastructure, whether
`packet-switched or connection-oriented. A design consider-
`ation is that network 106 preferably provide suitable band-
`width depending upon the nature of the content anticipated
`for the desired application.
`b. Client-End Processing of Spoken Input
`FIG. 16 is an illustration of a data navigation system
`driven by spokennatural language input, in accordance with
`a second embodiment of the present invention. Again, a
`user’s voice input data is captured by a voice input device
`102, such as a microphone. In the embodiment shown in
`FIG. 1b, the voice data is transmitted from device 202 to
`requests processing logic 300, hosted on a local speech
`processor, for processing and interpretation. In the preferred
`embodimentillustrated in FIG. 1b, the local speech proces-
`sor is conveniently integrated as part of communications box
`104, although implementation in a physically separate (but
`communicatively coupled) unit is also possible as will be
`readily apparent to those of skill in the art. The voice data is
`processed by the components of request processing logic
`
`Page 11 of 21
`
`Page 11 of 21
`
`

`

`US 6,742,021 B1
`
`5
`300 in order to understand the user’s request and construct
`an appropriate query or request for navigation of remote data
`source 110, in accordance with the interpretation process
`exemplified in FIGS. 4 and 5 as discussed in greater detail
`below.
`
`The resulting navigational query is then transmitted elec-
`tronically across network 106 to data source 110, which
`preferably resides on a central server or servers 108. As in
`FIG. 1a, data source 110 may comprise database(s), Internet/
`website(s), or other electronic information repositories, and
`preferably may include multimedia content, such as movies
`or other digital video and audio content, other various forms
`of entertainment data, or other electronic information. The
`contents of data source 110 are then navigated—1.e.,
`the
`contents are accessed and searched, for retrieval of the
`particular information desired by the user—preferably using
`the process of FIGS. 4 and 5 as described in greater detail
`below. Oncethe desired information has beenretrieved from
`data source 110, it is electronically transmitted via network
`106 to the user for viewing on client display device 112.
`In one embodiment
`in accordance with FIG. 1b and
`
`well-suited for the home entertainmentsetting, voice input
`device 102 is a portable remote control device with an
`integrated microphone, and the voice data is transmitted
`from device 102 preferably via infrared (or other wireless)
`link to the local speech processor. The local speech proces-
`sor is coupled to communications network 106, and also
`preferably to client display device 112 (especially for pur-
`poses of query refinement transmissions, as discussed below
`in connection with FIG. 4, step 412), and preferably may be
`integrated within or coupled to communications box 104. In
`addition, especially for purposes of a home entertainment
`application, display device 112 is preferably a television
`monitor or similar audiovisual entertainment device, typi-
`cally in stationary position for comfortable viewing by
`users. In addition, in such preferred embodiment, display
`device 112 is coupled to a communications box (which is
`preferably the same as communications box 104, but may
`also be a physically separate unit)
`for receiving and
`decoding/formatting the desired electronic information that
`is received across communications network 106.
`Design considerations favoring server-side processing
`and interpretation of spoken input requests, as exemplified
`in FIG. 1a, include minimizing the need to distribute costly
`computational hardware and software to all client users in
`order to perform speech and language processing. Design
`considerations favoring client-side processing, as exempli-
`fied in FIG. 15, include minimizing the quantity of data sent
`upstream across the network from each client, as the speech
`recognition is performed before transmission across the
`network and only the query data and/or request needs to be
`sent, thus reducing the upstream bandwidth requirements.
`c. Mobile Client Embodiment
`Amobile computing embodimentof the present invention
`may be implemented by practitioners as a variation on the
`embodiments of either FIG. 1a or FIG. 1b. For example, as
`depicted in FIG. 2, a mobile variation in accordance with the
`server-side processing architecture illustrated in FIG. 1 a
`may be implemented by replacing voice input device 102,
`communications box 104, and client display device 112,
`with an integrated, mobile, information appliance 202 such
`as a cellular telephone or wireless personal digital assistant
`(wireless PDA). Mobile information appliance 202 essen-
`tially performs the functions of the replaced components.
`Thus, mobile information appliance 202 receives spoken
`natural language input requests from the user in the form of
`voice data, and transmits that data (preferably via wireless
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`data receiving station 204) across communications network
`206 for server-side interpretation of the request, in similar
`fashion as described above in connection with FIG. 1.
`
`Navigation of data source 210 and retrieval of desired
`information likewise proceeds in an analogous manner as
`described above. Display information transmitted electroni-
`cally back to the user across network 206is displayed for the
`user on the display of information appliance 202, and audio
`information is output through the appliance’s speakers.
`Practitioners will further appreciate, in light of the above
`teachings,
`that
`if mobile information appliance 202 is
`equipped with sufficient computational processing power,
`then a mobile variation of the client-side architecture exem-
`plified in FIG. 2 may similarly be implemented. In that case,
`the modules corresponding to request processing logic 300
`would be embodied locally in the computational resources
`of mobile information appliance 202, and the logical flow of
`data would otherwise follow in a manner analogousto that
`previously described in connection with FIG. 1b.
`As illustrated in FIG. 2, multiple users, each having their
`ownclient input device, may issue requests, simultaneously
`or otherwise, for navigation of data source 210. This is
`equally true (though not explicitly drawn) for the embodi-
`ments depicted in FIGS. la and 1b. Data source 210 (or
`100), being a network accessible information resource, has
`typically already been constructed to support access requests
`from simultaneous multiple network users, as known by
`practitioners of ordinary skill in the art. In the case of
`server-side speech processing, as exemplified in FIGS. la
`and 2,
`the interpretation logic and error correction logic
`modules are also preferably designed and implemented to
`support queuing and multi-tasking of requests from multiple
`simultaneous networkusers, as will be appreciated by those
`of skill in the art.
`It will be apparentto those skilled in the art that additional
`implementations, permutations and combinations of the
`embodimentsset forth in FIGS. 1a, 1b, and 2 may be created
`without straying from the scope and spirit of the present
`invention. For example, practitioners will understand,
`in
`light of the above teachings and design considerations, that
`it is possible to divide and allocate the functional compo-
`nents of request processing logic 300 between client and
`server. For example, speech recognition—in entirety, or
`perhapsjust early stages such as feature extraction—might
`be performed locally on the client end, perhaps to reduce
`bandwidth requirements, while natural language parsing and
`other necessary processing might be performed upstream on
`the server end, so that more extensive computational power
`need not be distributed locally to each client. In that case,
`corresponding portions of request processing logic 300, such
`as speech recognition engine 310 or portions thereof, would
`reside locally at
`the client as in FIG. 1b, while other
`component modules would be hosted at the server end as in
`FIGS. 1a and 2.
`
`Further, practitioners may choose to implement the each
`of the various embodiments described above on any number
`of different hardware and software computing platforms and
`environments and various combinationsthereof, including,
`by way of just a few examples: a general-purpose hardware
`microprocessor such as the Intel Pentium series; operating
`system software such as Microsoft Windows/CE, Palm OS,
`or Apple Mac OS(particularly for client devices and client-
`side processing), or Unix, Linux, or Windows/NT(thelatter
`three particularly for network data servers and server-side
`processing), and/or proprietary information access platforms
`such as Microsoft’s WebTV or the Diva Systems video-on-
`demand system.
`
`Page 12 of 21
`
`Page 12 of 21
`
`

`

`US 6,742,021 B1
`
`7
`2. Processing Methodology
`The present invention provides a spoken natural language
`interface for interrogation of remote electronic databases
`and retrieval of desired information. A preferred embodi-
`mentof the present invention utilizes the basic methodology
`outlined in the flow diagram of FIG. 4 in order to provide
`this interface. This methodology will now be discussed.
`a. Interpreting Spoken Natural Language Requests
`At step 402, the user’s spoken request for information is
`initially received in the form of raw (acoustic) voice data by
`a suitable input device, as previously discussed in connec-
`tion with FIGS. 1-2. At step 404 the voice data received
`from the user is interpreted in order to understand the user’s
`request for information. Preferably this step includes per-
`forming speech recognition in order to extract words from
`the voice data, and further includes natural language parsing
`of those words in order to generate a structured linguistic
`representation of the user’s request.
`Speech recognition in step 404 is performed using speech
`recognition engine 310. A variety of commercial quality,
`speech recognition engines are readily available on the
`market, as practitioners will know. For example, Nuance
`Communications offers a suite of speech recognition
`engines, including Nuance 6, its current flagship product,
`and Nuance Express, a lower cost package for entry-level
`applications. As one other example, IBM offers the ViaVoice
`speech recognition engine,
`including a low-cost shrink-
`wrapped version available through popular consumerdistri-
`bution channels. Basically, a speech recognition engine
`processes acoustic voice data and attempts to generate a text
`stream of recognized words.
`Typically, the speech recognition engine is provided with
`a vocabulary lexicon of likely words or phrases that the
`recognition engine can match against its analysis of acous-
`tical signals, for purposes of a given application. Preferably,
`the lexicon is dynamically adjusted to reflect the current user
`context, as established by the preceding user inputs. For
`example, if a user is engaged in a dialogue with the system
`about movie selection, the recognition engine’s vocabulary
`may preferably be adjusted to favor relevant words and
`phrases, such as a stored list of proper names for popular
`movie actors and directors, etc. Whereas if t

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket