throbber
(12) United States Patent
`(10) Patent N0.:
`US 6,757,718 B1
`
`Halverson et al.
`(45) Date of Patent:
`Jun. 29, 2004
`
`U5006757718B1
`
`(54) MOBILE NAVIGATION OF NETWORK-
`BASED ELECTRONIC INFORMATION
`USING SPOKEN INPUT
`
`(75)
`
`-
`-
`.
`.
`Inventors Christme Halyerson’ san Jose’ CA
`(US)’ Lu“ Juha’ Menlo Park’ CA (US)’
`Dimitris V0“t535> Thessaloniki (GR);
`Adam Cheyer, PalO Alto, CA(US)
`
`(73) Assignee: SRI International, Menlo Park, CA
`(us)
`
`5,721,938 A
`5,729,659 A
`5,748,974 A
`5,774,859 A
`5,794,050 A
`5,802,526 A
`5 805 775 A
`5,855,002 A
`5,890,123 A
`5,963,940 A
`
`2/1998 Stuckey ...................... 395/754
`3/1998 Potter .....
`.. 395/2.79
`5/1998 Johnson .........
`395/759
`6/1998 Houser et al.
`704/275
`8/1998 Dahlgren et al.
`395/708
`9/1998 Fawcett et al.
`707/104
`9/1998 Eberman et al.
`..... 395/12
`..
`12/1998 Armstrong ........
`704/270
`
`3/1999 Brown et al.
`..
`704/275
`................... 707/5
`10/1999 Liddy et al.
`
`
`
`.
`
`(List continued on next page.)
`
`( * ) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) bydays.days.
`
`EP
`WO
`WO
`
`FOREIGN PATENT DOCUMENTS
`0 867 861
`9/1998
`............. GlOL/5/06
`99/50826
`10/1999
`............. GlOL/3/OO
`00/05638
`2/2000
`
`(21) Appl- No: 09/608,872
`
`(22)
`
`Filed:
`
`Jun. 30: 2000
`
`Related U'S' Appllcatlon Data
`.
`.
`.
`.
`(63) Contmuatlon of apphcatlon No. 09/524,095, filed on Mar.
`13, 2000, which is a continuation—in—part of application No.
`09/225,198, filed on Jan. 5, 1999.
`Provisional application No. 60/124,720, filed on Mar. 17,
`1999, provisional application No. 60/124,719, filed on Mar.
`17, 1999, and provisional application No. 60/124,718, filed
`on Mar. 17, 1999.
`
`(60)
`
`Int. Cl.7 ................................................ G06F 15/16
`(51)
`(52) US. Cl.
`....................... 709/218; 709/202; 709/217;
`709/219; 709/227; 704/257
`_
`(58) Fleld of Search ................................. 709/202, 218,
`709/217’ 219’ 227; 707/5’ 3’ 4; 704/257’
`270'1’ 275’ 246
`
`(56)
`
`.
`References C1ted
`U.S. PATENT DOCUMENTS
`
`5,197,005 A
`5,386,556 A
`5,434,777 A
`5,519,608 A
`5,608,624 A
`
`3/1993 Shwartz et al.
`............. 364/419
`1/1995 Hedin et al.
`..... 395/600
`
`7/1995 Luciw .........
`364/419.13
`
`5/1996 Kupiec
`364/419.08
`3/1997 Luciw ........................ 395/794
`
`OTHER PUBLICATIONS
`
`International Search Report,
`07987.
`
`Intl Appl No. PCT/US01/
`
`Stent, Amanda et al., “The CommandTalk Spoken Dialogue
`System”, SRI International.
`
`(List continued on next page.)
`
`Primary Examiner—Frantz B Jean
`-
`(7114) Error/16y, . Agent, hor Firm
`S er1 an, LLP, Km-Wa Tong
`
`1‘1
`
`oser, Patterson &
`
`ABSTRACT
`(57)
`A system, method, and article of manufacture are provided
`.
`.
`.
`for nav1gat1ng an electromc data source by means of spoken
`language Where a portion Of the data link between a mobile
`information appliance of the user and the data source utilizes
`Wireless communication. When a spoken input request is
`received from a user Who is using the mobile information
`appliance, it is interpreted. The resulting interpretation of the
`request
`is thereupon used to automatically construct an
`operational navigation query to retrieve the desired infor-
`mation from one or more electronic network data sources,
`Which is transmitted to the mobile information appliance.
`
`27 Claims, 7 Drawing Sheets
`
`
`102 /
`
`.33) E
`
`
`
`m 3
`
`
`00 (see Fig, 3)
`
`1.1911
`
`
`
`Page 1 of18
`
`GOOGLE EXHIBIT 1001
`
`GOOGLE EXHIBIT 1001
`
`Page 1 of 18
`
`

`

`US 6,757,718 B1
`Page 2
`
`U.S. PATENT DOCUMENTS
`
`Dowding, John et al., “Interpreting Language in Context in
`CommandTalk”, Feb. 5, 1999, SRI International.
`
`http://WWW.ai.sri.com/~oaa/infowiz.html, InfoWiz: An Ani-
`mated Voice Interactive Information System, May 8, 2000.
`
`Dowding, John, “InterleaVing Syntax and Semantics in an
`Efficient Bottom—up Parser”, SRI International.
`
`Moore, Robert et al., “Combining Linguistic and Statistical
`Knowledge Sources in Natural—Language Processing for
`ATIS”, SRI International.
`
`Dowding, John et al., “Gemini: A Natural Language System
`For Spoken—Language Understanding”, SRI International.
`
`* cited by examiner
`
`6,003,072 A
`6,016,476 A
`6,026,388 A
`6,102,030 A
`6,173,279 B1
`6,192,338 B1
`6,314,365 B1
`6,317,684 B1
`6,349,257 B1
`
`6,353,661 B1 ******
`
`........... 709/218
`Gerritsen et al.
`Maes et al. .......... 705/1
`
`..... 707/1
`Liddy et al.
`
`..... 704/275
`Brown et al.
`........ 707/5
`LeVin et al.
`
`Haszto et al.
`..... 704/257
`
`Smith .................. 340/988
`
`..... 340/990
`Roeseler et al.
`
`
`Liu et al. .............. 340/56
`............... 379/8817
`Bailey, III
`OTHER PUBLICATIONS
`
`12/1999
`1/2000
`2/2000
`8/2000
`1/2001
`2/2001
`1 1/2001
`1 1/2001
`2/2002
`3/2002
`
`Moore, Robert et al., “CommandTalk: A Spoken—Language
`Interface for Battlefield Simulations”, Oct. 23, 1997, SRI
`International.
`
`Page 2 of 18
`
`Page 2 of 18
`
`

`

`US. Patent
`
`Jun. 29,2004
`
`Sheet 1 0f 7
`
`US 6,757,718 B1
`
`Jflfl
`
`
`
`Page 3 of 18
`
`
`
`300 (see Fig. 3)
`
`110
`
`Page 3 of 18
`
`

`

`US. Patent
`
`Jun. 29, 2004
`
`Sheet 2 0f 7
`
`US 6,757,718 B1
`
`/
`
`102
`
`104
`
`-
`
`300 (see Fig. 3)
`
`Network
`
`.W\\_— E 19g
`
`1g
`
`
`
`
`
`Fig. 1b
`
`Page 4 of 18
`
`Page 4 of 18
`
`

`

`US. Patent
`
`Jun. 29, 2004
`
`Sheet 3 0f 7
`
`US 6,757,718 B1
`
`. 3%
`II
`
`e”
`HiI!
`
`II
`
`
`
`
`
`
`
`fig
`#09.
`’Q'
`l
`N A
`
`Network
`
`20_8
`
`300 (see Fig. 3)
`
`210
`
`I01
`
`Fig. 2
`
`Page 5 of 18
`
`Page 5 of 18
`
`

`

`US. Patent
`
`Jun. 29, 2004
`
`Sheet 4 0f 7
`
`US 6,757,718 B1
`
`REQUEST PROCESSING LOGIC 300
`
`
`SPEECH RECOGNITION
`
`ENGINE
`
`NATURAL LANGUAGE
`
`PARSER
`
`QUERY REFINEMENT LOGIC
`
`QUERY CONSTRUCTION
`
`LOGIC
`
`Page 6 of 18
`
`Page 6 of 18
`
`

`

`US. Patent
`
`Jun. 29, 2004
`
`Sheet 5 0f 7
`
`US 6,757,718 B1
`
`402
`
`RECEIVE SPOKEN NL REQUEST
`
`404
`
`INTERPRET REQUEST
`
`405 IDENTIFY/SELECT DATA SOURCE
`
`406 CONSTRUCT NAVIGATION QUERY
`
`
`
`
`
` SOLICIT
`ADDITIONAL
`
`
`(MULTIMODAL)
`' USERINPUT
`
`412
`
`_4_Q_§
`
`NAVIGATE DATA SOURCE
`
`_
`
`
`
`REFINE
`QUERY?
`
`4L
`
`NO
`
`410
`—”'
`
`TRANSMIT AND DISPLAY TO
`CLIENT
`
`Fig. 4
`
`Page 7 of 18
`
`Page 7 of 18
`
`

`

`US. Patent
`
`Jun. 29, 2004
`
`Sheet 6 0f 7
`
`US 6,757,718 B1
`
`(from step 406, Fig. 4)
`
`SCRAPE THE ONLINE SCRIPTED FORM TO
`EXTRACT AN INPUT TEMPLATE
`
`INTERPRETATION OF STEP 404
`
`INSTANTIATE THE INPUT TEMPLATE USING
`
`(to step 407, Fig. 4)
`
`Fig. 5
`
`Page 8 of 18
`
`Page 8 of 18
`
`

`

`US. Patent
`
`Jun. 29, 2004
`
`Sheet7 0f7
`
`US 6,757,718 B1
`
` Fl
`
`mOH<t.:U<H_
`
`Fzmo<mo>
`
`0%mm;9235me103%
`
`
`
`I:mm<mE<QFzmo<422zoEzoommm
`ag3Em?Emo<
`
`
`Emw<OFEF>552#2522
`
`
`Em?a.:8me5%:5/3023
`aEmo<
`
`E5235mmi
`Emo<.
`
`mmm:
`
`mo<ummlrz_
`
`mHZmO<
`
`Om9>
`
`mm<m<k<o
`
`Hzm0<
`
`0mm
`
`Page 9 of 18
`
`Page 9 of 18
`
`
`
`
`

`

`US 6,757,718 B1
`
`1
`MOBILE NAVIGATION OF NETWORK-
`BASED ELECTRONIC INFORMATION
`USING SPOKEN INPUT
`
`This application is a continuation of an application
`entitled NAVIGATING NETWORK-BASED ELEC-
`TRONIC INFORMATION USING SPOKEN NATURAL
`LANGUAGE INPUT WITH MULTIMODAL ERROR
`FEEDBACK which was filed on Mar. 13, 2000 under Ser.
`No. 09/524,095 and which is a Continuation In Part of
`co-pending US. patent application Ser. No. 09/225,198,
`filed Jan. 5, 1999, Provisional US. patent application Ser.
`No. 60/124,718, filed Mar. 17, 1999, Provisional US. patent
`application Ser. No. 60/124,720, filed Mar. 17, 1999, and
`Provisional US. patent application Ser. No. 60/124,719,
`filed Mar. 17, 1999, from which applications priority is
`claimed and these application are incorporated herein by
`reference.
`
`BACKGROUND OF THE INVENTION
`
`The present invention relates generally to the navigation
`of electronic data by means of spoken natural language
`requests, and to feedback mechanisms and methods for
`resolving the errors and ambiguities that may be associated
`with such requests.
`As global electronic connectivity continues to grow, and
`the universe of electronic data potentially available to users
`continues to expand, there is a growing need for information
`navigation technology that allows relatively naive users to
`navigate and access desired data by means of natural lan-
`guage input. In many of the most
`important markets—
`including the home entertainment arena, as well as mobile
`computing—spoken natural
`language input
`is highly
`desirable, if not ideal. As just one example, the proliferation
`of high-bandwidth communications infrastructure for the
`home entertainment market (cable, satellite, broadband)
`enables delivery of movies-on-demand and other interactive
`multimedia content to the consumer’s home television set.
`
`For users to take full advantage of this content stream
`ultimately requires interactive navigation of content data-
`bases in a manner that is too complex for user-friendly
`selection by means of a traditional remote-control clicker.
`Allowing spoken natural language requests as the input
`modality for rapidly searching and accessing desired content
`is an important objective for a successful consumer enter-
`tainment product in a context offering a dizzying range of
`database content choices. As further examples, this same
`need to drive navigation of (and transaction with) relatively
`complex data warehouses using spoken natural language
`requests applies equally to surfing the Internet/Web or other
`networks for general information, multimedia content, or
`e-commerce transactions.
`
`In general, the existing navigational systems for browsing
`electronic databases and data warehouses (search engines,
`menus, etc.), have been designed without navigation via
`spoken natural language as a specific goal. So today’s world
`is full of existing electronic data navigation systems that do
`not assume browsing via natural spoken commands, but
`rather assume text and mouse-click inputs (or in the case of
`TV remote controls, even less). Simply recognizing voice
`commands within an extremely limited vocabulary and
`grammar—the spoken equivalent of button/click input (e.g.,
`speaking “channel 5” selects TV channel 5)—is really not
`sufficient by itself to satisfy the objectives described above.
`In order to deliver a true “win” for users, the voice-driven
`front-end must accept spoken natural language input in a
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`manner that is intuitive to users. For example, the front-end
`should not require learning a highly specialized command
`language or format. More fundamentally, the front-end must
`allow users to speak directly in terms of what the user
`ultimately wants —e.g., “I’d like to see a Western film
`directed by Clint Eastwood” —as opposed to speaking in
`terms of arbitrary navigation structures (e.g., hierarchical
`layers of menus, commands, etc.) that are essentially arti-
`facts reflecting constraints of the pre-existing text/click
`navigation system. At the same time,
`the front-end must
`recognize and accommodate the reality that a stream of
`naive spoken natural language input will, over time, typi-
`cally present a variety of errors and/or ambiguities: e.g.,
`garbled/unrecognized words (did the user say “Eastwood” or
`“Easter”?) and under-constrained requests (“Show me the
`Clint Eastwood movie”). An approach is needed for han-
`dling and resolving such errors and ambiguities in a rapid,
`user-friendly, non-frustrating manner.
`What
`is needed is a methodology and apparatus for
`rapidly constructing a voice-driven front-end atop an
`existing, non-voice data navigation system, whereby users
`can interact by means of intuitive natural language input not
`strictly conforming to the step-by-step browsing architecture
`of the existing navigation system, and wherein any errors or
`ambiguities in user input are rapidly and conveniently
`resolved. The solution to this need should be compatible
`with the constraints of a multi-user, distributed environment
`such as the Internet/Web or a proprietary high-bandwidth
`content delivery network; a solution contemplating one-at-
`a-time user interactions at a single location is insufficient, for
`example.
`
`SUMMARY OF THE INVENTION
`
`invention addresses the above needs by
`The present
`providing a system, method, and article of manufacture for
`mobile navigation of network-based electronic data sources
`in response to spoken input requests. When a spoken input
`request is received from a user using a mobile information
`appliance that communicates with a network server via an at
`least partially wireless communications system,
`it
`is
`interpreted, such as by using a speech recognition engine to
`extract speech data from acoustic voice signals, and using a
`language parser to linguistically parse the speech data. The
`interpretation of the spoken request can be performed on a
`computing device locally with the user, such as the mobile
`information appliance, or remotely from the user. The result-
`ing interpretation of the request is thereupon used to auto-
`matically construct an operational navigation query to
`retrieve the desired information from one or more electronic
`network data sources, which is then transmitted to a client
`device of the user. If the network data source is a database,
`the navigation query is constructed in the format of a
`database query language.
`Typically, errors or ambiguities emerge in the interpreta-
`tion of the spoken request, such that the system cannot
`instantiate a complete, valid navigational template. This is to
`be expected occasionally, and one preferred aspect of the
`invention is the ability to handle such errors and ambiguities
`in relatively graceful and user-friendly manner. Instead of
`simply rejecting such input and defaulting to traditional
`input modes or simply asking the user to try again, a
`preferred embodiment of the present
`invention seeks to
`converge rapidly toward instantiation of a valid navigational
`template by soliciting additional clarification from the user
`as necessary, either before or after a navigation of the data
`source, via multimodal input, i.e., by means of menu selec-
`tion or other input modalities including and in addition to
`
`Page 10 of 18
`
`Page 10 of 18
`
`

`

`US 6,757,718 B1
`
`3
`spoken input. This clarifying, multi-modal dialogue takes
`advantage of whatever partial navigational information has
`been gleaned from the initial interpretation of the user’s
`spoken request. This clarification process continues until the
`system converges toward an adequately instantiated navi-
`gational template, which is in turn used to navigate the
`network-based data and retrieve the user’s desired informa-
`tion. The retrieved information is transmitted across the
`
`network and presented to the user on a suitable client display
`device.
`
`10
`
`In a further aspect of the present invention, the construc-
`tion of the navigation query includes extracting an input
`template for an online scripted interface to the data source
`and using the input template to construct the navigation
`query. The extraction of the input
`template can include
`dynamically scraping the online scripted interface.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The invention, together with further advantages thereof,
`may best be understood by reference to the following
`description taken in conjunction with the accompanying
`drawings in which:
`FIG. 1a illustrates a system providing a spoken natural
`language interface for network-based information
`navigation,
`in accordance with an embodiment of the
`present invention with server-side processing of requests;
`FIG. 1b illustrates another system providing a spoken
`natural language interface for network-based information
`navigation,
`in accordance with an embodiment of the
`present invention with client-side processing of requests;
`FIG. 2 illustrates a system providing a spoken natural
`language interface for network-based information
`navigation,
`in accordance with an embodiment of the
`present invention for a mobile computing scenario;
`FIG. 3 illustrates the functional logic components of a
`request processing module in accordance with an embodi-
`ment of the present invention;
`FIG. 4 illustrates a process utilizing spoken natural lan-
`guage for navigating an electronic database in accordance
`with one embodiment of the present invention;
`FIG. 5 illustrates a process for constructing a navigational
`query for accessing an online data source via an interactive,
`scripted (e.g., CGI) form; and
`FIG. 6 illustrates an embodiment of the present invention
`utilizing a community of distributed, collaborating elec-
`tronic agents.
`
`DETAILED DESCRIPTION OF THE
`INVENTION
`
`1. System Architecture
`a. Server-End Processing of Spoken Input
`FIG. 1a is an illustration of a data navigation system
`driven by spoken natural language input, in accordance with
`one embodiment of the present invention. As shown, a user’s
`voice input data is captured by a voice input device 102,
`such as a microphone. Preferably voice input device 102
`includes a button or the like that can be pressed or held-
`down to activate a listening mode, so that the system need
`not continually pay attention to, or be confused by, irrelevant
`background noise. In one preferred embodiment well-suited
`for the home entertainment setting, voice input device 102
`is a portable remote control device with an integrated
`microphone, and the voice data is transmitted from device
`102 preferably via infrared (or other wireless) link to com-
`munications box 104 (e.g., a set-top box or a similar
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`communications device that is capable of retransmitting the
`raw voice data and/or processing the voice data) local to the
`user’s environment and coupled to communications network
`106. The voice data is then transmitted across network 106
`
`to a remote server or servers 108. The voice data may
`preferably be transmitted in compressed digitized form, or
`alternatively—particularly where bandwidth constraints are
`significant—in analog format (e.g., via frequency modulated
`transmission), in the latter case being digitized upon arrival
`at remote server 108.
`
`the voice data is processed by
`At remote server 108,
`request processing logic 300 in order to understand the
`user’s request and construct an appropriate query or request
`for navigation of remote data source 110, in accordance with
`the interpretation process exemplified in FIG. 4 and FIG. 5
`and discussed in greater detail below. For purposes of
`executing this process, request processing logic 300 com-
`prises functional modules including speech recognition
`engine 310, natural language (NL) parser 320, query con-
`struction logic 330, and query refinement
`logic 340, as
`shown in FIG. 3. Data source 110 may comprise database(s),
`Internet/web site(s), or other electronic information
`repositories, and preferably resides on a central server or
`servers—which may or may not be the same as server 108,
`depending on the storage and bandwidth needs of the
`application and the resources available to the practitioner.
`Data source 110 may include multimedia content, such as
`movies or other digital video and audio content, other
`various forms of entertainment data, or other electronic
`information. The contents of data source 110 are
`
`navigated—i.e., the contents are accessed and searched, for
`retrieval of the particular information desired by the user—
`using the processes of FIGS. 4 and 5 as described in greater
`detail below.
`Once the desired information has been retrieved from data
`
`source 110, it is electronically transmitted via network 106
`to the user for viewing on client display device 112. In a
`preferred embodiment well-suited for the home entertain-
`ment setting, display device 112 is a television monitor or
`similar audiovisual entertainment device, typically in sta-
`tionary position for comfortable viewing by users.
`In
`addition, in such preferred embodiment, display device 112
`is coupled to or integrated with a communications box
`(which is preferably the same as communications box 104,
`but may also be a separate unit) for receiving and decoding/
`formatting the desired electronic information that is received
`across communications network 106.
`
`Network 106 is a two-way electronic communications
`network and may be embodied in electronic communication
`infrastructure including coaxial (cable television)
`lines,
`DSL, fiber-optic cable,
`traditional copper wire (twisted
`pair), or any other type of hardwired connection. Network
`106 may also include a wireless connection such as a
`satellite-based connection, cellular connection, or other type
`of wireless connection. Network 106 may be part of the
`Internet and may support TCP/IP communications, or may
`be embodied in a proprietary network, or in any other
`electronic communications network infrastructure, whether
`packet-switched or connection-oriented. A design consider-
`ation is that network 106 preferably provide suitable band-
`width depending upon the nature of the content anticipated
`for the desired application.
`b. Client-End Processing of Spoken Input
`FIG. 1b is an illustration of a data navigation system
`driven by spoken natural language input, in accordance with
`a second embodiment of the present invention. Again, a
`user’s voice input data is captured by a voice input device
`
`Page 11 0f18
`
`Page 11 of 18
`
`

`

`US 6,757,718 B1
`
`5
`102, such as a microphone. In the embodiment shown in
`FIG. 1b, the voice data is transmitted from device 202 to
`requests processing logic 300, hosted on a local speech
`processor, for processing and interpretation. In the preferred
`embodiment illustrated in FIG. 1b, the local speech proces-
`sor is conveniently integrated as part of communications box
`104, although implementation in a physically separate (but
`communicatively coupled) unit is also possible as will be
`readily apparent to those of skill in the art. The voice data is
`processed by the components of request processing logic
`300 in order to understand the user’s request and construct
`an appropriate query or request for navigation of remote data
`source 110, in accordance with the interpretation process
`exemplified in FIGS. 4 and 5 as discussed in greater detail
`below.
`
`The resulting navigational query is then transmitted elec-
`tronically across network 106 to data source 110, which
`preferably resides on a central server or servers 108. As in
`FIG. 1a, data source 110 may comprise database(s), Internet/
`web site(s), or other electronic information repositories, and
`preferably may include multimedia content, such as movies
`or other digital video and audio content, other various forms
`of entertainment data, or other electronic information. The
`contents of data source 110 are then navigated—i.e.,
`the
`contents are accessed and searched, for retrieval of the
`particular information desired by the user—preferably using
`the process of FIGS. 4 and 5 as described in greater detail
`below. Once the desired information has been retrieved from
`
`10
`
`15
`
`20
`
`25
`
`data source 110, it is electronically transmitted via network
`106 to the user for viewing on client display device 112.
`In one embodiment
`in accordance with FIG. 1b and
`
`30
`
`well-suited for the home entertainment setting, voice input
`device 102 is a portable remote control device with an
`integrated microphone, and the voice data is transmitted
`from device 102 preferably via infrared (or other wireless)
`link to the local speech processor. The local speech proces-
`sor is coupled to communications network 106, and also
`preferably to client display device 112 (especially for pur-
`poses of query refinement transmissions, as discussed below
`in connection with FIG. 4, step 412), and preferably may be
`integrated within or coupled to communications box 104. In
`addition, especially for purposes of a home entertainment
`application, display device 112 is preferably a television
`monitor or similar audiovisual entertainment device, typi-
`cally in stationary position for comfortable viewing by
`users. In addition, in such preferred embodiment, display
`device 112 is coupled to a communications box (which is
`preferably the same as communications box 104, but may
`also be a physically separate unit)
`for receiving and
`decoding/formatting the desired electronic information that
`is received across communications network 106.
`
`Design considerations favoring server-side processing
`and interpretation of spoken input requests, as exemplified
`in FIG. 1a, include minimizing the need to distribute costly
`computational hardware and software to all client users in
`order to perform speech and language processing. Design
`considerations favoring client-side processing, as exempli-
`fied in FIG. 1b, include minimizing the quantity of data sent
`upstream across the network from each client, as the speech
`recognition is performed before transmission across the
`network and only the query data and/or request needs to be
`sent, thus reducing the upstream bandwidth requirements.
`c. Mobile Client Embodiment
`
`Amobile computing embodiment of the present invention
`may be implemented by practitioners as a variation on the
`embodiments of either FIG. 1a or FIG. 1b. For example, as
`depicted in FIG. 2, a mobile variation in accordance with the
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`server-side processing architecture illustrated in FIG. 1a
`may be implemented by replacing voice input device 102,
`communications box 104, and client display device 112,
`with an integrated, mobile, information appliance 202 such
`as a cellular telephone or wireless personal digital assistant
`(wireless PDA). Mobile information appliance 202 essen-
`tially performs the functions of the replaced components.
`Thus, mobile information appliance 202 receives spoken
`natural language input requests from the user in the form of
`voice data, and transmits that data (preferably via wireless
`data receiving station 204) across communications network
`206 for server-side interpretation of the request, in similar
`fashion as described above in connection with FIG. 1.
`
`Navigation of data source 210 and retrieval of desired
`information likewise proceeds in an analogous manner as
`described above. Display information transmitted electroni-
`cally back to the user across network 206 is displayed for the
`user on the display of information appliance 202, and audio
`information is output through the appliance’s speakers.
`Practitioners will further appreciate, in light of the above
`teachings,
`that
`if mobile information appliance 202 is
`equipped with sufficient computational processing power,
`then a mobile variation of the client-side architecture exem-
`
`plified in FIG. 2 may similarly be implemented. In that case,
`the modules corresponding to request processing logic 300
`would be embodied locally in the computational resources
`of mobile information appliance 202, and the logical flow of
`data would otherwise follow in a manner analogous to that
`previously described in connection with FIG. 1b.
`As illustrated in FIG. 2, multiple users, each having their
`own client input device, may issue requests, simultaneously
`or otherwise, for navigation of data source 210. This is
`equally true (though not explicitly drawn) for the embodi-
`ments depicted in FIGS. 1a and 1b. Data source 210 (or
`100), being a network accessible information resource, has
`typically already been constructed to support access requests
`from simultaneous multiple network users, as known by
`practitioners of ordinary skill in the art. In the case of
`server-side speech processing, as exemplified in FIGS. 1a
`and 2,
`the interpretation logic and error correction logic
`modules are also preferably designed and implemented to
`support queuing and multi-tasking of requests from multiple
`simultaneous network users, as will be appreciated by those
`of skill in the art.
`
`It will be apparent to those skilled in the art that additional
`implementations, permutations and combinations of the
`embodiments set forth in FIGS. 1a, 1b, and 2 may be created
`without straying from the scope and spirit of the present
`invention. For example, practitioners will understand,
`in
`light of the above teachings and design considerations, that
`it is possible to divide and allocate the functional compo-
`nents of request processing logic 300 between client and
`server. For example, speech recognition—in entirety, or
`perhaps just early stages such as feature extraction—might
`be performed locally on the client end, perhaps to reduce
`bandwidth requirements, while natural language parsing and
`other necessary processing might be performed upstream on
`the server end, so that more extensive computational power
`need not be distributed locally to each client. In that case,
`corresponding portions of request processing logic 300, such
`as speech recognition engine 310 or portions thereof, would
`reside locally at
`the client as in FIG. 1b, while other
`component modules would be hosted at the server end as in
`FIGS. 1a and 2.
`
`Further, practitioners may choose to implement the each
`of the various embodiments described above on any number
`of different hardware and software computing platforms and
`
`Page 12 0f18
`
`Page 12 of 18
`
`

`

`US 6,757,718 B1
`
`7
`environments and various combinations thereof, including,
`by way of just a few examples: a general-purpose hardware
`microprocessor such as the Intel Pentium series; operating
`system software such as Microsoft Windows/CE, Palm OS,
`or Apple Mac OS (particularly for client devices and client-
`side processing), or Unix, Linux, or Windows/NT (the latter
`three particularly for network data servers and server-side
`processing), and/or proprietary information access platforms
`such as Microsoft’s WebTV or the Diva Systems video-on-
`demand system.
`2. Processing Methodology
`The present invention provides a spoken natural language
`interface for interrogation of remote electronic databases
`and retrieval of desired information. A preferred embodi-
`ment of the present invention utilizes the basic methodology
`outlined in the flow diagram of FIG. 4 in order to provide
`this interface. This methodology will now be discussed.
`a. Interpreting Spoken Natural Language Requests
`At step 402, the user’s spoken request for information is
`initially received in the form of raw (acoustic) voice data by
`a suitable input device, as previously discussed in connec-
`tion with FIGS. 1—2. At step 404 the voice data received
`from the user is interpreted in order to understand the user’s
`request for information. Preferably this step includes per-
`forming speech recognition in order to extract words from
`the voice data, and further includes natural language parsing
`of those words in order to generate a structured linguistic
`representation of the user’s request.
`Speech recognition in step 404 is performed using speech
`recognition engine 310. A variety of commercial quality,
`speech recognition engines are readily available on the
`market, as practitioners will know. For example, Nuance
`Communications offers a suite of speech recognition
`engines, including Nuance 6, its current flagship product,
`and Nuance Express, a lower cost package for entry-level
`applications. As one other example, IBM offers the ViaVoice
`speech recognition engine,
`including a low-cost shrink-
`wrapped version available through popular consumer distri-
`bution channels. Basically, a speech recognition engine
`processes acoustic voice data and attempts to generate a text
`stream of recognized words.
`Typically, the speech recognition engine is provided with
`a vocabulary lexicon of likely words or phrases that the
`recognition engine can match against its analysis of acous-
`tical signals, for purposes of a given application. Preferably,
`the lexicon is dynamically adjusted to reflect the current user
`context, as established by the preceding user inputs. For
`example, if a user is engaged in a dialogue with the system
`about movie selection, the recognition engine’s vocabulary
`may preferably be adjusted to favor relevant words and
`phrases, such as a stored list of proper names for popular
`movie actors and directors, etc. Whereas if the current
`dialogue involves selection and viewing of a sports event,
`the engine’s vocabulary might preferably be adjusted to
`favor a stored list of proper names for professional sports
`teams, etc.
`In addition, a speech recognition engine is
`provided with language models that help the engine predict
`the most likely interpretation of a given segment of acous-
`tical voice data, in the current context of phonemes or words
`in which the segment appears. In addition, speech recogni-
`tion engines often echo to the user, in more or less real-time,
`a transcription of the engine’s best guess at what the user has
`said, giving the user an opportunity to confirm or reject.
`In a further aspect of step 404, natural language inter-
`preter (or parser) 320 linguistically parses and interprets the
`textual output of the speech recognition engine. In a pre-
`ferred embodiment of the present invention,
`the natural-
`
`8
`language interpreter attempts to determine both the meaning
`of spoken words (semantic processing) as well as the
`grammar of the statement (syntactic processing), such as the
`Gemini Natural Language Understanding System developed
`by SRI International. The Gemini system is described in
`detail in publications entitled “Gemini: A Natural Language
`System for Spoken-Language Understanding” and

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket