`
`7
`8
`language interpreter attempts to determine both the meaning
`environments and various combinations thereof, including,
`by way of just a few examples: a general-purpose hardware
`of spoken words (semantic processing) as well as the
`grammar of the statement (syntactic processing), such as the
`microprocessor such as the Intel Pentium series; operating
`Gemini Natural Language Understanding System developed
`system software such as Microsoft Windows/CE, Palm OS,
`or Apple Mac OS (particularly for client devices and client- 5 by SRI International. The Gemini system is described in
`side processing), or Unix, Linux, or Windows/NT (the latter
`detail in publications entitled "Gemini: A Natural Language
`three particularly for network data servers and server-side
`System for Spoken-Language Understanding" and "Inter-
`processing), and/or proprietary information access platforms
`leaving Syntax and Semantics in an Efficient Bottom-Up
`such as Microsoft's WebTV or the Diva Systems video-on-
`Parser," both of which are currently available online at
`demand system.
`lo http://www.ai.sri.com/natural-language/projects/arpa-sls/
`2. Processing Methodology
`nat-lang.html. (Copies of those publications are also
`The present invention provides a spoken natural language
`included in an information disclosure statement submitted
`interface for interrogation of remote electronic databases
`herewith, and are incorporated herein by this reference).
`and retrieval of desired information. A preferred embodi-
`Briefly, Gemini applies a set of syntactic and semantic
`ment of the present invention utilizes the basic methodology 15 grammar rules to a word string using a bottom-up parser to
`outlined in the flow diagram of FIG. 4 in order to provide
`generate a logical form, which is a structured representation
`this interface. This methodology will now be discussed.
`of the context-independent meaning of the string. Gemini
`a. Interpreting Spoken Natural Language Requests
`can be used with a variety of grammars, including general
`At step 402, the user's spoken request for information is
`English grammar as well as application-specific grammars.
`initially received in the form of raw (acoustic) voice data by 20 The Gemini parser is based on "unification grammar,"
`a suitable input device, as previously discussed in connec-
`meaning that grammatical categories incorporate features
`tion with FIGS. 1-2. At step 404 the voice data received
`that can be assigned values; so that when grammatical
`from the user is interpreted in order to understand the user's
`category expressions are matched in the course of parsing or
`request for information. Preferably this step includes per-
`semantic interpretation, the information contained in the
`forming speech recognition in order to extract words from 25 features is combined, and if the feature values are incom-
`the voice data, and further includes natural language parsing
`patible the match fails.
`of those words in order to generate a structured linguistic
`It is possible for some applications to achieve a significant
`representation of the user's request.
`reduction in speech recognition error by using the natural-
`Speech recognition in step 404 is performed using speech
`language processing system to re-score recognition hypoth-
`recognition engine 310. A variety of commercial quality, 30 eses. For example, the grammars defined for a language
`speech recognition engines are readily available on the
`parser like Gemini may be compiled into context-free gram-
`market, as practitioners will know. For example, Nuance
`mar that, in turn, can be used directly as language models for
`Communications offers a suite of speech recognition
`speech recognition engines like the Nuance recognizer.
`engines, including Nuance 6, its current flagship product,
`Further details on this methodology are provided in the
`and Nuance Express, a lower cost package for entry-level 35 publication "Combining Linguistic and Statistical Knowl-
`edge Sources in Natural-Language Processing for ATIS"
`applications. As one other example, IBM offers the ViaVoice
`speech recognition engine, including a low-cost shrink-
`which is currently available online
`through http://
`wrapped version available through popular consumer distri-
`www.ai.sri.com/natural-language/projects/arpa-sls/spnl-
`bution channels. Basically, a speech recognition engine
`int.html. A copy of this publication is included in an infor-
`processes acoustic voice data and attempts to generate a text 40 mation disclosure submitted herewith, and is incorporated
`stream of recognized words.
`herein by this reference.
`Typically, the speech recognition engine is provided with
`In an embodiment of the present invention that may be
`a vocabulary lexicon of likely words or phrases that the
`preferable for some applications, the natural language inter-
`recognition engine can match against its analysis of acous-
`preter "learns" from the past usage patterns of a particular
`tical signals, for purposes of a given application. Preferably, 45 user or of groups of users. In such an embodiment, the
`the lexicon is dynamically adjusted to reflect the current user
`successfully interpreted requests of users are stored, and can
`context, as established by the preceding user inputs. For
`then be used to enhance accuracy by comparing a current
`example, if a user is engaged in a dialogue with the system
`request to the stored requests, thereby allowing selection of
`about movie selection, the recognition engine's vocabulary
`a most probable result.
`may preferably be adjusted to favor relevant words and 50
`b. Constructing Navigation Queries
`In step 405 request processing logic 300 identifies and
`phrases, such as a stored list of proper names for popular
`movie actors and directors, etc. Whereas if the current
`selects an appropriate online data source where the desired
`dialogue involves selection and viewing of a sports event,
`information (in this case, current weather reports for a given
`the engine's vocabulary might preferably be adjusted to
`city) can be found. Such selection may involve look-up in a
`favor a stored list of proper names for professional sports 55 locally stored table, or possibly dynamic searching through
`teams, etc. In addition, a speech recognition engine is
`an online search engine, or other online search techniques.
`provided with language models that help the engine predict
`For some applications, an embodiment of the present inven-
`the most likely interpretation of a given segment of acous-
`tion may be implemented in which only access to a particu-
`tical voice data, in the current context of phonemes or words
`lar data source (such as a particular vendor's proprietary
`in which the segment appears. In addition, speech recogni- 60 content database) is supported; in that case, step 405 may be
`tion engines often echo to the user, in more or less real-time,
`trivial or may be eliminated entirely.
`a transcription of the engine's best guess at what the user has
`Step 406 attempts to construct a navigation query, reflect-
`said, giving the user an opportunity to confirm or reject.
`ing the interpretation of step 404. This operation is prefer-
`In a further aspect of step 404, natural language inter-
`ably performed by query construction logic 330.
`preter (or parser) 320 linguistically parses and interprets the 65
`A "navigation query" means an electronic query, form,
`textual output of the speech recognition engine. In a pre-
`series of menu selections, or the like; being structured
`ferred embodiment of the present invention, the natural-
`appropriately so as to navigate a particular data source of
`
`Petitioner Microsoft Corporation - Ex. 1008, p. 695
`
`
`
`US 6,757,718 BI
`
`interest in search of desired information. In other words, a
`navigation query is constructed such that it includes what-
`ever content and structure is required in order to access
`desired information electronically from a particular database
`or data source of interest.
`For example, for many existing electronic databases, a
`navigation query can be embodied using a formal database
`query language such as Standard Query Language (SQL).
`For many databases, a navigation query can be constructed
`through a more user-friendly interactive front-end, such as a
`series of menus and/or interactive forms to be selected or
`filled in. SQL is a standard interactive and programming
`language for getting information from and updating a data-
`base. SQL is both an ANSI and an ISO standard. As is well
`known to practitioners, a Relational Database Management
`System (RDBMS), such as Microsoft's Access, Oracle's
`Oracle7, and Computer Associates' CA-OpenIngres, allow
`programmers to create, update, and administer a relational
`database. Practitioners of ordinary skill in the art will be
`thoroughly familiar with the notion of database navigation
`through structured query, and will be readily able to appre-
`ciate and utilize the existing data structures and navigational
`mechanisms for a given database, or to create such structures
`and mechanisms where desired.
`In accordance with the present invention, the query con-
`structed in step 406 must reflect the user's request as
`interpreted by the speech recognition engine and the NL
`parser in step 404. In embodiments of the present invention
`wherein data source 110 (or 210 in the corresponding
`embodiment of FIG. 2) is a structured relational database or
`the like, step 406 of the present invention may entail
`constructing an appropriate Structured Query Language
`like, or automatically filling out a
`(SQL) query or the
`front-end query form, series of menus or the
`like, as
`described above.
`In many existing Internet (and Intranet) applications, an
`online electronic data source is accessible to users only
`through the medium of interaction with a so-called Common
`Gateway Interface (CGI) script. Typically the user who
`visits a web site of this nature must fill in the fields of an
`online interactive form. The online form is in turn linked to
`a CGI script, which transparently handles actual navigation
`of the associated data source and produces output for
`viewing by the user's web browser. In other words, direct
`user access to the data source is not supported, only medi-
`ated access through the form and CGI script is offered.
`For applications of this nature, an advantageous embodi-
`ment of the present invention "scrapes" the scripted online
`site where information desired by a user may be found in
`order to facilitate construction of an effective navigation
`query. For example, suppose that a user's spoken natural
`language request is: "What's the weather in Miami?" After
`this request is received at step 402 and interpreted at step
`404, assume
`that step 405 determines that the desired
`weather information is available online through the medium
`of a CGI-scripted interactive form. Step 406 is then prefer-
`ably carried out using the expanded process diagrammed in
`FIG. 5. In particular, at sub-step 520, query construction
`logic 330 electronically "scrapes" the online interactive
`form, meaning that query construction logic 330 automati-
`cally extracts the format and structure of input fields
`accepted by the online form. At sub-step 522, a navigation
`query is then constructed by instantiating (filling in) the
`extracted input format-essentially an electronic template-
`in a manner reflecting the user's request for information as
`interpreted in step 404. The flow of control then returns to
`step 407 of FIG. 4. Ultimately, when the query thus con-
`
`20
`
`structed by scraping is used to navigate the online data
`source in step 408, the query effectively initiates the same
`scripted response as if a human user had visited the online
`site and had typed appropriate entries into the input fields of
`5 the online form.
`In the embodiment just described, scraping step 520 is
`preferably carried out with the assistance of an online
`extraction utility such as WebL. WebL is a scripting lan-
`guage for automating tasks on the World Wide Web. It is an
`l0 imperative, interpreted language that has built-in support for
`common web protocols like HTTP and FTP, and popular
`data types like HTML and XML. WebL's implementation
`language is Java, and the complete source code is available
`from Compaq. In addition, step 520 is preferably performed
`15 dynamically when necessary-in other words, on-the-fly in
`response to a particular user query-but in some applica-
`tions it may be possible to scrape relatively stable
`(unchanging) web sites of likely interest in advance and to
`cache the resulting template information.
`It will be apparent, in light of the above teachings, that
`preferred embodiments of the present invention can provide
`a spoken natural language interface atop an existing, non-
`voice data navigation system, whereby users can interact by
`means of intuitive natural language input not strictly con-
`25 forming to the linear browsing architecture or other artifacts
`of an existing menu/text/click navigation system. For
`example, users of an appropriate embodiment of the present
`invention for a video-on-demand application can directly
`speak the natural request: "Show me
`the movie
`'Unforgiven"'-instead of walking step-by-step through a
`typically linear sequence of genre/title/actor/director menus,
`scrolling and selecting from potentially long lists on each
`menu, or instead of being forced to use an alphanumeric
`keyboard that cannot be as comfortable to hold or use as a
`35 lightweight remote control. Similarly, users of an appropri-
`ate embodiment of the present invention for a web-surfing
`application in accordance with the process shown in FIG. 5
`can directly speak the natural request: "Show me a one-
`month price chart for Microsoft stock" -instead of poten-
`40 tially having to navigate to an appropriate web site, search
`for the right ticker symbol, enter/select the symbol, and
`specify display of the desired one-month price chart, each of
`those steps potentially involving manual navigation and data
`entry to one or more different interaction screens. (Note that
`45 these examples are offered to illustrate some of the potential
`benefits offered by appropriate embodiments of the present
`invention, and not to limit the scope of the invention in any
`respect.)
`c. Error Correction
`Several problems can arise when attempting to perform
`searches based on spoken natural language input. As indi-
`cated at decision step 407 in the process of FIG. 4, certain
`deficiencies may be identified during the process of query
`construction, before search of the data source is even
`55 attempted. For example, the user's request may fail to
`specify enough information in order to construct a naviga-
`tion query that is specific enough to obtain a satisfactory
`search result. For example, a user might orally request
`"what's the weather?" whereas the national online data
`60 source identified in step 405 and scraped in step 520 might
`require specifying a particular city.
`Additionally, certain deficiencies and problems may arise
`following the navigational search of the data source at step
`408, as indicated at decision step 409 in FIG. 4. For
`65 example, with reference to a video-on-demand application,
`a user may wish to see the movie "Unforgiven", but perhaps
`the user can't recall name of the film, but knows it was
`
`30
`
`50
`
`Petitioner Microsoft Corporation - Ex. 1008, p. 696
`
`
`
`US 6,757,718 BI
`
`directed by and starred actor Clint Eastwood. A typical
`video-on-demand database might indeed be expected to
`allow queries specifying the name of a leading actor and/or
`director, but in the case of this query-as in many cases-
`that will not be enough to narrow the search to a single film,
`and additional user input in some form is required.
`In the event that one or more deficiencies in the user's
`spoken request, as processed, result in the problems
`described, either at step 407 or 409, some form of error
`handling is in order. A straightforward, crude technique
`might be for the system to respond simply "input not
`understood/insufficient, please try again." However, that
`approach will likely result in frustrated users, and is not
`optimal or even acceptable for most applications. Instead, a
`preferred technique in accordance with the present invention
`handles such errors and deficiencies in user input at step 412,
`whether detected at step 407 or step 409, by soliciting
`additional input from the user in a manner taking advantage
`of the partial construction already performed and via user
`interface modalities in addition to spoken natural language
`("multi-modality"). This supplemental interaction is prefer-
`ably conducted through client display device 112 (202, in the
`embodiment of FIG. 2), and may include textual, graphical,
`audio and/or video media. Further details and examples are
`provided below. Query refinement logic 340 preferably
`carries out step 412. The additional input received from the
`user is fed into and augments interpreting step 404, and
`query construction step 406 is likewise repeated with the
`benefit of the augmented interpretation. These operations,
`and subsequent navigation step 408, are preferably repeated
`until no remaining problems or deficiencies are identified at
`decision points 407 or 409. Further details and examples for
`this query refinement process are provided immediately
`below.
`Consider again the example in which the user of a
`video-on-demand application wishes to see "Unforgiven"
`but can only recall that it was directed by and starred Clint
`Eastwood. First, it bears noting that using a prior art navi-
`gational interface, such as a conventional menu interface,
`will likely be relatively tedious in this case. The user can
`proceed through a sequence of menus, such as Genre (select
`"western"), Title (skip), Actor ("Clint Eastwood"), and
`Director ("Clint Eastwood"). In each case-especially for
`the last two items-the user would typically scroll and select
`from fairly long lists in order to enter his or her desired
`name, or perhaps use a relatively couch-unfriendly keypad
`to manually type the actor's name twice.
`Using a preferred embodiment of the present invention,
`the user instead speaks aloud, holding remote control micro-
`phone 102, "1 want to see that movie starring and directed
`by Clint Eastwood. Can't remember the title." At step 402
`the voice data is received. At step 404 the voice data is
`interpreted. At step 405 an appropriate online data source is
`selected (or perhaps the system is directly connected to a
`proprietary video-on-demand provider). At step 406 a query
`is automatically constructed by the query construction logic
`330 specifying "Clint Eastwood" in both the actor and
`director fields. Step 407 detects no obvious problems, and so
`the query is electronically submitted and the data source is
`navigated at step 408, yielding a list of several records
`satisfying the query (e.g., "Unforgiven", "True Crime",
`"Absolute Power", etc.). Step 409 detects that additional
`user input is needed to further refine the query in order to
`select a particular film for viewing.
`At that point, in step 412 query refinement logic 340
`might preferably generate a display for client display device
`112 showing the (relatively short) list of film titles that
`
`20
`
`satisfy the user's stated constraints. The user can then
`preferably use a relatively convenient input modality, such
`as buttons on the remote control, to select the desired title
`from the menu. In a further preferred embodiment, the first
`5 title on the list is highlighted by default, so that the user can
`simply press an "OK" button to choose that selection. In a
`further preferred feature, the user can mix input modalities
`by speaking a response like "I want number one on the list."
`Alternatively, the user can preferably say, "Let's see
`l0 Unforgiven," having now been reminded of the title by the
`menu display.
`Utilizing the user's supplemental input, request process-
`ing logic 300 iterates again through steps 404 and 406, this
`time constructing a fully-specified query that specifically
`15 requests the Eastwood film "Unforgiven." Step 408 navi-
`gates the data source using that query and retrieves the
`desired film, which is then electronically transmitted in step
`410 from network server 108 to client display device 112 via
`communications network 106.
`Now consider again the example in which the user of a
`web surfing application wants to know his or her local
`weather, and simply asks, "what's the weather?" At step 402
`the voice data is received. At step 404 the voice data is
`interpreted. At step 405 an online web site providing current
`25 weather information for major cities around the world is
`selected. At step 406 and sub-step 520, the online site is
`scraped using a WebL-style tool to extract an input template
`for interacting with the site. At sub-step 522, query con-
`struction logic 330 attempts to construct a navigation query
`3o by instantiating the input template, but determines (quite
`rightly) that a required field-name of city-cannot be
`determined from the user's spoken request as interpreted in
`step 404. Step 407 detects this deficiency, and in step 412
`query refinement logic 340 preferably generates output for
`35 client display device 112 soliciting the necessary supple-
`mental input. In a preferred embodiment, the output might
`display the name of the city where the user is located
`highlighted by default. The user can then simply press an
`"OK" button-or perhaps mix modalities by saying "yes,
`4o exactly" -to choose that selection. Apreferred embodiment
`would further display an alphabetical scrollable menu listing
`other major cities, and/or invite the user to speak or select
`the name of the desired city.
`input,
`Here again, utilizing the user's supplemental
`45 request processing logic 300 iterates through steps 404 and
`406. This time, in performing sub-step 520, a cached version
`of the input template already scraped in the previous itera-
`tion might preferably be retrieved. In sub-step 522, query
`construction logic 330 succeeds this time in instantiating the
`50 input template and constructing an effective query, since the
`desired city has now been clarified. Step 408 navigates the
`data source using that query and retrieves
`the desired
`weather information, which is then electronically transmit-
`ted in step 410 from network server 108 to client display
`55 device 112 via communications network 106.
`It is worth noting that in some instances, there may be
`details that are not explicitly provided by the user, but that
`query construction logic 330 or query refinement logic 340
`may preferably deduce on their own through reasonable
`6o assumptions, rather than requiring the use to provide explicit
`clarification. For example,
`in the example previously
`described regarding a request for a weather report, in some
`applications it might be preferable for the system to simply
`assume that the user means a weather report for his or her
`65 home area and to retrieve that information, if the cost of
`doing so is not significantly greater than the cost of asking
`the user to clarify the query. Making such an assumption
`
`Petitioner Microsoft Corporation - Ex. 1008, p. 697
`
`
`
`US 6,757,718 B1
`
`www.ai.sri.com/-lesaf/commandtalk.html and in the follow-
`ing publications, copies of which are provided in an Infor-
`mation Disclosure Statement submitted herewith and
`incorporated herein by this reference:
`"CommandTalk: A Spoken-Language Interface for Battle-
`field Simulations", 1997, by Robert Moore, John
`Dowding, Harry Bratt, J. Mark Gawron, Yonael Gorfu
`and Adam Cheyer, in "Proceedings of the Fifth Con-
`ference on Applied Natural Language Processing",
`Washington, D.C., pp. 1-7, Association for Computa-
`tional Linguistics
`"The CommandTalk Spoken Dialogue System", 1999, by
`Amanda Stent, John Dowding, Jean Mark Gawron,
`Elizabeth Owen Bratt and Robert Moore, in "Proceed-
`ings of the Thirty-Seventh Annual Meeting of the
`ACL", pp. 183-190, University of Maryland, College
`Park, Md., Association for Computational Linguistics
`
`5
`
`10
`
`15
`
`might be even more strongly justified in a preferred
`embodiment, as described earlier, where user histories are
`tracked, and where such history indicates that a particular
`user or group of users typically expect local information
`when asking for a weather forecast. At any rate, in the event
`such an assumption is made, if the user actually intended to
`request the weather for a different city, the user would then
`need to ask his or her question again. It will be apparent to
`practitioners, in light of the above teachings, that the choice
`of whether to program query construction logic 330 and
`query refinement logic 340 to make make particular assump-
`tions will typically involve trade-offs involving user con-
`veience that can be assessed in the context of specific
`applications.
`3. Open Agent Architecture (OAA®)
`Open Agent ArchitectureM
`(OAA®) is a software
`platform, developed by the assignee of the present invention,
`that enables effective, dynamic collaboration among com-
`munities of distributed electronic agents. OAA is described
`in greater detail in co-pending U.S. patent application Ser.
`No. 09/225,198, which has been incorporated herein by
`reference. Very briefly, the functionality of each client agent
`is made available to the agent community through registra-
`tion of the client agent's capabilities with a facilitator. A
`software "wrapper" essentially surrounds the underlying
`application program performing the services offered by each
`client. The common infrastructure for constructing agents is
`preferably supplied by an agent library. The agent library is
`preferably accessible in the runtime environment of several
`different programming languages. The agent library prefer-
`ably minimizes the effort required to construct a new system
`and maximizes the ease with which legacy systems can be
`"wrapped" and made compatible with the agent-based archi-
`tecture of the present invention. When invoked, a client
`agent makes a connection to a facilitator, which is known as
`its parent facilitator. Upon connection, an agent registers
`with its parent facilitator a specification of the capabilities
`and services it can provide, using a high-level, declarative
`Interagent Communication Language ("ICL")
`to express
`those capabilities. Tasks are presented to the facilitator in the
`form of ICL goal expressions. When a facilitator determines
`that the registered capabilities of one of its client agents will
`help satisfy a current goal or sub-goal thereof, the facilitator
`delegates that sub-goal to the client agent in the form of an
`ICL request. The client agent processes the request and
`returns answers or information to the facilitator. In process-
`ing a request, the client agent can use ICL to request services
`of other agents, or utilize other infrastructure services for
`collaborative work. The facilitator coordinates and inte-
`grates the results received from different client agents on
`various sub-goals, in order to satisfy the overall goal.
`OAA provides a useful software platform for building
`systems that integrate spoken natural language as well as
`other user input modalities. For example, see the above-
`referenced co-pending patent application, especially FIG. 13
`and the corresponding discussion of a "multi-modal maps"
`application, and FIG. 12 and the corresponding discussion of
`a "unified messaging" application. Another example is the
`InfoWiz interactive information kiosk developed by the
`assignee and described in the document entitled "InfoWiz:
`An Animated Voice Interactive Information System" avail-
`able online at http://www.ai.sri.com/-oaa/applications.html.
`A copy of the InfoWhiz document is provided in an Infor-
`mation Disclosure Statement submitted herewith and incor-
`porated herein by this reference. A further example is the
`"CommandTalk" application developed by the assignee for
`the U.S. military, as described online at http://
`
`20
`
`25
`
`30
`
`50
`
`"Interpreting Language in Context in CommandTalk",
`1999, by John Dowding and Elizabeth Owen Bratt and
`Sharon Goldwater, in "Communicative Agents: The
`Use of Natural Language in Embodied Systems", pp.
`63-67, Association for Computing Machinery (ACM)
`Special Interest Group on Artificial Intelligence
`(SIGART), Seattle, Wash.
`For some applications and systems, OAA can provide an
`advantageous platform for constructing embodiments of the
`present invention. For example, a representative application
`is now briefly presented, with reference to FIG. 6. If the
`statement "show me movies starring John Wayne" is spoken
`into the voice input device, the voice data for this request
`will be sent by UI agent 650 to facilitator 600, which in turn
`will ask natural language (NL) agent 620 and speech rec-
`ognition agent 610 to interpret the query and return the
`interpretation in ICL format. The resulting ICL goal expres-
`sion is then routed by the facilitator to appropriate agents-
`in this case, video-on-demand database agent 640-to
`40 execute the request. Video database agent 640 preferably
`includes or is coupled to an appropriate embodiment of
`query construction logic 330 and query refinement logic
`340, and may also issue ICL requests to facilitator 600 for
`additional assistance-e.g., display of menus and capture of
`45 additional user input in the event that query refinement is
`needed-and facilitator 600 will delegate such requests to
`appropriate client agents
`in the community. When the
`desired video content is ultimately retrieved by video data-
`base agent 640, UI agent 650 is invoked by facilitator 600
`to display the movie.
`Other spoken user requests, such as a request for the
`current weather in New York City or for a stock quote,
`would eventually lead facilitator to invoke web database
`agent 630 to access the desired information from an appro-
`55 priate Internet site. Here again, web database agent 630
`preferably includes or is coupled to an appropriate embodi-
`ment of query construction logic 330 and query refinement
`logic 340, including a scraping utility such as WebL. Other
`spoken requests, such as a request to view recent emails or
`6o access voice mail, would lead the facilitator to invoke the
`appropriate email agent 660 and/or telephone agent 680. A
`request to record a televised program of interest might lead
`facilitator 600 to invoke web database agent 630 to return
`televised program schedule information, and then invoke
`65 VCR controller agent 680 to program the associated VCR
`unit to record the desired television program at the sched-
`uled time.
`
`Petitioner Microsoft Corporation - Ex. 1008, p. 698
`
`
`
`Control and connectivity embracing additional electronic
`home appliances (e.g., microwave oven, home surveillance
`system, etc.) can be integrated
`in comparable fashion.
`Indeed, an advantage of OAA-based embodiments of the
`present invention, that will be apparent to practitioners in
`light of the above teachings and in light of the teachings
`disclosed in the cited co-pending patent applications, is the
`relative ease and flexibility with which additional service
`agents can be plugged into the existing platform, immedi-
`ately enabling the facilitator to respond dynamically to
`spoken natural language requests for the corresponding
`services.
`4. Further Embodiments and Equivalents
`While the present invention has been described in terms
`of several preferred embodiments,
`there are many
`alterations, permutations, and equivalents that may fall
`within the scope of this invention. It should also be noted
`that there are many alternative ways of implementing the
`methods and apparatuses of the present invention. It is
`therefore intended that the following appended claims be
`interpreted as including all such alterations, permutations,
`and equivalents as fall within the true spirit and scope of the
`present invention.
`What is claimed is:
`1. A method for speech-based navigation of an electronic
`data source located at one or more network servers located
`remotely from a user, wherein a data link is established
`between a mobile information appliance of the user and the
`one or more network servers, comprising the steps of:
`(a) receiving a spoken request for desired information
`from the user utilizing the mobile information appli-
`ance of the user, wherein said mobile information
`appliance comprises a portable remote control device
`or a set-top box for a television;
`(b) rendering an interpretation of the spoken request;
`(c) constructing a navigation query based upon the inter-
`pretation;
`(d) utilizing the navigation query to select a portion of the
`electronic data source; and
`(e) transmitting the selected portion of the electronic data
`source from the network server to the mobile informa-
`tion appliance of the user.
`2. The method of claim 1, wherein the step of rend