`
`d
`
`1i*-
`
`ar v
`
`r|e"
`
`r\
`
`l
`
`,''
`
`--l
`
`U.S.
`
`UTILITY Patent Appil,;ali.rr i
`
`(.,
`
`a.a. 2-6 -
`
`.u=i Nffi
`-fF; N44
`
`\
`
`ozI 0
`
`-o
`
`PTO-2040
`'tu99
`
`ISSU IhIG CLASSIFICATION
`cRoss REFERENCE(S)
`
`suBcLAss (oNE suBcLAss PER BLOCK)
`
`Continued on lssue Slip Inside File Jacket
`
`(date)
`
`f1 fne term of this patent
`_--
`subsequent to
`-
`has been disclaimed.
`l- I The term of this patent sl,all
`not extend beyond lre expira.ion date
`of U.S Patent. No.
`
`! The tEiminal
`ol
`thi.:. patent have been disclaimed.
`-months
`
`NOTICE OF ALLOWANCE MAILED
`
`3-ll-c3
`
`ISSUE FEE
`
`lee*.+klq@,"0 izlua
`
`ISSUE BATCH NUTVIBEH
`
`'| ne inlormation clisclosed herein may be restricted. Unauthorized disclosure may be prohibited by the United States Code Title 35, Sections 122' 181 and 368.
`Possession outside the U.S. Patent & Trademark Of{ice is restricted to authorized employees and contractors only.
`
`F|LED wrrH: I orsr lcnq I FIoHE I co-nonr
`
`(Attached In pocket on right inside flap)
`
`tssur ffir tN Flts
`
`(FACE)
`
`Form PTO-436A
`(Rev.6/99)
`
`#
`F
`
`il
`
`H Kl
`
`GOOGLE EXHIBIT 1004
`
`Page 1 of 214
`
`
`
`PATENT APPLIGATION i
`tllllillllllllilllllillilllll{illilillllilililllllt
`096083?2
`
`r.'t46 {1,9. n+r'
`fiqlffift$/;
`,i$iillli t! i ifirii illili;i!,
`flF,,r''i'0 :
`'
`
`CONTENTS
`Date Received
`(l,rc!" €. of M.)
`or
`Dafe Malled
`
`Date Received
`(lncl. C. of M.)
`or
`Date Mailed
`
`42.
`
`43.
`
`44,
`
`45.
`
`46.
`
`ln az.
`
`_€9s
`
`l2-)s-QO
`
`qTr iL
`ul-3o-o I
`5-t+-0 |
`5-tQ-ol
`
`tu,gi },t^+ettl"
`
`\-'7*\- bl
`
`irne ln"r#Jr.,
`
`1g'
`
`N",'"
`
`^v!{;!t-'
`8'
`
`,{. t
`
`'1 "l')\
`,r"[,''Jag.F#tfd
`
`i(: c
`
`30, _
`
`31. .
`
`32.
`
`3e,
`
`34.
`
`35.
`
`36.
`
`37.
`
`38.
`
`39.
`
`iI
`*
`
`,,-,
`
`,,. i
`':
`
`\- ___
`
`Y-_-
`
`-__.
`
`,40-92-
`
`GA6-oz
`z-t*dy
`3 -{,i -{ /r
`
`' 10" t,-
`
`tt f"Olo-u\
`
`s
`
`48.
`
`49.
`
`50.
`
`51.
`
`52.
`
`53.
`
`57.
`
`58.
`
`59.
`
`60.
`
`61.
`
`62.
`
`63.
`
`64.
`
`65.
`
`66.
`
`67.
`
`68.
`
`69.
`
`70.
`
`71.
`
`72.
`
`73.
`
`74.
`
`75.
`
`76.
`
`77.
`
`78.
`
`79.
`
`80.
`
`81
`
`82.
`
`(r_EFT OUTSTDE)
`
`Page 2 of 214
`
`
`
`It
`
`PATENT APPLICATION SERIAL NO;
`
`U.S. DEPARTMENT OF COMMERCE
`AND TRADEMARK OFFICE
`FEE RECORD SHEET
`
`07/1ele000 pellE]l 00000040 0960887e
`0l FC:801
`345.00 {tp
`63.00 0p
`0? FC:P03
`
`PTO-1556
`(st87)
`
`'U.S. GPO: 1 999459-082/1 91 44
`
`?
`
`Page 3 of 214
`
`
`
`YI
`
`.t
`
`:i
`
`I
`
`Page I of2
`
`{
`
`Ur,UIEp Srnrns'FernMar'.tpl:lireosvlARx OFFIcE
`
`i'r/
`
`''.-- --,,-iaai-ffiF ii- *---
`
`UNrteo Sr,qres PmeHr IHoTnIDEMARK OrFlcE
`l'
`t/vrsHtNcroN, D.C. 20231
`wwwusologov
`I
`1;CNFIRMATION NO.2382
`
`ATTORNEY
`DOCKET NO.
`SRllp037B
`
`i
`
`t
`til
`
`riI
`
`i
`
`ilililillllllllllllllllllilllililllillilllllillillllilllllllil|||l
`Bib Data Sheet
`
`SER,IAL NUMBEIR
`09/608,872
`
`GROUP ART UNIT
`..I Atr
`
`L
`
`IJJ
`
`Christihe Halver6en, San Jose, CA;
`Luc Julia, lt/enlo Park', CA;
`Dirritris Voutsas, Thessaloniki' GREECE;
`Adam GheYer, Palo Alto, Cfl;
`-
`coNTlNulNG DATA *n*****'&**.*(*ffi
`THIS APPLTCATION lS A CONTOF '09t524,095 03/'1312000
`wHlcll ls A clP oF 09i225,198 01/051':1999
`wHlcl-l cL.AlM$' BENEFIT OF 601124,718 03/17l1999
`AN D SiAl D 09/524,09503t 13t2000
`CLAIMS BENEFIT OF 601124;720 03117t1999
`AND ClAltvls BENEFIT OF 60t124:7,19 03t17t1999
`i.
`* FOREIGN APFI-lC,ATlONS,"rt******************
`
`F REQUIRED, FORHIGN FILING tlcENsE GMNTED *. SMALL ENTtry ""
`08/31/2000
`
`fly". $ no
`oreign Priorilv claimed
`USC 119 (a-d) conriitions B V"r F no'E M"t,'ft".'
`/l^/
`,rAlloSmnce
`l
`Exatinffi Signature
`
`Initials
`
`I{OMASON, IV"IDSER & PATTERSON, LLP
`SHREWSBURY AVENUE
`
`I
`
`nobile riavigation of netwrjrk-based electronic information using spoken input
`
`FILING FEE
`REOEMEN
`473
`
`EES: Authority has been given in Paper
`'
`o.
`to charge/credit DEPOSIT ACCOUNT
`
`fl t.to Fees ( Filing )
`fl t.tz Fees ( Processing Ext. of
`
`fl t.tg Fees ( lssue )
`
`\
`
`Page 4 of 214
`
`
`
`Ufurro Srnrns Parslm AI.ID TirApeNaARx OFncn
`
`Page 1 of i
`
`UNtrEo STATEs plreHr mo TnnoEMARK OFFtcs
`WAsHrNGroN, D.C, AOa3l
`www.usplo.gov
`
`ililllil lrill illil ||ilillilillilil ltil ililililillilillllilllilt il lill
`Bib Data Sheet
`
`SERIAL NI.]MBER
`09/608.872
`
`FILING DATE
`06t30/2000
`RULE
`
`CLASS
`704
`
`GROIJP ART'I.INIT
`274r
`
`ATTORNEY
`DOCKET NO.
`SRIlp037B
`
`Christine Halversen, San Jose, CA ;
`Luc Julia, Menlo Park', CA ;
`Dimitris Voutsas,' Thessaloniki, GREECE:
`Adam Cheyer, Palo Alto, CA ; /n
`t* CONTINLIING DATA n***ff4,*,^fr*#n{<{.d,{.{.,t *,t **
`THIS APPLICATTON rS h CON OF 09t524.A95 03t13t2000
`WHICH rS A'Crp OF 09t225.198 01/05/1999
`WHICH CLAIIvIS BENEFIT OF 60A24,718 03/17flggg
`WHICH CLAIMS BENEFIT OF 601124.719 O3IITllggg
`WHICH CLAIMS BENEFIT OF 60n24,720 O3IJTnggg
`, x F9REIGN AppLIc AT,IoNs :r. *,r,,,. * * * * * * 4t{n *kff,
`
`F REQUIRED, FORIIIGN FILING LICENSE
`;RANTED ** 08/31/2()00
`Foreign Prioriry claimed n
`Nl
`Hyes tno
`t5 USC ll9 (a-d) conditions n m n - -
`A-yes ,*. no H Merafter
`ner
`y'erinedand
`c4'
`--vff:
`{,cknowledged ExrriilinertrSEntrure- Itf,.l,
`\T'T'KI,SD
`
`,.4277
`
`TITLE
`
`r{lrr
`
`*+ SMALL ENTITY 'k*
`
`STATE OR
`COTINTRY
`CA
`
`SIMETS
`DRAWING
`1
`
`TOTAL
`CLAIMS
`27
`
`NDIJPI]NDEI{]
`CLAIMS
`J
`
`i r iifi,
`
`I
`
`I
`
`I
`
`I
`
`Mobile navigation of network-based electronic information using spoken input
`
`FILING FEE IFEES: Authority has been siven in paper
`RECEIVED lNo.
`io charge/ciedir DEPOSIT ACCOUNT
`473
`for foll6wing:
`lNo. --
`
`lJ All Fees
`rJ l.16 Fees ( Filing )
`Itrr
`17 Fees ( Processing Ext. of
`Itire j
`F t,tg nees ( tssue )
`
`I
`
`I
`
`]
`
`E creait
`
`file://C:\Apps\Prelkam\correspondence\l A.xml
`
`l l/15/00
`
`Page 5 of 214
`
`
`
`a.'
`
`IN THE I]NITED STATES PATENT AIUD TRADEMARK OFNCE
`CERTIFICATE oF E)PRESS MAILING
`Attornev Docket No.: SN1P037B
`I hereby cenifu that this paper and the documents and/or fees refened to as
`attached therein are being deposited with the United States Postal Service FifSt Named lnVgntOf :
`"Express Mail Post Offrce to
`$ I . 10, Mailing Label Number
`HALVERSEN, Christine
`the Assistant Commissioner for Patents,
`
`on June 30,
`Addressee"
`
`an
`
`uTrLrrY PATENT APPLICATION TRANSnITTAL (37 CFR. $ 1.s3(b))
`(Continuation, Divisional or Continuation-in-part application)
`
`Assistant Commissioner for Patents
`Box Patent Application
`Washinglon,DC 20231
`
`Duplicate for
`fee processing
`
`Sir:
`
`For:
`
`This is a request for filing a patent application under 37 CFRI $ 1.53(b) in the name of inventors:
`Christine Halversen, Luc Julia, Dimitris Voutsas, Adam Cheyer
`
`MOBILE NAVIGATION OF NETWORK.BASED ELECTRONIC INFORMATION USING
`SPOKEN INPUT
`
`I Divisional
`[] continuation-in-part
`This application is a X Continuation
`of prior Application No.: 09/524,095, from which priority under 35 U.S.C. $120 is claimed.
`
`Application Elements:
`X ,, Pages of Specification, Claims and Abstract
`X Ot Sheets of Drawings
`Declaration
`1_l Newly executed (original or copy)
`I
`Copy fiom a prior application (37 CFR 1.63(d) for a continuation or divisional).
`The entire disclosure of the prior application from which a copy of the declaration is .
`herein supplied is considered as being part of the disclosure of the accompanying
`application and is hereby incorporated by reference therein.
`tt| | Deletion of inventors Signed statement attached deleting inventor(s)
`named in the prior application, see 37 CFR 1.63(dX2) and 1.33(b).
`
`,
`
`Accompanyin g ApPl ication Parts :
`
`tll_J Assignment and Assignment Recordation CoVei Sheet (recording fee of $40.00 enclosed)
`Power of Attorney
`37 CFR 3.73(b) Statement by Assignee
`, PatApp Trans 53(b) ContDivCIP) Page I of 3
`
`TTt
`
`a97
`
`(Revised
`
`Page 6 of 214
`
`
`
`lnformation Disbl. ie Statement with Form PTO-1449
`Preliminary Amendment
`
`T-I | 'Copies of IDS Citations
`
`Return Receipt Postcard
`Small Entity Statement(s) X Statement filed in prior application. Status still proper and
`
`Other:
`
`nXXn L
`
`J
`
`desired
`
`Claim For Foreisn Prioritv
`
`I
`
`Priority of
`Application No.
`------..-
`is claimed under 35 U.S.C. $ 119.
`L_J The certified copy has been filed in prior application U.S. Application No.
`l_J The certified copy will follow.
`
`filed on
`
`Extension of Time for Prior Pending Application
`
`A Petition for Extension of Time is being concurrently filed in the prior pending
`application. A copy of the Petition for Extension of Time is attached.
`
`Amendments
`
`LJ Amend the specification by inserting before the first line the sentence: "This is a
`l-lcontinration-in-part,lllDiuirional
`Continuation
`application of copending prior
`| | Anplication No.
`I
`InternationalApplication
`designated the United States,
`the disclosure of which is incorporated herein by reference."
`
`filed on
`
`which
`
`filed on
`
`x
`
`Cancel in this application original claims 2-55 of the prior application
`be{bre calculating the filing fee. (At least one original independent claim must be retained.)
`
`Fee Calculation (37 CFR Q I .16)
`
`(Col.2)
`(Col. 1)
`NO. FILED
`NO. EXTRA
`BASIC FEE
`TOTAL CLAIMS 27 -20 = 7
`INDEP CLAIMS 3 -03 = 0
`[ ] Multiple Dependent Claim Presented
`* If the difference in Col. .l is less
`than zero, enter f'0" in Col. 2.
`
`SMALL ENTITY OR
`RATE EEE
`$345 $
`x09= $
`x39= $
`$130 = $
`Tolal $
`
`345
`63
`
`408
`
`OR
`OR
`OR
`OR
`OR
`
`LARGE ENTITY
`RATE EEE
`$6e0 $
`xl8= $
`x78= $
`$260 = $
`Total $
`
`ffi Cnect No. 137 inthe amount of $ 408.00 is enclosed.
`
`(Revised 12/97,Pat App Trans 53(b) ContDivCIP)
`
`Page2 of3
`
`Page 7 of 214
`
`
`
`ff.r. Commissioner is authc.--ed to charge any fees beyond the amount et.closed which may be
`f
`IeErired, or to credit any overpayment, to Deposit Account No. 50-1351 (Order No. SRIlP0378)'
`
`Xn
`
`Applicants hereby make and generally authorize any Petitions for Extensions of Time as may be
`ed for any subsequent filings. The Commissioner is also authorized to charge any extension fees under
`eed
`FR $1.1i as maybe needed to Deposit AccountNo. 50-1351 (OrderNo. SNIP037B)'
`37C
`X
`
`Please send correspondence to the following address:
`Kevin J.Zilka
`P.O. BOX ?2r030
`San Jose, Catifornia 95172-1030
`
`Direct Telephone Calls To:
`
`Keyin J. Zilka at telephone number (408) 505-5100
`
`Date:
`
`June 30.2000
`
`Kevin J.
`
`(Revised l2l97,Pat App Trans 53(b) ContDivCIP)
`
`Page 3 of3
`
`Page 8 of 214
`
`
`
`Nn vrclrn {c Nnrwonr-B.q.sro Er,ncrnoNrc lNronulrroN UsrNc SpoxnN
`NArtR.cL L.lNcu.c.cn INput wrn Mur,uMoDAL Ennon FBnDBACK
`
`BacrcnouND oF THE INvnxrrou
`
`This+q^aContinuation In Part of co-pending U.S. Patent Application No.
`rfr^
`091225,L98, fiIed January 5,7999, Provisional U.S. Patent Application No.
`
`60/ 124,7 lS,filed March 17, Iggg,Provisional U.S. Patent Application No.
`
`601124,720, filed March 17,1999, and Provisional U.S. Patent Application No.
`601124,719, filed March 17, 1999, from which applications priority is claimed and
`these application are incorporated herein by reference.
`
`,
`
`10
`
`The present invention relates generally to the navigation of electronic data by
`
`means of spoken natural language requests, and to feedback mechanisms and methods
`for resolving the enors and ambiguities that may be associated with such requests.
`
`As global elechonic connectivity continues to grow, and the universe of
`electronic data potentially available to users continues to expand, there is a growing
`
`t5
`
`need for information navigation technology that allows relatively naive users to
`navigate and access desired data by means of natural language input. In many of the
`most important markets -- including the home entertainment arena, aS well as mobile
`computing -- spoken natural language input is highly desirable, if not ideal. As just
`one example, the proliferation of high-bandwidth communications infrastructure for
`the home entertainment market (cable, satellite, broadband) enables delivery of
`
`movies-on-demand and other interactive multimedia content to the consumer's home
`television set. For users to take full advantage of this qqntent stream ultimately
`requires interactive navigation of content databases in a manner that is too complex
`for user-friendly selection by means of a haditional remote-control clicker. Allowing
`
`25
`
`spoken natural language requests as the input modality for rapidly searching and
`
`accessing desired content is an important objective for a successful consumer
`entertainment product in a contglt off'ering a dizzyingrange of database content
`
`choices. As further examples, this same need to drive navigation of (and transaction
`
`with) relatively complex data warehouses using spoken natural language requests
`
`applies equally to surfing the Intemet/Web or other networks for general information,
`
`multimedia content, or e-commerce transactions.
`
`f,
`
`qf
`
`lil.il
`
`!t
`---'.
`
`t-:
`
`f* 5:
`
`Page 9 of 214
`
`
`
`In general, the existing navigational systems for browsing electronic databases
`and data warehouses (search engines, menus, etc.), have been designed without
`navigation via spoken natural language as a specific goal. So today's world is full of
`existing electronic data navigation systems that do not assume browsing via natural
`spoken commands, but rather assume text and mouse-click inputs (or in the case of
`TV remote controls, even less). Simply recognizing voice commands within an
`exftemely limited vocabulary and grammar -- the spoken equivalent of button/click
`input (e.g., speaking "channel 5" selects TV channel 5) -- is really not sufficient by
`itself to satisfu the objectives described above. In order to deliver a true "win" for
`users, the voice-driven front-end must accept spoken natural language input in a
`manner that is intuitive to users. For example, the front-end should not require
`learning a highly specialized command language or format. More fundamentally, the
`front-end must allow users to speak directly in terms of what the user ultimately wants
`-- e.9., "I'd like to see a western film directed by clint Eastwood" -- as opposed to
`speaking in terms of arbitrary navigation structures (e.g., hierarchical layers of menus,
`commands, etc.) that are essentially artifacts reflecting constraints of the pre-existing
`texVclick navigation system. At the same time, the front-end must recognize and
`accommodate the reality that a sheam of naive spoken natural language input will,
`over time, typically present a variety of enors and/or ambiguilips: e.g.,
`garbled/unrecognized words (did the user say "Eastwood" or "Easter"?) and under-
`constrained requests ("Show me the Clint Eastwood movie"). An approach is needed
`for handling and resolving such errors and ambiguities in a rapid, user-friendly, non-
`frushating manner.
`
`What is needed is a methodology and apparatus for rapidly constructing a
`voice-driven front-end atop an existing, non:voice data navigation system, whereby
`users can interact by means of intuitive natural language input not strictly conforming
`to the step-by-step browsing architecture of the existing navigation system, and
`wherein any elrors or ambiguities in user input are rapidly and conveniently resolved.
`The solution to this need should be compatible with the conshaints of a multi-user,
`distributed environment such as the Internet/Web or a proprietary high-bandwidth
`
`content delivery network; a solution contemplating one-at-a-time user interactions at a
`single location is insufficient, for example.
`
`n5
`
`a- a-
`
`10
`
`t5
`
`i.- "!
`
`i*.!
`
`ii
`{6t
`
`Page 10 of 214
`
`
`
`i-x
`i'r'l
`
`,i
`
`jj
`
`Sunntmy or rnr Ixvnxnox
`
`n addresses the above needs by providing a system,
`manufacture for navigating network-based electronic data
`spoken NL input requests. When a spoken natural language
`ived from a user, it is interpreted, such as by using a speech
`
`extract speech data from acoustic voice signals, and using a
`to linguistically parse the speech data. The interpretation of
`languagg request can be performed on a computing device locally
`tely from the user. The resulting interpretation of the request is
`automatically construct an operational navigation query to retrieve
`tion from one or more electronic network data sources. which is
`to a client device of the user. If the network data source is a
`igation query is constructed in the format of a database query
`
`€l
`rhepresent
`{l method, and article
`)
`sources m
`
`input request is
`
`recognluon engl
`
`natural language
`
`the spoken
`l0 with the user or
`thereupon used
`
`the desired i
`
`then
`
`database, the
`
`15
`
`language.
`
`ly, errors or ambiguities emerge in the interpretation of the spoken NL
`request, such hat the system cannot instantiate a complete, valid navigational
`template. Thi
`
`is to be expected occasionally, and one prefer.red aspect of the
`ability to handle such errors and ambiguities in relatively graceful and
`. Instead of simply rejecting such inpul 6nd defaulting to
`modes or simply asking the user to try again, a prefened embodiment
`ion seeks to converge rapidly toward instantiation of a valid
`by soliciting additional clarification from the user as necessary,
`after a navigation of the data source, via multimodal input, i.e., by
`u selection or other input modalities including and in addition to spoken
`This clariffing, multi-mOdal dialogue takes advantage of whatever
`
`mvenuon ls
`20 user-friendlv
`
`of the present
`
`navigational
`
`either before
`means of
`
`natural
`
`partial navi
`
`user's
`
`NL request. This clarification process continues until the system
`
`convefses
`
`used to nav
`
`The rehi
`
`on a suita
`
`an adequately instantiated navigational template, which is in turn
`the network-based data and retrieve the user's desired information.
`information is transmitted across the network and presented to the user
`client display device.
`
`-3-
`
`t
`
`Page 11 of 214
`
`
`
`In a further aspect of the present invention, the construction of the navigation
`query includes exhacting an input template for an online scripted interface to the data
`source and using the input template to construct the navigation query. The extraction
`of the input template can include dynamically scraping the online scripted interface.
`
`ff',i
`
`...'.
`.i ii
`
`i.
`
`l;:
`
`ii
`
`lil
`
`-4-
`
`5
`
`Page 12 of 214
`
`
`
`Bnnr DnscnrprroN oFTHE Dnq,wnycs
`
`The invention, together with further advantages thereof, may best be
`understood by reference to the following description taken in conjunction with the
`
`accompanying drawings in which:
`
`i
`
`Figure la illustates a system providing a spoken natural language interface
`for network-based information navigation, in accordance with an embodiment of the
`
`present invention with server-side processing of requests;
`
`Figure lb illustrates another system providing a spoken natural language
`interface for network-based information navigation, in accordance with an
`t0 embodiment of the present invention with client-side processing of requests;
`
`Figure 2 illustrates a system providing a spoken natural language interface for
`network-based information navigation, in accordance with an embodiment of the
`present invention for a mobile computing scenario;
`
`Figure 3 illushates the functional logic components of a request processing
`module in accordance with an embodiment of the prese4! invention;
`
`15
`
`lil
`
`'*l:r
`
`ijt iJ
`
`Figure 4 illustrates a process utilizing spoken natural language for navigating
`
`an electronic database in accordance with one embodiment of thp present invention;
`
`.:.,
`
`Figure 5 illustrates a process for constructing a navigational query for
`accessing an online data source via an interactive, scripted (e,9,, CGI) form; and
`,i.i,
`Figure 6 illushates an embodiment of the present invention utilizing a
`community of distributed, collaborating electronic agents.
`
`-5-
`
`Q;
`
`Page 13 of 214
`
`
`
`iii
`
`lsrj
`
`i::
`
`Dnrarr,no DpscnrprroN oF THE lxvrNrroN
`
`1.. Svstem Architecture
`
`a. Server-End Processing of Spoken Input
`
`Figure la is an illustration of a data navigation system driven by spoken
`natural language input, in accordance with one embodiment of the present invention.
`As shown, a user's voice input data is captured by a voice input device 102, such as a
`microphone. Preferably voice input device 102 includes a button or the like that can
`be pressed or held-down to activate a listening mode, so that the system need not
`continually pay attention to, or be confused by, inelevant background noise. In one
`prefened embodiment well-suited for the home entertainment setting, voice input
`device i02 is a portable remote control device with an integrated microphone, and the
`voice data is transmitted from device 102 preferably via infrared (or other wireless)
`link to communications box 104 (e.g., a set-top box or a similar communications
`device that is capable of retansmitting the raw voice data pnd/or processing the voice
`data) local to the user's environment and coupled to communications network 106.
`The voice data is then transmitted across network 106 to a remote server or servers
`108. The voice data may preferably be hansmifted in compressed digitized form, or
`alternatively --particularly where bandwidth constraints are significant- in analog
`format (e.g., via frequency modulated transmission), in the lattpl case being digitized
`
`t0
`
`l5
`
`20
`
`upon arrival at remote server 108.
`
`At remote server 108, the voice data is processed by request processing logic
`300 in order to understand the user's request and construct an appropriate query or
`request for navigation of remote data source 1 10, in accordance with the interpretation
`process exemplified in Figure 4 and Figure 5 and discussed in greater detail below.
`For purposes of executing this process, request processing logic 300 comprises
`functional modules including speech recognition engine 310, natural language (NL)
`parser 320, query construction logic 330, and query refinement logic 340, as shown in
`Figure 3. Data source 1i0 may comprise database(s), InterneVweb site(s), or other
`electronic information repositories, and preferably resides on a central server or
`servers -- which may or may not be the same as server 108, depending on the storage
`
`25
`
`30
`
`-6-
`
`1
`
`Page 14 of 214
`
`
`
`and bandwidth needs of the application and the resources available to the practitioner.
`Data source 110 may include multimedia content, such as movies or other digital
`video and audio content, other various forms of entertainment data, or other elechonic
`information. The contents of data source 110 are navigated -- i.e., the contents are
`accessed and searched, for retrieval of the particular information desired by the user --
`
`using the processes of Figures 4 and 5 as described in greater detail below.
`
`Once the desired informafion has been retrieved from data source 110, it is
`elechonically hansmitted via network 106 to the user for viewing on client display
`device 1I2.In a preferred embodiment well-suited for the home entertainment setting,
`display device 112 is a television monitor or similar audiovisual gntertainment device,
`typically in stationary position for comfortable viewing by users. In addition, in such
`prefened embodiment, display device 112 is coupled to or integrated with a
`communications box (which is preferably the same as communications box 104, but
`may also be a separate unit) for receiving and decoding/formatting the desired
`electronic information that is received across communications network 106.
`
`t0
`
`15
`
`Network 106 is a two-way electronic communications network and may be
`embodied in electronic communication infrastructure including coaxial (cable
`television) lines, DSL, fiber-optic cable, traditional copper wire (twisted pair), or any
`other type of hardwired connection. Network 106 mq.yr,?lso include a wireless
`connection such as a satellite-based connection, cellular connection, or other type of
`wireless connection. Network 106 may be part of the Internet and may support
`TCP/IP communications, or may be embodied in a proprietary network, or in any
`other electronic communications network infrastructure, whether packet-switched or
`connection-oriented. A design consideration is that network 106 preferably provide
`2s suitable bandwidth depending upon the nature of the content anticipated for the
`desired application.
`
`b. Client-End Processing.of Spoken Input
`
`Figure lb is an illushation of a data navigation system driven by spoken
`natural language input, in accordance with a second embodiment of the present
`30 invention. Again, a user's voice input data is captured by a voice input device I02,
`such as a microphone. In the embodiment shown in Figure lb, the voice data is
`
`5 C
`
`,
`
`-7 -
`
`t:
`
`i: :i
`
`Page 15 of 214
`
`
`
`10
`
`t5
`
`li
`
`t-i
`
`i... "!
`
`t
`
`transmitted from device 202 to requests processing logic 300, hosted on a local speech
`processor, for processing and interpretation. In the preferred embodiment illushated
`in Figure lb, the local speech processor is conveniently integrated as part of
`communications box 104, although implementation in a physically separate (but
`co'mmunicatively coupled) unit is also possible as will be readily apparent to those of
`skill in the art. The voice data is processed by the components of request processing
`logic 300 in order to understand the user's request and construct an appropriate query
`or request for navigation of remote data source 110, in accordance with the
`interpretation process exemplified in Figures 4 and 5 as discussed in greater detail
`'
`below.
`
`,
`
`the resulting navigational query is then transmitted electronically across
`network 106 to data source 110, which preferably resides on a central server or
`servers 108. As in Figure la, data source 110 may comprise database(s), InterneUweb
`site(s), or other elechonic information repositorles, and preferably may include
`multimedia conten! such as movies or other digital video and audio content, other
`various forms of entertainment data, or other electronic information. The contents of
`data source 1 l0 are then navigated -- i.e., the contents are accessed and searched, for
`retrieval of the particular information desired by the user -. preferably using the
`process of Figures 4 and 5 as described in greater detail bdlgw. Once the desired
`information has been retrieved from data source 110, it is elechonically transmitted
`via network 106 to the user for viewing on client display device 112.
`
`In one embodiment in accordance with Figure lb and well.suited for the home
`entertainment setting, voice input device 102 is a portable remote control device with
`an integrcted microphone, and the voice data is transmitted from device 102
`preferably via infrared (or other wireless) link to the local speech processor. The
`local speech processor is coupled to communications network 106, and also
`preferably to client display device i 12 (especially for purposes of query refinement
`fiansrrissions, as discussed below in connection with Figure 4, step 4I2), and
`preferably may be integrated within or coupled to communications box 104. In
`addition, especiatly for purposes of a home entertainment application, display device
`112 is preferably a television monitor or similar audiovisual entertainment device,
`typically in stationary position for comfortable viewing by users. In addition, in such
`
`-8-
`
`q
`
`Page 16 of 214
`
`
`
`i0
`
`l5
`
`irti
`
`gr:i
`
`t-?
`
`prefened embodiment, display device 112 is coupled to a communications box (which
`is preferably the same as communications box 104, but may also be a physically
`separate unit) for receiving and decoding/formatting the desired electronic
`information that is received across communications network 106.
`
`Design considerations favoring server-side processing and interpretation of
`spoken input requests, as exemplified in Figure la, include minimizing the need to
`distribute costly computational hardware and software to all client users in order to
`perform speech and language processing. Design considerations favoring client-side
`processing, as exemplified,in Figure lb, include minimizing the quantity of data sent
`upstream across the network from each client, as the speech recognition is performed
`
`before hansmission across the network and only the query data and/or request needs
`
`to be sent, thus reducing the upstream bandwidth requirements.
`
`c. Mobile Client Embodiment
`
`A mobile computing embodiment of the present invention may be
`implemented by practitioners as a variation on the embodiments of either Figure la or
`Figure lb. For example, as depicted in Figure 2, a fiIabile variation in accordance
`with the server-side processing architecture illushated in Figure la may be
`implemented by replacing voice input device 102, communications box 104, and
`client display device 112, with an integrated, mobile, information appliance 202 such
`as a cellular telephone or wireless personal digital assistant (wireless PDA). Mobile
`information appliance 202 essentially performs the f,1pgtions of the replaced
`components. Thus, mobile information appliangg, 202r"receives spoken natural
`language input requests from the user in the form of voice data, and transmits that
`data (preferably via wireless data receirrzing station 204) across communications
`network 206 for server-side interpretation of the request, in similar fashion as
`described above in connection with Figure l. Navigation of data source 210 and
`retrieval of desired information likewise proceeds in an analogous manner as
`described above. Display information hansmitted elechonically back to the user
`across network 206 is displayed for the user on the display of information appliance
`202, and audio information is output through the appliance's speakers.
`
`-9-
`
`Page 17 of 214
`
`
`
`Practitioners will further appreciate, in light of the above ,teachings, that if
`mobile information appliance 202 is equipped with sufficient computational
`processing power, then a mobile variation of the client-side architecture exemplified
`
`in Figure 2 may similarly be implemented. In that case, the modules corresponding to
`request processing logic 300 would be embodied locally in the computational
`resources of mobile information appliance 202, and the logical flow of data would
`otherwise follow in a manner analogous to that previously described in connection
`with Figure lb.
`
`As illushated in Figure 2, multiple users, each having their own client input
`
`t0
`
`device, may issue requests,,simultaneously or otherwise, for navigation of data source
`
`210. This is equally true (though not explicitly drawn) for the embodiments depicted
`in Figures la and lb. Data source 210 (or 100), being a network accessible
`information resource, has typically already been constructed to support access
`requests from simultaneous multiple network users, as known by practitioners of
`ordinary skill in the art. In the case of server-side speech processing, as exemplified
`in Figures la and 2, the interpretation logic and eno,r correction logic modules are
`also preferably designed and implemented to support queuing and multi-tasking of
`requests from multiple simultaneous network users, as will be appreciated by those of
`
`15
`
`skill in the art.
`
`It will be apparent to those skilled in the art that 4dditional implementations,
`permutations and combinations of the embodiments set forth in Figures ia, ib, and 2
`may be created without shaying from the scope and spirit of the present invention.
`For example, practitioners will understand, in iight of the above teachings and design
`considerations, that it is possible to divide and allocate the functional components of
`request processing logic 300 between client and server. For example, speech
`recognition - in entirety, or perhaps just early stages such as feature extraction --
`might be performed locally on the glignt end, perhaps to reduce bandwidth
`requirements, while natural language paning- and other necessary processing might be
`
`performed upstream on the server end, so that more extensive computational power
`need not be distuibuted locally to each client. In that case, corresponding portions of
`request processing logic 300, such as speech recognition engine 310 or portions
`
`i:!!
`
`"sili
`
`ii
`
`ii
`
`- 10-
`
`t{
`
`Page 18 of 214
`
`
`
`thereof, would reside locally at the client as in Figure lb, while other component
`modules would be hosted at the server end as in Fieures 1a and 2.
`
`Further, practitioners may choose to implement the each of the various
`embodiments described above on any number of different hardware and software
`computing platforms and environments and various combinations thereof, including,
`by way of just a few examples: a general-purpose hardware microprocessor such as
`the Intel Pentium series; operating system software such as Microsoft Windows/CE,
`Palm OS, or Apple Mac OS (particularly for client devices and client-side
`processing), or Unix, Linux, or WindowsA{T (the latter three particularly for network
`data servers and server-side processing), anilor proprietary information access
`
`platforms such as Microsoft's WebTV or the Diva Systems video-on-demand system.
`
`t0
`
`2. Processing Methodology
`
`The present invention provides a spoken natural language interface for
`interrogation of remote electronic databases and retrieval of desired information. A
`
`15
`
`preferred embodiment of the present invention utilizes the basic methodology outlined
`
`in the flow diagram of Figure 4 in order to provide thig interface. This methodology
`will now be discussed.
`
`a. Interpreting Spoken Natural Lansuage Requests
`
`At step 402, the user's spoken request for information is initially received in
`the form of raw (acoustic) voice