throbber
PCT
`INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT)
`WO 00/21232
`
`(51) International Patent Classification 7 :
`H04L
`
`(11) International Publication Number:
`
`A2
`
`(43) International Publication Date:
`
`13 April 2000 (13.04.00)
`
`WORLD INTELLECTUAL PROPERTY ORGANIZATION
`International Bureau
`
`(21) International Application Number:
`
`PCT/US99/23008
`
`(22) International Filing Date:
`
`1 October 1999 (01.10.99)
`
`(81) Designated States: CA, CN, IL, IN, JP, KR, US, European
`patent (AT, BE, CH, CY, DE, DK, ES, Fl, FR, GB, GR,
`IE, IT, LU, MC, NL, PT, SE).
`
`(30) Priority Data:
`60/102,957
`60/117,595
`
`2 October 1998 (02.10.98)
`27 January 1999 (27.01.99)
`
`Published
`Without international search report and to be republished
`upon receipt of that report.
`
`US
`US
`
`(71) Applicant (for all designated States except US):
`INTER(cid:173)
`NATIONAL BUSINESS MACHINES CORPORATION
`[US/US]; Old Orchard Road, Armonk, NY 10504 (US).
`
`(72) Inventors; and
`(75) Inventors/Applicants (for US only): GOPALAKRISHNAN,
`Ponani [IN/US]; 3073 Radcliff Drive, Yorktown Heights,
`NY 10598 (US). LUCAS, Bruce, D. [US/US]; 2408 Mill
`Pond Road, Yorktown Heights, NY 10598 (US). MAES,
`Stephane, H. [BE/US]; 1 Wintergreen Hill Road, Danbury,
`CT 06811 (US). NAHAMOO, David [IR/US]; 12 Elm(cid:173)
`wood Road, White Plains, NY 10605 (US). SEDIVY, Jan
`[CZ/CZ]; U lesa 11, Praha (CZ).
`
`(74) Agent: OTTERSTEDT, Paul, J.; International Business Ma(cid:173)
`chines Corporation, Yorktown IP Law Department, T.J.
`Watson Research Center, Route 134 and Kitchawan Road,
`Yorktown Heights, NY 10598 (US).
`
`(54) Title: CONVERSATIONAL BROWSER AND CONVERSATIONAL SYSTEMS
`
`(57) Abstract
`
`A conversational browsing system (10) comprising a
`conversational browser (11) having a command and con(cid:173)
`trol interface (12) for converting speech commands or
`multi-modal input from 1/0 resources (27) into navigation
`request, a processor (14) for parsing and interpreting a CML
`(conversational markup language) file, the CML file compris(cid:173)
`ing meta-information representing a conversational user in(cid:173)
`terface for presentation to a user. The system (10) comprises
`conversational engines (23) for decoding input commands
`for interpretation by the command and control interface and
`decoding meta-information provided by the CML proces(cid:173)
`sor for generating synthesized audio output. The browser
`(11) accesses the engine (23) via system calls through a sys(cid:173)
`tem platform (15). The system includes a communication
`stack (19) for transmitting the navigation request to a con(cid:173)
`tent server and receiving a CML file from the content server
`based on the navigation request. A conversational transcoder
`(13) transforms presentation material from one modality to
`a conversational modality. The transcoder (13) includes a
`functional transcoder (13a) to transform a page of GUI to
`a page of CUI (conversational user interface) and a logical
`transcoder (13b) to transform business logic of an applica(cid:173)
`tion, transaction or site into an acceptable dialog. Conver(cid:173)
`sational transcoding can convert HTML files into CML files
`that are interpreted by the conversational browser (11 ).
`
`Convmcllonal Bnlwnr
`
`II
`
`Transcoder
`Functional
`Transcoder
`
`logical
`Transcod~
`
`ct3b
`
`I
`
`10
`
`~
`
`r--------------------
`
`16
`
`Convenoflonal API-CW API
`
`17
`Conversaffonal Virtual Machine (CVII)
`(K11111I)
`
`___________ .L'~------,
`-----~-..... --,
`
`19
`
`20
`
`I
`
`I
`I
`
`Communlcaffon Slack
`
`Cawenllonal Prolocols
`(TCP/IP, HTTP, WAP, etc.)
`
`Al
`
`Conversaffonal Prolocols
`(Coonlinaffon, Registration, Discovery,
`Negotiation, Speech Coding, ate)
`
`I
`I
`I
`1-----''-~----_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_u_-_-_-J
`Convenllonal Drivers/APls
`I
`
`: .... ____ ..., -----------------
`
`L-----
`
`27
`
`33
`Audio Caphl11
`Comp11sslon
`Oecomp11sslon
`Reconslrvcllon
`
`-i-
`
`Amazon Exhibit 1006
`IPR Petition - USP 9,716,732
`
`

`

`FOR THE PURPOSES OF INFORMATION ONLY
`
`Codes used to identify States party to the PCT on the front pages of pamphlets publishing international applications under the PCT.
`
`AL
`AM
`AT
`AU
`AZ
`BA
`BB
`BE
`BF
`BG
`BJ
`BR
`BY
`CA
`CF
`CG
`CH
`CI
`CM
`CN
`cu
`CZ
`DE
`DK
`EE
`
`Albania
`Armenia
`Austria
`Australia
`Azerbaijan
`Bosnia and Herzegovina
`Barbados
`Belgium
`Burkina Faso
`Bulgaria
`Benin
`Brazil
`Belarus
`Canada
`Central African Republic
`Congo
`Switzerland
`Cote d'Ivoire
`Cameroon
`China
`Cuba
`Czech Republic
`Germany
`Denmark
`Estonia
`
`ES
`Fl
`FR
`GA
`GB
`GE
`GH
`GN
`GR
`HU
`IE
`IL
`IS
`IT
`JP
`KE
`KG
`KP
`
`KR
`KZ
`LC
`LI
`LK
`LR
`
`Spain
`Finland
`France
`Gabon
`United Kingdom
`Georgia
`Ghana
`Guinea
`Greece
`Hungary
`Ireland
`Israel
`Iceland
`Italy
`Japan
`Kenya
`Kyrgyzstan
`Democratic People's
`Republic of Korea
`Republic of Korea
`Kazakstan
`Saint Lucia
`Liechtenstein
`Sri Lanka
`Liberia
`
`LS
`LT
`LU
`LV
`MC
`MD
`MG
`MK
`
`ML
`MN
`MR
`MW
`MX
`NE
`NL
`NO
`NZ'
`PL
`PT
`RO
`RU
`SD
`SE
`SG
`
`Lesotho
`Lithuania
`Luxembourg
`Latvia
`Monaco
`Republic of Moldova
`Madagascar
`The former Yugoslav
`Republic of Macedonia
`Mali
`Mongolia
`Mauritania
`Malawi
`Mexico
`Niger
`Netherlands
`Norway
`New Zealand
`Poland
`Portugal
`Romania
`Russian Federation
`Sudan
`Sweden
`Singapore
`
`SI
`SK
`SN
`sz
`TD
`TG
`TJ
`TM
`TR
`TT
`UA
`UG
`us
`uz
`VN
`YU
`zw
`
`Slovenia
`Slovakia
`Senegal
`Swaziland
`Chad
`Togo
`Tajikistan
`Turkmenistan
`Turkey
`Trinidad and Tobago
`Ukraine
`Uganda
`United States of America
`Uzbekistan
`Viet Nam
`Yugoslavia
`Zimbabwe
`
`-ii-
`
`

`

`WO 00/21232
`
`PCT /US99/23008
`
`CONVERSATIONAL BROWSER AND
`CONVERSATIONAL SYSTEMS
`
`This application is based on provisional applications U.S. Serial Number 60/102,957,
`
`filed on October 2, 1998, and U.S. Serial No. 60/117,595 filed on January 27, 1999.
`
`5
`
`BACKGROUND
`
`1. Technical Field:
`
`The present invention relates generally to systems and methods for accessing information
`
`and, more particularly, to a conversational browser that provides unification of the access to
`
`various information sources to a standard network protocol (such as HTTP) thereby allowing a
`
`10
`
`pure GUI (graphical user interface) modality and pure speech interface modality to be used
`
`individually ( or in combination) to access the same bank of transaction and information services
`
`without the need for modifying the current networking infrastructure.
`
`2. Description of Related Art:
`
`Currently, there is widespread use of IVR (Interactive Voice Response) services for
`
`15
`
`telephony access to information and transactions. Am IVR system uses spoken directed dialog
`
`and generally operates as follows. A user will dial into an IVR system and then listen to an
`
`audio prompts that provide choices for accessing certain menus and particular information. Each
`
`choice is either assigned to one number on the phone keypad or associated with a word to be
`
`uttered by the user (in voice enabled IVRs) and the user will make a desired selection by pushing
`
`20
`
`the appropriate button or uttering the proper word. Conventional IVR applications are typically
`
`written in specialized script languages that are offered by manufacturers in various incarnations
`
`and for different HW (hardware) platforms. The development and maintenance of such IVR
`
`applications requires qualified staff. Conventional IVR applications use specialized ( and
`
`expensive) telephony HW, and each IVR applications uses different SW (software) layers for
`
`25
`
`accessing legacy database servers. These layers must be specifically designed for each
`
`application.
`
`Furthermore, IVR systems are not designed to handle GUI or other modalities other
`
`than DTMF and speech. Although it is possible to mix binary data
`
`-1-
`
`

`

`WO 00/21232
`
`PCT /US99/23008
`
`and voice on a conventional analog connection, it is not possible to do so with a conventional
`
`IVR as the receiver. Therefore, IVR systems typically do not allow data/binary input and voice
`
`to be merged. Currently, such service would require a separate system configured for handling
`
`binary connections ( e.g. a form of modem). In the near future, Voice over IP (VoIP) and wireless
`
`5
`
`communication ( e.g., GSM) will allow simultaneous transmission of voice and data. Currently,
`
`more than one simultaneous call is needed for simultaneous exchange of binary and voice ( as it
`
`is explained to be useful later to adequately handle specialized tasks) or it will require a later
`
`call or callback for asynchronous transmission of the data. This is typically not convenient. In
`
`particular, the data exchange can be more than sending or receiving compressed speech and
`
`10
`
`information related to building a speech UI, it can also be the necessary information to add
`
`modalities to the UI ( e.g. GUI). Assuming that services will be using multiple lines to offer, for
`
`example, a voice in/ web out (or voice in/ web and voice out) modality where the result of the
`
`queries and the presentation material also result into GUI material ( e.g. HTML displayed on a
`
`GUI browser like Netscape Navigator), the service provider must now add all the IT
`
`15
`
`infrastructure and backend to appropriately networked and synchronize its backends, IVR and
`
`web servers. A simple but very difficult task is the coordination between the behavior/evolution
`
`of the speech presentation material with respect to the GUI or HTML portion of the presentation.
`
`With the rapidly increasing evolution of mobile and home computing, as well as the
`
`prevalence of the Internet, the use of networked PCs, NCs, information kiosks and other
`
`20
`
`consumer devices (as opposed to IVR telephony services) to access information services and
`
`transactions has also become widespread. Indeed, the explosion of Internet and Intranet has
`
`afforded access to virtually every possible information source, database or transaction accessible
`
`through what is generally known as a GUI "Web browser," with the conversion of the data and
`
`the transactions being performed via proxies, servers and/or transcoders.
`
`25
`
`In general, a Web browser is an application program (or client program) that allows a
`
`user to view and interact with information on the WWW (World Wide Web or the "Web") (i.e.,
`
`a client program that utilizes HTTP (Hypertext Transfer Protocol) to make requests of HTTP
`
`servers on the Internet). The HTTP servers on the Internet include "Web pages" that are written
`
`in standard HTML (Hypertext Markup language). An Internet Web page may be accessed from
`
`30
`
`an HTTP server over a packet-switched network, interpreted by the Web browser, and then
`
`presented to the user in graphical form. The textual information presented to the user includes
`-2-
`
`

`

`WO 00/21232
`
`PCT /US99/23008
`
`highlighted hyperlinks to new sources of information. The user can then select a hyperlink by,
`
`e.g., clicking on the with mouse, to download a new Web page for presentation by the Web
`
`browser. The access to legacy databases over the Internet is enabled by several known standards
`
`such as LiveWire and JDBC (Java Database Connectivity). Furthermore, Web pages can
`
`5
`
`include executable code such as applets ( e.g., java programs) that can be downloaded from a
`
`server and executed on the browser or on a JVM Gava virtual machine) of the system on top of
`
`which the browser is built. Other information can be provided by servlets ( e.g., java programs)
`
`running on the server and pushing changes in the connected browser. The applets and servlets
`
`can include CGI (common gateway interface) functions which allow a Web server and
`
`10
`
`applications to communicate with each other. In addition, other information accessing methods
`
`include scripts which are predetermined program languages that are interpreted and executed on
`
`the browser. This includes, for example, javascripts and DHTML (Dynamic HTML) languages.
`
`Plug-ins are programs outside the browser that can be downloaded by the browser and
`
`automatically recognized by the browser to run native on the local device and be executed on
`
`15
`
`arguments that are subsequently provided (via download) by the browser. CGI scripts are server
`
`side scripts that implement the business logic and produce as output of them running the next
`
`presentation material. Applets and plugins can communicate via RMI (remote method
`
`invocation), socket connections, RPC (remote procedure call), etc. In addition, complex
`
`transcoding schemes, XML (Extensible Markup Language) extensions and scripting languages
`
`20
`
`are used for specific information or services or to simplify the interaction.
`
`As explained above, the purpose of the Internet Web browser and IVR is to access
`
`information. The following example describes a typical scenario in connection with a banking
`
`application to demonstrate that the paradigm used for accessing the information via IVR with a
`
`telephone and via the Internet using a PC and Web browser is similar. For instance, the typical
`
`25
`
`banking A TM transaction allows a customer to perform money transfers between savings,
`
`checking and credit card accounts, check account balances using IVR over the telephone. These
`
`transactions can also be performed using a PC with Internet access and a Web browser. In
`
`general, using the PC, the customer can obtain information in a form of a text menus. In the case
`
`of the telephone, the information is presented via audio menus. The mouse clicks on the PC
`
`30
`
`application are transformed to pushing telephone buttons or spoken commands. More
`
`specifically, a typical home banking IVR application begins with a welcome message. Similarly,
`-3-
`
`

`

`WO 00/21232
`
`PCT/US99/23008
`
`the Internet home page of the Bank may display a picture and welcome text and allow the user to
`
`5
`
`choose from a list of services, for example:
`
`a.
`
`b.
`
`C.
`
`d.
`
`e.
`
`f.
`
`instant account information;
`
`transfer and money payment;
`
`fund information;
`
`check information;
`
`stock quotes; and
`
`help.
`
`With the IVR application, the above menu can be played to the user over the telephone,
`
`10
`
`whereby the menu messages are followed by the number or button the user should press to select
`
`the desired option:
`
`15
`
`a.
`
`b.
`
`c.
`
`d.
`
`e.
`
`f.
`
`"for instant account information, press one;"
`
`"for transfer and money payment, press two;"
`
`"for fund information, press three;"
`
`"for check information, press four;"
`
`"for stock quotes, press five;"
`
`"for help, press seven;"
`
`The IVR system may implement speech recognition in lieu of, or in addition to, DTMF
`
`keys. Let's assume that user wants to get the credit card related information. To obtain this
`
`20
`
`information via the Internet based application, the user would click on a particular hypertext link
`
`in a menu to display the next page. In the telephone application, the user would press the
`
`appropriate telephone key to transmit a corresponding DTMF signal. Then, the next menu that
`
`is played back may be:
`
`25
`
`a.
`
`b.
`
`C.
`
`"for available credit, press one";
`
`"for outstanding balance, press two";
`
`"if your account is linked to the checking account, you can pay
`
`your credit
`
`card balance, press three."
`
`Again, the user can make a desired selection by pressing the appropriate key.
`
`To continue, the user may be prompted to provide identification information. For this
`
`30
`
`purpose, the Internet application may display, for example, a menu with an empty field for the
`
`user's account number and another for the users social security number. After the information is
`-4-
`
`

`

`WO 00/21232
`
`PCT /US99/23008
`
`filled in it is posted to the server, processed, the replay is formatted and sent back to the user.
`
`Over the telephone the scenario is the same. The IVR system may playback ( over the telephone)
`
`an audio prompt requesting the user to enter his/her account number (via DTMF or speech), and
`
`the information is received from the user by processing the DTMF signaling or recognizing the
`
`5
`
`speech. The user may then be prompted to input his/her SSN and the reply is processed in a
`
`similar way. When the processing is complete, the information is sent to a server, wherein the
`
`account information is accessed, formatted to audio replay, and then played back to the user over
`
`the telephone.
`
`As demonstrated above, IVRs use the same paradigm for information access as Web
`
`10
`
`browsers and fulfill the same functionality. Indeed, beyond their interface and modality
`
`differences, IVR systems and Web browsers are currently designed and developed as
`
`fundamentally different systems. In the near future, however, banks and large corporations will
`
`be moving their publicly accessible information sources to the Internet while keeping the old
`
`IVRs. Unfortunately, this would require these institutions to maintain separate systems for the
`
`15
`
`same type of information and transaction services. It would be beneficial for banks and
`
`corporations to be able to provide information and services via IVR over the Internet using the
`
`existing infrastructure. In view of this, a universal system and method that would allow a user to
`
`access information and perform transactions over the Internet using IVR and conventional
`
`browsers is desired.
`
`20
`
`SUMMARY OF THE INVENTION
`
`The present invention is directed to a system and method for unifying the access to
`
`applications to a standard protocol, irrespective of the mode of access. In particular, the present
`
`invention provides a universal method and system for accessing information and performing
`
`transactions utilizing, for example, a standard networking protocol based on TCP/IP (such as
`
`25
`
`HTTP (Hypterext Transfer protocol) or WAP (wireless application protocol) and architecture to
`
`access information from, e.g., a HTTP server over the Internet such that a pure GUI (graphical
`
`user interface) modality and pure speech interface modality can be used individually ( or in
`
`combination) to access the same bank of transaction and information services without requiring
`
`modification of the current infrastructure of currently available networks.
`
`-5-
`
`

`

`WO 00/21232
`
`PCT /US99/23008
`
`In one embodiment of the present invention, a conversational browser is provided that
`
`translates commands over the telephone to an HTTP protocol. The introduction of the
`
`conversational browser allows us to unify Internet and Telephone (IVR) and thereby decrease the
`
`cost, enlarge the coverage and flexibility of such applications. In particular, for IVR
`
`5
`
`applications, the conversational browser or (telephony browser) can interpret DTMF signaling
`
`and/or spoken commands from a user, generate HTTP requests to access information from the
`
`appropriate HTTP server, and then interpret HTML-based information and present it to the user
`
`via audio messages. The conversational browser can also decode compressed audio which is
`
`received from the HTTP server in the HTTP protocol, and play it reconstructed to the user.
`
`10
`
`Conversely, it can capture the audio and transmit it (compressed or not) to the server for
`
`distributed recognition and processing. When the audio is captured locally and shipped to the
`
`server, this can be done with a plug-in (native implementation) or for example with ajava
`
`applet or java program using audio and multimedia API to capture tl)e user's input.
`
`For the new proposed IVR architecture and conversational browser, the content pages
`
`15
`
`are on the same HTTP server that are accessed by conventional modes such as GUI browsers,
`
`and use the same information access methods, sharing the legacy database access SW layer, etc.
`
`In other words, an IVR is now a special case of a HTTP server with a conversational browser.
`
`Similar to the conventional GUI browser and PC, the conversational browser, the information
`
`and queries will be sent over the switched packed network using the same protocol (HTTP).
`
`20
`
`The present invention will allow an application designer to set up the application using
`
`one framework, irrespective of the mode of access, whether it is through telephone or a WWW
`
`browser. All interactions between the application and the client are standardized to the HTTP
`
`protocol, with information presented through html and its extensions, as appropriate. The
`
`application on the WWW server has access to the type of client that is accessing the application
`
`25
`
`(telephone, PC browser or other networked consumer device) and the information that is
`
`presented to the client can be structured appropriately. The application still needs to only
`
`support one standard protocol for client access. In addition, the application and content is
`
`presented in a uniformed framework which is easy to design, maintain and modify.
`
`In another aspect of the present invention, a conversational browser interprets
`
`30
`
`conversational mark-up language (CML) which follows the XML specifications. CML allows
`
`new experienced application developers to rapidly develop conversational dialogs. In another
`-6-
`
`

`

`WO 00/21232
`
`PCT/US99/23008
`
`aspect, CML may follow other declarative syntax or method. Pursuing further the analogy with
`
`HTML and the World Wide Web, CML and conversational browser provide a simple and
`
`systematic way to build a conversational user interface around legacy enterprise applications and
`
`legacy databases.
`
`5
`
`CML files/documents can be accessed from HTTP server using standard networking protocols.
`
`The CML pages describe the conversational UI to be presented to the user via the conversational
`
`browser. Preferably, CML pages are defined by tags which are based on the XML application.
`
`The primary elements are <page>, <body><menu>, and <form>. Pages group other CML
`
`elements, and serve as the top-level element for a CML document (as required by XML). Bodies
`
`10
`
`specify output to be spoken by the browser. Menus present the user with a list of choices, and
`
`associate with each choice a URL identifying a CML element to visit if the user selects that
`
`choice. Forms allow the user to provide one or more pieces of information, where the content of
`
`each piece of information is described by a grammar. The form element also specifies a URL to
`
`visit when the user has completed the form.
`
`15
`
`In another aspect, conversational mark-up language rules can be added by a content
`
`provider to an HTML file ( or used in place of HTML) to take full advantage of the
`
`conversational browser.
`
`In yet another aspect, a conversational transcoder transforms presentation material form
`
`one modality to a conversational modality (typically, speech only and/or speech and GUI). This
`
`20
`
`involves functional transformation to transform one page of GUI to a page of CUI
`
`( conversational user interface), as well as logical transcoding to transform business logic of an
`
`application, transaction or site into an acceptable dialog. Conversational transcoding can convert
`
`HTML files into CML files that are interpreted by the conversational browser. The transcoder
`
`may be a proprietary application of the server, browser or content provider.
`
`25
`
`In another aspect, HTML/GUI based structure skeletons can be used to capture the dialog
`
`logic or business logic of a GUI site. This information can be used to map the sit, logic or
`
`application. After appropriate organization of the dialog flow, each element can undergo
`
`functional transcoding into a speech only content or a multi-modal (synchronized GUI and
`
`speech interface) page.
`
`30
`
`In another aspect, a conversational proxy is provided to modify and/or prepare the
`
`content description of the application, logic or site to the capabilities of, e.g., the device,
`-7-
`
`

`

`WO 00/21232
`
`PCT /US99/23008
`
`browser and/or engines, preferences of the user or application, load on the servers, traffic on the
`
`network, location of the conversational arguments (data files). For instance, the conversational
`
`proxy can directly convert proprietary formats such as screen maps of corporate software.
`
`These and other aspects, features and advantages of the present invention will be
`
`5
`
`described and become apparent from the following detailed description of preferred
`
`embodiments, which is to be read in connection with the accompanying drawings.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`Fig. 1 is a diagram of a conversational browsing system according to a preferred
`
`embodiment of the present invention;
`
`10
`
`Fig. 2 is a block diagram of a system for accessing information implementing a
`
`conversational browsing system according to an embodiment of the present invention;
`
`Fig. 3 is a block diagram of a system for accessing information implementing a
`
`conversational browsing system according to another embodiment of the present invention;
`
`Fig. 4a is a block diagram illustrating a distributed system for accessing information
`
`15
`
`implementing a conversational browsing system according to an embodiment of the present
`
`invention;
`
`Fig. 4b is a block diagram illustrating a distributed system for accessing information
`
`implementing a conversational browsing system according to another embodiment of the present
`
`invention;
`
`20
`
`Fig. 5 is a block diagram of a conversational information accessing system using
`
`conversational markup language according to an embodiment of the present invention;
`
`Fig. 6 is a general diagram of a distributed conversational system using conversational
`
`markup language accordance to an embodiment of the present invention;
`
`Fig. 7 is a diagram of an exemplary distributed conversational system using
`
`25
`
`conversational markup language according to an embodiment of the present invention;
`
`Fig. 8 is a diagram of another exemplary distributed conversational system using
`
`conversational markup language according to another embodiment of the present invention;
`
`Fig. 9 is a diagram of a yet another distributed conversational information accessing
`
`system using conversational markup language according to an embodiment of the present
`
`30
`
`invention; and
`
`-8-
`
`

`

`WO 00/21232
`
`PCT /US99/23008
`
`Fig. 10 is a diagram of another exemplary distributed conversational information
`
`accessing system using conversational markup language according to an embodiment of the
`
`present invention.
`
`DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
`
`5
`
`The present invention is directed to a conversational browsing system and CML
`
`( conversational markup language) for building a conversational browsing system using a set of
`
`interlinked CML pages. This conversational system is conceptually analogous to building
`
`conventional GUI browser applications using a set of interlinked pages written using HTML
`
`(hypertext markup language). Moreover, just as HTML provides a set of mechanisms for
`
`10
`
`translating GUI actions into application actions such as visiting other pages or communicating
`
`with a server, the conversational browser and CML are used for translating spoken inputs into
`
`similar application actions. A CML page describes the conversational UI to be interpreted and
`
`presented to the user via the conversational browser. Preferably, CML pages are defined by tags
`
`which are based on the current XML ( extensible markup language) application ( as described in
`
`15
`
`detail below).
`
`It is to be understood that the conversational systems and methods described herein may
`
`be implemented in various forms of hardware, software, firmware, special purpose processors, or
`
`a combination thereof. In particular, the conversational browser is preferably implemented as
`
`an application comprising program instructions that are tangibly embodied on a program storage
`
`20
`
`device (e.g., magnetic floppy disk, RAM, ROM, CD ROM and/or Flash memory) and executable
`
`by any device or machine comprising suitable architecture such as personal computers and
`
`pervasive computing devices such as PDAs and smart phones.
`
`It is to be further understood that, because some of the constituent components of the
`
`conversational browser and other system components depicted in the accompanying Figures are
`
`25
`
`preferably implemented in software, the actual connections between such components may differ
`
`depending upon the manner in which the present invention is programmed. Given the teachings
`
`herein, one of ordinary skill in the related art will be able to contemplate these and similar
`
`implementations or configurations of the present invention.
`
`-9-
`
`

`

`WO 00/21232
`
`PCT /US99/23008
`
`Conversational Browser Architecture
`
`Referring now to Fig. 1, a block diagram illustrates a conversational browser system
`
`according to a preferred embodiment of the present invention. In general, a conversational
`
`browsing system 10 allows a user to access legacy information services and transactions through
`
`5
`
`voice input ( either uniquely or in conjunction with other modalities such as DTMF, keyboard,
`
`mouse, etc) using a standard networking protocol such as HTTP. In addition, it is to be
`
`understood that the HTTP protocol is a preferred embodiment of the present invention but other
`
`similar protocols can be used advantageously. For example, this can be deployed on top of any
`
`protocol that such as TCP/IP, WAP (Wireless Application Protocol), GSM, VoIP, etc., or any
`
`10
`
`other protocol that supports IP (and therefore provide TCP/IP or similar features. Even more
`
`generally, if TCP/IP is not available we can implement another protocol offering features
`
`similar to TCP/IP or at least performing network and transport functions (the present invention is
`
`not dependent on the transport and network layer) ..
`
`In Fig. 1, a conversational browsing system 10 according to one embodiment of the
`
`15
`
`present invention comprises a conversational browser 11 ( conversational application) which
`
`executes on top of a CVM ( conversational virtual machine) system 15. The conversational
`
`browser 11 comprises a transcoder module 11 which, in general, transcodes conventional
`
`(legacy) structured document formats such as HTML or DB2 into an intermediate document, or
`
`CML ( conversational markup language) document in accordance with prespecified transcoding
`
`20
`
`rules (as discussed below). A CML document describes the conversational UI of the legacy
`
`information format to be presented to the user. More specifically, a CML document comprises
`
`meta- information which is processed by a CML parser/ processor 14 to present, for example,
`
`HTML-based information to a user as synthesized audio messages. It is to be understood that
`
`various embodiments for a CML document are contemplated for implementation with the
`
`25
`
`present invention. In a preferred embodiment described in detail below, a CML document is
`
`defined by tags which are based on XML ( extensible markup language). It is to be understood,
`
`however, that any declarative method for implementing CML may be employed. XML is
`
`currently preferred because of its simplicity, power and current popularity.
`
`The conversational browser 11 further comprises a command/ request processor 12 (a
`
`30
`
`command and control interface) which converts user command (multi- modal) inputs such as
`
`speech commands, DTMF signals, and keyboard input into navigation requests such as HTTP
`-10-
`
`

`

`WO 00/21232
`
`PCT/US99/23008
`
`requests. It is to be understood that in a pure speech conversational browser, the only input is
`
`speech. However, the conversational browser 11 can be configured for multi-modal input.
`
`When certain conversational functions or services are needed, the conversational browser
`
`11 will make API calls to the CVM 15 requesting such services (as described below). For
`
`5
`
`instance, when interpreting a CML document (via the CML parser/processor 14), the
`
`conversational browser 11 may hook to a TTS (text-to-speech syntheses) engine 26 (via the
`
`CVM 15) to provide synthesized speech output to the user. In addition, when speech commands
`
`or natural language queries (e.g., navigation requests) are input, the conversational browser 11
`
`may hook to a speech recognition engine 24 and NLU (natural language understanding) engine
`
`10
`
`25 to process such input commands, thereby allowing the command/request processor to
`
`generate, e.g., the appropriate HTTP requests. The CVM system 15 is a shell that can run on top
`
`of any conventional OS (operating system) or RTOS (real-time operating system). A detailed
`
`discussion of the architecture and operation of the CVM system 15 is provided in the patent
`
`application IBM Docket No. YO999-111P, filed concurrently herewith, entitled
`
`15
`
`"Conversational Computing Via Conversational Virtual Machine," which is commonly

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket