`INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT)
`WO 00/21232
`
`(51) International Patent Classification 7 :
`H04L
`
`(11) International Publication Number:
`
`A2
`
`(43) International Publication Date:
`
`13 April 2000 (13.04.00)
`
`WORLD INTELLECTUAL PROPERTY ORGANIZATION
`International Bureau
`
`(21) International Application Number:
`
`PCT/US99/23008
`
`(22) International Filing Date:
`
`1 October 1999 (01.10.99)
`
`(81) Designated States: CA, CN, IL, IN, JP, KR, US, European
`patent (AT, BE, CH, CY, DE, DK, ES, Fl, FR, GB, GR,
`IE, IT, LU, MC, NL, PT, SE).
`
`(30) Priority Data:
`60/102,957
`60/117,595
`
`2 October 1998 (02.10.98)
`27 January 1999 (27.01.99)
`
`Published
`Without international search report and to be republished
`upon receipt of that report.
`
`US
`US
`
`(71) Applicant (for all designated States except US):
`INTER(cid:173)
`NATIONAL BUSINESS MACHINES CORPORATION
`[US/US]; Old Orchard Road, Armonk, NY 10504 (US).
`
`(72) Inventors; and
`(75) Inventors/Applicants (for US only): GOPALAKRISHNAN,
`Ponani [IN/US]; 3073 Radcliff Drive, Yorktown Heights,
`NY 10598 (US). LUCAS, Bruce, D. [US/US]; 2408 Mill
`Pond Road, Yorktown Heights, NY 10598 (US). MAES,
`Stephane, H. [BE/US]; 1 Wintergreen Hill Road, Danbury,
`CT 06811 (US). NAHAMOO, David [IR/US]; 12 Elm(cid:173)
`wood Road, White Plains, NY 10605 (US). SEDIVY, Jan
`[CZ/CZ]; U lesa 11, Praha (CZ).
`
`(74) Agent: OTTERSTEDT, Paul, J.; International Business Ma(cid:173)
`chines Corporation, Yorktown IP Law Department, T.J.
`Watson Research Center, Route 134 and Kitchawan Road,
`Yorktown Heights, NY 10598 (US).
`
`(54) Title: CONVERSATIONAL BROWSER AND CONVERSATIONAL SYSTEMS
`
`(57) Abstract
`
`A conversational browsing system (10) comprising a
`conversational browser (11) having a command and con(cid:173)
`trol interface (12) for converting speech commands or
`multi-modal input from 1/0 resources (27) into navigation
`request, a processor (14) for parsing and interpreting a CML
`(conversational markup language) file, the CML file compris(cid:173)
`ing meta-information representing a conversational user in(cid:173)
`terface for presentation to a user. The system (10) comprises
`conversational engines (23) for decoding input commands
`for interpretation by the command and control interface and
`decoding meta-information provided by the CML proces(cid:173)
`sor for generating synthesized audio output. The browser
`(11) accesses the engine (23) via system calls through a sys(cid:173)
`tem platform (15). The system includes a communication
`stack (19) for transmitting the navigation request to a con(cid:173)
`tent server and receiving a CML file from the content server
`based on the navigation request. A conversational transcoder
`(13) transforms presentation material from one modality to
`a conversational modality. The transcoder (13) includes a
`functional transcoder (13a) to transform a page of GUI to
`a page of CUI (conversational user interface) and a logical
`transcoder (13b) to transform business logic of an applica(cid:173)
`tion, transaction or site into an acceptable dialog. Conver(cid:173)
`sational transcoding can convert HTML files into CML files
`that are interpreted by the conversational browser (11 ).
`
`Convmcllonal Bnlwnr
`
`II
`
`Transcoder
`Functional
`Transcoder
`
`logical
`Transcod~
`
`ct3b
`
`I
`
`10
`
`~
`
`r--------------------
`
`16
`
`Convenoflonal API-CW API
`
`17
`Conversaffonal Virtual Machine (CVII)
`(K11111I)
`
`___________ .L'~------,
`-----~-..... --,
`
`19
`
`20
`
`I
`
`I
`I
`
`Communlcaffon Slack
`
`Cawenllonal Prolocols
`(TCP/IP, HTTP, WAP, etc.)
`
`Al
`
`Conversaffonal Prolocols
`(Coonlinaffon, Registration, Discovery,
`Negotiation, Speech Coding, ate)
`
`I
`I
`I
`1-----''-~----_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_u_-_-_-J
`Convenllonal Drivers/APls
`I
`
`: .... ____ ..., -----------------
`
`L-----
`
`27
`
`33
`Audio Caphl11
`Comp11sslon
`Oecomp11sslon
`Reconslrvcllon
`
`-i-
`
`Amazon Exhibit 1006
`IPR Petition - USP 9,716,732
`
`
`
`FOR THE PURPOSES OF INFORMATION ONLY
`
`Codes used to identify States party to the PCT on the front pages of pamphlets publishing international applications under the PCT.
`
`AL
`AM
`AT
`AU
`AZ
`BA
`BB
`BE
`BF
`BG
`BJ
`BR
`BY
`CA
`CF
`CG
`CH
`CI
`CM
`CN
`cu
`CZ
`DE
`DK
`EE
`
`Albania
`Armenia
`Austria
`Australia
`Azerbaijan
`Bosnia and Herzegovina
`Barbados
`Belgium
`Burkina Faso
`Bulgaria
`Benin
`Brazil
`Belarus
`Canada
`Central African Republic
`Congo
`Switzerland
`Cote d'Ivoire
`Cameroon
`China
`Cuba
`Czech Republic
`Germany
`Denmark
`Estonia
`
`ES
`Fl
`FR
`GA
`GB
`GE
`GH
`GN
`GR
`HU
`IE
`IL
`IS
`IT
`JP
`KE
`KG
`KP
`
`KR
`KZ
`LC
`LI
`LK
`LR
`
`Spain
`Finland
`France
`Gabon
`United Kingdom
`Georgia
`Ghana
`Guinea
`Greece
`Hungary
`Ireland
`Israel
`Iceland
`Italy
`Japan
`Kenya
`Kyrgyzstan
`Democratic People's
`Republic of Korea
`Republic of Korea
`Kazakstan
`Saint Lucia
`Liechtenstein
`Sri Lanka
`Liberia
`
`LS
`LT
`LU
`LV
`MC
`MD
`MG
`MK
`
`ML
`MN
`MR
`MW
`MX
`NE
`NL
`NO
`NZ'
`PL
`PT
`RO
`RU
`SD
`SE
`SG
`
`Lesotho
`Lithuania
`Luxembourg
`Latvia
`Monaco
`Republic of Moldova
`Madagascar
`The former Yugoslav
`Republic of Macedonia
`Mali
`Mongolia
`Mauritania
`Malawi
`Mexico
`Niger
`Netherlands
`Norway
`New Zealand
`Poland
`Portugal
`Romania
`Russian Federation
`Sudan
`Sweden
`Singapore
`
`SI
`SK
`SN
`sz
`TD
`TG
`TJ
`TM
`TR
`TT
`UA
`UG
`us
`uz
`VN
`YU
`zw
`
`Slovenia
`Slovakia
`Senegal
`Swaziland
`Chad
`Togo
`Tajikistan
`Turkmenistan
`Turkey
`Trinidad and Tobago
`Ukraine
`Uganda
`United States of America
`Uzbekistan
`Viet Nam
`Yugoslavia
`Zimbabwe
`
`-ii-
`
`
`
`WO 00/21232
`
`PCT /US99/23008
`
`CONVERSATIONAL BROWSER AND
`CONVERSATIONAL SYSTEMS
`
`This application is based on provisional applications U.S. Serial Number 60/102,957,
`
`filed on October 2, 1998, and U.S. Serial No. 60/117,595 filed on January 27, 1999.
`
`5
`
`BACKGROUND
`
`1. Technical Field:
`
`The present invention relates generally to systems and methods for accessing information
`
`and, more particularly, to a conversational browser that provides unification of the access to
`
`various information sources to a standard network protocol (such as HTTP) thereby allowing a
`
`10
`
`pure GUI (graphical user interface) modality and pure speech interface modality to be used
`
`individually ( or in combination) to access the same bank of transaction and information services
`
`without the need for modifying the current networking infrastructure.
`
`2. Description of Related Art:
`
`Currently, there is widespread use of IVR (Interactive Voice Response) services for
`
`15
`
`telephony access to information and transactions. Am IVR system uses spoken directed dialog
`
`and generally operates as follows. A user will dial into an IVR system and then listen to an
`
`audio prompts that provide choices for accessing certain menus and particular information. Each
`
`choice is either assigned to one number on the phone keypad or associated with a word to be
`
`uttered by the user (in voice enabled IVRs) and the user will make a desired selection by pushing
`
`20
`
`the appropriate button or uttering the proper word. Conventional IVR applications are typically
`
`written in specialized script languages that are offered by manufacturers in various incarnations
`
`and for different HW (hardware) platforms. The development and maintenance of such IVR
`
`applications requires qualified staff. Conventional IVR applications use specialized ( and
`
`expensive) telephony HW, and each IVR applications uses different SW (software) layers for
`
`25
`
`accessing legacy database servers. These layers must be specifically designed for each
`
`application.
`
`Furthermore, IVR systems are not designed to handle GUI or other modalities other
`
`than DTMF and speech. Although it is possible to mix binary data
`
`-1-
`
`
`
`WO 00/21232
`
`PCT /US99/23008
`
`and voice on a conventional analog connection, it is not possible to do so with a conventional
`
`IVR as the receiver. Therefore, IVR systems typically do not allow data/binary input and voice
`
`to be merged. Currently, such service would require a separate system configured for handling
`
`binary connections ( e.g. a form of modem). In the near future, Voice over IP (VoIP) and wireless
`
`5
`
`communication ( e.g., GSM) will allow simultaneous transmission of voice and data. Currently,
`
`more than one simultaneous call is needed for simultaneous exchange of binary and voice ( as it
`
`is explained to be useful later to adequately handle specialized tasks) or it will require a later
`
`call or callback for asynchronous transmission of the data. This is typically not convenient. In
`
`particular, the data exchange can be more than sending or receiving compressed speech and
`
`10
`
`information related to building a speech UI, it can also be the necessary information to add
`
`modalities to the UI ( e.g. GUI). Assuming that services will be using multiple lines to offer, for
`
`example, a voice in/ web out (or voice in/ web and voice out) modality where the result of the
`
`queries and the presentation material also result into GUI material ( e.g. HTML displayed on a
`
`GUI browser like Netscape Navigator), the service provider must now add all the IT
`
`15
`
`infrastructure and backend to appropriately networked and synchronize its backends, IVR and
`
`web servers. A simple but very difficult task is the coordination between the behavior/evolution
`
`of the speech presentation material with respect to the GUI or HTML portion of the presentation.
`
`With the rapidly increasing evolution of mobile and home computing, as well as the
`
`prevalence of the Internet, the use of networked PCs, NCs, information kiosks and other
`
`20
`
`consumer devices (as opposed to IVR telephony services) to access information services and
`
`transactions has also become widespread. Indeed, the explosion of Internet and Intranet has
`
`afforded access to virtually every possible information source, database or transaction accessible
`
`through what is generally known as a GUI "Web browser," with the conversion of the data and
`
`the transactions being performed via proxies, servers and/or transcoders.
`
`25
`
`In general, a Web browser is an application program (or client program) that allows a
`
`user to view and interact with information on the WWW (World Wide Web or the "Web") (i.e.,
`
`a client program that utilizes HTTP (Hypertext Transfer Protocol) to make requests of HTTP
`
`servers on the Internet). The HTTP servers on the Internet include "Web pages" that are written
`
`in standard HTML (Hypertext Markup language). An Internet Web page may be accessed from
`
`30
`
`an HTTP server over a packet-switched network, interpreted by the Web browser, and then
`
`presented to the user in graphical form. The textual information presented to the user includes
`-2-
`
`
`
`WO 00/21232
`
`PCT /US99/23008
`
`highlighted hyperlinks to new sources of information. The user can then select a hyperlink by,
`
`e.g., clicking on the with mouse, to download a new Web page for presentation by the Web
`
`browser. The access to legacy databases over the Internet is enabled by several known standards
`
`such as LiveWire and JDBC (Java Database Connectivity). Furthermore, Web pages can
`
`5
`
`include executable code such as applets ( e.g., java programs) that can be downloaded from a
`
`server and executed on the browser or on a JVM Gava virtual machine) of the system on top of
`
`which the browser is built. Other information can be provided by servlets ( e.g., java programs)
`
`running on the server and pushing changes in the connected browser. The applets and servlets
`
`can include CGI (common gateway interface) functions which allow a Web server and
`
`10
`
`applications to communicate with each other. In addition, other information accessing methods
`
`include scripts which are predetermined program languages that are interpreted and executed on
`
`the browser. This includes, for example, javascripts and DHTML (Dynamic HTML) languages.
`
`Plug-ins are programs outside the browser that can be downloaded by the browser and
`
`automatically recognized by the browser to run native on the local device and be executed on
`
`15
`
`arguments that are subsequently provided (via download) by the browser. CGI scripts are server
`
`side scripts that implement the business logic and produce as output of them running the next
`
`presentation material. Applets and plugins can communicate via RMI (remote method
`
`invocation), socket connections, RPC (remote procedure call), etc. In addition, complex
`
`transcoding schemes, XML (Extensible Markup Language) extensions and scripting languages
`
`20
`
`are used for specific information or services or to simplify the interaction.
`
`As explained above, the purpose of the Internet Web browser and IVR is to access
`
`information. The following example describes a typical scenario in connection with a banking
`
`application to demonstrate that the paradigm used for accessing the information via IVR with a
`
`telephone and via the Internet using a PC and Web browser is similar. For instance, the typical
`
`25
`
`banking A TM transaction allows a customer to perform money transfers between savings,
`
`checking and credit card accounts, check account balances using IVR over the telephone. These
`
`transactions can also be performed using a PC with Internet access and a Web browser. In
`
`general, using the PC, the customer can obtain information in a form of a text menus. In the case
`
`of the telephone, the information is presented via audio menus. The mouse clicks on the PC
`
`30
`
`application are transformed to pushing telephone buttons or spoken commands. More
`
`specifically, a typical home banking IVR application begins with a welcome message. Similarly,
`-3-
`
`
`
`WO 00/21232
`
`PCT/US99/23008
`
`the Internet home page of the Bank may display a picture and welcome text and allow the user to
`
`5
`
`choose from a list of services, for example:
`
`a.
`
`b.
`
`C.
`
`d.
`
`e.
`
`f.
`
`instant account information;
`
`transfer and money payment;
`
`fund information;
`
`check information;
`
`stock quotes; and
`
`help.
`
`With the IVR application, the above menu can be played to the user over the telephone,
`
`10
`
`whereby the menu messages are followed by the number or button the user should press to select
`
`the desired option:
`
`15
`
`a.
`
`b.
`
`c.
`
`d.
`
`e.
`
`f.
`
`"for instant account information, press one;"
`
`"for transfer and money payment, press two;"
`
`"for fund information, press three;"
`
`"for check information, press four;"
`
`"for stock quotes, press five;"
`
`"for help, press seven;"
`
`The IVR system may implement speech recognition in lieu of, or in addition to, DTMF
`
`keys. Let's assume that user wants to get the credit card related information. To obtain this
`
`20
`
`information via the Internet based application, the user would click on a particular hypertext link
`
`in a menu to display the next page. In the telephone application, the user would press the
`
`appropriate telephone key to transmit a corresponding DTMF signal. Then, the next menu that
`
`is played back may be:
`
`25
`
`a.
`
`b.
`
`C.
`
`"for available credit, press one";
`
`"for outstanding balance, press two";
`
`"if your account is linked to the checking account, you can pay
`
`your credit
`
`card balance, press three."
`
`Again, the user can make a desired selection by pressing the appropriate key.
`
`To continue, the user may be prompted to provide identification information. For this
`
`30
`
`purpose, the Internet application may display, for example, a menu with an empty field for the
`
`user's account number and another for the users social security number. After the information is
`-4-
`
`
`
`WO 00/21232
`
`PCT /US99/23008
`
`filled in it is posted to the server, processed, the replay is formatted and sent back to the user.
`
`Over the telephone the scenario is the same. The IVR system may playback ( over the telephone)
`
`an audio prompt requesting the user to enter his/her account number (via DTMF or speech), and
`
`the information is received from the user by processing the DTMF signaling or recognizing the
`
`5
`
`speech. The user may then be prompted to input his/her SSN and the reply is processed in a
`
`similar way. When the processing is complete, the information is sent to a server, wherein the
`
`account information is accessed, formatted to audio replay, and then played back to the user over
`
`the telephone.
`
`As demonstrated above, IVRs use the same paradigm for information access as Web
`
`10
`
`browsers and fulfill the same functionality. Indeed, beyond their interface and modality
`
`differences, IVR systems and Web browsers are currently designed and developed as
`
`fundamentally different systems. In the near future, however, banks and large corporations will
`
`be moving their publicly accessible information sources to the Internet while keeping the old
`
`IVRs. Unfortunately, this would require these institutions to maintain separate systems for the
`
`15
`
`same type of information and transaction services. It would be beneficial for banks and
`
`corporations to be able to provide information and services via IVR over the Internet using the
`
`existing infrastructure. In view of this, a universal system and method that would allow a user to
`
`access information and perform transactions over the Internet using IVR and conventional
`
`browsers is desired.
`
`20
`
`SUMMARY OF THE INVENTION
`
`The present invention is directed to a system and method for unifying the access to
`
`applications to a standard protocol, irrespective of the mode of access. In particular, the present
`
`invention provides a universal method and system for accessing information and performing
`
`transactions utilizing, for example, a standard networking protocol based on TCP/IP (such as
`
`25
`
`HTTP (Hypterext Transfer protocol) or WAP (wireless application protocol) and architecture to
`
`access information from, e.g., a HTTP server over the Internet such that a pure GUI (graphical
`
`user interface) modality and pure speech interface modality can be used individually ( or in
`
`combination) to access the same bank of transaction and information services without requiring
`
`modification of the current infrastructure of currently available networks.
`
`-5-
`
`
`
`WO 00/21232
`
`PCT /US99/23008
`
`In one embodiment of the present invention, a conversational browser is provided that
`
`translates commands over the telephone to an HTTP protocol. The introduction of the
`
`conversational browser allows us to unify Internet and Telephone (IVR) and thereby decrease the
`
`cost, enlarge the coverage and flexibility of such applications. In particular, for IVR
`
`5
`
`applications, the conversational browser or (telephony browser) can interpret DTMF signaling
`
`and/or spoken commands from a user, generate HTTP requests to access information from the
`
`appropriate HTTP server, and then interpret HTML-based information and present it to the user
`
`via audio messages. The conversational browser can also decode compressed audio which is
`
`received from the HTTP server in the HTTP protocol, and play it reconstructed to the user.
`
`10
`
`Conversely, it can capture the audio and transmit it (compressed or not) to the server for
`
`distributed recognition and processing. When the audio is captured locally and shipped to the
`
`server, this can be done with a plug-in (native implementation) or for example with ajava
`
`applet or java program using audio and multimedia API to capture tl)e user's input.
`
`For the new proposed IVR architecture and conversational browser, the content pages
`
`15
`
`are on the same HTTP server that are accessed by conventional modes such as GUI browsers,
`
`and use the same information access methods, sharing the legacy database access SW layer, etc.
`
`In other words, an IVR is now a special case of a HTTP server with a conversational browser.
`
`Similar to the conventional GUI browser and PC, the conversational browser, the information
`
`and queries will be sent over the switched packed network using the same protocol (HTTP).
`
`20
`
`The present invention will allow an application designer to set up the application using
`
`one framework, irrespective of the mode of access, whether it is through telephone or a WWW
`
`browser. All interactions between the application and the client are standardized to the HTTP
`
`protocol, with information presented through html and its extensions, as appropriate. The
`
`application on the WWW server has access to the type of client that is accessing the application
`
`25
`
`(telephone, PC browser or other networked consumer device) and the information that is
`
`presented to the client can be structured appropriately. The application still needs to only
`
`support one standard protocol for client access. In addition, the application and content is
`
`presented in a uniformed framework which is easy to design, maintain and modify.
`
`In another aspect of the present invention, a conversational browser interprets
`
`30
`
`conversational mark-up language (CML) which follows the XML specifications. CML allows
`
`new experienced application developers to rapidly develop conversational dialogs. In another
`-6-
`
`
`
`WO 00/21232
`
`PCT/US99/23008
`
`aspect, CML may follow other declarative syntax or method. Pursuing further the analogy with
`
`HTML and the World Wide Web, CML and conversational browser provide a simple and
`
`systematic way to build a conversational user interface around legacy enterprise applications and
`
`legacy databases.
`
`5
`
`CML files/documents can be accessed from HTTP server using standard networking protocols.
`
`The CML pages describe the conversational UI to be presented to the user via the conversational
`
`browser. Preferably, CML pages are defined by tags which are based on the XML application.
`
`The primary elements are <page>, <body><menu>, and <form>. Pages group other CML
`
`elements, and serve as the top-level element for a CML document (as required by XML). Bodies
`
`10
`
`specify output to be spoken by the browser. Menus present the user with a list of choices, and
`
`associate with each choice a URL identifying a CML element to visit if the user selects that
`
`choice. Forms allow the user to provide one or more pieces of information, where the content of
`
`each piece of information is described by a grammar. The form element also specifies a URL to
`
`visit when the user has completed the form.
`
`15
`
`In another aspect, conversational mark-up language rules can be added by a content
`
`provider to an HTML file ( or used in place of HTML) to take full advantage of the
`
`conversational browser.
`
`In yet another aspect, a conversational transcoder transforms presentation material form
`
`one modality to a conversational modality (typically, speech only and/or speech and GUI). This
`
`20
`
`involves functional transformation to transform one page of GUI to a page of CUI
`
`( conversational user interface), as well as logical transcoding to transform business logic of an
`
`application, transaction or site into an acceptable dialog. Conversational transcoding can convert
`
`HTML files into CML files that are interpreted by the conversational browser. The transcoder
`
`may be a proprietary application of the server, browser or content provider.
`
`25
`
`In another aspect, HTML/GUI based structure skeletons can be used to capture the dialog
`
`logic or business logic of a GUI site. This information can be used to map the sit, logic or
`
`application. After appropriate organization of the dialog flow, each element can undergo
`
`functional transcoding into a speech only content or a multi-modal (synchronized GUI and
`
`speech interface) page.
`
`30
`
`In another aspect, a conversational proxy is provided to modify and/or prepare the
`
`content description of the application, logic or site to the capabilities of, e.g., the device,
`-7-
`
`
`
`WO 00/21232
`
`PCT /US99/23008
`
`browser and/or engines, preferences of the user or application, load on the servers, traffic on the
`
`network, location of the conversational arguments (data files). For instance, the conversational
`
`proxy can directly convert proprietary formats such as screen maps of corporate software.
`
`These and other aspects, features and advantages of the present invention will be
`
`5
`
`described and become apparent from the following detailed description of preferred
`
`embodiments, which is to be read in connection with the accompanying drawings.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`Fig. 1 is a diagram of a conversational browsing system according to a preferred
`
`embodiment of the present invention;
`
`10
`
`Fig. 2 is a block diagram of a system for accessing information implementing a
`
`conversational browsing system according to an embodiment of the present invention;
`
`Fig. 3 is a block diagram of a system for accessing information implementing a
`
`conversational browsing system according to another embodiment of the present invention;
`
`Fig. 4a is a block diagram illustrating a distributed system for accessing information
`
`15
`
`implementing a conversational browsing system according to an embodiment of the present
`
`invention;
`
`Fig. 4b is a block diagram illustrating a distributed system for accessing information
`
`implementing a conversational browsing system according to another embodiment of the present
`
`invention;
`
`20
`
`Fig. 5 is a block diagram of a conversational information accessing system using
`
`conversational markup language according to an embodiment of the present invention;
`
`Fig. 6 is a general diagram of a distributed conversational system using conversational
`
`markup language accordance to an embodiment of the present invention;
`
`Fig. 7 is a diagram of an exemplary distributed conversational system using
`
`25
`
`conversational markup language according to an embodiment of the present invention;
`
`Fig. 8 is a diagram of another exemplary distributed conversational system using
`
`conversational markup language according to another embodiment of the present invention;
`
`Fig. 9 is a diagram of a yet another distributed conversational information accessing
`
`system using conversational markup language according to an embodiment of the present
`
`30
`
`invention; and
`
`-8-
`
`
`
`WO 00/21232
`
`PCT /US99/23008
`
`Fig. 10 is a diagram of another exemplary distributed conversational information
`
`accessing system using conversational markup language according to an embodiment of the
`
`present invention.
`
`DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
`
`5
`
`The present invention is directed to a conversational browsing system and CML
`
`( conversational markup language) for building a conversational browsing system using a set of
`
`interlinked CML pages. This conversational system is conceptually analogous to building
`
`conventional GUI browser applications using a set of interlinked pages written using HTML
`
`(hypertext markup language). Moreover, just as HTML provides a set of mechanisms for
`
`10
`
`translating GUI actions into application actions such as visiting other pages or communicating
`
`with a server, the conversational browser and CML are used for translating spoken inputs into
`
`similar application actions. A CML page describes the conversational UI to be interpreted and
`
`presented to the user via the conversational browser. Preferably, CML pages are defined by tags
`
`which are based on the current XML ( extensible markup language) application ( as described in
`
`15
`
`detail below).
`
`It is to be understood that the conversational systems and methods described herein may
`
`be implemented in various forms of hardware, software, firmware, special purpose processors, or
`
`a combination thereof. In particular, the conversational browser is preferably implemented as
`
`an application comprising program instructions that are tangibly embodied on a program storage
`
`20
`
`device (e.g., magnetic floppy disk, RAM, ROM, CD ROM and/or Flash memory) and executable
`
`by any device or machine comprising suitable architecture such as personal computers and
`
`pervasive computing devices such as PDAs and smart phones.
`
`It is to be further understood that, because some of the constituent components of the
`
`conversational browser and other system components depicted in the accompanying Figures are
`
`25
`
`preferably implemented in software, the actual connections between such components may differ
`
`depending upon the manner in which the present invention is programmed. Given the teachings
`
`herein, one of ordinary skill in the related art will be able to contemplate these and similar
`
`implementations or configurations of the present invention.
`
`-9-
`
`
`
`WO 00/21232
`
`PCT /US99/23008
`
`Conversational Browser Architecture
`
`Referring now to Fig. 1, a block diagram illustrates a conversational browser system
`
`according to a preferred embodiment of the present invention. In general, a conversational
`
`browsing system 10 allows a user to access legacy information services and transactions through
`
`5
`
`voice input ( either uniquely or in conjunction with other modalities such as DTMF, keyboard,
`
`mouse, etc) using a standard networking protocol such as HTTP. In addition, it is to be
`
`understood that the HTTP protocol is a preferred embodiment of the present invention but other
`
`similar protocols can be used advantageously. For example, this can be deployed on top of any
`
`protocol that such as TCP/IP, WAP (Wireless Application Protocol), GSM, VoIP, etc., or any
`
`10
`
`other protocol that supports IP (and therefore provide TCP/IP or similar features. Even more
`
`generally, if TCP/IP is not available we can implement another protocol offering features
`
`similar to TCP/IP or at least performing network and transport functions (the present invention is
`
`not dependent on the transport and network layer) ..
`
`In Fig. 1, a conversational browsing system 10 according to one embodiment of the
`
`15
`
`present invention comprises a conversational browser 11 ( conversational application) which
`
`executes on top of a CVM ( conversational virtual machine) system 15. The conversational
`
`browser 11 comprises a transcoder module 11 which, in general, transcodes conventional
`
`(legacy) structured document formats such as HTML or DB2 into an intermediate document, or
`
`CML ( conversational markup language) document in accordance with prespecified transcoding
`
`20
`
`rules (as discussed below). A CML document describes the conversational UI of the legacy
`
`information format to be presented to the user. More specifically, a CML document comprises
`
`meta- information which is processed by a CML parser/ processor 14 to present, for example,
`
`HTML-based information to a user as synthesized audio messages. It is to be understood that
`
`various embodiments for a CML document are contemplated for implementation with the
`
`25
`
`present invention. In a preferred embodiment described in detail below, a CML document is
`
`defined by tags which are based on XML ( extensible markup language). It is to be understood,
`
`however, that any declarative method for implementing CML may be employed. XML is
`
`currently preferred because of its simplicity, power and current popularity.
`
`The conversational browser 11 further comprises a command/ request processor 12 (a
`
`30
`
`command and control interface) which converts user command (multi- modal) inputs such as
`
`speech commands, DTMF signals, and keyboard input into navigation requests such as HTTP
`-10-
`
`
`
`WO 00/21232
`
`PCT/US99/23008
`
`requests. It is to be understood that in a pure speech conversational browser, the only input is
`
`speech. However, the conversational browser 11 can be configured for multi-modal input.
`
`When certain conversational functions or services are needed, the conversational browser
`
`11 will make API calls to the CVM 15 requesting such services (as described below). For
`
`5
`
`instance, when interpreting a CML document (via the CML parser/processor 14), the
`
`conversational browser 11 may hook to a TTS (text-to-speech syntheses) engine 26 (via the
`
`CVM 15) to provide synthesized speech output to the user. In addition, when speech commands
`
`or natural language queries (e.g., navigation requests) are input, the conversational browser 11
`
`may hook to a speech recognition engine 24 and NLU (natural language understanding) engine
`
`10
`
`25 to process such input commands, thereby allowing the command/request processor to
`
`generate, e.g., the appropriate HTTP requests. The CVM system 15 is a shell that can run on top
`
`of any conventional OS (operating system) or RTOS (real-time operating system). A detailed
`
`discussion of the architecture and operation of the CVM system 15 is provided in the patent
`
`application IBM Docket No. YO999-111P, filed concurrently herewith, entitled
`
`15
`
`"Conversational Computing Via Conversational Virtual Machine," which is commonly