throbber
(12) United States Patent
`(10) Patent N0.:
`US 6,801,604 132
`
`Macs et al.
`(45) Date of Patent:
`Oct. 5, 2004
`
`USOO6801604B2
`
`(54) UNIVERSAL IP-BASED AND SCALABLE
`ARCHITECTURES ACROSS
`CONVERSATIONAL APPLICATIONS USING
`WEB SERVICES FOR SPEECH AND AUDIO
`PROCESSING RESOURCES
`
`(75)
`
`Inventors: Stephane H. Maes, Danbury, CT (US);
`David M. Lubensky, Brookfield, CT
`(US); Andrzej Sakrajda, White Plains,
`NY (US)
`
`(73) Assignee:
`
`International Business Machines
`Corporation, Armonk, NY (US)
`
`( * ) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`(21) Appl. No.: 10/183,125
`
`(22)
`
`Filed:
`
`Jun. 25, 2002
`
`(65)
`
`Prior Publication Data
`
`US 2003/0088421 A1 May 8, 2003
`
`(60)
`
`Related US. Application Data
`Provisional application No. 60/300,755, filed on Jun. 25,
`2001.
`
`Int. Cl.7 ................................................. H04M 1/64
`(51)
`(52) US. Cl.
`................................ 379/88.17; 379/88.16;
`704/2701, 709/203; 709/231
`(58) Field of Search ........................... 379/88.01—88.04,
`379/88.16, 88.17, 88.23—88.25; 704/270—275,
`717/114, 116; 709/228—231, 201—203, 249,
`250
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`......................... 709/228
`2002/0184373 A1 * 12/2002 Maes
`2002/0194388 A1 * 12/2002 Boloker et al.
`...... 709/310
`
`............ 709/318
`2003/0005174 A1 *
`1/2003 Coffman et al.
`2003/0088421 A1 *
`5/2003 Maes et al.
`.............. 704/2701
`
`* cited by examiner
`
`Primary Examiner—Roland Foster
`(74) Attorney, Agent, or Firm—F. Chau & Associates, LLC
`
`(57)
`
`ABSTRACT
`
`Systems and methods for conversational computing and, in
`particular, to systems and methods for building distributed
`conversational applications using a Web services-based
`model wherein speech engines (e.g., speech recognition) and
`audio I/O systems are programmable services that can be
`asynchronously programmed by an application using a
`standard, extensible SERCP (speech engine remote control
`protocol), to thereby provide scalable and flexible IP-based
`architectures that enable deployment of the same application
`or application development environment across a wide range
`of voice processing platforms and networks/gateways (e.g.,
`PSTN (public switched telephone network), Wireless,
`Internet, and VoIP (voice over IP)). Systems and methods are
`further provided for dynamically allocating, assigning, con-
`figuring and controlling speech resources such as speech
`engines, speech pre/post processing systems, audio
`subsystems, and exchanges between speech engines using
`SERCP in a web service-based framework.
`
`22 Claims, 17 Drawing Sheets
`
`14
`
`
`
`
`
`Application
`
`Task Manager
`
`Router and
`
`Load Manager
`
`
`
`
`Voice Response System
`(e.g. DT/SOOO)
`
`
`
`
`
`
`
`
`
`
`TELESIGN EX1003
`
`Page 1
`
`TELESIGN EX1003
`Page 1
`
`

`

`US. Patent
`
`aO
`
`
`
`W.K225,:
`
`E
`
`2
`
`co=m23q<
`
`
`
`
`
`5923/.xmm...
`
`
`
`
`S889.5.m.3memcms.30..
`
`
`
`
`
`
`
`
`E295mmcoqmmmmo_o>cam550m
`
`1wh
`
`U
`
`S
`
`8
`
`2B4
`
`U2aea.22
`
`momwana
`6.,\
`
`
`
`06.252mm
`
`M“N
`
`TELESIGN EX1003
`
`Page 2
`
`TELESIGN EX1003
`Page 2
`
`
`
`
`

`

`US. Patent
`
`Oct. 5, 2004
`
`Sheet 2 0f 17
`
`US 6,801,604 B2
`
`
`
`25
`
`
`
`
`
`Each server has unique
`address
`
`
`
`
`21
`
`13a
`
`<profile>.<service>.<ins_tance>
`.<host:port, Ilstener audio port>
`
`
`
`
`26
`
`FIG. 2
`
`TELESIGN EX1003
`
`Page 3
`
`TELESIGN EX1003
`Page 3
`
`

`

`US. Patent
`
`0
`
`m2
`
`71f03tee%
`
`2B406,108,6SU
`
`
`
`99.05E9:859$596$.A
`
`
`
`meEmunewmmEmcm22:50A
`
`
`
`4uwmmcmExmeA2..
`
`5:m.ana.nL®m>>ohmMW
`m<>>”NW86NH
`
`Ucmxomm
`
`
`
`$2505.__>_Xoo_o>
`
`5—D“.
`
`an.wE
`
`am.wE
`
`TELESIGN EX1003
`
`Page 4
`
`TELESIGN EX1003
`Page 4
`
`
`
`
`

`

`US. Patent
`
`Oct. 5, 2004
`
`Sheet 4 0f 17
`
`US 6,801,604 132
`
`ucwxomnucmm<>>
`
`ucmxomnUcmw<>>
`
`$8.309.2mefimeooWm.....................
`
`
`
`
`whys—OECmomwmomF—FAfiWBOLm>
`
`
`mm.
`
`$9505m.__>_Xmo_o>Hm”
`
`
`
`
`
`.__>_xcozfiocmmflxmz
`
`_>_n_n_
`
`om.Em
`
`_mUoE-E:_>_
`
`um.wE
`
`TELESIGN EX1003
`
`Page 5
`
`TELESIGN EX1003
`Page 5
`
`
`
`
`
`

`

`US. Patent
`
`whS
`
`SU
`
`2B4
`
`M£ngm.m,;833325_55.8%a.r......528E“O._
`SEESmm
`
`“sign:2;
`
`:83
`
`Em528x
`
`
`
`pm8029/"”raga:.=‘85;9.3%“
`5....................ASE”;5&3235
`2933U................A/n..252::BE2.
`
`
`
`
`m.m.VGER
`238m25053“a£53m=o__s__&<
`
`E
`
`TELESIGN EX1003
`
`Page 6
`
`v»
`
`2.8m
`
`mo_o>
`
`59505
`
`.__>_Xmo_o>
`
`
`
`Lomzém55335500
`
`TELESIGN EX1003
`Page 6
`
`
`
`
`
`
`

`

`US. Patent
`
`40025,LCO
`
`71f06teehS
`
`08,6SU
`
`2nluBaMa6o1warmT
`
`
`
`
`
`
`
`
`
`
`
`m.oo_fio__man
`
`558.32
`
`
`
`mm
`
`0%.MaEPN
`M7
`
`TELESIGN EX1003
`Page 7
`
`
`

`

`US. Patent
`
`Oct. 5, 2004
`
`Sheet 7 0f 17
`
`US 6,801,604 132
`
`
`
`
`
`
`
`
`
`
`
`2.2£2
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`TELESIGN EX1003
`
`Page 8
`
`TELESIGN EX1003
`Page 8
`
`

`

`US. Patent
`
`Oct. 5, 2004
`
`Sheet 8 0f 17
`
`US 6,801,604 B2
`
`
`
`
`
`ASR,TTS,SPID,NLU
`SeechServer
`
`
`
` Dialogic,
`NMS
`
`
`PCwith
`TeI&Aud Control
`
`
`
`
`
`Controller
`
`TELESIGN EX1003
`
`Page 9
`
`
`
`h :
`
`5I-I
`
`82
`
`h
`
`DT
`
`Data
`
`TELESIGN EX1003
`Page 9
`
`

`

`US. Patent
`
`Oct. 5, 2004
`
`Sheet 9 0f 17
`
`US 6,801,604 B2
`
`Receive Call
`
`Determine Call ID
`
`Send Application Instance Request
`
`Assign Application to Call
`
`90
`
`91
`
`92
`
`93
`
`94
`
`95
`
`Will
`
`
`Load Application Presentation Layer
`
`Provide Application with Audio l/O Port
`
`Application Sends Request to Accept Call
`
`Application Generates Control Message
`
`Requesting Audio Processing Services
`
`Task Manager Sends Control Message to
`Router/Load Manager Requesting Services
`
`Router/Load Manager Allocates/Assignes
`Appropriate Resources (speech engines)
`
`
`
`
`
`96
`
`97
`
`98
`
`99
`
`
`
`Task Manager Transmits Control Messages
`to Assigned Speech Engine(s) to Program
`Engine(s) for Processing Incoming
`all
`
`
`
`
`Task Manager Receives Processing Results
`
`FIG 8
`
`and Sends Results to Application
`
`100
`
`101
`
`TELESIGN EX1003
`
`Page 10
`
`TELESIGN EX1003
`Page 10
`
`

`

`7__.x‘_.............E0630H_536$\,0_mo_o>\m_\trllllk
`
`.626na\Eofcwhmtfi52mm
`
`US. Patent
`
`*mo285063
`
`
`
`53%:xmfl.
`
`
`
`a:m=,\$38.29m-m:_go£s<cosmo=QQ<
`
` mo_EmEom._z\.=>_X_5,33¢k3-1E55,_LmquEoEfi0:$596_mBm_o::oo
`
`i<Ow(_
`
`wEm>mcam9.3%:"n:.m/m.
`mmEmcw_/.ME0:
`
`./e./mt“_/./.S_/./_./._.
`
`6_S_
`
`U_:2:_05m...\_9.3m\_\20593350
`
`2BM
`
`6,_IIIIIIIIIIII.m._.
`
`TELESIGN EX1003
`
`Page 11
`
`TELESIGN EX1003
`Page 11
`
`
`

`

`US. Patent
`
`Oct. 5, 2004
`
`Sheet 11 0f 17
`
`US 6,801,604 B2
`
`
`
`
`
` Each server has unique
`ddr ss
`8
`e
`
`130,
`
`
`
`
`
`
`52 Audio Bus
`
`115
`
`FIG. 10
`
`TELESIGN EX1003
`
`Page 12
`
`TELESIGN EX1003
`Page 12
`
`

`

`US. Patent
`
`Oct. 5, 2004
`
`Sheet 12 0f 17
`
`US 6,801,604 132
`
`296mm
`
`
`
`mm.>>53>COBEOQE
`
`
`
`322:5mmgg‘s
`
`5523$83.
`
` $328
`
`
`22%aneozawnesmmoowkbx855552
`
`
`SE23?583%:32mm__cm
`
`52mm8:332;$0.;ngE8
`
`wmmflwx
`85>25
`
`em“
`
`2.2m
`
`25m
`
`TELESIGN EX1003
`
`Page 13
`
`TELESIGN EX1003
`Page 13
`
`
`
`
`
`

`

`US. Patent
`
`Oct. 5, 2004
`
`Sheet 13 0f 17
`
`US 6,801,604 B2
`
`132
`
`Application
`Environment /
`
`Task Manager
`
`SERCP
`
`Unspecified
`
`additional protocol
`
`SERCP
`
`(optional)
`
`TEL control
`I30
`
`
`RT-DSR: RTP Stream
`
`
`Conversational
`
`
`RT-DSR: RTCP
`
`RT-DSR: Call Control
`
`
`Engines
`
`
`
`131
`
`FIG. 12
`
` Conventional
`
`
`
`
`Web service
`compliant interface
`and behavior (e.g. WSDL
`and SOAP, etc...)
`
`Web Services
`
`
`
`.
`Web Servrce
`with simplified
`
`oroptimizedAPl
`or protocol
`
`.
`_
`Slmpllfied
`PFOtOCOI (XML RPC,
`RPC, limited API,
`simpler messaging etc..)
`
`
`150
`
`151
`
`’52
`
`FIG. 17
`
`TELESIGN EX1003
`
`Page 14
`
`TELESIGN EX1003
`Page 14
`
`

`

`US. Patent
`
`Oct. 5, 2004
`
`Sheet 14 0f 17
`
`US 6,801,604 132
`
`«508
`
`5:2
`
`ofiNEoEoiw
`
`_822n_
`
`
`
`85%.Emmam
`
`380$951
`
`8:352
`
`on
`
`{0282
`
`0m
`
`mmo
`
`_8o§n_
`
`28%mm
`
`85:8
`
`8%
`
`83.
`
`8%
`
`9:9558%
`
`2:833%
`
`cozméoE
`
`$80
`
`LawEwmmcmz
`
`388
`
`23
`
`mg.wE
`
`TELESIGN EX1003
`
`Page 15
`
`TELESIGN EX1003
`Page 15
`
`
`
`

`

`US. Patent
`
`Oct. 5, 2004
`
`Sheet 15 0f 17
`
`US 6,801,604 132
`
`140
`
`
`Consumer
`
`Manager)
`
`(Client Application, Task Manager, Load
`
`XML Service Request (SERCP)
`
`XML Service Request (SERCP)
`
`
`
`Business Facade
`
`'—
`
`
`Busmess Logic
`
`146
`
`
`
`
`
`
`
`Telephony/Audio l/O Service
`
`
`Speech Engine Service
`
`
`
`
`141
`
`142
`
`FIG. 14
`
`TELESIGN EX1003
`
`Page 16
`
`TELESIGN EX1003
`Page 16
`
`

`

`US. Patent
`
`Oct. 5, 2004
`
`Sheet 16 0f 17
`
`US 6,801,604 B2
`
`
`
`Upstream Codec Negotiation
`
`
`
`
`Downstream Codec Negotiation
`
`Establrsh connection
`
`Upstream Codec Negotratron
`
`Downstream Codec Negoflatrons
`
`
`AuroroWI_7_
`RTP + RTCP + RSVP + SERCP
`RTP"+"RTCP+mRSVP+SERCP"'
`
`
`
`GSM('paonaptype-SIM
`GSM(payloadtypeRf":
`Barge-In Detecttron-"Framexx"
`
`
`
`Request for new upstream
`codooafter framexxx.
`
`Upstream Codec Negotiation
`
`etc...
`
`etc...
`
`Request for new upstream
`__oopooafterframe xxx
`Upstream Code-E Negotlatlon ........................
`
`
`
`..........
`
`...
`
`FIG. 15
`
`TELESIGN EX1003
`
`Page 17
`
`TELESIGN EX1003
`Page 17
`
`

`

`US. Patent
`
`Oct. 5, 2004
`
`Sheet 17 0f 17
`
`US 6,801,604 B2
`
`with negotiated codecs
`
`Established DSR connection
`
`Engrne Capability Determrnatrons
`
`Engine Capability Determinations
`
`
`
`Engine reservation (Service combination)
`
`Engine reservation(Serwcecomlrinat'ionlm"""""'"”"""1‘-- .
`
`
`etc...
`
`
`Remote control Commands including parameters
`and data file settings &associated CDP & frame
`
`Remote control Commandsrncludirrgparamelersfl
`and data file settings & associated CDP & frame
`
`Results, events, resulting downstream RTP
`
`Results, events. resulting downstream RTP
`
`etc...
`
`FIG. 16
`
`TELESIGN EX1003
`
`Page 18
`
`TELESIGN EX1003
`Page 18
`
`

`

`US 6,801,604 B2
`
`1
`UNIVERSAL IP-BASED AND SCALABLE
`ARCHITECTURES ACROSS
`CONVERSATIONAL APPLICATIONS USING
`WEB SERVICES FOR SPEECH AND AUDIO
`PROCESSING RESOURCES
`
`CROSS REFERENCE TO RELATED
`APPLICATION
`
`This application claims priority to US. Provisional Appli-
`cation Ser. No. 60/300,755, filed on Jun. 25, 2001, which is
`incorporated herein by reference.
`
`TECHNICAL FIELD
`
`The present invention relates generally to systems and
`methods for conversational computing and, in particular, to
`systems and methods for building distributed conversational
`applications using a Web services-based model wherein
`speech engines (e.g., speech recognition) and audio I/O
`systems are implemented as programmable services that can
`be asynchronously programmed by an application using a
`standard, extensible SERCP (speech engine remote control
`protocol), to thereby provide scalable and flexible IP-based
`architectures that enable deployment of the same application
`or application development environment across a wide range
`of voice processing platforms and networks/gateways (e.g.,
`PSTN (public switched telephone network), Wireless,
`Internet, and VoIP (voice over IP)). The invention is further
`directed to systems and methods for dynamically allocating,
`assigning, configuring and controlling speech resources such
`as speech engines, speech pre/post processing systems,
`audio subsystems, and exchanges between speech engines
`using SERCP in a web service-based framework.
`BACKGROUND
`
`Telephony generally refers to any telecommunications
`system involving the transmission of speech information in
`either wired or wireless environments. Telephony applica-
`tions include, for example, IP telephony and Interactive
`Voice Response (IVR), and other voice processing plat-
`forms. IP telephony allows voice, data and video collabo-
`ration through existing IP telephony-based networks such as
`LANs, WANs and the Internet as well as IMS (IP multime-
`dia services) over wireless networks. Previously, separate
`networks were required to handle traditional voice, data and
`video traffic, which limited their usefulness. Voice and data
`connections where typically not available simultaneously.
`Each required separate transport protocols/mechanisms and
`infrastructures, which made them costly to install, maintain
`and reconfigure and unable to interoperate. Currently, vari-
`ous applications and APIs are commercially available that
`that enable convergence of PSTN telephony and telephony
`over Internet Protocol networks and 2.5G/3G wireless net-
`works. There is a convergence among fixed, mobile and
`nomadic wireless networks as well as with the Internet and
`
`voice networks, as exemplified by 2.5G, 3G and 4G.
`IVR is a technology that allows a telephone-based user to
`input or receive information remotely to or from a database.
`Currently,
`there is widespread use of IVR services for
`telephony access to information and transactions. An IVR
`system typically (but not exclusively) uses spoken directed
`dialog and generally operates as follows. Auser will dial into
`an IVR system and then listen to an audio prompts that
`provide choices for accessing certain menus and particular
`information. Each choice is either assigned to one number
`on the phone keypad or associated with a word to be uttered
`by the user (in voice enabled IVRs) and the user will make
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`a desired selection by pushing the appropriate button or
`uttering the proper word.
`By way of example, a typical banking ATM transaction
`allows a customer to perform money transfers between
`savings, checking and credit card accounts, check account
`balances using IVR over the telephone, wherein information
`is presented via audio menus. With the IVR application, a
`menu can be played to the user over the telephone, whereby
`the menu messages are followed by the number or button the
`user should press to select the desired option:
`a. “for instant account information, press one,”
`b. “for transfer and money payment, press two,”
`c. “for fund information, press three,”
`d. “for check information, press four,”
`e. “for stock quotes, press five,”
`f. “for help, press seven,” etc.
`To continue, the user may be prompted to provide iden-
`tification information. Over the telephone, the IVR system
`may playback an audio prompt requesting the user to enter
`his/her account number (via DTMF or speech), and the
`information is received from the user by processing the
`DTMF signaling or recognizing the speech. The user may
`then be prompted to input his/her SSN and the reply is
`processed in a similar way. When the processing is
`complete, the information is sent to a server, wherein the
`account information is accessed, formatted to audio replay,
`and then played back to the user over the telephone.
`An IVR system may implement speech recognition in lieu
`of, or in addition to, DTMF keys. Conventional IVR appli-
`cations use specialized telephony hardware and IVR appli-
`cations use different software layers for accessing legacy
`database servers. These layers must be specifically designed
`for each application. Typically, IVR application developers
`offer
`their own proprietary speech engines and APIs
`(application program interface). The dialog development
`requires complex scripting and expert programmers and
`these proprietary applications are typically not portable from
`vendor to vendor (i.e., each application is painstakingly
`crafted and designed for specific business logic). Conven-
`tional IVR applications are typically written in specialized
`script languages that are offered by manufacturers in various
`incarnations and for different hardware platforms. The
`development and maintenance of such IVR applications
`requires qualified staff. Thus, current
`telephony systems
`typically do not provide interoperability, i.e., the ability of
`software and hardware on multiple machines from multiple
`vendors to communicate meaningfully.
`VoiceXML is a markup language that has been designed
`to facilitate the creation of speech applications such as IVR
`applications. Compared to conventional IVR programming
`frameworks that employ proprietary scripts and program-
`ming languages over proprietary/closed platforms,
`the
`VoiceXML standard provides a declarative programming
`framework based on XML (eXtensible Markup Language)
`and ECMAScript (see, e.g., the W3C XML specifications
`(www.w3.org/XML)
`and VoiceXML forum
`(www.voicexml.org)). VoiceXML is designed to run on
`web-like infrastructures of web servers and web application
`servers (i.e. the Voice browser). VoiceXML allows informa-
`tion to be accessed by voice through a regular phone or a
`mobile phone whenever it
`is difficult or not optimal
`to
`interact through a wireless GUI micro-browser.
`to
`More importantly, VoiceXML is a key component
`building multi-modal systems such as multi-modal and
`conversational user interfaces or mobile multi-modal brows-
`
`ers. Multi-modal solutions exploit the fact
`
`that different
`
`TELESIGN EX1003
`
`Page 19
`
`TELESIGN EX1003
`Page 19
`
`

`

`US 6,801,604 B2
`
`3
`interaction modes are more efficient for different user inter-
`actions. For example, depending on the interaction, talking
`may be easier than typing, whereas reading may be faster
`than listening. Multi-modal interfaces combine the use of
`multiple interaction modes, such as voice, keypad and
`display to improve the user interface to e-business.
`Advantageously, multi-modal browsers can rely on
`VoiceXML browsers and authoring to describe and render
`the voice interface.
`There are still key inhibitors to the deployment of com-
`pelling multi-modal applications. Most arise out of the
`current
`infrastructure and device platforms.
`Indeed,
`the
`current networking infrastructure is not configured for pro-
`viding seamless, multi-modal access to information. Indeed,
`although a plethora of information can be accessed from
`servers over a communications network using an access
`device (e.g., personal information and corporate information
`available on private networks and public information acces-
`sible via a global computer network such as the Internet), the
`availability of such information may be limited by the
`modality of the client/access device or the platform-specific
`software applications with which the user is interacting to
`obtain such information. For instance, current wireless net-
`work infrastructure and handsets do not provide simulta-
`neous voice and data access. Middleware, interfaces and
`protocols are needed to synchronize and manage the differ-
`ent channels. In light of the ubiquity of IP-based networks
`such as the Internet, and the availability of a plethora a
`services and resources on the Internet, the advantages of
`open and interoperable telephony systems are particularly
`compelling for voice processing applications such as IP
`telephony systems and IVR.
`Another hurdle is that development of multi-modal/
`conversational applications using current
`technologies
`requires not only knowledge of the goal of the application
`and how the interaction with the users should be defined, but
`a wide variety of other interfaces and modules external to the
`application at hand, such as (i) connection to input and
`output devices (telephone interfaces, microphones, web
`browsers, palm pilot display); (ii) connection to variety of
`engines (speech recognition, natural
`language
`understanding, speech synthesis and possibly language
`generation); (iii) resource and network management; and
`(iv) synchronization between various modalities for multi-
`modal or conversational applications.
`Accordingly, there is strong desire for development of
`distributed conversational systems having scalable and flex-
`ible architectures, which enable implementation of such
`systems over a wide range of application environments and
`voice processing platforms.
`SUMMARY OF THE INVENTION
`
`The present invention relates generally to systems and
`methods for conversational computing and, in particular, to
`systems and methods for building distributed conversational
`applications using a Web services-based model wherein
`speech engines (e.g., speech recognition) and audio I/O
`systems are implemented as programmable services that can
`be asynchronously programmed by an application using a
`standard, extensible SERCP (speech engine remote control
`protocol), to thereby provide scalable and flexible IP-based
`architectures that enable deployment of the same application
`or application development environment across a wide range
`of voice processing platforms and networks/gateways (e.g.,
`PSTN (public switched telephone network), Wireless,
`Internet, and VoIP (voice over IP)).
`The invention is further directed to systems and methods
`for dynamically allocating, assigning, configuring and con-
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`trolling speech resources such as speech engines, speech
`pre/post processing systems, audio subsystems, and
`exchanges between speech engines using SERCP in a web
`service-based framework.
`
`In one preferred embodiment, a SERCP framework,
`which is used for speech engine remote control and network
`and system load management,
`is implemented using an
`XML-based web service framework wherein speech engines
`and resources comprise programmable services, wherein (i)
`XML is used to represent data (and XML Schemas to
`describe data types); (ii) an extensible messaging format is
`based on SOAP;
`(iii) an extensible service description
`language is based on WSDL, or an extension thereof, as a
`mechanism to describe the commands/interface supported
`by a given service;
`(iv) UDDI
`(Universal Description,
`Discovery, and Integration) is used to advertise and locate
`the service; and wherein (v) WSFL (Web Service Flow
`Language) is used to provide a generic mechanism from
`combining speech processing services through flow compo-
`sition.
`
`A conversational system according to an embodiment of
`the present invention assumes an application environment in
`which a conversational application comprises a collection of
`audio processing engines (e.g., audio I/O system, speech
`processing engines, etc.) that are dynamically associated
`with an application, wherein the exchange of audio between
`the audio processing engines is decoupled from control and
`application level exchanges and wherein the application
`generates control messages that configure and control the
`audio processing engines in a manner that renders the
`exchange of control messages independent of the application
`model and location of the engines. The speech processing
`engines can be dynamically allocated to the application on
`either a call, session, utterance or persistent basis.
`Preferably, the audio processing engines comprise web
`services that are described and accessed using WSDL (Web
`Services Description Language), or an extension thereof.
`In yet another aspect, a conversational system comprises
`a task manager, which is used to abstract from the
`application, the discovery of the audio processing engines
`and remote control of the engines.
`The systems and methods described herein may be used
`in various frameworks. One framework comprises a
`terminal-based application (located on the client or local to
`the audio subsystem) that remotely controls speech engine
`resources. One example of a terminal based application
`includes a wireless handset-based application that uses
`remote speech engines, e. g., a multimodal application in “fat
`client configuration” with a voice browser embedded on the
`client that uses remote speech engines. Another example of
`a terminal-based application comprises a voice application
`that operates on a client having local embedded engines that
`are used for some speech processing tasks, and wherein the
`voice application uses remote speech engines when (i) the
`task is too complex for the local engine, (ii) the task requires
`a specialized engine,
`(iii)
`it would not be possible to
`download speech data files (grammars, etc .
`.
`. ) without
`introducing significant delays, or (iv) when for IP, security
`or privacy reasons, it would not be appropriate to download
`such data files on the client or to perform the processing on
`the client or to send results from the client.
`
`Another usage framework for the invention is to enable an
`application located in a network to remotely control different
`speech engines located in the network. For example, the
`invention may be used to (i) distribute the processing and
`perform load balancing, (ii) allow the use of engines opti-
`
`TELESIGN EX1003
`
`Page20
`
`TELESIGN EX1003
`Page 20
`
`

`

`US 6,801,604 B2
`
`5
`
`mized for specific tasks, and/or to (iii) enable access and
`control of third party services specialized in providing
`speech engine capabilities.
`These and other aspects, features, and advantages of the
`present invention will become apparent from the following
`detailed description of the preferred embodiments, which is
`to be read in connection with the accompanying drawings.
`
`DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a block diagram of a speech processing system
`according to an embodiment of the present invention.
`FIG. 2 is a block diagram of a speech processing system
`according to an embodiment of the invention.
`FIGS. 3a—3d are diagrams illustrating application frame-
`works that can be implemented in a speech processing
`system according to the invention.
`FIG. 4 is a block diagram of a speech processing system
`according to an embodiment of the invention, which uses a
`conversational browser.
`
`FIG. 5 is a block diagram of a speech processing system
`according to an embodiment of the invention.
`FIG. 6 is a block diagram of a speech processing system
`according to an embodiment of the invention.
`FIG. 7 is a block diagram of a speech processing system
`according to an embodiment of the invention.
`FIG. 8 is a flow diagram of a method for processing a call
`according to one aspect of the invention.
`FIG. 9 is a block diagram of a speech processing system
`according to an embodiment of the invention.
`FIG. 10 is a block diagram of a speech processing system
`according to an embodiment of the invention.
`FIG. 11 is a block diagram of a speech processing system
`according to an embodiment of the invention.
`FIG. 12 is a block diagram of a speech processing system
`according to an embodiment of the invention.
`FIG. 13 is a block diagram illustrating a DSR system that
`may be implemented in a speech processing system accord-
`ing to an embodiment of the invention.
`FIG. 14 is a block diagram of a web service system
`according to an embodiment of the invention.
`FIG. 15 is a diagram illustrating client/server communi-
`cation using a DSR protocol stack according to an embodi-
`ment of the present invention.
`FIG. 16 is a diagram illustrating client/server communi-
`cation of SERCP (speech engine remote control protocol)
`data exchanges according to an embodiment of the present
`invention.
`
`FIG. 17 is a block diagram of a web service system
`according to another embodiment of the invention.
`DETAILED DESCRIPTION OF PREFERRED
`EMBODIMENTS
`
`The present invention is directed to systems and method
`for implementing universal IP-based and scalable conver-
`sational applications and platforms that are interoperable
`across a plurality of conversational applications, program-
`ming or execution models and systems. The term “conver-
`sational” and “conversational computing” as used herein
`refers to seamless, multi-modal
`(or voice only) dialog
`(information exchanges) between user and machine and
`between devices or platforms of varying modalities (I/O
`capabilities), regardless of the I/O capabilities of the access
`device/channel, preferably, using open, interoperable com-
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`munication protocols and standards, as well as a conversa-
`tional programming model (e.g., conversational gesture-
`based markup language) that separates the application data
`content (tier 3) and business logic (tier 2) from the user
`interaction and data model that the user manipulates. Con-
`versational computing enables humans and machines to
`carry on a dialog as natural as human-to-human dialog.
`Further, the term “conversational application” refers to an
`application that supports multi-modal, free flow interactions
`(e.g., mixed initiative dialogs) within the application and
`across independently developed applications, preferably
`using short term and long term context (including previous
`input and output) to disambiguate and understand the user’s
`intention. Preferably, conversational applications utilize
`NLU (natural language understanding). Multi-modal inter-
`active dialog comprises modalities such as speech (e.g.,
`authored in VoiceXML), visual
`(GUI)
`(e.g., HTML
`(hypertext markup language)), constrained GUI (e.g., WML
`(wireless markup language), CHTML (compact HTML),
`HDML (handheld device markup language)), and a combi-
`nation of such modalities (e.g., speech and GUI). Further,
`the invention supports voice only (mono-modal) machine
`driven dialogs and any level of dialog capability in between
`voice only and free flow multimodal capabilities. As
`explained below, the invention provides a universal archi-
`tecture that can handle all these types of capabilities and
`deployments.
`Conversational applications and platforms according to
`the present invention preferably comprise a scalable and
`flexible framework that enables deployment of various types
`of applications and application development environment to
`provide voice access using various voice processing plat-
`forms such as telephony cards and IVR systems over
`networks/mechanisms such as PSTN, wireless, Internet, and
`VoIP networks/gateways, etc. A conversational system
`according to the invention is preferably implemented in a
`distributed, multi-tier client/server environment, which
`decouples the conversational applications from distributed
`speech engines and the telephony/audio I/O components. A
`conversational platform according to the invention is pref-
`erably interoperable with the existing WEB infrastructure to
`enable delivery of voice applications over telephony, for
`example, taking advantage of the ubiquity of applications
`and resources available over the Internet. For example,
`preferred telephony applications and systems according to
`the invention enable business enterprises and service pro-
`viders to give callers access to their business applications
`and data, anytime, anyplace using any telephone or voice
`access device.
`
`Referring now to FIG. 1, a block diagram illustrates a
`conversational system 10 according to an embodiment of the
`invention. The system 10 comprises a client voice response
`system 11 that executes on a host machine, which is based,
`for example, on a AIX, UNIX, or DOS/Windows operating
`system platform. The client application 11 provides the
`connectivity to the telephone line (analog or digital), other
`voice networks (such as IMS, VoIP, etc., wherein the appli-
`cation 11 may be considered as a gateway to the network (or
`a media processing entity), and other voice processing
`services (as explained below). Incoming calls/connections
`are answered by an appropriate client application running on
`the host machine. More specifically, the host machine can be
`connected to a PSTN, VoIP network, wireless network, etc.,
`and accessible by a user over an analog telephone line or an
`IDSN (Integrated Services Digital Network)
`line,
`for
`example. In addition,
`the host client machine 11 can be
`connected to a PBX (private branch exchange) system,
`
`TELESIGN EX1003
`
`Page 21
`
`TELESIGN EX1003
`Page 21
`
`

`

`US 6,801,604 B2
`
`7
`central office or automatic call distribution center, a VoIP
`gateway, a wireless support node gateway, etc. The host
`comprises the appropriate software and APIs that allows the
`client application 11 to interface to various telephone sys-
`tems and video phones systems, such as PSTN, digital ISDN
`and PBX access, VoIP gateway the voice services on the
`servers. The system 10 is preferably operable in various
`connectivity environments including, for example, T1, E1,
`ISDN, CAS, SS7, VoIP, wireless, etc.
`The voice response system 11 (or gateway) comprises
`client enabling code that operates with one or more appli-
`cation servers and conversational engine servers over an IP
`(Internet Protocol)-based network 13. The IP network 13
`may comprise, a LAN, WAN, or a global communication
`network such as the Internet or wireless network (IMS). In
`one exemplary embodiment, the host 11 machine comprises
`an IBM RS/6000 computer that comprises Direct Talk
`(DT/6000)®, a commercially available platform for voice
`processing applications. Direct Talk® is a versatile voice
`processing platform that provides expanded functionality to
`IVR applications. DirectTalk enables the development and
`operation of automated customer service solutions for vari-
`ous enterprises and service providers. Clients, customers,
`employees and other users can interact directly with busi-
`ness applications using telephones connected via public or
`private networks. DirectTalk supports scalable solutions
`from various telephony channels operating in customer
`premises or within telecommunication networks. It is to be
`understood, however, that the voice response system 11 may
`comprise any application that is accessible via telephone to
`provide telephone access to one or more applications and
`databases, provide interactive dialog with voice response,
`and data input via DTMF (dual tone multi-frequency). It is
`to be appreciated that other gateways and media processing
`entities can be considered.
`
`The system 10 further comprises one or more application
`servers (or web servers) and speech servers that are distrib-
`uted over the network 13. The system 10 comprises one or
`more conversational applications 14 and associate

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket