throbber
Vol.14 No.1 JANUARY 1996
`
`|
`
`nee
`
`2.4) Power
`
`ae
`
`SopesteayCoe oes
`cet.fe Md en
`oobyFAeanRDorca ayiN MYLewis s
`Fr Ten isDarePo eeasideaforCa
`A eee ao
`cae
`ee ee Sr forCe
`eemee Md aae
`| oFPyan‘3M oeand on
`aPerea)overcroBae radio Se
`a iHPageandy,ae
`airs Laureateeeeee aaaand _ |
`; le a
`oe
`coe
`se
`hasoyeee aosPi
`CeeofcurrentCeeSrarEaoegad .
`cll) De analysis -
`;
`:
`_
`;
`OverviewofcurrentSaacaaaealtechniques:oaaa
`me Bugeragt al
`- prosody lars) speech Le
`Beyond intelligibility-—_orSdoftext-to-speech
`1ORR
`acia
`mT ean]ee eee Se recognition
`.,over the 1s ee
`:
`ae
`aeduyCee speechCe
`neal recognition — MET) nh ro lgoe ve :
`tospares recognisersstill 98% ene oracoca ds
`oteBoCURalCo0a phy oe
`eeeTa) et iciceadeaiall interactiveCeeee
`SpokenDe Oc—_- beyond aand Ce
`Assessing EL Su
`; ialdaregspeech?-amaeinaiyiDed
`7 SoD Heath Ipswich SSTieee iets -
`| Deby CE Cafae
`
`PU Maraaa a |
`2 ea ae
`ia)aia"
`DJCasand§7Tra
`Py Te |
`AaPer a org
`yh ae Cr2a WALL
`
`BT eeeete
`
`Page | of 22
`
`~ GOOGLE EXHIBIT 1027.
`
`GOOGLE EXHIBIT 1027
`
`Page 1 of 22
`
`

`

`
`
`The BT Technology Journalis a quarterly periodical of technical papers pub-
`lished by British Telecommunications plc to promote an awareness, among
`workersin similar fields world-wide, of the Research and Development under-
`taken by BT in telecommunications and related sciences.
`
`EDITORIAL BOARD
`
`G White BSc PhD CEng FIEE SMIEEE, Chairman
`JRW Ames MSc CEng MIEE
`E L Cusack BSc PhD
`IG Dufour Euring CEng FIEE
`PG Flavin CEng MIEE
`PW France MSc PhD
`J R Grierson MA PhD CEng FIEE
`RC Nicol PhD CEng FIEE
`S G Stockman BA PhD
`AM Jell BA, Editor
`DN Clough MA, Assistant Editor
`
`Enquiries to the Editor: (01473) 623232, facsimile (01473) 620915, e-mail: bttj @ipswich.sac.co.uk
`Internet access to the BT Laboratories information pagesis available on: http://www.labs.bt.com
`
`is reserved by British
`Unless otherwise stated, copyright of the papers appearing in the Journal
`Telecommunicationsplc. The views of contributors are not necessarily those of the Editorial Board, do
`not necessarily represent BT policy, nor reflect an endorsementfor any commercial products.
`
`The BT Technology Journalis distributed by Chapmanand Hall, 2-6 Boundary Row, London SE! 8HN,
`UK.
`
`The Journal is published four times per year in January, April, July and October. Subscription prices for
`1996are: print + Internet access: $214 (USA/Canada) £126 (EU) £140 (all other countries); print only:
`$185 (USA/Canada) £108 (EU) £122(all other countries). Subscription prices for individuals are (print
`only): $95 (USA/Canada) £54 (EU)£54(all other countries). Individual subscriptions must be paid for by
`personal chequeorcredit card.
`
`Any paymentin US$ should be made to Routledge, Chapman and Hall Dollar Account: 051-70700-4,
`Barclays Bank New York Ltd., 300 Park Avenue, New York, NY 10022, USA.
`
`Subscription rates to USA include airfreight to New York and secondclass postage thereafter. All other
`territories outside UK and Europe will be served by accelerated surface post.
`Second class postage paid at Rahway, NJ. Postmaster: send address corrections to The BT Technology
`Journal, c/o Mercury Airfreight International Ltd Inc., 2323 Randolph Avenue, Avenel, NJ 07001, USA
`(US mailing agent).
`
`All subscription enquiries should be made to Chapman and Hall Subscriptions Department.
`All enquiries concerning editorial matters should be made to the Editor, SAC Technographic Ltd,
`38 Anson Road, Martlesham Heath, Ipswich, Suffolk IP5 7RG (for voice, fax, e-mail details, see above).——--eeeeeSeta,SEEADOVE).
`BT Laboratories
`
`Martlesham Heath Ipswich Suffolk England IP5 7RE
`
`Page 2 of 22
`
`Page 2 of 22
`
`

`

`
`
` HAR 0 4 199
`-OpCOPYcece\G
`TechnologyJournal
`
`PyriGHt OFF
`
`bT
`
`Vol.14 No.1 JANUARY 1996
`
`THEME
`
`Speechtechnologyfor telecommunications
`
`Foreword by C Wheddon
`
`Editorial by F A Westall, R D Johnston and A V Lewis
`
`FA Westall, R D Johnston and
`A V Lewis
`
`Speech technology for telecommunications
`
`Speech is the easiest, most expressive and most natural means of
`human communication. Most of us have received intensive training in
`using itfrom the day we were born! But speech is more than just a way
`of transmitting words or ideas — it conveys the essence of human
`emotion, moods, and personality. It is BT’s core business, accounting
`for over 90% of revenues. It is also our primary means to access the
`26 million customers of the UK telephone networks, and to around a
`half a billion telephone users world-wide. This paper introduces the
`key speech technologies, described in detail in the associated papers
`in this issue, and makes some personalpredictions aboutfuture trends
`and challenges in this important, exciting andfar-reachingfield.
`
`WT K Wong
`
`Low rate speech coding for telecommunications
`
`28
`
`Overthe last decade major advances have been made in speech cod-
`ing technology which is now widely used in international, digital
`mobile and satellite networks. The most recent techniques permit
`telephone network quality speech transmission at 8 kbit/s, but there -
`are still demands for even lower rates and moreflexible, good quality
`coding techniques for various network applications. This paper
`reviews the developments so far, and describes a new class of speech
`coding methods known as speech interpolation coding which has the
`potential to providetoll-quality speech coding at or below 4 kbit/s.
`
`P A Barrett, R M Voelcker and
`A V Lewis
`
`Speechtransmission over digital mobile radio channels
`
`45
`
`The design ofa speech channelfor digital mobile radio applicationsis
`a trade-off between the key performance dimensions of speech quality,
`robustness to errors, delay, complexity and bit rate. An appropriate
`balance is often difficult
`to achieve, but
`is vital
`to customer
`satisfaction. This paper identifies the considerations in selecting a
`speech codec for mobile telephony applications, outlines techniques
`for robust and efficient speech transmission over a digital mobile
`radio channel and discusses how the resulting performance can be
`assessed. Throughout the paper,
`the half-rate GSM digital mobile
`radio system is used as an example.
`
`Page 3 of 22
`
`BT Technol J Vol 14 No 1 January 1996
`
`Page 3 of 22
`
`

`

`Spoken language systems — beyond prompt and
`response
`
`P J Wyard, A D Simons, S Appleby, E Kaneen, S H Williams and K R Preston
`
`
`
`Spoken language systems allow users to interact with computers by speaking to them. This paper focuses on the most
`advanced systems, which seek to allow as natural a style of interaction as possible. Specifically this means the use of
`continuous speech recognition — natural language understanding to interpret the utterance, and an intelligent dialogue
`manager which allows a flexible style of ‘conversation’ between computer and user. This paper discusses the architecture of
`spoken language systems and the components of which they are made, and describes both a variety ofpossible approaches
`and the particular design decisions made in some systems developed at BT Laboratories. Three spoken language systems in
`the course of development are described — a multimodalinterface to the BT Business Catalogue, an e-mail secretary which
`can be consulted over the telephone network, and a multimodal system to allow selection offilms in the interactive TV
`environment.
`
`
`
`1.
`
`Introduction
`
`N; sciencefiction imageofthefuture is complete without
`
`the ever-present personable computer which can under-
`stand every wordsaid to them. In spite of these popular media
`images, the goal of completely natural interaction between
`humans and machinesis still some wayoff.
`
`systems, which
`(IVR)
`response
`Interactive voice
`provide services over the telephone network, have been
`available since the mid-1980s. Initially they were restricted
`to interactive TouchTone® input with voice providing the
`response to the user. The use of such services was therefore
`limited to the population with TouchTone keypads. More
`recently applications using automatic speech recognition
`(ASR) have been developed. These often simply allow the
`option of spoken digit recognition as an alternative to
`keypad entry, thus allowing the service to be launched even
`in areas where TouchTone penetration is poor. Moving on
`from such systems the words which are spoken can be
`matched to the service. This allows these ASR-based
`services to be more user-friendly than their TouchTone
`counterparts because the user can directly answer the
`question: ‘Which service do you require?’ with ‘weather’ or
`‘sport’ rather than ‘for weather press 1 for sport press 2’,
`etc. However,
`they
`still
`rely on selection from a
`predetermined menuofitems at any point in the dialogue.
`
`Moresophisticated services are now becoming possible
`using emerging larger vocabulary speech recognition
`technology. However,it is not sensible to simply extend the
`menu-based approach to accommodate larger vocabularies.
`
`Page 4 of 22
`
`.
`
`Although well-engineered simple applications may be easy
`to use, more advanced services
`are
`likely to have
`complicated menu structures. If information can only be
`provided one item at a time, using a ‘prompt and response’
`dialogue, rigid interaction styles may steer the user through
`a complex dialogue. This can result in the user becoming
`lost, or ending up with the wrong information. These
`problems are particularly significant
`for
`inexperienced
`users. On the other hand, experienced users may become
`bored by the large number of responses needed when they
`know exactly what they want. The menu-based structure
`required by systems which rely on isolated word input is
`often the limiting factor for new services. This limitation of
`the user interface is one of the greatest barriers to the
`usability of many IVR services.
`
`Moving beyond the menu-style interaction towards
`conversational spoken language will allow users to express
`their
`requirements more directly and avoid tedious
`navigation through menus. This approach will also allow
`the user to take control of the interaction rather than using
`the more common‘promptand response’ dialogue.
`
`BTis interested in the development of spoken language
`systems (SLS) to provide a key competitive advantage.
`SLSs
`allow users
`to interact with computers using
`conversational language rather than simply responding to
`system prompts with short or one word utterances. With the
`rapid increase
`in
`competition,
`service differentiation
`becomes a key factor in gaining market share. Systems
`
`187
`
`BT Technol J Vol 14 No | January 1996
`
`Page 4 of 22
`
`

`

`SPOKEN LANGUAGE SYSTEMS
`
`which allow users 24-hour remote access to information
`provide a very useful service for people whoare in different
`time zones, or away from their office, or who need
`information immediately during unsocial hours. SLSs can
`be used to automate such services and also those which
`currently require human operators, thus freeing their time to
`deal with difficult situations where more complex, or more
`personalised advice is needed.
`
`2.
`
`System overview
`
`his section outlines a typical spoken language system
`architecture, from the information processing point of
`view (platform and inter-process communication issues are
`not dealt with to any great extent in this paper). The archi-
`tecture and the key processing componentsare outlined.
`
`The most basic form of SLS, a speech-in/speech-out
`(rather
`than multimodal)
`system,
`requires at
`least
`the
`following major components (described briefly below and
`in more detail in section 4).
`
`e
`
`Speech recognition — to convert an input speech
`utterance to a string of words.
`
`@ Meaning extraction — to extract as much of the
`meaning as is necessary for the application from the
`recogniser output and encodeit into a suitable meaning
`representation.
`
`in information networking and the
`trends
`Current
`phenomenal growth of the Internet bring their attendant
`problems for our customers in keeping up with technology,
`finding what they need, and using information to their best
`advantage. Spoken language system technology can greatly
`enhance our customers’ ease of access to information, thus
`increasing network revenue through new and increased
`usage. Systems which combine several modes of input and
`output, such as speech, graphics, text, video, mouse-control,
`touch and virtual reality, are known as multimodal spoken
`language systems. These allow far greater freedom of
`Database query — to retrieve the information specified
`expression for users who, as a result, should feel more
`by the output of the meaning extraction component.
`comfortable and less as though they are ‘talking to a
`Some applications (e.g. home banking) may require a
`computer’. They are able to point, use gestures, speak, type;
`specific transaction to occur. Many applications may
`whatever comes most naturally to them. Spoken language
`be a mixture of database query and_transaction
`systems will become increasingly important
`in the near
`processing.
`future as progress in technology becomes more widely
`available.
`
`e
`
`e
`
`Dialogue manager — this controls the interaction or
`‘dialogue’ between the system and the user, and co-
`ordinates
`the operation of all
`the other
`system
`components.
`It uses
`a dialogue model
`(generic
`information about how conversations progress) to aid
`the final interpretation of an utterance. This may not
`have been achieved by the ‘meaning extraction’
`component, because the interpretation relies on an
`understanding of the conversation as a whole.
`

`
`e
`
`to be
`Response generation — to generate the text
`output in spoken form. Information retrieved by the
`database query component will be passed to the
`response
`generation
`component,
`together with
`instructions from the dialogue manager about how to
`generate thetext (e.g. terse/verbose, polite/curt, etc).
`
`Speech output module (text-to-speech synthesis or
`recorded speech).
`
`Atits simplest, processing consists of a linear sequence
`of calls to each component, as shown in Fig 1. A typical
`outputof each stage from an application which accesses the
`BT Business Catalogue is shown. It
`is not necessary to
`understand
`the output of
`the
`‘meaning
`extraction’
`component in detail to realise that meaning extraction can
`be a non-trivial exercise. The simple linear sequence shown
`in Fig 1
`is,
`in general,
`too inflexible. It is better if the
`dialogue manageris given greater control, to call the other
`components in a flexible order, according to the results at
`
`The goal is to be able to build systems which are not
`restricted only to those motivated users who are prepared to
`spend time learning the language the machine understands.
`These new systems can be used by anyone who wants
`occasional accessto a particular service. They will also help
`the user successfully gain the information or service they
`require by simply calling a number and asking for what they
`want. In fact, the aim is to put back someofthe intelligence
`which existed in the network 50 years ago when a user
`simply lifted the handset and asked to be connected to the
`service or numberrequired.
`
`This paper discusses the design and implementation of
`spoken language systems and is organised as follows.
`Section 2 gives an outline of the architecture of an SLS.
`Section 4 describes the components of an SLS in some
`detail, giving concrete examples from current systems.
`Section 3 discusses some of the systems currently under
`development at BTL. These include a multi-modal system
`for access to the BT Business Catalogue, a speech-in/
`speech-out system for remote e-mail access and a system
`for accessing information about films. Section 5 discusses
`future work which needs to be carried out to improve the
`quality and usability of SLSs, and section 6 draws some
`conclusions.
`
`188
`
`BT Technol J Vol 14 No 1 January 1996
`
`Page 5 of 22
`
`

`

`SPOKEN LANGUAGE SYSTEMS
`
`dialogue manager
`
`speech
`recognition
`
`meaning
`
`database
`query
`
`response
`generation
`
`“The Duet 50
`
`extraction
`
`“Which
`
`and the Duet 80”
`
`phonescost
`less than the
`Duet 100”
`
`price of the Duet 100
`
`
`
`
`
`
`
`{
`[
`
`productQuestion.
`P,
`[buyPrice]
`
`]
`
`find all products P1,
`where P1 is a phone,
`= the price of P1 is
`Price1, and Price1 is
`less than Price2,
`where Price2 is the
`
`[product, phone, P41],
`[valueFeature, buyPrice, P1, Price1],
`[valueFeature, buyPrice, P2, Price2]
`[Price1<Price2],
`
`[product, phone, P2],
`
`[P2=duet_100]
`
`]
`
`
`Example ofa linear process flow in a spoken language system.
`Business Catalogue access system (see section 3.1) are
`multimodal and require a screen and a meansof inputting
`text and mouse clicks and outputting text and graphics.
`These components must be addedto the architecture shown
`in Fig 2 and the dialogue manager and response generator
`must be upgraded to deal with the extra modalities.
`However, most of the discussion of this section applies
`equally to multimodal systems.
`
`Fig 1
`
`each stage. This leads to an architecture of the type shown
`in Fig 2.
`
`The need for this more flexible architecture is illustrated
`by the processing sequence in Fig 3 which shows the
`dialogue manageras controlcentre, calling each component
`in an order determined by the results of processing at each
`stage. Although every processing stage is passed through
`the dialogue manager, this is not included in the sequence
`unless some non-trivial decision or action is taken. The
`example given in Fig 3 is largely driven by limitations of
`the recogniser, but
`the need for
`this sort of flexible
`architecture goes far beyond this. It will eventually enable
`the dialogue managerto act in an intelligent manner, co-
`ordinating the components and combiningtheir outputs in a
`nonlinear manner.
`
`3. Example systems
`n this section three spoken language systems under
`development at BT Laboratories are described:
`
`So far in this section, the discussion has covered speech
`in/speech out systems. However, systems such as the BT
`

`
`access to the BT Business Catalogue, known as BusCat
`— this was the first multimodal continuous speech
`input spoken language system,
`
`
` speech
`
`recognition
`
`meaning
`extraction
`
`database
`query
`
`dialogue manager
`
`
` response
`
`generation
`
`Fig2
`
`Role of a dialogue managerin a spoken language system.
`
`189
`
`BT Technol J Vol 14 No 1 January 1996
`
`Page 6 of 22
`
`Page 6 of 22
`
`

`

`SPOKEN LANGUAGE SYSTEMS
`

`

`
`an e-mail access system, which is speech in/speech out
`only, but has the conversational features described in
`this paper — it is also a dial-up service overthe tele-
`phonenetwork,
`
`a film access system, in which users will be able to
`select films and videos using continuous speech and
`button pushes on a remote control handset — this
`system is targeted at the interactive TV environment.
`
`user
`
`input
`
`dialogue
`manageraction
`
`componentaction
`
`Which
`phones
`comein
`blue?
`ue?
`
`
`
`
`
`
`
`tells response
`module to
`promptthe
`userfor repeat
`input
`
`
`Which
`phones
`comein
`
`blue?
`
`Telephones
`
` realises thatit
`cannot
`
`
`
`
`
`
`interpret the
`question, soit
`tells the
`response
`moduleto tell
`the user thatit
`
`
`
`
`
`is missing
`information
`
`about the
`
`
`product class
`
` combinesthis
`
`
`semantic
`
`representation
`with the
`previous one- it
`
` now realises
`
`
`
`
`
`
`
`
`
`that it has
`sufficient
`information to
`make a
`database
`query
`
`speech recogniser returns
`with a low confidencethatit
`made a satisfactory
`recognition
`
`
`
`
`
`
`
`response module
`generates and outputs
`
`“| did not understand
`
`that - please repeat”
`
`
` recogniser outputs “Which
`ones comein blue?” (one
`
`
`word misrecognised
`
`
`
`meaning extraction
`produces a semantic
`
`
`representation, which
`
`
`contains an unresolved
`
`
`
`
`product class
`
` response module
`
`generates and outputs
`
`
`
`“Whattype of product do
`you require?”
`
`
`meaning extraction
`produces a semantic
`
`
`representation of the word
`
`telephones
`
`database query returns with
`a list of the blue phones
`
`
`
`the response module gener-
`ates and outputs “We have
`the following blue phones:
`
`
`the Relate 100 the Relate
`200 and the Duet 100”
`
`
`
`
`3.1
`
`BusCat
`
`The SLS BusCat provides direct access to a subset of
`the BT Business Catalogue, which covers a range of
`products such as telephones, answering machines and phone
`systems. The user has a screen displaying a Netscape
`WWW browserand speech input/outputfacilities. All the
`normal WWW browser features are present, such as the
`ability to click on links to other pages, and a display
`consisting of mixed text and graphics
`
`(see Fig 4),
`
`The SLS BusCatsystem in use.
`Fig4
`in this system users may use continuous
`Additionally,
`speech input, type questions into a free-text window, and
`listen to speech output generated bya text-to-speech (TTS)
`system. This multimodal interface enables users to request
`specific information about the products in the catalogue, or
`to browse through the catalogue.
`
`In addition to its internal knowledge bases, the system
`has the capability to access external databases across a
`network. One application for this might be to provide a
`multimodalinterface for such databases. Anotheris to allow
`the internal knowledge bases to be periodically updated
`from an external database.
`
`The speech recogniser used is BT’s Stap recogniser[1],
`and the text-to-speech system is BT’s Laureate [2] system.
`
`The example in Table | gives a flavour of whatit feels
`like to interact with the system. Here the user is already
`logged on to the system. From each WWW page there is a
`choiceof:
`
`recogniser outputs
`“Telephones”
`
`The overall structure of the system is shown in Fig 5.
`The system can cope with multiple simultaneous users.
`
`190
`
`Fig 3
`
`Nonlinear process flow in spoken language systems.
`
`BT Technol J Vol 14 No I January 1996
`
`Page 7 of 22
`
`Page 7 of 22
`
`

`

`prolog database
`
`dialogue information
`
`SPOKEN LANGUAGESYSTEMS
`
`current query
`
`template
`
`
`current user
`preferences
`
`
`
`Oo
`(J
`
`
`:
`database

`
`products
`domain
`query
`dialogue
`and services
`knowledge
`history
`
`
`
`
`
`
`response
`meaning
`dialogue
`extraction
`manager
`generation
`
`
`
`
`
` www
`
`browser
`
`
`
`
`
`speech
`recogniser
`
`speech
`output
`
`
`
`user
`
`Fig 5
`
`Architecture of BusCat.
`
`System response
`
`In the interaction the user wants to know what on-hook
`dialling is. Having received an explanation of this feature,
`he decides he wants a phone with on-hookdialling which
`costs less than £60. Then he remembers he also wantsit in
`grey to matchhis living room. Hefinally selects the Relate
`200 telephone.
`
`3.2.
`
`E-mail access
`
`BTis very interested in the mobile telephony market.
`Speech-only natural language systemsare very attractive to
`this market because people want to be able to keep in touch
`while on the move. They are likely to only have a mobile
`phone with which to do so. Speech access to information
`can have an added safety advantage over TouchTone
`interfaces as well as helping to improve the usability of the
`service.
`
`
`
`
`“Which phones have on-hook dial-
`ling and cost less than 60 pounds?’
`
`Anexample session with BusCat.
`Table 1
`
`
` User input
`
`
`
`
`(and optionally spoken)
`‘What is on-hookdialling?’
`Textual
`explanation of on-hook dialling:
`
`
`‘Time spent waiting for someone to
`
`
`answer the phone can often be lost
`
`
`time. But with this feature, you can
`
`
`dial without picking up the phone
`
`
`handset,
`leaving you free to carry
`
`
`on with something else until
`the
`
`
`second your call connects,’ and a
`
`
`list of five phones which havethis
`
`
`feature: Vanguard 10e, Relate 200,
`
`
`Relate 300, Relate 400, Converse
`300.
`
`
`
`Text: ‘The following products meet
`
`
`your requirements,’ and a list of
`four phones, each with a small pic-
`
`
`ture, a short description andaprice
`
`
`E-mail has been chosen as the vehicle for producing a
`(Vanguard 10e, Relate 200, Relate
`speech in/speech out natural language system because the
`300, Converse 300).
`
`
`information source (the users own e-mail)
`is
`readily
`
`
`Text: ‘The following products meet
`“Which ones comein grey?’
`available. An existing system allows selected users to
`your requirements,’ and a list of
`
`access their e-mail using TouchTone commands,
`thus
`three phones, each with a small pic-
`
`providing information on whether natural language access
`ture, a short description and a price
`
`confers real benefits to the users. It will certainly enable
`(Vanguard 10e, Relate 200, Relate
`300).
`queries which are notpossible currently, such as: ‘Do I have
`any messages from John?’
`
`
`
`The user clicks on the link next to
`
`
`The system responds with a large
`the picture of the Relate 200.
`picture of the Relate 200, a full
`
`
`
`description including all its features
`
`and a price.
`
`

`

`

`
`speaking to the system,
`
`clicking on a link,
`
`typing into the free-text field.
`
`Page 8 of 22
`
`The main differences between the e-mail service and the
`BusCat service are as follows. The e-mail service uses
`telephony speech only, rather than a multimodal system.
`The dialogue is more robustto errors and will allow the user
`or the system to take the inititive whereas the Business
`Catalogue system is based onthe user taking the initiative.
`The e-mail
`system architecture is being designed to
`decouple domain-independent and domain-dependentparts
`of the system, and the same approachis being used for the
`
`191
`
`BT Technol J Vol 14 No | January 1996
`
`Page 8 of 22
`
`

`

`SPOKEN LANGUAGE SYSTEMS
`
`film access system (see section 3.3). This means that in the
`future there will be a good level of confidence in being able
`to produce spoken language systems for new domains more
`quickly than at present.
`
`An example of a dialogue the system might have with
`the user is shown in Table 2.
`
`Table 2
`
`An example session with an e-mail service.
`
`User
`Note
`System
`(dials up system)
`
`The default startup is to
`tell the user how many
`old and new messages
`there are and to start
`listing the new ones.
`
`Good morning Alison.
`You have ten new mes-
`sages and 20 old mes-
`sages. Yourfirst new
`message is from Mike
`Harper, regarding,
`Thanksfor demo.
`
`Read it.
`
`4 J
`
`Messageis: ‘Thank
`you for your demoyes-
`terday. The customers
`were most impressed.’
`
`Forward it to Peter
`Smith and Sandra
`
`Resolution ofpartially
`specified name. Check
`on preconditions for
`forwarding
`
`
`Sandra Jonesis already
`on the distributionlist
`
`Forwarding message to
`Peter Smith
`
`
`Forward to Language
`Group
`
`Colin North, regarding,
`Team Meeting.
`
`Forwarding message to
`all the people on your
`distributionlist called
`‘Language Group’.
`
`wantto carry onlisting
`new messages unless
`youtell it otherwise.
`
`Does not confirm on
`forwarding a message.
`
`Next messageis from
`Laurence Cooper:
`Plans for Workshop
`
` Next messageis from
` Stop.
`
` System assumes you
`
`
`
`3.3
`
`Film access for interactive multimedia services
`
`Interactive multimedia services (IMS) are currently
`being implemented by BT in Colchester and Ipswich [3].
`The usability of
`the interface is
`still constrained to
`navigation using a TV remote control. This means that
`multiple layers of menus needto be traversed to get to the
`information required. Spoken language access would allow
`users to go straight to the information they are searching for,
`without
`requiring them to learn complex navigation
`procedures.
`
`The video-on-demandsubsetof the IMS, whichconsists
`of over 4000 hours of material, including films, educational
`programmes, children’s programmes, etc, was chosen. The
`SLS will allow users to give instructions such as: ‘I want a
`comedyfilm starring Harrison Ford’. Part of the benefit of
`developing such a system,is to ensure that the generic SLS
`frameworkis truly domain independent.
`
`There is currently a text-based interface to the Internet
`movie database [4]. This allows users to enter queries such
`as: “Tell methe ratings of comedy moviesstarring Harrison
`Ford’. The system performs the meaning extraction using a
`caseframeparser (section 4.2). This allowsit to pick outthe
`salient information from among extraneous words.
`
`It seems likely, from humananalysis of typical queries
`about films, that this method is suitable.
`
`Anissue yet to be addressed is how to best reconcile the
`advantages of using speech, with the limitations of current
`recognition technology. This is clearly illustrated in the
`present example, since the text-based interface can query
`the database of over 50 000 films and 100 000 cast names.
`No speech recogniser yet built can cope with this range of
`vocabulary. The obvioussolutionis to restrict the size of the
`database. A possible step in the right direction would beto
`couple the ‘meaning extraction’ component and recogniser
`much more closely,
`so that meaning extraction and
`recognition happen simultaneously. This might enable the
`recogniser to cut down the vocabularysize ‘on the fly’. For
`.. OK. What would
`example, given the input sentence: ‘Which comedy movies
`you like?
`
`star Burt Lancaster,’ it could be established straightaway
`
`
`that the user was talking about comedies, then only about
`
`cinemafilms, and finally that the user was only interested in
`
`
`
`HaveI any recent mess-
`In the last week you
`Slightly more complex
`have received 5 mes-
`ages from Steve Brown
`user query. System
`an actor. Therefore, by the time the recogniser gets to the
` or Steve Clark
`
`sages from Steve Brown
`makesexplicit whatit
`name ‘Burt Lancaster,’ the number of possible words has
`and 4 from Steve Clark
`thinksofas‘recent’.
`
`
`reduced considerably.
`List the ones from
`
`
`Message1is about
`
`
`Resolutionofpartial
`Clark.
`
`
`Meeting onthe 31st
`specified name.
`
`Message 2 is about
`
`
`Agenda for team
`
`
`meeting
`
`
`Messageis: ‘Hi Alison,
`Have you completed the
`plans yet?’
`
`Reply
`Choice of several stock
`replies
`
`Let’s hearit.
`
`Tell him yes
`
`OK
`
`Ihave four possible
`responses, yes, OK, no,
`or Pl call you.
` Reply to Laurence
`
`Cooper: Text is ‘Yes’.
`Please confirm.
`
`Reply sent. Next mes-
`sage is from...
`
`Confirms before send-
`ing a reply
`
`Detection of user speak-
`ing terminates prompt.
`Listing new messages
`ended by user.
`
`192
`
`This is the subject of further research andis discussed in
`moredetail in the next section.
`
`BT Technol J Vol 14 No | January 1996
`
`Page 9 of 22
`
`Page 9 of 22
`
`

`

`4. Components of a spoken language system
`
`4.1
`
`Speech recognition
`
`he job of a speech recogniser is typically thought of as
`converting a speech utteranceinto a string of text. The
`internal workings of speech recognisers are explained in
`some depth elsewhere [5]. This section looks at the recog-
`niser’s place within an SLS, and,in particular, at the lan-
`guage model
`(LM)the recogniser uses and the form of
`outputthat it provides.
`
`A language model embodies information about which
`wordsor phrases are more likely than others at a given point
`in a dialogue.
`
`One might imagine that in a system that accepts fluent
`language, for example an automated travel agent, the speech
`recogniser might need only one language model,that of the
`entire English language. It could then recognise anything
`that anyone said to it (assuming they are speaking English)
`and could inform the dialogue manager accordingly. Speech
`recognition is not yet accurate enough and a modelofthe
`entire English language does not exist. Instead,
`to get a
`working system, the recogniser must be given as much help
`as possible. It must be given hints about whatthe user is
`likely to say next
`to improve the chances of correctly
`recognising what has been said. If the dialogue manager
`knowsthat the customer wants to go on a cruise and has just
`asked them where they would like to go, it should prime the
`recogniser to be expecting a response that may well concern
`one of a numberof specified cruise ports and, by the same
`token, is unlikely to have anything to do with backpacking
`in Nepal.
`
`SPOKEN LANGUAGE SYSTEMS
`
`@
`

`

`

`

`
`language models,
`
`perplexity of a language model,
`
`advantages and disadvantages of language models,
`
`loading language models into the recogniser,
`
`output from the recogniser.
`
`4.1.1
`
`Language models for the recogniser
`
`speech
`the
`for
`source
`knowledge
`primary
`The
`recognition componentis a set ofstatistical models, known
`as hidden Markov models or HMMs, which encode how
`likely a given acoustic utteranceis, given a string of spoken
`words. A recogniser can decode a speech utterance purely
`on the basis of this acoustic-phonetic knowledge,andthis is
`basically what happens in the case of single isolated-word
`recognition. However, in the case of recognising a string of
`words
`(which form part of a spoken language),
`the
`recogniser can use a second knowledge source, namely the
`intrinsic probability of
`the given string. This
`second
`knowledge source is knownas the language model.
`
`To take a classic example, a given utterance may have
`almost equal
`acoustic-phonetic probabilities of being
`‘recognise speech’ or ‘wreck a nice beach’. However, the
`intrinsic probability of the first string is likely to be higher
`than that of the second, particularly if this utterance came
`from the domain of a
`technical
`journal on speech
`technology.
`
`This can be expressed mathematically as follows. Let X
`be the acoustic utterance and let S be the sentence t

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket