throbber
United States Patent r191
`Stanley et al.
`
`I 1111111111~ II l~l 111111111111111111
`5,684,924
`Nov. 4, 1997
`
`US005684924A
`[llJ Patent Number:
`[451 Date of Patent:
`
`[54] USER ADAPfABLE SPEECH RECOGNITION
`SYSTEM
`
`5,127,055
`5,231,670
`
`6/1992 Larkey ................................... 395/2.53
`7/1993 Goldhor et al. . ...................... 395/2.84
`
`[75]
`
`Inventors: Barbara Ann Stanley, Lexington;
`Mary-Marshall Teel, Watertown;
`Susan Rousmaniere Avery, Westford;
`Vladimir Sejnoba, Cambridge, all of
`Mass.
`
`[73] Assignee: Kurzweil Applied Intelligence, Inc.,
`Waltham, Mass.
`
`[21] Appl. No.: 444,391
`May 19, 1995
`(22] Filed:
`Int. CL 6
`........................................................ Gl0L 5/00
`(51)
`(52) U.S. Cl. .......................................... 395/2A2; 395/2.53
`[58) Field of Search ..................................... 395/2.6, 2.49,
`395/2.47, 2.61, 2.39, 2.45, 2.64, 2.5, 2.31,
`2.52, 2.53, 2.42
`
`[56]
`
`References Cited
`
`U.S. PXI'ENT DOCUMENTS
`
`OfHER PUBLICATIONS
`
`"Building Better Wizards", User Interface Engineering, Dec.
`1994.
`Primary Examiner-Krista M. Zele
`Assistant Examiner-Scott Wolinsky
`Attome~ Agent, or Fi,--Henry D. Pahl, Jr.
`ABSTRACT
`(57)
`
`A speech recognition system is disclosed which comprises a
`core speech recognition program and a plurality of utility
`program modules for adjusting various recognition param(cid:173)
`eters such as gain, sensitivity and acceptance threshold and
`for improving the training of word models. The system
`further provides a decision tree and utility controlling pro(cid:173)
`gram module which can be invoked by a user confronting
`problems during the running of the core program. The
`system utilizes user input to traverse the decision tree and to
`initiate appropriate ones of the utility program modules
`thereby to alter the on-going behavior of the core recogni(cid:173)
`tion program.
`
`5,008,941
`
`4/1991 Sejnoha .................................. 395/2.31
`
`8 Claims, 4 Drawing Sheets
`
`.-----I GAIN
`----"-35
`,----------136
`,----~SENSITIVITY
`
`VOCABULARY
`
`MODELS
`
`137
`
`DISCRIMINITIVE
`ADJUST
`
`It
`
`34
`
`38
`
`20
`
`33
`
`CORE
`
`SPEECH
`
`RECOGNIZER
`
`DECISION---(cid:173)
`TREE AND 11----'
`UTILITY
`CONTROL
`
`39
`
`ACCEPTANCE
`THRESHOLD
`
`GENDER SELECT
`
`C::=:=:=======~.JWORD
`
`t-
`
`PROCESSOR
`
`VOICE RESPONSIVE
`USER INTERFACE
`
`32
`
`50
`
`OPERATING SYSTEM
`
`31
`
`COMPUTER
`HARDWARE
`
`28
`
`Page 1
`
`AMAZON 1016
`Amazon v. SpeakWare
`IPR2019-00999
`
`

`

`U.S. Patent
`
`Nov. 4, 1997
`
`Sheet 1 of 4
`
`5,684,924
`
`(cid:143)
`
`.
`<..9
`LL
`
`,....
`
`0
`C\J
`
`_J
`<(
`z
`(!) a::
`- 0
`en en
`en
`..J LIJ
`<( u
`.... 0
`a::
`(!) Q.
`0
`
`C
`.......
`<(
`
`Page 2
`
`AMAZON 1016
`Amazon v. SpeakWare
`IPR2019-00999
`
`

`

`U.S. Patent
`
`Nov. 4, 1997
`
`Sheet 2 of 4
`
`5,684,924
`
`r------1 GAIN
`------'"-·35
`______ __,,136
`
`---~SENSITIVITY
`
`VOCABULARY
`
`MODELS
`
`137
`
`TRAIN WORD.,._ __
`
`DISCRIMINITIVE
`ADJUST
`
`II
`
`34
`
`38
`
`DECISION 1 - - - - '
`TREE AND 1 - - - - - -1
`UTILITY
`CONTROL
`
`39
`
`ACCEPTANCE
`THRESHOLD
`
`CORE
`
`SPEECH
`
`RECOGNIZER
`
`33
`
`50
`
`32
`
`31
`
`VOICE RESPONSIVE
`USER INTERFACE
`
`OPERATING SYSTEM
`
`COMPUTER
`HARDWARE
`
`28
`
`FIG.2
`
`Page 3
`
`AMAZON 1016
`Amazon v. SpeakWare
`IPR2019-00999
`
`

`

`U.S. Patent
`
`Nov. 4, 1997
`
`Sheet 3 of 4
`
`5,684,924
`
`NOT RECOGNIZE -
`
`A WORD
`
`\Al
`
`IDENTIFY
`
`INPUT
`
`i.....,. WORD ~ ... SAMPLES
`62
`
`I
`
`61
`
`TRAIN
`MODELS
`
`,,
`63
`
`,143
`
`- IDENTIFY
`WORDS
`-
`I
`64
`
`INPUT
`SAMPLES
`
`- DELETE
`- MODEL
`
`1
`6
`5
`
`I
`66
`
`44
`}
`CONFUSES
`TWO WORDS
`
`NOT HEAR
`ONE WORD
`
`ONE WORD
`INTRUDES
`
`"'-45
`~47
`
`HEARS NOISE
`AS WORD
`
`~49
`
`HEARS NOT
`SPEAKING
`
`51)
`
`OCCASIONALLY
`NOT HEAR
`
`-
`
`TOO LONG:
`
`53
`,-----L/"55
`DEAF
`
`TOO LOUD
`
`56
`
`58
`
`GENERALLY
`POOR
`RECOGNITION
`
`l/ 59
`
`RESET
`
`54
`J
`
`IDENTIFY -
`
`WORD
`
`67}
`
`SOUND
`LEVEL
`- CHECK
`-
`
`-
`-
`
`91
`I
`
`r
`
`L SOUND
`LEVEL CHECK
`
`69
`'
`
`ADJUST
`
`70
`J
`ADJUST
`
`ACCEPTANCE • SENSITIVITY
`
`-
`
`.--------~
`
`SOUND
`LEVEL CHECK
`
`93 94
`ADJUST
`GAIN
`
`FIG.3
`
`95
`J
`SOUND LEVEL
`CHECK
`
`96
`)
`VERIFY
`-
`L GENDER
`
`ADJUST
`~ ACCEPTA.i."1CE
`
`:-J 97
`
`98
`
`J
`
`L
`
`r
`
`ADJUST
`SENSITIVITY
`
`Page 4
`
`AMAZON 1016
`Amazon v. SpeakWare
`IPR2019-00999
`
`

`

`U.S. Patent
`
`Nov. 4, 1997
`
`Sheet 4 of 4
`
`5,684,924
`
`IDENTIFY WORDS
`
`71
`
`73
`
`INPUT SAMPLES OF EACH
`
`COMPARE SAMPLES OF DESIRED WORD
`WITH CORRESPONDING EXISTING MODELS
`AND SELECT SAMPLES FOR ADDITION
`
`COMPARE MODELS OF INTRUDER WORD
`WITH SAMPLES OF DESIRED WORD
`
`75
`
`77
`
`NO
`
`EXIT
`
`YES
`
`EXIT
`
`DELETE
`CLOSEST
`INTRUDER
`MODEL
`
`83
`
`ADD NEW MODELS
`BASED ON INTRUDER
`SAMPLES
`
`85
`
`FIG. 4
`
`EXIT
`
`Page 5
`
`AMAZON 1016
`Amazon v. SpeakWare
`IPR2019-00999
`
`

`

`1
`USER ADAPTABLE SPEECH RECOGNITION
`SYSTEM
`
`5,684,924
`
`2
`recognition system; the provision of such a system which
`facilitates the adjustment of on-going system operating
`parameters by a user; the provision of such a system which
`allows a user to identify a problem which he has encountered
`5 during operation of the speech recognition system and to
`transparently invoke one or more utility programs to make
`corresponding changes in the on-going operation of the core
`speech recognition program; the provision of such a system
`which facilitates re-training of word models when user input
`10 identifies problems arising from mismatch of vocabulary
`models with the user's speech characteristics; the provision
`of such a system which allows a user to improve the
`discrimination between an identified pair of words; the
`provision of such a system which is easy to use, which is
`15 highly reliable and which is of relatively simple and inex(cid:173)
`pensive implementation. Other objects and features will be
`in part apparent and in part pointed out hereinafter.
`
`SUMMARY OF THE INVENI10N
`
`20
`
`30
`
`In a speech recognition system in accordance with the
`present invention, an input utterance from a user is provided
`to input means which provides an adjustable amplitude gain
`which is applied both to the input utterance and to back-
`25 ground noise. The system incorporates a core recognition
`program which compares an input utterance with a multi(cid:173)
`plicity of trained word models and generates scores for each
`of at least a portion of the models. The core program has
`adjustable operating parameters, including a sensitivity
`parameter which controls discrimination level between
`background noise and sound levels accepted as an utterance
`and a threshold parameter for scores. The system also
`includes a plurality of primary utility program modules
`which respond to user input to change system parameters,
`e.g. the gain, sensitivity and threshold parameter. A training
`utility program module is also incorporated which responds
`to user input to further train a selected word model on the
`basis of a particular user's voice characteristics. A decision
`tree and utility controlling program module can be invoked
`by a user who confronts problems during running of the core
`program and this module utilizes user input to traverse the
`decision tree and to initiate a corresponding utility program
`module thereby to alter the on-going operating behavior of
`the core program.
`
`BACKGROUND OF THE INVENTION
`1 . Field of the Invention
`The present invention relates to speech recognition sys(cid:173)
`tems and more particularly to a speech recognition system
`which facilitates the adjustment, by a non-technical user, of
`operating parameters of the system and of the training or
`selection of word models and thereby alter the on-going
`operating behavior of the recognition system itseJf.
`2. Description of the Prior Art
`While it is conventional to provide, with various appli(cid:173)
`cation programs, a HELP file or even a so-called Wizard
`which guides an unfamiliar user through the operation of the
`functionality of the program, these prior aids have not been
`entirely successful in facilitating user interaction with
`speech recognition programs. The basic problem with these
`prior art approaches is that they provide aid or guidance in
`enabling the user to control the functionality or operation of
`the application program but they do not basically affect or
`tune the on-going operation of the program itself. For
`example, there are so-called Wizards which aid a user in
`creating a chart using a spread sheet program or in gener(cid:173)
`ating structured reports. In each case, however, the behavior
`of the underlying application program remains unchanged.
`Unlike typical application programs. a speech recognition
`program must, to some extent, be tailored to a particular user
`and his environment in order to provide the highest accuracy
`of recognition and general ease of operation. Non-technical
`users can easily be confused about the effects of various
`adjustments and their impact on recognition results. For
`example, a user may be easily confused between gain and
`sensitivity. As understood by those skilled in the art, but not
`typically by users, gain is the amplification provided
`between the input microphone and the conversion to digital
`values which typically precedes any analysis of an input
`utterance while the term "sensitivity" is typically used to
`describe a controlled parameter which defines a discrimina(cid:173)
`tion level between background noise levels and sound levels
`which will be accepted as an utterance. Likewise, non(cid:173)
`technical users may experience confusion between the sen(cid:173)
`sitivity parameter and a threshold parameter. The threshold
`parameter is applied after scoring of models against an 45
`utterance to exclude the presumably unlikely models from
`being transferred to an application program, e.g. a word
`processor. As is understood by those skilled in the art, large
`vocabulary speech recognition systems typically employ
`one or more coarse preselection stages to narrow the field of 50
`vocabulary models before performing a detailed or high
`accuracy analysis of the reduced field of models.
`Non-technical users may also not recognize when their
`personal speaking characteristics are interfering with recog(cid:173)
`nition so that it is appropriate to re-train one or more word 55
`models in the vocabulary or, in some cases, to actually delete
`models to aid discrimination between a pair of confusable
`words.
`While utility program modules for adjusting the operating
`parameters of a speech recognition program are typically
`provided with that program, the selection of which utility
`program to run in order to improve performance may be
`obscured even though the individual utility programs have
`user interfaces which aid the user in accomplishing the
`specific tasks performed by that utility.
`Among the several objects of the present invention may
`be noted the provision of a novel. user-friendly speech
`
`35
`
`40
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`FIG. 1 is a block diagram of a speech recognition system
`in accordance with the present invention;
`FIG. 2 is a diagram illustrating the relationship of various
`software components employed in the speech recognition
`system;
`FIG. 3 is a flow chart illustrating the operation of a
`decision tree and utility control component of the software
`of FIG. 2; and
`FIG. 4 is a flow chart illustrating the operation of a
`software component which improves discrimination
`between a pair of words.
`Corresponding reference characters indicate correspond-
`60 ing parts throughout the several views of the drawings.
`
`DESCRIPTION OF THE PREFERRED
`EMBODIMENT
`
`In common with many prior art systems, the preferred
`65 embodiment of the system of the present invention operates
`by first transducing acoustic speech waveforms to obtain
`corresponding electrical signals and then digitizing those
`
`Page 6
`
`AMAZON 1016
`Amazon v. SpeakWare
`IPR2019-00999
`
`

`

`5,684,924
`
`3
`signals. With reference to FIG. 1, the transducer indicated
`there is a microphone 11 which is connected. through a
`suitable preamplifier 13, to an analog-to-digital converter
`15. The gain of pre-amplifier 13 is preferably adjustable
`under software control. As is usual in the art, the digitized 5
`speech signal is treated to obtain, at a succession of sample
`times, a sequence of digital values or data frames which
`characterize the speech. In the embodiment illustrated, these
`values are obtained by passing the speech signal through a
`digital signal processor 17 which performs a Fourier trans- 10
`form so as to extract spectral features characterizing the
`input speech. The collection of digital values defining the
`input spectrum at a given moment of time is referred to
`hereinafter as a frame. Each frame may be considered to be
`a multidimensional vector as understood by those skilled in 15
`the art
`Collectively, the front end circuitry is identified by ref(cid:173)
`erence character 20 which receives its input from micro(cid:173)
`phone 11. Though the input signal processing is illustrated as
`being implemented digitally, it should be understood that 20
`analog filtering followed by analog-to-digital conversion
`might also be used. Likewise, while multichannel filtering is
`presently preferred, it should be understood that other meth(cid:173)
`ods of treating or encoding the raw input signal might also
`be employed. for example, linear predictive encoding which
`might also be done by special pwpose hardware.
`A general pwpose microcomputer system 23, e.g., one
`employing an Intel 80486 microprocessor, is provided for
`general system management and conlrol functions, as well
`as for the processing of distance or scoring calculations. As
`is conventional, computer 23 incorporates a video display 24
`and a keyboard 26 for providing interaction with the system
`user.
`The raw spectral information obtained from the front end
`circuitry 20 is further preprocessed in the computer 23 to 35
`replace each sample or input frame with an index which
`corresponds to or identifies one of a predetermined set of
`standard or prototype spectral distnbutions or frames. In the
`particular embodiment being described, 1024 such standard
`frames are utilized. In the art, this substitution is conven- 40
`tionally referred to as vector quantization and the indices are
`commonly referred to as VQ indices. The preprocessing of
`the input data by the computer 23 also includes an estimating
`of the beginning and end of a word in an unknown speech
`input segment based on the energy level values. It should 45
`also be understood that the particular embodiment being
`descnbed is a discrete word recognizer rather than a con(cid:173)
`tinuous speech recognizer but that the basic methods of the
`present invention could be applied to either type. To deter(cid:173)
`mine the start and end of words spoken by the user, the input 50
`circuitry preferably incorporates a software adjustable con(cid:173)
`trol parameter, designated the "sensitivity" value, which sets
`a threshold distinguishing user speech from background
`noise.
`As indicated previously. vocabulary models are repre- 55
`sented by sequences of standard or prototype states. Rather
`than representing spectral distributions, the state indices
`identify or correspond to probability distribution functions.
`The state spectral index essentially serves as a pointer into
`a table which identifies, for each state index, the set of 60
`probabilities that each prototype frame or VQ index will be
`observed to correspond to that state index. The table is, in
`effect, a precalculated mapping between all possible frame
`indices and all state indices. Thus, for comparing a single
`frame and single state, a distance measurement or a measure 65
`of match can be obtained by directly indexing into the tables
`using the respective indices and combining the values
`
`4
`obtained with appropriate weighting. It is thus possible to
`build a table or array storing a distance metric representing
`the closeness of match of each standard or prototype input
`frame with each standard or prototype model state.
`As is understood by those skilled in the art, the distance
`or likelihood values which fill the tables can be generated by
`statistical training methods. Various such training methods
`are known in the art and, as they do not form a part of the
`present invention, they are not described in further detail
`herein. Rather, for the pwposes of the present invention, it
`is merely assumed that there is some metric for determining
`degree of match or likelihood of correspondence between
`input frames and the states which are used to represent
`vocabulary models. A preferred system for precalculating
`and storing a table of distance measurements is disclosed in
`co-assigned U.S. Pat. No. 5,546,499 issued Aug. 13, 1996 to
`Thomas Lynch, Vladimir Sejnoha and Thomas Dinger, and
`entitled Speech Recognition System Utilizing Precalculated
`Similarity Measurements. The disclosure of that application
`is incorporated herein by reference.
`As is understood by those skilled in the art, natural
`variations in speaking rate require that some method be
`employed for time aligning a sequence of frames represent(cid:173)
`ing an unknown speech segment with each sequence of
`states representing a vocabulary word. This process is com-
`25 monly referred to as time warping. The sequence of frames
`which constitute the unknown speech segment taken
`together with a sequence of states representing a vocabulary
`model in effect define a matrix and the time warping process
`involves finding a path across the matrix which produces the
`30 best score, e.g., least distance or cost. The distance or cost
`is typically arrived at by accumulating the cost or distance
`values associated with each pairing of frame index with state
`index as described previously with respect to the VQ (vector
`quantization) process.
`As is also understood by those skilled in the art, the final
`comparing of an unknown speech segment with competing
`vocabulary models must be highly accurate if the procedure
`is to discern between similar and competing vocabulary
`models in a system having a large vocabulary. High accu(cid:173)
`racy comparisons are computationally intensive, even using
`vector quantization techniques, since a large number of
`possible paths across the matrix must be explored.
`Accordingly, in order to reduce the number of high accuracy
`comparisons which must be performed, the apparatus of the
`present invention preferably also utilizes a succession of
`screening steps to preselect candidates. Such preselection
`processes are known in the art It may be noted that the time
`warping functions and the pre-screening function are both
`facilitated by the availability of precalculated distance
`measurements, as is the final and most precise comparison
`of an input utterance with candidate word models. A pre-
`ferred system for preselection is described in co-assigned
`U.S. Pat. No. 5,386,492issuedJan. 31, 1995 to Wilson etal.
`Preferably the system can provide a plurality of models
`for each vocabulary word and, preferably also, the system
`incorporates means for improving the training of models
`during use. A preferred form of automatic training is dis(cid:173)
`closed in co-assigned U.S. Pat. No. 5,127,055 issued Jun.
`30, 1992 to Leah S. Larkey. The automatic training is based
`upon how often each model is involved in a correct and
`incorrect recognitions. In the particular version employed in
`its embodiment being described, up to eight models can be
`available for a given word. If improved samples based upon
`actual use by a particular user are available, they are added
`and less well performing models are deleted.
`In accordance with one aspect of the present invention,
`the software portion of the system is organized into various
`
`Page 7
`
`AMAZON 1016
`Amazon v. SpeakWare
`IPR2019-00999
`
`

`

`5,684,924
`
`5
`
`5
`modules, not all of which are operative at any given time.
`Referring now to FIG. 2. the operating system for the
`computer hardware 28 is designated generally by reference
`character 31. The operating system may, for example, be the
`Windows operating system available from Microsoft Cor-
`poration of Bellevue, Washington. A voice responsive user
`interface operating in conjunction with the Windows oper(cid:173)
`ating system is designated by reference character 32. The
`presently preferred form of user interface is that disclosed in
`copending, co-assigned application Ser. No. 08/128,990,
`filed Sep. 29, 1993, by William F. Cote et al., and entitled
`System For Controlling Multiple User Application Programs
`By Spoken Input. The disclosure of that application is
`incoiporated herein by reference.
`The software which performs the basic comparison and
`pattern recognition functions is generally designated by
`reference character 33, while the collection of vocabulary
`models is designated by reference character 34. As is also
`understood by those skilled in the art, speech recognition
`systems are for the most part employed to provide transla(cid:173)
`tions of spoken input to another user application program.
`For example, the user may wish to conduct wordprocessing,
`spreadsheet or report generation functions utilizing speech
`input rather than keyboard input. By way of example, a word
`processing program is shown and designated by reference
`character 50. The system also incoiporates various utility
`program modules which are called into play as needed as
`described hereinafter. Utility program modules for adjusting
`gain and sensitivity are indicated by reference characters 35
`and 36, respectively. The utility for training up vocabulary
`models from user provided spoken samples is indicated at
`reference character 37. As is understood by those skilled in
`the art, it may also be useful to train models of noises which
`commonly occur in the user's environment. When the sys(cid:173)
`tem then recognizes a sound as being an unwanted noise
`rather than a spoken word, the translation can be designated
`as a null or no action code so that no input is provided to the
`application program, e.g. the word processor 50.
`As is described in greater detail hereinafter, it is some(cid:173)
`times useful to improve the discrimination between an
`identified pair of words and a utility program for performing
`that function is indicated at reference character 38. Utility
`program modules for setting acceptance threshold level and
`for designating gender are indicated by reference characters
`39 and 40, respectively. As is understood by those skilled in
`the art, the utility program modules operate to modify or
`adjust parameters and/or model data which is essentially a
`part of the speech recognition system itself and which affects
`or determines ongoing operational characteristics.
`As indicated earlier, a technically sophisticated user may
`be able to reliably identify a problem in the ongoing opera(cid:173)
`tion of the speech recognizer and to invoke, typically
`through keyboard commands, an appropriate utility program
`for making the necessary adjustments in the ongoing oper(cid:173)
`ating parameters of the recognizer. However, for most users 55
`who are merely interested in performing the intended work,
`e.g., word processing, the choice of utility and action to take
`is often confusing and incomprehensible. In accordance with
`the present invention. there is provided a decision tree and
`utility control module as indicated at reference character 52. 60
`The functioning of this program module is illustrated in the
`flowchart of FIG. 3.
`Referring now to the decision tree flowchart illustrated in
`FIG. 3, the blocks along the left hand side represent different
`problem descriptions which a user can choose and thereby
`invoke an adaptation or tuning procedure in accordance with
`the present invention. As will be seen, some of the problem
`
`6
`definitions are merely different approaches or wordings for
`essentially the same problem and solution, while other
`problem statements, though appearing similar, actually
`invoke different procedures.
`If the user chooses problem description 41. i.e., that the
`system does NOT RECOGNIZE A WORD, the user is
`prompted to identify that word from the vocabulary as
`indicated at block 61. He will also be guided, at block 62,
`into speaking samples of the identified word. The samples
`10 are then utilized by the training module or utility program 63
`to generate new models of the vocabulary word which can
`be utilized in subsequent recognition operations. In the
`preferred embodiment of the present invention, up to eight
`models can be provided for each vocabulary word. When
`new models beyond the eight are generated, the pre-existing
`15 model which has been used least often in providing correct
`results is deleted as described in greater detail in the previ(cid:173)
`ously identified U.S. Pat. No. 5,127,055.
`If the user chooses problem description 44, i.e., that the
`20 system consistently CONFUSES TWO WORDS, the user is
`prompted to identify the words, as indicated by reference
`character 48, but then a somewhat different procedure is
`initiated, as indicated by reference characters 64, 65 and 66,
`to improve the distinguish ability of the words. This proce-
`25 dure is illustrated in greater detail in FIG. 4.
`As illustrated in FIG. 4, the user is first prompted to
`identify the words which are being confused, as represented
`at block 71, and is then prompted (block 73) to input a
`preselected number of samples, e.g., three, of each of the
`30 words. As indicated at block 75, the system then compares
`the samples of the desired word with corresponding existing
`models for that word and selects, for addition to the collec(cid:173)
`tion of models, those sample-derived new models which are
`closest to the existing models. Again, if the total number of
`35 models exceeds the number provided for in the system, e.g.,
`eight, the least used pre-existing model is deleted as
`described previously.
`In accordance with one aspect of the present invention, it
`has been found that merely adding newly trained models for
`40 the intruder is not typically effective in curing the problems
`encountered by a user who perceives two words as being
`confused. Rather, the problem originates with the existence
`and inclusion of models which do not well represent the
`intruder word, particularly as compared with the desired
`45 word. Accordingly, all existing models of the intruder word
`are compared with the samples of the desired word at block
`77, and, if any existing model of the intruder is within a
`preselected threshold distance or metric of the desired word
`as indicated by the "YES" output of decision diamond 79,
`so the closest one is deleted at block 83.
`If, after the deletion, there are no remaining intruder word
`models, as indicated by the "NO" output of decision dia(cid:173)
`mond 81, the newly obtained samples of the intruder word
`are used to generate models as indicated at block 85. If
`multiple models for the intruder word exist at the start of the
`procedure, the newly obtained samples are not used.
`Referring back to FIG. 3, if the user selects the problem
`identified as one that where in ONE WORD INTRUDES
`(block 45), essentially similar procedures are invoked.
`Likewise, if the user selects the problem ( 43) that the system
`DOES NOT HEAR ONE word, essentially similar proce-
`dures are invoked but the noise models are considered as
`possible intruders since, as described previously, the trans(cid:173)
`lation of a noise model is a null or "no action". Thus, to
`65 allow the desired word to be recognized, it may be necessary
`to delete an intruding noise model in accordance with the
`procedures descnbed with reference to FIG. 4.
`
`Page 8
`
`AMAZON 1016
`Amazon v. SpeakWare
`IPR2019-00999
`
`

`

`5,684,924
`
`10
`
`7
`If the system is hearing a common noise as a word, as
`indicated by reference character 47, the user is prompted to
`merely identify that it is noise that he wishes to characterize
`and then he is then prompted to cause the noise to occur. The
`model created from the sample then created is tagged as a 5
`noise model which, as indicated previously, is ''recognized"
`in the same sense as the word, but is ignored in the sense that
`no translation is provided to the application program, e.g.,
`wordprocessing, for which the user is employing the speech
`recognition system as an input mechanism.
`If the user indicates that the system HEARS when he is
`NOf SPEAKING (block 49), the user is prompted to
`perform a sound level check, as indicated at block 67. The
`acceptance level is then adjusted as indicated at block 69,
`and likewise sensitivity level is adjusted as indicated at 15
`block 70. Essentially similar procedures are followed if the
`user indicates that the system occasionally does not hear
`him. i.e., as indicated at block 51.
`If the system repeatedly reports to the user that the
`discrete word input is TOO LONG, the user, by identifying 20
`this problem at block 53, can be prompted to invoke the
`sound level check, as indicated at block 91, which is then
`utilized to adjust sensitivity (block 70).
`If the user indicates that the system is DEAF, as indicated
`at block 55, he is prompted to initiate a sound level check as 25
`indicated at block 93. The gain is then adjusted as indicated
`at block '4. Similarly, if the user input is repeatedly iden(cid:173)
`tified as being TOO LOUD and the user selects that problem
`(block 56), he is then prompted to do a sound level check
`(block 93) and to adjust gain (block '4).
`If the user indicates that he is obtaining generally poor
`recognition results, as indicated at block 58, he is prompted
`to first perform a sound level check as indicated at block 95
`and to then designate gender as indicated at block 96. As
`indicated previously, the preferred embodiment of the 35
`present invention preferably provides different data trans(cid:173)
`fonnations for model data based upon gender to obtain
`improved recognition results. As a final choice, the user can
`elect to reset parameters as indicated at block 59 in which
`case he is prompted to adjust the acceptance threshold, as 40
`indicated at block 97, and sensitivity is also adjusted, as
`indicated at block 98.
`As indicated, the new setting levels are stored with the
`recognition program itself and employed in all future 45
`recognition, i.e., until further adjustments are made so that
`the ongoing perf onnance of the recognition system is
`altered. Thus, by periodically invoking the decision tree
`mechanism, the operation of the overall system will increas(cid:173)
`ingly adapt to the characteristics of a particular user and will 50
`provide better performance and higher accuracy of recog(cid:173)
`nition on an ongoing basis.
`In view of the foregoing it may be seen that several
`objects of the present invention are achieved and other
`advantageous results have been attained.
`As various changes could be made in the above construc(cid:173)
`tions without departing from the scope of the invention, it
`should be understood that all matter contained in the above
`description or shown in the accompanying drawings shall be
`interpreted as illustrative and not in a limiting sense.
`What is claimed is:
`1. A speech recognition system comprising:
`input means for receiving an input utterance from a user
`and receiving also background noise;
`a core speech recognizer which compares an input utter- 65
`ance with a multiplicity of trained models representing
`words and generates scores for each of at least some of
`
`30
`
`55
`
`60
`
`8
`said models, which scores represent the degree of
`match of the respective model with the utterance,
`said core speech recognizer having a plurality of adjust(cid:173)
`able parameters including a sensitivity parameter
`which controls a discrimination level defining sound
`levels accepted as an utterance rather than background
`noise;
`a first utility program module which responds to user
`input to change said sensitivity parameter;
`a training utility program module which responds to user
`input to further train a selected word model on the basis
`of a particular user's voice characteristics;
`a discrimination improving utility program module which
`responds to user input identifying a desired word and
`an intruder word to improve the differentiation in match
`between the respective models representing the desired
`and intruder words; and
`a decision tree and utility controlling program module
`which can be invoked by a user confronting problems
`during running of said core speech recognizer and
`which utilizes input provided by the user to traverse
`said decision tree and to initiate corresponding ones of
`said utility program modules and thereby al

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket