`McAulife et al.
`
`USOO5870705A
`Patent Number:
`11
`(45) Date of Patent:
`
`5,870,705
`Feb. 9, 1999
`
`54 METHOD OF SETTING INPUT LEVELS IN A
`VOICE RECOGNITION SYSTEM
`
`4,829,578 5/1989 Roberts.
`4,837,831
`6/1989 Gillick et al..
`4.866,778 9/1989 Baker.
`
`75 Inventors: Garrett McAuliffe, Kirkland; Leonard
`-
`0
`
`Zuvela, Mikilteo, both of Wash.
`73 Assignee: Microsoft Corporation, Redmond,
`Wash.
`
`21 Appl. No.: 327,543
`22 Filed:
`Oct. 21, 1994
`(51) Int. Cl. .................................................. G10L 9/00
`52 U.S. Cl. .......................... 704/225; 704/275, 704/200;
`381/106; 381/107; 381/108
`58 Field of Search .................................. 395/2.34, 2.84,
`395/2; 381/68.4, 28, 107, 106, 108
`References Cited
`
`56)
`
`U.S. PATENT DOCUMENTS
`2/1981 Scott.
`4,250,637
`4,292.469 9/1981 Scott et al..
`4,297,527 10/1981 Pate ......................................... 381/107
`4,354,064 10/1982 Scott.
`4,383,135 5/1983 Scott et al..
`
`4,455,676 6/1984 Kaneda - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 381/106
`
`4,468,204 8/1984 Scott et al..
`4,495.384
`1/1985 Scott et al. .
`4,610,023 9/1986 Noso et al... 30s,284
`4.672,667 6/1987 Scott et al.,
`4,776,016 10/1988 Hansen ................................... 395/2.84
`4,777,649 10/1988 Carlson et al. ........................... 381/26
`4,783,803 11/1988 Baker et al..
`4,829,576 5/1989 Porter.
`
`: 3. Stil - - - - - - - - - - - - - - - - - - - - - - - - - - 381/31
`2- Y -
`CKC a
`
`4.914,703 4/1990 Gillick.
`4.969,193 11/1990 Scott et al..
`5,025,471
`6/1991 S. N.
`5,027,406 6/1991 Roberts et al..
`5,208,866 5/1993 Kato et al. .............................. 381/107
`5,267,322 11/1993 Smith et al. ............................ 381/107
`5,345,538 9/1994 Narayannan et al.
`... 395/2.84
`5,363,147 11/1994 Joseph et al. ........................... 381/108
`Primary Examiner David R. Hudspeth
`Assistant Examiner Vijay B. Chawan
`Attorney, Agent, or Firm-Ratner & Prestia
`57
`ABSTRACT
`A computer implemented Voice recognition method and
`System for adjusting an input level to adjust the input Signal
`amplitude level of spoken words to enhance voice recogni
`tion. A user is prompted with a word to Speak into a
`microphone. The spoken word is converted into an analog
`electrical signal having an input Signal amplitude level. A
`Sound card then converts this analog signal to a digital
`Stream of data. This input Signal amplitude level is compared
`to a reference amplitude level. An adjustment to an input
`
`volume control is made with respect tO the comparison tO
`
`h th
`litude level
`1
`he i
`di
`adjust the input Signal amp itu e leve tO approach the
`reference amplitude level. The invention also uses an itera
`tive process for a Set number of iterations to make the
`adjustment for the input signal amplitude level to approach
`the reference amplitude level.
`28 Claims, 4 Drawing Sheets
`
`Prompt User for Word
`
`Generate Input Signal
`Amplitude Level
`
`Y.
`
`
`
`
`
`Word
`Detected
`
`
`
`
`
`
`
`Compare Input Signal
`Amplitude Level to
`Preselected Signal
`Amplitude Level
`
`
`
`Stort Over
`
`30
`
`Acceptoble
`28
`
`Page 1
`
`AMAZON 1017
`Amazon v. SpeakWare
`IPR2019-00999
`
`
`
`U.S. Patent
`
`Feb. 9, 1999
`
`Sheet 1 of 4
`
`5,870,705
`
`
`
`
`
`
`
`
`
`
`
`Prompt User for Word
`
`Generate Input Signal
`Amplitude Level
`
`12
`
`14
`
`Word
`Detected
`
`Compare Input Signal
`Amplitude Level to
`Preselected Signal
`Amplitude Level
`
`
`
`
`
`No
`
`Stort Over
`
`Acceptable
`
`32
`
`3O
`
`28
`
`Page 2
`
`AMAZON 1017
`Amazon v. SpeakWare
`IPR2019-00999
`
`
`
`U.S. Patent
`
`Feb. 9, 1999
`
`Sheet 2 of 4
`
`5,870,705
`
`eIqoqdeooo
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Page 3
`
`AMAZON 1017
`Amazon v. SpeakWare
`IPR2019-00999
`
`
`
`U.S. Patent
`
`Feb. 9, 1999
`
`Sheet 3 of 4
`
`5,870,705
`
`
`
`FIG, 3
`Prior Art
`
`Page 4
`
`AMAZON 1017
`Amazon v. SpeakWare
`IPR2019-00999
`
`
`
`U.S. Patent
`
`Feb. 9, 1999
`
`Sheet 4 of 4
`
`5,870,705
`
`
`
`Page 5
`
`AMAZON 1017
`Amazon v. SpeakWare
`IPR2019-00999
`
`
`
`1
`METHOD OF SETTING INPUT LEVELS IN A
`VOICE RECOGNITION SYSTEM
`
`5,870,705
`
`15
`
`2
`Newton, Mass. is an example of a voice recognition engine,
`which can be run on a personal computer.
`Because different userS Speak at different Sound levels, as
`well as the difference in background sound levels, both of
`which can effect the reception of a userS Speech by a voice
`recognition System, it is likely that in many situations, a
`Voice recognition engine or System may not function at an
`optimal level. The audio input may not be within the
`acceptable range for the Voice recognition engine being
`used.
`Computers equipped for voice recognition may typically
`have a Sound card in addition to an input device Such as a
`microphone. A Sound card typically includes a coder/
`decoder or CODEC. The Microsoft Sound System Sound
`Card uses the Analog Devices AD 1848 Parallel-Port Sound
`Port Stereo CODEC. Among other functions, the CODEC
`contains an input volume control which can be used to adjust
`the amplitude level of an analog input signal from the
`microphone. The CODEC also converts an analog signal
`(representative of a voice input) into a digital signal. The
`digital signal can then be transmitted from the Sound card
`through the computer bus for processing (Such as pattern
`matching) by the computer.
`One widely Sold operating System program which helps
`control a computer is WINDOWSTM version 3.1
`(“WINDOWS") of Microsoft Corporation. Among other
`features, WINDOWS provides a graphical user interface
`allowing the user the option of using a pointing device Such
`as a mouse, to control the operation of the computer without
`the need to memorize text commands usually required in
`DOS based applications. WINDOWS also provides appli
`cation programmerS with tools So that applications have a
`common look in Structure as well as execution of common
`operations. A WINDOWS application programmer is thus
`provided with a variety of tools to assist in controlling
`various computer functions as well as designing “user
`friendly' applications.
`A Software program written for WINDOWS operation
`uses dynamic link libraries (DLLS) which contain a plurality
`of application programming interfaces (APIs). Examples of
`Such DLLs are USER.EXE, KRNL386.EXE, and GDI.EXE
`which contain the core functionality APIs that make up
`Microsoft Windows 3.1. Although each of these three DLLs
`has the .EXE extension (usually representing an executable
`application), each is a DLL. The APIs are used to carry out
`various WINDOWS functions. For example, if a software
`program requires a dialog box displayed on a computer
`monitor to prompt a user for a command or data entry, the
`Software program would make a call to the DialogBox API
`which brings up a dialog box on the computer monitor. The
`contents of the dialog box are local to or associated with the
`particular application which made the call. Another example
`of a WINDOWS API is the SetWindowLong API. This API
`asSociates data with a particular window, allowing a user
`who has Switched applications to return to the point in the
`original application where processing had been taking place
`prior to the switch to the other application. WINDOWS
`operation and WINDOWS programming, including the use
`of DLLs and APIs are well known by those skilled in the art.
`The Microsoft WINDOWS Software Development Kit,
`Guide to Programming, Volumes 1-3, 1992, is incorporated
`by reference herein. It is available and used by WINDOWS
`programmerS and provides reference information for many
`of the DLLs and APIs which are available to WINDOWS
`programmerS.
`WINDOWS, while providing ease of use for running
`applications, may serve as a platform for a voice recognition
`
`FIELD OF THE INVENTION
`This invention relates to a method and System for adjust
`ing audio input volume for a System which uses voice
`recognition.
`BACKGROUND OF THE INVENTION
`Voice recognition is the process by which spoken words
`are interpreted and “understood” by a computer. Voice
`recognition Systems thus become another means for entering
`data and controlling a computer, to the function of a key
`board or a pointing device (e.g., mouse).
`In a typical Voice recognition System, a user Speaks into
`an input device Such as a microphone, which converts the
`audible Sound waves of Voice into an analog electrical
`Signal. This analog electrical Signal has a characteristic
`waveform defined by several factors including the volume at
`which the words are spoken. The volume component of the
`spoken word translates into the amplitude of the waveform.
`Voice recognition involves pattern matching to compare
`the electrical Signal associated with a spoken word against a
`reference Signal associated with a "known word. A
`25
`“known word is Stored in a computer by a user. In a typical
`System, the user Speaks a word into a microphone and the
`electrical Signal of this spoken word is associated with a
`typed word. Instead of a typed word, the word can also be
`called up from a database, for example. After a word is
`“known,” Voice recognition can take place.
`Thus, if the electrical signal of a spoken word matches the
`waveform of the reference signal of the “known” word,
`within an acceptable range of error, the System “recognizes'
`the spoken word as the “known” word (which has previously
`been associated with the reference signal). A Software appli
`cation which uses voice recognition could then use the Voice
`input for entering data or controlling a Software application
`(similar to the way a keyboard would be used). For example,
`in a word processor or dictation System using Voice recog
`nition text could be audibly entered into the body of a
`document via a microphone instead of typing the words into
`the text on a keyboard.
`Digital Signal processing can be used to provide an
`accurate comparison between the waveform of the Voice
`audio input and that of the reference Signal. Digital signal
`processing requires that the waveform of the Voice audio
`input, as well as the waveform of the reference Signal are
`represented as digital Signals. Having a Sufficient amplitude
`level for the Voice audio input provides a better Signal for
`conversion to a digital Signal and thus a better reference
`Signal for voice recognition. If the amplitude level is too low,
`there may not be enough range in the electrical Signal of
`either the reference Signal or the Spoken word to provide a
`high enough level of confidence that the electrical signal of
`a spoken word matches that of the “known” word. If the
`amplitude level is too high, certain attributes of the electrical
`Signals may be "clipped.” This, too, may lower the confi
`dence level of the pattern matching. In more extreme cases,
`the electrical Signals may be too low or too high, resulting
`in no match. The sufficiency of the amplitude level is
`determined for a particular voice recognition “engine'. The
`Voice recognition engine is Software or hardware which
`carries out the interpretation and analysis of the Voice audio
`input (or its digital representative) to determine whether a
`match has occurred and the confidence level of the match.
`The Dragon Recognizer by Dragon Systems, Inc. of
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`Page 6
`
`AMAZON 1017
`Amazon v. SpeakWare
`IPR2019-00999
`
`
`
`3
`system. WINDOWS lacks, however, a system for adjusting
`input levels to optimize Voice recognition for a particular
`user at given location.
`A speech detection recognition apparatus for use with
`background noise of varying levels is described in U.S. Pat.
`No. 4,829,578, to Roberts. The apparatus compares the
`amplitude of an audio signal during Successive time periods
`with certain speech detection thresholds and generates an
`indication of whether the Signal contains Speech. The ampli
`tude of the audio signal is altered relative to speech detection
`thresholds as a function of background noise Signals which
`are detected to improve Speech detection.
`Roberts and other systems which relate to speech
`detection, do not address adjusting the input amplitude level
`to assist in and improve voice recognition. Still further, there
`is a lack of a System to make Systematic adjustments to input
`amplitude levels by Sampling a userS Speech and analyzing
`it in a controlled fashion and then adjusting an input device
`based on that Sampling and analysis.
`SUMMARY OF THE INVENTION
`There is provided, in accordance with the present inven
`tion a voice recognition method and System for adjusting an
`input Volume control of an input device. This in turn, adjusts
`an input signal amplitude level of a word spoken into the
`input device. The user is prompted with a word to Speak into
`an input device, Such as a microphone connected to a Sound
`card. The input device converts the Spoken word into an
`electrical signal with an amplitude level ("input signal
`amplitude level”) relative to the volume at which the words
`were spoken. The input signal amplitude level is compared
`against a preselected reference Signal amplitude level. The
`preSelected reference Signal amplitude level is Set to a level
`to enhance voice recognition. An input volume control of the
`input device is then adjusted to cause the input Signal
`amplitude level to approach the preselected reference signal
`amplitude level in a predetermined manner.
`In a preferred embodiment of the present invention, a Step
`of determining if a word was spoken is performed prior to
`comparing the Spoken word to a reference. The Steps of
`prompting the user for a word, generating an input signal
`amplitude level for the spoken word, determining if a word
`was spoken, comparing the input Signal amplitude level of
`the Spoken word with respect to the preselected reference
`Signal amplitude level and adjusting an input volume control
`with respect to the comparison are repeated nine times.
`During each of the nine iterations, the user is prompted to
`Speak a different word than had been previously prompted.
`BRIEF DESCRIPTION OF THE FIGURES
`The invention will now be described by way of non
`limiting example, with reference to the attached drawings in
`which:
`FIG. 1 is a flow diagram showing the method which
`operates in accordance with the present invention;
`FIG. 2 is a flow chart showing the operation of proceSS
`block 24 shown in FIG. 1;
`FIG. 3 shows a personal computer and asSociated periph
`eral devices used in operating the System and performing the
`method in accordance with the present invention; and
`FIG. 4 shows a Screen display in accordance with the
`present invention.
`DETAILED DESCRIPTION OF THE
`INVENTION
`There is shown in FIG.3 an example computer system 50
`for carrying out process 10 shown in the flow charts of FIGS.
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`5,870,705
`
`4
`1 and 2. Computer system 50 is comprised of a personal
`computer 42 having Several peripheral devices including
`monitor 44, keyboard 46, mouse 48 (resting on mouse pad
`60), microphone 52, Sound card 54 (including a CODEC56)
`plugged inside of computer 42 and a Speaker 58. The present
`invention is not limited to the configuration for computer
`system 50 shown in FIG. 3. Other configurations which can
`operate the present method and System will be understood
`by those skilled in the art.
`Microphone 52 is an input device for generating a repre
`Sentative input Signal amplitude level as microphone 52
`converts acoustic energy into an analog electrical Signal
`including audio signal information. Microphone 52 is shown
`connected to Sound card 54 in FIG. 3. The combination of
`microphone 52 and sound card 54 can also be viewed as an
`input device for generating an input Signal amplitude level.
`A device having a similar function, Such as a microphone
`containing circuitry to carry out the functions of a Sound
`card, including an input volume control, could also serve as
`an input device.
`Other input devices, Such as an optical Storage device or
`magnetic Storage device could also serve as an input device
`containing prerecorded “audio' information. In Such a
`System, a digital representation of the audio information is
`Stored on the respective Storage device. In Such a System, the
`Stored digital audio information could also be used directly
`in determining an input Signal amplitude level.
`In a preferred embodiment of the present invention, a
`microphone is used and thus operates on a real time signal,
`not a prerecorded signal.
`AS previously mentioned, microphone 52 is connected to
`Sound card 54. Sound card 54 handles the interface between
`audio input and output I/O and the computer. It also converts
`analog signals (i.e., audio waveforms) into a stream of
`digital data. An example of Sound card 54 is the Microsoft
`WINDOWS Sound System model #206-151v200, which
`contains a CODEC 56. An example for CODEC 56 is the
`Analog Devices AD 1848 Parallel-Port SoundPort Stereo
`CODEC (“AD1848 CODEC"). The operation of the
`AD 1848 CODEC is described in the Analog Devices Speci
`fication REVO for the AD 1848 CODEC. Sound Card 54 and
`CODEC 56 and the Windows Sound System software allow
`adjustments to the Volume or amplitude level of an input
`Signal amplitude level to enhance Voice recognition. Other
`Sound cards or devices which handle the interface between
`audio input/output (I/O) and computer 42 can be used in the
`System of the present invention.
`A monitor 44 is used in the present invention to display a
`Visual prompt to a user with words and messages. Although
`in a preferred embodiment the user is provided with visual
`prompts on monitor 44, a user could also be prompted
`audibly through a speaker 58 or through another output
`device Such as a serial or parallel connected printer (not
`shown).
`In a preferred embodiment, the input signal amplitude
`level adjustment of the present invention is used in a
`WINDOWS voice recognition application, such as Voice
`Pilot version 2.0 which is included with the Microsoft
`WINDOWS Sound System software (version 2.0) (the ref
`erence manual for which is incorporated herein by
`reference). With this software, spoken words can be used to
`execute commands, Such as resizing a window (i.e., using
`the spoken words “minimize” or “maximize,” respectively).
`As a WINDOWS application, certain application program
`ming interfaces (APIs) are used to access functions Sup
`ported by the WINDOWS operating system.
`
`Page 7
`
`AMAZON 1017
`Amazon v. SpeakWare
`IPR2019-00999
`
`
`
`5,870,705
`
`1O
`
`15
`
`25
`
`35
`
`S
`For example, the mixer API included with the
`MSMIXMGR.DLL and which is a part of the WINDOWS
`Sound System device driver kit, controls sound card 54
`which is plugged into personal computer 42. The mixer API
`controls a particular Sound card 54 plugged into personal
`computer 42 through a mixer driver written for the particular
`sound card being used. The WINDOWS Sound System
`mixer driver is one example of Such a driver. The mixer
`driver allows communication through, and control of, the
`CODEC 56 located on Sound card 54. The mixer API can
`control Several functions, including the input and output
`Volume of Sound card 54. Accordingly, it is through the
`mixer API that input volume adjustments to sound card 54
`are made based on the comparison of an input signal
`amplitude level and a preselected reference Signal amplitude
`level, as discussed below.
`Another API used by WINDOWS is the Wavelnopen
`API, which is a part of the MMSystem.DLL. It is used to
`access sound card 54 to input sound via microphone 52. The
`WavenCopen API calls the Wave driver which in turn is used
`to “push’ data into and out of sound card 54. Sound files
`used by WINDOWS are formatted in the WAV file format.
`Voice Pilot also contains a voice recognition engine
`known as the Dragon Recognizer by Dragon Systems, Inc.
`of Newton, Mass. This voice recognition engine is called/
`operated through a DLL of Microsoft Sound System called
`VLAYER.DLL. This DLL contains Several APIs which
`allow function calls to the Dragon Recognizer and is used
`for polling the Dragon Recognizer to make the comparison
`between the input Signal amplitude level and the preselected
`reference Signal amplitude level. Other voice recognition
`engines could also be used in Voice recognition Systems in
`accordance with the present invention. If a Voice recognition
`engine other than the Dragon Recognizer is used, the APIs
`in VLAYER.DLL would be changed accordingly to accom
`modate a different Voice recognition engine. AS previously
`noted, the preselected reference Signal amplitude level is
`already entered into the Dragon Recognizer. Some of the
`APIs and their associated function used in VLAYER.DLL
`C.
`
`40
`
`45
`
`1) SetVoiceWindow-associates a window with the
`Dragon Recognizer;
`2) InitvoiceRecognizer-initializes the Dragon Recog
`nizer to the input hardware (Sound card);
`3) Recognize-determines whether a word was spoken;
`and
`4) GetUttMeasure-determines whether the input signal
`amplitude too high, too low, or within an acceptable
`range as determined by the Dragon Recognizer (by
`comparison to the reference Signal amplitude level).
`Other DLLs and APIs for WINDOWS, the WINDOWS
`50
`Sound System, and the Dragon Recognizer will be under
`stood by those skilled in the art.
`There is shown in FIG. 1, a flow chart of a process 10 of
`the present invention which is carried out on computer
`system 50. In block 12, a user is prompted for a word to
`Speak into an input means Such as microphone 52 or the
`combination of microphone 52 and sound card 54 shown in
`FIG. 3. The prompt to the user occurs by way of a dialog box
`(Smaller window) shown in FIG. 4, with text identifying the
`word to be spoken by the user. In block 14, an input signal
`amplitude level is determined. The input signal amplitude
`level is one component of the waveform which results from
`the conversion of the spoken word (audio signal) into an
`analog electrical Signal by microphone 52. The present
`invention adjusts the audio input Volume, thereby adjusting
`the input signal amplitude level, to enhance Voice recogni
`tion.
`
`55
`
`60
`
`65
`
`6
`Although the input waveform has been previously
`described as including an input signal amplitude level, it is
`also comprised of a plurality of amplitude levels which
`result from normal speech. This is a result of both how the
`human Voice operates and how language is communicated
`with the human Voice. AS there is no one amplitude level
`“value, an algorithm is necessary to either generate a single
`value (Such as an average of all amplitude levels Sampled if
`digital signal processing is being used) or analyze the Series
`of amplitude values which comprise the waveform of the
`spoken word. In the preferred embodiment of the present
`invention, it is not critical how the comparison of the input
`Signal amplitude level is made, as the preferred embodiment
`of the present invention is used to enhance the input Signal
`amplitude level for the Voice recognition engine being used,
`in this case, the Dragon Recognizer. The generation of an
`algorithm to accomplish this comparison would be under
`stood by those skilled in the art such as described in
`Principles of Digital Audio, Ken Pohlmann, Sams, 1989
`(2nd edition).
`In block 16 it is determined whether or not a user has
`spoken a word as prompted in block 12. This determination
`is made on the basis of a preselected threshold input Signal
`amplitude level being detected, not whether a word match
`has occurred. The prompt to the user in block 12 is to have
`the user Speak and then generate a waveform which has an
`amplitude level which can be detected and analyzed by the
`present invention. Certain input devices 52, Such as a
`microphone, can be adjusted or built with varying Sensitivity
`to help isolate or pickup a user's voice.
`The preselected threshold input signal amplitude level can
`be set to a value which accounts for any normal background
`noise in a typical home or office Setting, taking into account
`the Sensitivity and directional characteristics of the micro
`phone or other input device being used. Whether block 16
`determines whether a word has been spoken or a preselected
`threshold input Signal amplitude level has been detected, the
`description which follows refers to block 16 as detecting
`whether or not a word has been detected or Spoken.
`If no word is detected in block 16, the user is prompted
`in block 18 to acknowledge whether or not a word has
`actually been spoken. In a preferred embodiment, the user is
`prompted via a WINDOWS dialog box displayed on a
`computer monitor. The Windows dialog box asks the user to
`acknowledge whether a word has been spoken by Selecting
`the appropriate button (“yes” or “no” or “help”) in the dialog
`box. As in many WINDOWS applications, the selection of
`the appropriate button can be made using the keyboard or a
`pointing device Such as a mouse. Also in a preferred
`embodiment, a timer counts approximately five Seconds
`from when the user is first prompted in block 12 before the
`user is prompted in block 18 to acknowledge whether a word
`has been spoken. If a user acknowledges that a word has not
`been Spoken, control returns to block 12 and the user is
`prompted to Speak the particular word into the microphone.
`If the user acknowledges that a word has been spoken,
`control drops down to block 22 and the volume level of the
`input means is adjusted upwardly. The adjustment will
`always be upward in Such a situation as the input volume of
`input device 52 (such as microphone 52 in FIG. 3) was so
`low that the System could not even detect that the prompted
`word had been spoken.
`If a word is detected in block 16, processing continues
`into block 24. Block 24 is comprised of two steps shown in
`blocks 20 and 22, respectively.
`In block 20, the input Signal amplitude level is compared
`to a preselected reference signal amplitude level. The pre
`
`Page 8
`
`AMAZON 1017
`Amazon v. SpeakWare
`IPR2019-00999
`
`
`
`5,870,705
`
`8
`varies for each of the nine words as shown in Table 1:
`
`7
`Selected reference Signal amplitude level is a desired ampli
`tude level for input signals which is programmed into the
`System. In one embodiment, the Voice recognition engine is
`preprogrammed with the desired input level, in order to
`enhance voice recognition. The Voice recognition engine
`also includes a tool which can be queried or polled with an
`input Signal amplitude level to determine whether or not the
`input signal amplitude level is acceptable (i.e. is within an
`acceptable range of the preselected reference Signal ampli
`tude level). In an additional embodiment, a voice recogni
`tion engine may not be polled to make the comparison
`between the input Signal amplitude level and the preselected
`Signal amplitude level. In Such a case, an algorithm would
`have to be generated to determine whether the input Signal
`amplitude level is within an acceptable range of the prese
`lected Signal amplitude level.
`Once the comparison is made in block 20, processing
`continues in block 22 where the input volume control, or
`input level, of the input means is adjusted relative to the
`comparison made in block 20. If the comparison of the input
`Signal amplitude level to the preselected reference Signal
`amplitude level made in block 20 determines that the input
`Signal amplitude level is below the preselected Signal ampli
`tude level, the input volume control is adjusted to amplify or
`increase the amplitude of the electrical Signal generated
`from the audio input signal. If the comparison in block 20
`determines that the input signal amplitude level is higher
`than the preselected reference Signal amplitude level, the
`input volume control is adjusted to reduce or lower the
`amplitude of the electrical Signal generated from the audio
`input signal. If the comparison determines that the input
`Signal amplitude level is within an accepted range of the
`preSelected reference Signal amplitude level, then no adjust
`ment to the input Volume control is made. In a preferred
`embodiment, an acceptable range for the comparison is
`within two percent of the preselected reference Signal ampli
`tude level. The acceptable range may be adjusted, depending
`upon the requirements of a particular voice recognition
`System and/or voice recognition application. Once the input
`means of Volume has been adjusted in block 22, processing
`continues to block 26. In block 26, it is determined whether
`or not the user is required to be prompted for additional
`words. This takes place if multiple passes or iterations are
`used to initialize the System.
`In a preferred embodiment, an iterative proceSS is used
`whereby the user is prompted to Speak nine different words,
`one following the prompt associated with each iteration,
`with a comparison to the preselected reference signal ampli
`tude level (the same reference level for each iteration/word)
`and corresponding adjustments to the input means Volume
`made after each word is spoken. Nine iterations is an
`arbitrary value which has provided Satisfactory results.
`Fewer or greater number of iterations can be used, depend
`ing upon the Voice recognition System and or voice recog
`nition application for which the present invention is being
`used. Although the user is prompted for nine different words,
`the present invention is not “recognizing” (matching) the
`words, instead it is detecting and analyzing the input signal
`amplitude levels.
`The iterative process of the present invention begins by
`prompting the user with the first word in block 12, as
`previously described. Processing continues as previously
`described until block 22 is reached. In this iterative method,
`the input volume control of the input means is initially Set to
`a 50 percent level, half way between the maximum volume
`Setting and the minimum Volume Setting of the input means.
`The Subsequent adjustments to the input volume control
`
`TABLE 1.
`
`Word No.
`
`Percent Volume
`is Adjusted
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`Table 1 shows the percent upward or downward that the
`Volume control of the input means can be adjusted for each
`word of the nine for which the user is prompted to Speak.
`The iterations shown in Table 1 are chosen to achieve a
`result within an acceptable range of two percent between the
`input signal amplitude level and the preselected reference
`Signal amplitude level. This assumes that the user is speak
`ing at a fairly consistent level for each of the nine words for
`which the user is prompted. Each iteration may result in a
`corresponding adjustment made in CODEC 56 which con
`tains the input volume control, in a preferred embodiment.
`The operation of the iterative process of the present
`invention can be understood by the following example.
`Because the Volume control of the input means is initially Set
`to 50 percent for the first word Spoken, correspondingly, the
`input Signal amplitude level of the first word Spoken by the
`user is given a value of 50 percent. When compared to the
`preSelected reference Signal amplitude level, the preselected
`reference Signal amplitude level is thus given a value
`relative to the 50 percent level assigned to the input Signal
`amplitude level of the first word. For this example, assume
`that the preselected reference Signal amplitude level is at 38
`percent, relative to the 50 percent value assigned to the input
`Signal amplitude level.
`On pass one through proceSS 10, the comparison carried
`out in block 20 would indicate that the input signal ampli
`tude level of the first word is higher than the preselected
`reference signal amplitude level (50>38). Looking at Table
`1, the input volume control would be adjusted downward 10
`percent, adjusting the input Signal amplitude level for the
`first spoken word to a value of 40. Since this is the first of
`nine iterations, block 26 would determine that there are more
`words to prompt the user and return proceSS control to block
`12. This determination can be made by initializing an
`iteration or word counter with the first word prompted and
`incrementing the counter with each additional prompt.
`When the counter equals the total of the words to be
`prompted, it is complete.
`The user would be prompted with a second word which
`would pass through system 10 for comparison in block 20.
`Block 20 would again provide a signal that the input Signal
`amplitude level (now for the Second word) is greater than the
`preselected signal amplitude level (40>38). Looking at Table
`1, for the second word, the input volume control would be
`adjusted down another 10 percent adjusting the input signal
`amplitude l