`Tran
`
`54). SPEECH RECOGNIZER
`
`76 Inventor: Bao Q. Tran, 10206 Grove Glen,
`Houston, Tex. 77099
`s
`21 Appl. No.: 09/190,691
`22 Filed:
`Nov. 12, 1998
`
`Related U.S. Application Data
`
`62 Division of application No. 08/461,646, Jun. 5, 1995, aban-
`doned.
`(51) Int. Cl. ............................................... G10L 21/06
`52 U.S. Cl. ..................
`704/275; 704/232
`58 Field of Search ..................................... 704/200, 270,
`704/231, 232, 233,275,251
`
`56)
`
`References Cited
`U.S. PATENT DOCUMENTS
`4,771,390 9/1988 Dolph et al. ........................... 395/2.83
`
`
`
`USOO607O140A
`Patent Number:
`11
`(45) Date of Patent:
`
`6,070,140
`May 30, 2000
`
`5,319,736 6/1994 Hunt ....................................... 395/2.36
`5,487,129
`1/1996 Paiss et al.......
`... 395/2.42
`5,548,681
`8/1996 Gleaves et al. .
`... 395/2.42
`5,562,453 10/1996 Wen ........................................ 434/185
`5,983,186 11/1999 Miyazawa et al. ..................... 704/275
`Primary Examiner Richemond Dorvil
`Attorney, Agent, or Firm Kurt J. Brown
`57
`ABSTRACT
`A computer System has a power-down mode to conserve
`energy. The computer System includes a Speech transducer
`for capturing speech; a low-energy consuming power-up
`indicator coupled to Said speech transducer, Said power-up
`indicator detecting Speech directed at Said speech transducer
`and asserting a wake-up signal to a powered-down proces
`Sor, and a Voice recognizer coupled to Said Speech trans
`ducer and Said wake-up signal, Said Voice recognizer waking
`up from the power-down mode when Said wake-up signal is
`asserted.
`
`10 Claims, 18 Drawing Sheets
`
`Page 1
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`
`
`U.S. Patent
`
`May 30, 2000
`
`Sheet 1 of 18
`
`6,070,140
`
`
`
`
`
`16
`
`18
`
`22
`
`DISPLAY
`
`KEYBOARD
`
`12
`
`
`
`FIG. 1
`
`FIG. 2
`
`Page 2
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`
`
`U.S. Patent
`
`May 30, 2000
`
`Sheet 2 of 18
`
`6,070,140
`
`LOW-PASS
`FILTER
`
`26
`
`Wref
`
`FIG. 3
`
`1 O
`
`25
`LOW-PASS
`FILTER
`
`27
`
`RMS
`
`FIG. 4
`
`
`
`28
`
`29
`
`33
`
`OW-PASS
`FTER
`
`25
`
`FIG. 5
`
`Page 3
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`
`
`U.S. Patent
`
`May 30, 2000
`
`Sheet 3 of 18
`
`6,070,140
`
`12
`
`AOC
`
`
`
`24
`
`LOGIC
`
`FIG. 6
`
`
`
`FIG. 7
`
`Page 4
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`
`
`U.S. Patent
`
`May 30, 2000
`
`Sheet 4 of 18
`
`6,070,140
`
`
`
`Page 5
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`
`
`U.S. Patent
`
`May 30, 2000
`
`Sheet S of 18
`
`6,070,140
`
`CPU
`
`MEMORY
`
`98
`
`
`
`
`
`
`
`
`
`
`
`
`
`VA 76
`
`DIA
`
`KEYEAR
`
`RADIO FREO.
`TRANSMITTER
`RECEIVER
`
`INFRARED
`TRANSMITTER
`RECEIVER
`
`UART
`
`DISPLAY
`
`1 1 O
`
`112
`
`116
`
`120
`
`8 2
`
`FIG. 9
`
`134
`
`126
`
`8
`
`RADO
`FREQUENCY
`TRANSMITTER
`
`SOLATOR
`
`BANESS
`
`76
`
`RADO
`FREDUENCY
`RECEIVER
`
`BANDPASS
`FILTER
`
`FIG 10
`
`Page 6
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`
`
`U.S. Patent
`
`May 30, 2000
`
`Sheet 6 of 18
`
`6,070,140
`
`12O
`
`134
`
`DOPSK
`MODULATOR
`
`13
`6
`
`1
`28
`
`SOLATOR
`
`BP
`FILTER
`
`
`
`
`
`
`
`
`
`FIG. 11
`
`FIG. 12
`
`
`
`150
`
`NFRARED
`TRANSMTTER
`
`NFRARED
`FORMAT
`DECODER
`
`FIG. 13
`
`Page 7
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`
`
`U.S. Patent
`
`May 30, 2000
`
`Sheet 7 of 18
`
`6,070,140
`
`162
`
`172
`
`
`
`Page 8
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`
`
`U.S. Patent
`
`May 30, 2000
`
`Sheet 8 of 18
`
`6,070,140
`
`184
`
`DCTIONARY
`
`
`
`
`
`FIG. 16
`
`18O
`
`182
`
`188
`
`190
`
`192
`
`194
`
`
`
`AUTO CORREATOR
`
`198
`
`NOSE
`CANCELLER
`
`
`
`LPC
`PARAM
`
`FFT
`PARAM
`
`AUDITORY
`PARAM
`
`FRACTAL
`PARAM
`
`2O8
`
`2O6
`WAWELET
`PARAM
`
`PARAMETER
`WEIGHTER
`
`TEMPORAL DERIVATOR
`
`VECTOR OUANTIZER
`
`21 O
`
`212
`
`214
`
`FIG. 17
`
`Page 9
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`
`
`U.S. Patent
`
`May 30, 2000
`
`Sheet 9 of 18
`
`6,070,140
`
`START
`
`SET INTIAL VECTOR
`EOUAL TO CENTROD
`OF SAMPLE DATA
`
`SPLT VECTOR
`
`ASSGN DATAO
`CLOSEST VECTOR
`
`216
`
`218
`
`22O
`
`222
`
`
`
`
`
`
`
`VECTORS
`SUBSTANTIALLY
`DENTICAL
`
`RECOMBINE WECTORS
`
`
`
`ADJUST AND
`BALANCE TREE
`
`228
`
`
`
`DESRED
`NOS. OF VECTORS
`REACHED
`
`DONE
`F.G. 18
`
`Page 10
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`
`
`U.S. Patent
`
`May 30, 2000
`
`Sheet 10 0f 18
`
`6,070,140
`
`214
`
`WECTOR OUANTIZER
`
`HMM
`MODEL
`
`DTW
`MODEL
`
`232
`NEURAL
`NETWORK
`
`238
`
`FUZZY
`LOGIC
`
`TEMPLATE
`MATCHING
`
`24O
`
`242
`
`244
`
`NTIALN-GRAM
`GENERATOR
`
`NNER N-GRAM
`GENERATOR
`
`CANDDATE
`PRESELECTOR
`
`
`
`WORD N-GRAMMODE
`
`248
`WORD GRAMMAR MODEL
`
`250
`
`252
`
`FIG. 19
`
`Page 11
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`
`
`U.S. Patent
`
`May 30, 2000
`
`Sheet 11 of 18
`
`6,070,140
`
`T
`
`S
`
`E
`
`T
`
`S
`
`E
`
`T
`
`T
`
`E
`
`S
`
`T
`
`FIG. 20
`
`T
`
`E
`
`S
`
`T
`
`Page 12
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`
`
`U.S. Patent
`
`May 30, 2000
`
`Sheet 12 of 18
`
`6,070,140
`
`274
`
`FIG. 22
`
`
`
`
`
`27 8
`PARAMETER
`ESTMATION
`
`FIG. 23
`
`
`
`276
`
`
`
`FEATURE
`EXTRACTION
`
`284
`
`286
`
`LIKELIHOOD
`EXTRACTION
`
`Page 13
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`
`
`U.S. Patent
`
`May 30, 2000
`
`Sheet 13 0f 18
`
`6,070,140
`
`PARAMETER
`ESTMATION
`
`
`
`FIG. 24
`
`NNK
`
`292
`
`KELIHOOD
`L
`EXTRACTION
`
`276
`
`FEATURE
`EXTRACTION
`
`
`
`
`
`
`
`FIG. 25
`
`Page 14
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`
`
`U.S. Patent
`
`May 30, 2000
`
`Sheet 14 of 18
`
`6,070,140
`
`2O
`3
`GENERATE TRIGRAM FROM
`PHONEME STRING
`
`
`
`3 22
`
`GENERATE START
`SUBSTITUTION TRIGRAMS
`
`GENERATE START
`DELETION TRIGRAMS
`
`GENERATE START
`NSERTON TRIGRAMS
`
`324
`
`326
`
`3
`28
`
`
`
`
`
`GENERATE INNER
`TRIGRAMS
`
`3
`3O
`COMBINE SUBSTITUTION,
`DELETION, AND INSERTION
`START TRIGRAMS WITH
`NNER TRGRAMS
`
`COMPARE CANDDATE
`WITH DCTIONARY
`TEMPLATES AND
`GENERATE SMLARITY
`COUNT
`
`332
`
`334
`
`SELECT CANDDATE WITH
`HIGHEST SIMILARITY COUNT
`
`FIG. 26
`
`Page 15
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`
`
`U.S. Patent
`
`May 30, 2000
`
`Sheet 15 0f 18
`
`6,070,140
`
`
`
`Page 16
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`
`
`U.S. Patent
`
`May 30, 2000
`
`Sheet 16 of 18
`
`6,070,140
`
`
`
`ADD
`APPONTMENT
`
`DELETE
`APPOINTMENT
`
`SEARCH
`APPOINTMENT
`
`
`
`EDT
`APPONTMENT
`
`
`
`
`
`ACTIVY
`
`Page 17
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`
`
`U.S. Patent
`
`May 30, 2000
`
`Sheet 17 0f 18
`
`6,070,140
`
`CHANGE
`REPLACE
`
`STRING
`
`
`
`
`
`EXIT OR
`SELENCE
`
`
`
`FIG. 30
`
`Page 18
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`
`
`U.S. Patent
`
`May 30, 2000
`
`Sheet 18 of 18
`
`6,070,140
`
`START
`
`READ IN ALL
`PHONEME RULES
`
`GET INPUT
`PHONEME STRING
`
`450
`
`452
`
`454
`
`
`
`
`
`
`
`FND A MATCHING
`RULE COVERNG AN
`NTIAL PORTION OF
`| THE PHONEME STRING
`6 45
`
`
`
`46O
`
`GET PHONEME STRING
`AFTEREND OF WORD
`O
`
`STORE ASSOCATED
`LETTERS
`
`458
`
`END OF WORDYN
`
`Y
`
`462
`
`
`
`END OF
`PHONEME
`STRING
`
`
`
`
`
`
`
`Page 19
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`
`
`1
`SPEECH RECOGNIZER
`
`6,070,140
`
`2
`to the reference is converted into a control Signal which is
`applied to the control circuit of the watch. Although wear
`able speech recognizers are shown in Bui and Monbaron, the
`devices disclosed therein do not provide for Speaker inde
`pendent Speech recognition.
`Another problem facing the Speech recognizer is the
`presence of noise, as the user's verbal command and data
`entry may be made in a noisy environment or in an envi
`ronment in which multiple Speakers are Speaking Simulta
`neously. Additionally, the user's voice may fluctuate due to
`the user's health and mental State. These voice fluctuations
`Severely test the accuracy of traditional Speech recognizers.
`Thus, a need exists for an efficient Speech recognizer that can
`handle medium and large Vocabulary robustly in a variety of
`environments.
`Yet another problem facing the portable voice recognizer
`is the power consumption requirement. Additionally, tradi
`tional Speech recognizers require the computer to continu
`ously monitor the microphone for verbal activities directed
`at the computer. However, the continuous monitoring for
`Speech activity even during an extended period of Silence
`wastes a significant amount of battery power. Hence, a need
`exists for a low-power monitoring of Speech activities to
`wake-up a powered-down computer when commands are
`being directed to the computer.
`Speech recognition is particularly useful as a data entry
`tool for a personal information management (PIM) system,
`which trackS telephone numbers, appointments, travel
`expenses, time entry, note-taking and personal data
`collection, among others. Although many personal organiz
`ers and handheld computers offer PIM capability, these
`Systems are largely under-utilized because of the tedious
`process of keying in the data using a miniaturized keyboard.
`Hence, a need exists for an efficient speech recognizer for
`entering data to a PIM system.
`SUMMARY OF THE INVENTION
`In the present invention, a speech transducer captures
`Sound and delivers the data to a robust and efficient speech
`recognizer. To minimize power consumption, a voice wake
`up indicator detects Sounds directed at the Voice recognizer
`and generates a power-up signal to wake up the Speech
`recognizer from a powered-down State. Further, to isolate
`Speech in noisy environments, a robust high order Speech
`transducer comprising a plurality of microphones positioned
`to collect different aspects of Sound is used. Alternatively,
`the high order Speech transducer may consist of a micro
`phone and a noise canceller which characterizes the back
`ground noise when the user is not Speaking and Subtracts the
`background noise when the user is speaking to the computer
`to provide a cleaner Speech Signal.
`The user's Speech Signal is next presented to a voice
`feature extractor which extracts features using linear pre
`dictive coding, fast Fourier transform, auditory model, frac
`tal model, wavelet model, or combinations thereof. The
`input Speech Signal is compared with word models Stored in
`a dictionary using a template matcher, a fuzzy logic matcher,
`a neural network, a dynamic programming System, a hidden
`Markov model, or combinations thereof. The word model is
`Stored in a dictionary with an entry for each word, each entry
`having word labels and a context guide.
`A word preselector receives the output of the Voice feature
`extractor and queries the dictionary to compile a list of
`candidate words with the most similar phonetic labels. These
`candidate words are presented to a Syntax checker for
`Selecting a first representative word from the candidate
`
`This is a divisional of application Ser. No. 08/461,646,
`filed Jun. 5, 1995 now abandoned.
`FIELD OF THE INVENTION
`This invention relates generally to a computer, and more
`Specifically, to a computer with Speech recognition capabil
`ity.
`
`25
`
`35
`
`40
`
`BACKGROUND OF THE INVENTION
`The recent development of Speech recognition technology
`has opened up a new era of man-machine interaction. A
`Speech user interface provides a convenient and highly
`15
`natural method of data entry. However, traditional Speech
`recognizers use complex algorithms which in turn need large
`Storage Systems and/or dedicated digital Signal processors
`with high performance computers. Further, due to the com
`putational complexity, these Systems generally cannot rec
`ognize speech in real-time. Thus, a need exists for an
`efficient Speech recognizer that can operate in real-time and
`that does not require a dedicated high performance com
`puter.
`The advent of powerful Single chip computerS has made
`possible compact and inexpensive desktop, notebook, note
`pad and palmtop computers. These Single chip computers
`can be incorporated into personal items. Such as watches,
`rings, necklaces and other forms of jewelry. Because these
`personal items are accessible at all times, the computeriza
`tion of these items delivers truly personal computing power
`to the users. These personal Systems are constrained by the
`battery capacity and Storage capacity. Further, due to their
`miniature size, the computer mounted in the watch or the
`jewelry cannot house a bulky keyboard for text entry or a
`Writing Surface for pen-based data entry. Thus, a need exists
`for an efficient Speaker independent, continuous speech
`recognizer to act as a user interface for these tiny personal
`computers.
`U.S. Pat. No. 4,717.261, issued to Kita, et al., discloses an
`electronic wrist watch with a random access memory for
`recording voice messages from a user. Kita et al. further
`discloses the use of a voice Synthesizer to reproduce keyed
`in characters as a voice or Speech from the electronic
`wrist-watch. However, Kita only passively records and playS
`audio messages, but does not recognize the user's Voice and
`act in response thereto.
`U.S. Pat. No. 4,509,133, issued to Monbaron et al. and
`U.S. Pat. No. 4,573,187, issued to Buiet al. disclose watches
`which recognize and respond to Verbal commands. These
`patents teach the use of preprogrammed training for the
`references stored in the vocabulary. When the user first uses
`the watch, he or she pronounces a word corresponding to a
`command to the watch to train the recognizer. After training,
`the user can repeatedly pronounce the trained word to the
`watch until the watch display shows the correct word on the
`Screen of the watch. U.S. Pat. No. 4,635,286, issued to Bui
`et al., further discloses a speech-controlled watch having an
`electro-acoustic means for converting a pronounced word
`into an analog Signal representing that word, a means for
`transforming the analog signal into a logic control
`information, and a means for transforming the logic infor
`mation into a control Signal to control the watch display.
`When a word is pronounced by a user, it is coded and
`compared with a part of the memorized references. The
`watch retains the reference whose coding is closest to that of
`the word pronounced. The digital information corresponding
`
`65
`
`45
`
`50
`
`55
`
`60
`
`Page 20
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`
`
`3
`words, as ranked by the context guide and the grammar
`Structure, among others. The user can accept or reject the
`first representative word via a voice user interface. If
`rejected, the Voice user interface presents the next likely
`word selected from the candidate words. If all the candidates
`are rejected by the user or if the word does not exist in the
`dictionary, the System can generate a predicted word based
`on the labels. Finally, the Voice recognizer also allows the
`user to manually enter the word or spell the word out for the
`System. In this manner, a robust and efficient human
`machine interface is provided for recognizing Speaker
`independent, continuous speech.
`BRIEF DESCRIPTION OF THE DRAWINGS
`A better understanding of the present invention can be
`obtained when the following detailed description of the
`preferred embodiment is considered in conjunction with the
`following drawings, in which:
`FIG. 1 is a block diagram of a computer System for
`capturing and processing a Speech Signal from a user;
`FIG. 2 is a block diagram of a computer System with a
`wake-up logic in accordance with one aspect of the inven
`tion;
`FIG. 3 is a circuit block diagram of the wake-up logic in
`accordance with one aspect of the invention shown in FIG.
`2,
`FIG. 4 is a circuit block diagram of the wake-up logic in
`accordance with another aspect of the invention shown in
`FIG. 2;
`FIG. 5 is a circuit block diagram of the wake-up logic in
`accordance with one aspect of the invention shown in FIG.
`4;
`FIG. 6 is a block diagram of the wake-up logic in
`accordance with another aspect of the invention shown in
`FIG. 2;
`FIG. 7 is a block diagram of a computer system with a
`high order noise cancelling Sound transducer in accordance
`with another aspect of the present invention;
`FIG. 8 is a perspective view of a watch-sized computer
`System in accordance with another aspect of the present
`invention;
`FIG. 9 is a block diagram of the computer system of FIG.
`8 in accordance with another aspect of the invention;
`FIG. 10 is a block diagram of a RF transmitter/receiver
`system of FIG. 9;
`FIG. 11 is a block diagram of the RF transmitter of FIG.
`10;
`FIG. 12 is a block diagram of the RF receiver of FIG. 10;
`FIG. 13 is a block diagram of an optical transmitter/
`receiver system of FIG. 9;
`FIG. 14 is a perspective view of a hand-held computer
`System in accordance with another aspect of the present
`invention;
`FIG. 15 is a perspective view of a jewelry-sized computer
`System in accordance with yet another aspect of the present
`invention;
`FIG. 16 is a block diagram of the processing blocks of the
`Speech recognizer of the present invention;
`FIG. 17 is an expanded diagram of a feature extractor of
`the speech recognizer of FIG. 16;
`FIG. 18 is an expanded diagram of a vector quantizer in
`the feature extractor Speech recognizer of FIG. 16;
`FIG. 19 is an expanded diagram of a word preselector of
`the speech recognizer of FIG. 16;
`
`1O
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6,070,140
`
`4
`FIGS. 20 and 21 are diagrams showing the matching
`operation performed by the dynamic programming block of
`the word preselector of the speech recognizer of FIG. 19,
`FIG. 22 is a diagram of a HMM of the word preselector
`of the speech recognizer of FIG. 19,
`FIG. 23 is a diagram showing the relationship of the
`parameter estimation block and likelihood evaluation block
`with respect to a plurality of HMM templates of FIG.22;
`FIG. 24 is a diagram showing the relationship of the
`parameter estimation block and likelihood evaluation block
`with respect to a plurality of neural network templates of
`FIG. 23;
`FIG. 25 is a diagram of a neural network front-end in
`combination with the HMM in accordance to another aspect
`of the invention;
`FIG. 26 is a block diagram of a word preselector of the
`speech recognizer of FIG. 16;
`FIG.27 is another block diagram illustrating the operation
`of the word preselector of FIG. 26;
`FIG. 28 is a State machine for a grammar in accordance
`with another aspect of the present invention;
`FIG. 29 is a State machine for a parameter State machine
`of FIG. 28;
`FIG. 30 is a state machine for an edit state machine of
`FIG. 28; and
`FIG.31 is a block diagram of an unknown word generator
`in accordance with yet another aspect of the present inven
`tion.
`
`DETAILED DESCRIPTION OF THE
`INVENTION
`Referring now to FIG. 1, a general purpose architecture
`for recognizing speech is illustrated as shown in FIG. 1, a
`microphone 10 is connected to an analog to digital converter
`(ADC) 12 which interfaces with a central processing unit
`(CPU) 14. The CPU 14 in turn is connected to a plurality of
`devices, including a read-only-memory (ROM) 16, a ran
`dom access memory (RAM) 18, a display 20 and a keyboard
`22. Forportable applications, in addition to using low power
`devices, the CPU 14 has a power-down mode where non
`essential parts of the computer are placed in a Sleep mode
`once they are no longer needed to conserve energy.
`However, to detect the presence of Voice commands, the
`computer needs to continually monitor the output of the
`ADC 12, thus unnecessarily draining power even during an
`extended period of Silence. This requirement thus defeats the
`advantages of power-down.
`FIG. 2 discloses one aspect of the invention which wakes
`up the computer from its Sleep mode when spoken to without
`requiring the CPU to continuously monitor the ADC output.
`AS shown in FIG. 2, a wake-up logic 24 is connected to the
`output of microphone 10 to listen for commands being
`directed at the computer during the power-down periods.
`The wake-up logic 24 is further connected to the ADC 12
`and the CPU 14 to provide wake-up commands to turn-on
`the computer in the event a command is being directed at the
`computer.
`The wake-up logic of the present invention can be imple
`mented in a number of ways. In FIG. 3, the output of the
`microphone 10 is presented to one or more Stages of
`low-pass filters 25 to preferably limit the detector to audio
`signals below two kilohertz. The wake-up logic block 24
`comprises a power comparator 26 connected to the filters 25
`and a threshold reference Voltage.
`As shown in FIG. 3, Sounds captured by the microphone
`10, including the Voice and the background noise, are
`
`Page 21
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`
`
`6,070,140
`
`S
`applied to the power comparator 26, which compares the
`Signal power level with a power threshold value and outputs
`a wake-up signal to the ADC 12 and the CPU 14 as a result
`of the comparison. Thus, when the analog input is voice, the
`comparator output asserts a wake-up signal to the computer.
`The power comparator 26 is preferably a Schmidt trigger
`comparator. The threshold reference Voltage is preferably a
`floating DC reference which allows the use of the detector
`under varying conditions of temperature, battery Supply
`Voltage fluctuations, and battery Supply types.
`FIG. 4 shows an embodiment of the wake-up logic 24
`suitable for noisy environments. In FIG. 4, a root-mean
`square (RMS) device 27 is connected to the output of the
`microphone 10, after filtering by the low-pass filter 25, to
`better distinguish noise from Voice and wake-up the com
`15
`puter when the user's voice is directed to the computer. The
`RMS of any voltage over an interval T is defined to be:
`
`| 1
`RMS = if f(t) di
`
`6
`as targets for the outputs of the middle layer, changing the
`weights in the middle layer following the same procedure as
`above. This way the corrections are back-propagated to
`eventually reach the input layer. After reaching the input
`layer, a new test is entered and forward propagation takes
`place again. This process is repeated until either a prese
`lected allowable error is achieved or a maximum number of
`training cycles has been executed.
`Referring now to FIG. 6, the first layer of the neural
`network is the input layer 38, while the last layer is the
`output layer 42. Each layer in between is called a middle
`layer 40. Each layer 38, 40, or 42 has a plurality of neurons,
`or processing elements, each of which is connected to Some
`or all the neurons in the adjacent layers. The input layer 38
`of the present invention comprises a plurality of units 44, 46
`and 48, which are configured to receive input information
`from the ADC 12. The nodes of the input layer does not have
`any weight associated with it. Rather, its Sole purpose is to
`Store data to be forward propagated to the next layer. A
`middle layer 40 comprising a plurality of neurons 50 and 52
`accepts as input the output of the plurality of neurons 44-48
`from the input layer 38. The neurons of the middle layer 40
`transmit output to a neuron 54 in the output layer 42 which
`generates the wake-up command to CPU 14.
`The connections between the individual processing units
`in an artificial neural network are also modeled after bio
`logical processes. Each input to an artificial neuron unit is
`weighted by multiplying it by a weight value in a proceSS
`that is analogous to the biological Synapse function. In
`biological Systems, a Synapse acts as a connector between
`one neuron and another, generally between the axon (output)
`end of one neuron and the dendrite (input) end of another
`cell. Synaptic junctions have the ability to enhance or inhibit
`(i.e., weigh) the output of one neuron as it is inputted to
`another neuron. Artificial neural networks model the
`enhance or inhibit function by weighing the inputs to each
`artificial neuron. During operation, the output of each neu
`ron in the input layer is propagated forward to each input of
`each neuron in the next layer, the middle layer. The thus
`arranged neurons of the input layer Scans the inputs which
`are neighboring Sound values and after training using tech
`niques known to one skilled in the art, detects the appro
`priate utterance to wake-up the computer. Once the CPU 14
`wakes up, the neural network 36 is put to sleep to conserve
`power. In this manner, power consumption is minimized
`while retaining the advantages of Speech recognition.
`Although an accurate Speech recognition by humans or
`computerS requires a quiet environment, Such a requirement
`is at times impracticable. In noisy environments, the present
`invention provides a high-order, gradient noise cancelling
`microphone to reject noise interference from distant Sources.
`The high-order microphone is robust to noise because it
`precisely balances the phase and amplitude response of each
`acoustical component. A high-order noise cancelling micro
`phone can be constructed by combining outputs from lower
`order microphones. As shown in FIG. 7, one embodiment
`with a first order microphone is built from three Zero order,
`or conventional preSSure-based, microphones. Each micro
`phone is positioned in a port located Such that the port
`captures a different Sample of the Sound field.
`The microphones of FIG. 7 are preferably positioned as
`far away from each other as possible to ensure that different
`Samples of the Sound field are captured by each microphone.
`If the noise Source is relatively distant and the wavelength of
`the noise is Sufficiently longer than the distance between the
`ports, the noise acoustics differ from the user's Voice in the
`magnitude and phase. For distant Sounds, the magnitude of
`
`25
`
`FIG. 5 shows in greater detail a low-power implementa
`tion of the RMS device 27. In FIG. 5, an amplifier 31 is
`connected to the output of filter 25 at one input. At the other
`input of amplifier 31, a capacitor 29 is connected in Series
`with a resistor 28 which is grounded at the other end. The
`output of amplifier 31 is connected to the input of amplifier
`31 and capacitor 29 via a resistor 30. A diode 32 is connected
`to the output of amplifier 31 to rectify the waveform. The
`rectified Sound input is provided to a low-pass filter, com
`prising a resistor 33 connected to a capacitor 34 to create a
`low-power version of the RMS device. The RC time con
`Stant is chosen to be considerably longer than the longest
`period present in the Signal, but short enough to follow
`variations in the signal's RMS value without inducing
`excessive delay errors. The output of the RMS circuit is
`converted to a wake-up pulse to activate the computer
`System. Although half-wave rectification is shown, full
`wave rectification can be used. Further, other types of RMS
`40
`device known in the art can also be used.
`FIG. 6 shows another embodiment of the wake-up logic
`24 more suitable for multi-speaker environments. In FIG. 6,
`a neural network 36 is connected to the ADC 12, the
`wake-up logic 24 and the CPU 14. When the wake-up logic
`24 detects speech being directed to the microphone 10, the
`wake-up logic 24 wakes up the ADC 12 to acquire Sound
`data. The wake-up logic 24 also wakes up the neural network
`36 to examine the Sound data for a wake-up command from
`the user. The wake-up command may be as Simple as the
`word “wake-up' or may be a preassigned name for the
`computer to cue the computer that commands are being
`directed to it, among others. The neural network is trained to
`detect one or more of these phrases, and upon detection,
`provides a wake-up signal to the CPU 14. The neural
`network 36 preferably is provided by a low-power micro
`controller or programmable gate array logic to minimize
`power consumption. After the neural network 36 wakes up
`the CPU 14, the neural network 36 then puts itself to sleep
`to conserve power. Once the CPU 14 completes its operation
`and before the CPU 14 puts itself to sleep, it reactivates the
`neural network 36 to listen for the wake-up command.
`A particularly robust architecture for the neural network
`36 is referred to as the back-propagation network. The
`training process for back-propagation type neural networks
`Starts by modifying the weights at the output layer. Once the
`weights in the output layer have been altered, they can act
`
`35
`
`45
`
`50
`
`55
`
`60
`
`65
`
`Page 22
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`
`
`6,070,140
`
`15
`
`25
`
`35
`
`40
`
`7
`the Sound waves arriving at the microphones are approxi
`mately equal with a Small phase difference among them. For
`Sound Sources close to the microphones, the magnitude of
`the local Sound dominates that of the distant Sounds. After
`the Sounds are captured, the Sound pressure at the ports
`adapted to receive distant Sounds are Subtracted and con
`verted into an electrical signal for conversion by the ADC
`12.
`As shown in FIG. 7, a first order microphone comprises
`a plurality of Zero order microphones 56, 58 and 60. The
`microphone 56 is connected to a resistor 62 which is
`connected to a first input of an analog multiplier 70. Micro
`phones 58 and 60 are connected to resistors 64 and 66, which
`are connected to a Second input of the analog multiplier 70.
`A resistor 68 is connected between ground and the Second
`input of the analog multiplier 70. The output of the analog
`multiplier 70 is connected to the first input of the analog
`multiplier 70 via a resistor 72. In this embodiment, micro
`phones 56 and 58 are positioned to measure the distant
`Sound Sources, while the microphone 60 is positioned
`toward the user. The configured multiplier 70 takes the
`difference of the sound arriving at microphones 56, 58 and
`60 to arrive at a less noisy representation of the user's
`speech. Although FIG. 7 illustrates a first order microphone
`for cancelling noise, any high order microphones Such as a
`Second order or even third order microphone can be utilized
`for even better noise rejection. The Second order microphone
`is built from a pair of first order microphones, while a third
`order microphone is constructed from a pair of Second order
`microphone.
`Additionally, the outputs of the microphones can be
`digitally enhanced to Separate the Speech from the noise. AS
`disclosed in U.S. Pat. No. 5,400,409, issued to Linhard, the
`noise reduction is performed with two or more channels
`Such that the temporal and acoustical Signal properties of
`Speech and interference are Systematically determined.
`Noise reduction is first executed in each individual channel.
`After noise components have been estimated during Speak
`ing pauses, a Spectral Substraction is performed in the
`Spectral range that corresponds to the magnitude. In this
`instance the temporally Stationary noise components are
`damped.
`Point-like noise Sources are damped using an acoustic
`directional lobe which, together with the phase estimation, is
`oriented toward the Speaker with digital directional filters at
`the inlet of the channels. The pivotable, acoustic directional
`lobe is produced for the individual voice channels by
`respective digital directional filters and a linear phase esti
`mation to correct for a phase difference between the two
`channels. The linear phase shift of the noisy voice Signals is
`determined in the power domain by means of a specific
`number of maxima of the cross-power density. Thus, each of
`the first and Second related Signals is transformed into the
`frequency domain prior to the Step of estimating, and the
`phase correction and the directional filtering are carried out
`in the frequency domain. This method is effective with
`respect to noises and only requires a low computation
`expenditure. The directional filters are at a fixed Setting. The
`method assumes that the Speaker is relatively close to the
`microphones, preferably within 1 meter, and that the Speaker
`only moves within a limited area. Non-Stationary and Sta
`tionary point-like noise Sources are damped by means of this
`Spatial evaluation. Because noise reduction cannot take
`place error-free, distortions and artificial insertions Such as
`“musical tones' can occur due to the Spatial Separation of the
`receiving channels (microphones at a specific spacing).
`When the individually-processed channels are combined, an
`averaging is performed to reduce these errors.
`
`45
`
`50
`
`55
`
`60
`
`65
`
`8
`The composite Signal is Subsequently further processed
`with the use of cross-correlation of the Signals in the
`individual channels, thus damping diffuse noise and echo
`components during Subsequent processing. The individual
`Voice channels are Subsequently added whereby the Statis
`tical disturbances of Spectral Subtraction are averaged.
`Finally, the composite Signal resulting from the addition is
`Subsequently processed with a modified coherence function
`to damp diffuse noise and echo components. The thus
`disclosed noise canceller effectively reduces noise using
`minimal computing resources.
`FIG. 8 shows a portable embodiment of the present
`invention where the Voice recognizer is hou