throbber
United States Patent (19)
`Tran
`
`54). SPEECH RECOGNIZER
`
`76 Inventor: Bao Q. Tran, 10206 Grove Glen,
`Houston, Tex. 77099
`s
`21 Appl. No.: 09/190,691
`22 Filed:
`Nov. 12, 1998
`
`Related U.S. Application Data
`
`62 Division of application No. 08/461,646, Jun. 5, 1995, aban-
`doned.
`(51) Int. Cl. ............................................... G10L 21/06
`52 U.S. Cl. ..................
`704/275; 704/232
`58 Field of Search ..................................... 704/200, 270,
`704/231, 232, 233,275,251
`
`56)
`
`References Cited
`U.S. PATENT DOCUMENTS
`4,771,390 9/1988 Dolph et al. ........................... 395/2.83
`
`
`
`USOO607O140A
`Patent Number:
`11
`(45) Date of Patent:
`
`6,070,140
`May 30, 2000
`
`5,319,736 6/1994 Hunt ....................................... 395/2.36
`5,487,129
`1/1996 Paiss et al.......
`... 395/2.42
`5,548,681
`8/1996 Gleaves et al. .
`... 395/2.42
`5,562,453 10/1996 Wen ........................................ 434/185
`5,983,186 11/1999 Miyazawa et al. ..................... 704/275
`Primary Examiner Richemond Dorvil
`Attorney, Agent, or Firm Kurt J. Brown
`57
`ABSTRACT
`A computer System has a power-down mode to conserve
`energy. The computer System includes a Speech transducer
`for capturing speech; a low-energy consuming power-up
`indicator coupled to Said speech transducer, Said power-up
`indicator detecting Speech directed at Said speech transducer
`and asserting a wake-up signal to a powered-down proces
`Sor, and a Voice recognizer coupled to Said Speech trans
`ducer and Said wake-up signal, Said Voice recognizer waking
`up from the power-down mode when Said wake-up signal is
`asserted.
`
`10 Claims, 18 Drawing Sheets
`
`Page 1
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`

`

`U.S. Patent
`
`May 30, 2000
`
`Sheet 1 of 18
`
`6,070,140
`
`
`
`
`
`16
`
`18
`
`22
`
`DISPLAY
`
`KEYBOARD
`
`12
`
`
`
`FIG. 1
`
`FIG. 2
`
`Page 2
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`

`

`U.S. Patent
`
`May 30, 2000
`
`Sheet 2 of 18
`
`6,070,140
`
`LOW-PASS
`FILTER
`
`26
`
`Wref
`
`FIG. 3
`
`1 O
`
`25
`LOW-PASS
`FILTER
`
`27
`
`RMS
`
`FIG. 4
`
`
`
`28
`
`29
`
`33
`
`OW-PASS
`FTER
`
`25
`
`FIG. 5
`
`Page 3
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`

`

`U.S. Patent
`
`May 30, 2000
`
`Sheet 3 of 18
`
`6,070,140
`
`12
`
`AOC
`
`
`
`24
`
`LOGIC
`
`FIG. 6
`
`
`
`FIG. 7
`
`Page 4
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`

`

`U.S. Patent
`
`May 30, 2000
`
`Sheet 4 of 18
`
`6,070,140
`
`
`
`Page 5
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`

`

`U.S. Patent
`
`May 30, 2000
`
`Sheet S of 18
`
`6,070,140
`
`CPU
`
`MEMORY
`
`98
`
`
`
`
`
`
`
`
`
`
`
`
`
`VA 76
`
`DIA
`
`KEYEAR
`
`RADIO FREO.
`TRANSMITTER
`RECEIVER
`
`INFRARED
`TRANSMITTER
`RECEIVER
`
`UART
`
`DISPLAY
`
`1 1 O
`
`112
`
`116
`
`120
`
`8 2
`
`FIG. 9
`
`134
`
`126
`
`8
`
`RADO
`FREQUENCY
`TRANSMITTER
`
`SOLATOR
`
`BANESS
`
`76
`
`RADO
`FREDUENCY
`RECEIVER
`
`BANDPASS
`FILTER
`
`FIG 10
`
`Page 6
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`

`

`U.S. Patent
`
`May 30, 2000
`
`Sheet 6 of 18
`
`6,070,140
`
`12O
`
`134
`
`DOPSK
`MODULATOR
`
`13
`6
`
`1
`28
`
`SOLATOR
`
`BP
`FILTER
`
`
`
`
`
`
`
`
`
`FIG. 11
`
`FIG. 12
`
`
`
`150
`
`NFRARED
`TRANSMTTER
`
`NFRARED
`FORMAT
`DECODER
`
`FIG. 13
`
`Page 7
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`

`

`U.S. Patent
`
`May 30, 2000
`
`Sheet 7 of 18
`
`6,070,140
`
`162
`
`172
`
`
`
`Page 8
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`

`

`U.S. Patent
`
`May 30, 2000
`
`Sheet 8 of 18
`
`6,070,140
`
`184
`
`DCTIONARY
`
`
`
`
`
`FIG. 16
`
`18O
`
`182
`
`188
`
`190
`
`192
`
`194
`
`
`
`AUTO CORREATOR
`
`198
`
`NOSE
`CANCELLER
`
`
`
`LPC
`PARAM
`
`FFT
`PARAM
`
`AUDITORY
`PARAM
`
`FRACTAL
`PARAM
`
`2O8
`
`2O6
`WAWELET
`PARAM
`
`PARAMETER
`WEIGHTER
`
`TEMPORAL DERIVATOR
`
`VECTOR OUANTIZER
`
`21 O
`
`212
`
`214
`
`FIG. 17
`
`Page 9
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`

`

`U.S. Patent
`
`May 30, 2000
`
`Sheet 9 of 18
`
`6,070,140
`
`START
`
`SET INTIAL VECTOR
`EOUAL TO CENTROD
`OF SAMPLE DATA
`
`SPLT VECTOR
`
`ASSGN DATAO
`CLOSEST VECTOR
`
`216
`
`218
`
`22O
`
`222
`
`
`
`
`
`
`
`VECTORS
`SUBSTANTIALLY
`DENTICAL
`
`RECOMBINE WECTORS
`
`
`
`ADJUST AND
`BALANCE TREE
`
`228
`
`
`
`DESRED
`NOS. OF VECTORS
`REACHED
`
`DONE
`F.G. 18
`
`Page 10
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`

`

`U.S. Patent
`
`May 30, 2000
`
`Sheet 10 0f 18
`
`6,070,140
`
`214
`
`WECTOR OUANTIZER
`
`HMM
`MODEL
`
`DTW
`MODEL
`
`232
`NEURAL
`NETWORK
`
`238
`
`FUZZY
`LOGIC
`
`TEMPLATE
`MATCHING
`
`24O
`
`242
`
`244
`
`NTIALN-GRAM
`GENERATOR
`
`NNER N-GRAM
`GENERATOR
`
`CANDDATE
`PRESELECTOR
`
`
`
`WORD N-GRAMMODE
`
`248
`WORD GRAMMAR MODEL
`
`250
`
`252
`
`FIG. 19
`
`Page 11
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`

`

`U.S. Patent
`
`May 30, 2000
`
`Sheet 11 of 18
`
`6,070,140
`
`T
`
`S
`
`E
`
`T
`
`S
`
`E
`
`T
`
`T
`
`E
`
`S
`
`T
`
`FIG. 20
`
`T
`
`E
`
`S
`
`T
`
`Page 12
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`

`

`U.S. Patent
`
`May 30, 2000
`
`Sheet 12 of 18
`
`6,070,140
`
`274
`
`FIG. 22
`
`
`
`
`
`27 8
`PARAMETER
`ESTMATION
`
`FIG. 23
`
`
`
`276
`
`
`
`FEATURE
`EXTRACTION
`
`284
`
`286
`
`LIKELIHOOD
`EXTRACTION
`
`Page 13
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`

`

`U.S. Patent
`
`May 30, 2000
`
`Sheet 13 0f 18
`
`6,070,140
`
`PARAMETER
`ESTMATION
`
`
`
`FIG. 24
`
`NNK
`
`292
`
`KELIHOOD
`L
`EXTRACTION
`
`276
`
`FEATURE
`EXTRACTION
`
`
`
`
`
`
`
`FIG. 25
`
`Page 14
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`

`

`U.S. Patent
`
`May 30, 2000
`
`Sheet 14 of 18
`
`6,070,140
`
`2O
`3
`GENERATE TRIGRAM FROM
`PHONEME STRING
`
`
`
`3 22
`
`GENERATE START
`SUBSTITUTION TRIGRAMS
`
`GENERATE START
`DELETION TRIGRAMS
`
`GENERATE START
`NSERTON TRIGRAMS
`
`324
`
`326
`
`3
`28
`
`
`
`
`
`GENERATE INNER
`TRIGRAMS
`
`3
`3O
`COMBINE SUBSTITUTION,
`DELETION, AND INSERTION
`START TRIGRAMS WITH
`NNER TRGRAMS
`
`COMPARE CANDDATE
`WITH DCTIONARY
`TEMPLATES AND
`GENERATE SMLARITY
`COUNT
`
`332
`
`334
`
`SELECT CANDDATE WITH
`HIGHEST SIMILARITY COUNT
`
`FIG. 26
`
`Page 15
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`

`

`U.S. Patent
`
`May 30, 2000
`
`Sheet 15 0f 18
`
`6,070,140
`
`
`
`Page 16
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`

`

`U.S. Patent
`
`May 30, 2000
`
`Sheet 16 of 18
`
`6,070,140
`
`
`
`ADD
`APPONTMENT
`
`DELETE
`APPOINTMENT
`
`SEARCH
`APPOINTMENT
`
`
`
`EDT
`APPONTMENT
`
`
`
`
`
`ACTIVY
`
`Page 17
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`

`

`U.S. Patent
`
`May 30, 2000
`
`Sheet 17 0f 18
`
`6,070,140
`
`CHANGE
`REPLACE
`
`STRING
`
`
`
`
`
`EXIT OR
`SELENCE
`
`
`
`FIG. 30
`
`Page 18
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`

`

`U.S. Patent
`
`May 30, 2000
`
`Sheet 18 of 18
`
`6,070,140
`
`START
`
`READ IN ALL
`PHONEME RULES
`
`GET INPUT
`PHONEME STRING
`
`450
`
`452
`
`454
`
`
`
`
`
`
`
`FND A MATCHING
`RULE COVERNG AN
`NTIAL PORTION OF
`| THE PHONEME STRING
`6 45
`
`
`
`46O
`
`GET PHONEME STRING
`AFTEREND OF WORD
`O
`
`STORE ASSOCATED
`LETTERS
`
`458
`
`END OF WORDYN
`
`Y
`
`462
`
`
`
`END OF
`PHONEME
`STRING
`
`
`
`
`
`
`
`Page 19
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`

`

`1
`SPEECH RECOGNIZER
`
`6,070,140
`
`2
`to the reference is converted into a control Signal which is
`applied to the control circuit of the watch. Although wear
`able speech recognizers are shown in Bui and Monbaron, the
`devices disclosed therein do not provide for Speaker inde
`pendent Speech recognition.
`Another problem facing the Speech recognizer is the
`presence of noise, as the user's verbal command and data
`entry may be made in a noisy environment or in an envi
`ronment in which multiple Speakers are Speaking Simulta
`neously. Additionally, the user's voice may fluctuate due to
`the user's health and mental State. These voice fluctuations
`Severely test the accuracy of traditional Speech recognizers.
`Thus, a need exists for an efficient Speech recognizer that can
`handle medium and large Vocabulary robustly in a variety of
`environments.
`Yet another problem facing the portable voice recognizer
`is the power consumption requirement. Additionally, tradi
`tional Speech recognizers require the computer to continu
`ously monitor the microphone for verbal activities directed
`at the computer. However, the continuous monitoring for
`Speech activity even during an extended period of Silence
`wastes a significant amount of battery power. Hence, a need
`exists for a low-power monitoring of Speech activities to
`wake-up a powered-down computer when commands are
`being directed to the computer.
`Speech recognition is particularly useful as a data entry
`tool for a personal information management (PIM) system,
`which trackS telephone numbers, appointments, travel
`expenses, time entry, note-taking and personal data
`collection, among others. Although many personal organiz
`ers and handheld computers offer PIM capability, these
`Systems are largely under-utilized because of the tedious
`process of keying in the data using a miniaturized keyboard.
`Hence, a need exists for an efficient speech recognizer for
`entering data to a PIM system.
`SUMMARY OF THE INVENTION
`In the present invention, a speech transducer captures
`Sound and delivers the data to a robust and efficient speech
`recognizer. To minimize power consumption, a voice wake
`up indicator detects Sounds directed at the Voice recognizer
`and generates a power-up signal to wake up the Speech
`recognizer from a powered-down State. Further, to isolate
`Speech in noisy environments, a robust high order Speech
`transducer comprising a plurality of microphones positioned
`to collect different aspects of Sound is used. Alternatively,
`the high order Speech transducer may consist of a micro
`phone and a noise canceller which characterizes the back
`ground noise when the user is not Speaking and Subtracts the
`background noise when the user is speaking to the computer
`to provide a cleaner Speech Signal.
`The user's Speech Signal is next presented to a voice
`feature extractor which extracts features using linear pre
`dictive coding, fast Fourier transform, auditory model, frac
`tal model, wavelet model, or combinations thereof. The
`input Speech Signal is compared with word models Stored in
`a dictionary using a template matcher, a fuzzy logic matcher,
`a neural network, a dynamic programming System, a hidden
`Markov model, or combinations thereof. The word model is
`Stored in a dictionary with an entry for each word, each entry
`having word labels and a context guide.
`A word preselector receives the output of the Voice feature
`extractor and queries the dictionary to compile a list of
`candidate words with the most similar phonetic labels. These
`candidate words are presented to a Syntax checker for
`Selecting a first representative word from the candidate
`
`This is a divisional of application Ser. No. 08/461,646,
`filed Jun. 5, 1995 now abandoned.
`FIELD OF THE INVENTION
`This invention relates generally to a computer, and more
`Specifically, to a computer with Speech recognition capabil
`ity.
`
`25
`
`35
`
`40
`
`BACKGROUND OF THE INVENTION
`The recent development of Speech recognition technology
`has opened up a new era of man-machine interaction. A
`Speech user interface provides a convenient and highly
`15
`natural method of data entry. However, traditional Speech
`recognizers use complex algorithms which in turn need large
`Storage Systems and/or dedicated digital Signal processors
`with high performance computers. Further, due to the com
`putational complexity, these Systems generally cannot rec
`ognize speech in real-time. Thus, a need exists for an
`efficient Speech recognizer that can operate in real-time and
`that does not require a dedicated high performance com
`puter.
`The advent of powerful Single chip computerS has made
`possible compact and inexpensive desktop, notebook, note
`pad and palmtop computers. These Single chip computers
`can be incorporated into personal items. Such as watches,
`rings, necklaces and other forms of jewelry. Because these
`personal items are accessible at all times, the computeriza
`tion of these items delivers truly personal computing power
`to the users. These personal Systems are constrained by the
`battery capacity and Storage capacity. Further, due to their
`miniature size, the computer mounted in the watch or the
`jewelry cannot house a bulky keyboard for text entry or a
`Writing Surface for pen-based data entry. Thus, a need exists
`for an efficient Speaker independent, continuous speech
`recognizer to act as a user interface for these tiny personal
`computers.
`U.S. Pat. No. 4,717.261, issued to Kita, et al., discloses an
`electronic wrist watch with a random access memory for
`recording voice messages from a user. Kita et al. further
`discloses the use of a voice Synthesizer to reproduce keyed
`in characters as a voice or Speech from the electronic
`wrist-watch. However, Kita only passively records and playS
`audio messages, but does not recognize the user's Voice and
`act in response thereto.
`U.S. Pat. No. 4,509,133, issued to Monbaron et al. and
`U.S. Pat. No. 4,573,187, issued to Buiet al. disclose watches
`which recognize and respond to Verbal commands. These
`patents teach the use of preprogrammed training for the
`references stored in the vocabulary. When the user first uses
`the watch, he or she pronounces a word corresponding to a
`command to the watch to train the recognizer. After training,
`the user can repeatedly pronounce the trained word to the
`watch until the watch display shows the correct word on the
`Screen of the watch. U.S. Pat. No. 4,635,286, issued to Bui
`et al., further discloses a speech-controlled watch having an
`electro-acoustic means for converting a pronounced word
`into an analog Signal representing that word, a means for
`transforming the analog signal into a logic control
`information, and a means for transforming the logic infor
`mation into a control Signal to control the watch display.
`When a word is pronounced by a user, it is coded and
`compared with a part of the memorized references. The
`watch retains the reference whose coding is closest to that of
`the word pronounced. The digital information corresponding
`
`65
`
`45
`
`50
`
`55
`
`60
`
`Page 20
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`

`

`3
`words, as ranked by the context guide and the grammar
`Structure, among others. The user can accept or reject the
`first representative word via a voice user interface. If
`rejected, the Voice user interface presents the next likely
`word selected from the candidate words. If all the candidates
`are rejected by the user or if the word does not exist in the
`dictionary, the System can generate a predicted word based
`on the labels. Finally, the Voice recognizer also allows the
`user to manually enter the word or spell the word out for the
`System. In this manner, a robust and efficient human
`machine interface is provided for recognizing Speaker
`independent, continuous speech.
`BRIEF DESCRIPTION OF THE DRAWINGS
`A better understanding of the present invention can be
`obtained when the following detailed description of the
`preferred embodiment is considered in conjunction with the
`following drawings, in which:
`FIG. 1 is a block diagram of a computer System for
`capturing and processing a Speech Signal from a user;
`FIG. 2 is a block diagram of a computer System with a
`wake-up logic in accordance with one aspect of the inven
`tion;
`FIG. 3 is a circuit block diagram of the wake-up logic in
`accordance with one aspect of the invention shown in FIG.
`2,
`FIG. 4 is a circuit block diagram of the wake-up logic in
`accordance with another aspect of the invention shown in
`FIG. 2;
`FIG. 5 is a circuit block diagram of the wake-up logic in
`accordance with one aspect of the invention shown in FIG.
`4;
`FIG. 6 is a block diagram of the wake-up logic in
`accordance with another aspect of the invention shown in
`FIG. 2;
`FIG. 7 is a block diagram of a computer system with a
`high order noise cancelling Sound transducer in accordance
`with another aspect of the present invention;
`FIG. 8 is a perspective view of a watch-sized computer
`System in accordance with another aspect of the present
`invention;
`FIG. 9 is a block diagram of the computer system of FIG.
`8 in accordance with another aspect of the invention;
`FIG. 10 is a block diagram of a RF transmitter/receiver
`system of FIG. 9;
`FIG. 11 is a block diagram of the RF transmitter of FIG.
`10;
`FIG. 12 is a block diagram of the RF receiver of FIG. 10;
`FIG. 13 is a block diagram of an optical transmitter/
`receiver system of FIG. 9;
`FIG. 14 is a perspective view of a hand-held computer
`System in accordance with another aspect of the present
`invention;
`FIG. 15 is a perspective view of a jewelry-sized computer
`System in accordance with yet another aspect of the present
`invention;
`FIG. 16 is a block diagram of the processing blocks of the
`Speech recognizer of the present invention;
`FIG. 17 is an expanded diagram of a feature extractor of
`the speech recognizer of FIG. 16;
`FIG. 18 is an expanded diagram of a vector quantizer in
`the feature extractor Speech recognizer of FIG. 16;
`FIG. 19 is an expanded diagram of a word preselector of
`the speech recognizer of FIG. 16;
`
`1O
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6,070,140
`
`4
`FIGS. 20 and 21 are diagrams showing the matching
`operation performed by the dynamic programming block of
`the word preselector of the speech recognizer of FIG. 19,
`FIG. 22 is a diagram of a HMM of the word preselector
`of the speech recognizer of FIG. 19,
`FIG. 23 is a diagram showing the relationship of the
`parameter estimation block and likelihood evaluation block
`with respect to a plurality of HMM templates of FIG.22;
`FIG. 24 is a diagram showing the relationship of the
`parameter estimation block and likelihood evaluation block
`with respect to a plurality of neural network templates of
`FIG. 23;
`FIG. 25 is a diagram of a neural network front-end in
`combination with the HMM in accordance to another aspect
`of the invention;
`FIG. 26 is a block diagram of a word preselector of the
`speech recognizer of FIG. 16;
`FIG.27 is another block diagram illustrating the operation
`of the word preselector of FIG. 26;
`FIG. 28 is a State machine for a grammar in accordance
`with another aspect of the present invention;
`FIG. 29 is a State machine for a parameter State machine
`of FIG. 28;
`FIG. 30 is a state machine for an edit state machine of
`FIG. 28; and
`FIG.31 is a block diagram of an unknown word generator
`in accordance with yet another aspect of the present inven
`tion.
`
`DETAILED DESCRIPTION OF THE
`INVENTION
`Referring now to FIG. 1, a general purpose architecture
`for recognizing speech is illustrated as shown in FIG. 1, a
`microphone 10 is connected to an analog to digital converter
`(ADC) 12 which interfaces with a central processing unit
`(CPU) 14. The CPU 14 in turn is connected to a plurality of
`devices, including a read-only-memory (ROM) 16, a ran
`dom access memory (RAM) 18, a display 20 and a keyboard
`22. Forportable applications, in addition to using low power
`devices, the CPU 14 has a power-down mode where non
`essential parts of the computer are placed in a Sleep mode
`once they are no longer needed to conserve energy.
`However, to detect the presence of Voice commands, the
`computer needs to continually monitor the output of the
`ADC 12, thus unnecessarily draining power even during an
`extended period of Silence. This requirement thus defeats the
`advantages of power-down.
`FIG. 2 discloses one aspect of the invention which wakes
`up the computer from its Sleep mode when spoken to without
`requiring the CPU to continuously monitor the ADC output.
`AS shown in FIG. 2, a wake-up logic 24 is connected to the
`output of microphone 10 to listen for commands being
`directed at the computer during the power-down periods.
`The wake-up logic 24 is further connected to the ADC 12
`and the CPU 14 to provide wake-up commands to turn-on
`the computer in the event a command is being directed at the
`computer.
`The wake-up logic of the present invention can be imple
`mented in a number of ways. In FIG. 3, the output of the
`microphone 10 is presented to one or more Stages of
`low-pass filters 25 to preferably limit the detector to audio
`signals below two kilohertz. The wake-up logic block 24
`comprises a power comparator 26 connected to the filters 25
`and a threshold reference Voltage.
`As shown in FIG. 3, Sounds captured by the microphone
`10, including the Voice and the background noise, are
`
`Page 21
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`

`

`6,070,140
`
`S
`applied to the power comparator 26, which compares the
`Signal power level with a power threshold value and outputs
`a wake-up signal to the ADC 12 and the CPU 14 as a result
`of the comparison. Thus, when the analog input is voice, the
`comparator output asserts a wake-up signal to the computer.
`The power comparator 26 is preferably a Schmidt trigger
`comparator. The threshold reference Voltage is preferably a
`floating DC reference which allows the use of the detector
`under varying conditions of temperature, battery Supply
`Voltage fluctuations, and battery Supply types.
`FIG. 4 shows an embodiment of the wake-up logic 24
`suitable for noisy environments. In FIG. 4, a root-mean
`square (RMS) device 27 is connected to the output of the
`microphone 10, after filtering by the low-pass filter 25, to
`better distinguish noise from Voice and wake-up the com
`15
`puter when the user's voice is directed to the computer. The
`RMS of any voltage over an interval T is defined to be:
`
`| 1
`RMS = if f(t) di
`
`6
`as targets for the outputs of the middle layer, changing the
`weights in the middle layer following the same procedure as
`above. This way the corrections are back-propagated to
`eventually reach the input layer. After reaching the input
`layer, a new test is entered and forward propagation takes
`place again. This process is repeated until either a prese
`lected allowable error is achieved or a maximum number of
`training cycles has been executed.
`Referring now to FIG. 6, the first layer of the neural
`network is the input layer 38, while the last layer is the
`output layer 42. Each layer in between is called a middle
`layer 40. Each layer 38, 40, or 42 has a plurality of neurons,
`or processing elements, each of which is connected to Some
`or all the neurons in the adjacent layers. The input layer 38
`of the present invention comprises a plurality of units 44, 46
`and 48, which are configured to receive input information
`from the ADC 12. The nodes of the input layer does not have
`any weight associated with it. Rather, its Sole purpose is to
`Store data to be forward propagated to the next layer. A
`middle layer 40 comprising a plurality of neurons 50 and 52
`accepts as input the output of the plurality of neurons 44-48
`from the input layer 38. The neurons of the middle layer 40
`transmit output to a neuron 54 in the output layer 42 which
`generates the wake-up command to CPU 14.
`The connections between the individual processing units
`in an artificial neural network are also modeled after bio
`logical processes. Each input to an artificial neuron unit is
`weighted by multiplying it by a weight value in a proceSS
`that is analogous to the biological Synapse function. In
`biological Systems, a Synapse acts as a connector between
`one neuron and another, generally between the axon (output)
`end of one neuron and the dendrite (input) end of another
`cell. Synaptic junctions have the ability to enhance or inhibit
`(i.e., weigh) the output of one neuron as it is inputted to
`another neuron. Artificial neural networks model the
`enhance or inhibit function by weighing the inputs to each
`artificial neuron. During operation, the output of each neu
`ron in the input layer is propagated forward to each input of
`each neuron in the next layer, the middle layer. The thus
`arranged neurons of the input layer Scans the inputs which
`are neighboring Sound values and after training using tech
`niques known to one skilled in the art, detects the appro
`priate utterance to wake-up the computer. Once the CPU 14
`wakes up, the neural network 36 is put to sleep to conserve
`power. In this manner, power consumption is minimized
`while retaining the advantages of Speech recognition.
`Although an accurate Speech recognition by humans or
`computerS requires a quiet environment, Such a requirement
`is at times impracticable. In noisy environments, the present
`invention provides a high-order, gradient noise cancelling
`microphone to reject noise interference from distant Sources.
`The high-order microphone is robust to noise because it
`precisely balances the phase and amplitude response of each
`acoustical component. A high-order noise cancelling micro
`phone can be constructed by combining outputs from lower
`order microphones. As shown in FIG. 7, one embodiment
`with a first order microphone is built from three Zero order,
`or conventional preSSure-based, microphones. Each micro
`phone is positioned in a port located Such that the port
`captures a different Sample of the Sound field.
`The microphones of FIG. 7 are preferably positioned as
`far away from each other as possible to ensure that different
`Samples of the Sound field are captured by each microphone.
`If the noise Source is relatively distant and the wavelength of
`the noise is Sufficiently longer than the distance between the
`ports, the noise acoustics differ from the user's Voice in the
`magnitude and phase. For distant Sounds, the magnitude of
`
`25
`
`FIG. 5 shows in greater detail a low-power implementa
`tion of the RMS device 27. In FIG. 5, an amplifier 31 is
`connected to the output of filter 25 at one input. At the other
`input of amplifier 31, a capacitor 29 is connected in Series
`with a resistor 28 which is grounded at the other end. The
`output of amplifier 31 is connected to the input of amplifier
`31 and capacitor 29 via a resistor 30. A diode 32 is connected
`to the output of amplifier 31 to rectify the waveform. The
`rectified Sound input is provided to a low-pass filter, com
`prising a resistor 33 connected to a capacitor 34 to create a
`low-power version of the RMS device. The RC time con
`Stant is chosen to be considerably longer than the longest
`period present in the Signal, but short enough to follow
`variations in the signal's RMS value without inducing
`excessive delay errors. The output of the RMS circuit is
`converted to a wake-up pulse to activate the computer
`System. Although half-wave rectification is shown, full
`wave rectification can be used. Further, other types of RMS
`40
`device known in the art can also be used.
`FIG. 6 shows another embodiment of the wake-up logic
`24 more suitable for multi-speaker environments. In FIG. 6,
`a neural network 36 is connected to the ADC 12, the
`wake-up logic 24 and the CPU 14. When the wake-up logic
`24 detects speech being directed to the microphone 10, the
`wake-up logic 24 wakes up the ADC 12 to acquire Sound
`data. The wake-up logic 24 also wakes up the neural network
`36 to examine the Sound data for a wake-up command from
`the user. The wake-up command may be as Simple as the
`word “wake-up' or may be a preassigned name for the
`computer to cue the computer that commands are being
`directed to it, among others. The neural network is trained to
`detect one or more of these phrases, and upon detection,
`provides a wake-up signal to the CPU 14. The neural
`network 36 preferably is provided by a low-power micro
`controller or programmable gate array logic to minimize
`power consumption. After the neural network 36 wakes up
`the CPU 14, the neural network 36 then puts itself to sleep
`to conserve power. Once the CPU 14 completes its operation
`and before the CPU 14 puts itself to sleep, it reactivates the
`neural network 36 to listen for the wake-up command.
`A particularly robust architecture for the neural network
`36 is referred to as the back-propagation network. The
`training process for back-propagation type neural networks
`Starts by modifying the weights at the output layer. Once the
`weights in the output layer have been altered, they can act
`
`35
`
`45
`
`50
`
`55
`
`60
`
`65
`
`Page 22
`
`AMAZON 1007
`Amazon v. SpeakWare
`IPR2019-00999
`
`

`

`6,070,140
`
`15
`
`25
`
`35
`
`40
`
`7
`the Sound waves arriving at the microphones are approxi
`mately equal with a Small phase difference among them. For
`Sound Sources close to the microphones, the magnitude of
`the local Sound dominates that of the distant Sounds. After
`the Sounds are captured, the Sound pressure at the ports
`adapted to receive distant Sounds are Subtracted and con
`verted into an electrical signal for conversion by the ADC
`12.
`As shown in FIG. 7, a first order microphone comprises
`a plurality of Zero order microphones 56, 58 and 60. The
`microphone 56 is connected to a resistor 62 which is
`connected to a first input of an analog multiplier 70. Micro
`phones 58 and 60 are connected to resistors 64 and 66, which
`are connected to a Second input of the analog multiplier 70.
`A resistor 68 is connected between ground and the Second
`input of the analog multiplier 70. The output of the analog
`multiplier 70 is connected to the first input of the analog
`multiplier 70 via a resistor 72. In this embodiment, micro
`phones 56 and 58 are positioned to measure the distant
`Sound Sources, while the microphone 60 is positioned
`toward the user. The configured multiplier 70 takes the
`difference of the sound arriving at microphones 56, 58 and
`60 to arrive at a less noisy representation of the user's
`speech. Although FIG. 7 illustrates a first order microphone
`for cancelling noise, any high order microphones Such as a
`Second order or even third order microphone can be utilized
`for even better noise rejection. The Second order microphone
`is built from a pair of first order microphones, while a third
`order microphone is constructed from a pair of Second order
`microphone.
`Additionally, the outputs of the microphones can be
`digitally enhanced to Separate the Speech from the noise. AS
`disclosed in U.S. Pat. No. 5,400,409, issued to Linhard, the
`noise reduction is performed with two or more channels
`Such that the temporal and acoustical Signal properties of
`Speech and interference are Systematically determined.
`Noise reduction is first executed in each individual channel.
`After noise components have been estimated during Speak
`ing pauses, a Spectral Substraction is performed in the
`Spectral range that corresponds to the magnitude. In this
`instance the temporally Stationary noise components are
`damped.
`Point-like noise Sources are damped using an acoustic
`directional lobe which, together with the phase estimation, is
`oriented toward the Speaker with digital directional filters at
`the inlet of the channels. The pivotable, acoustic directional
`lobe is produced for the individual voice channels by
`respective digital directional filters and a linear phase esti
`mation to correct for a phase difference between the two
`channels. The linear phase shift of the noisy voice Signals is
`determined in the power domain by means of a specific
`number of maxima of the cross-power density. Thus, each of
`the first and Second related Signals is transformed into the
`frequency domain prior to the Step of estimating, and the
`phase correction and the directional filtering are carried out
`in the frequency domain. This method is effective with
`respect to noises and only requires a low computation
`expenditure. The directional filters are at a fixed Setting. The
`method assumes that the Speaker is relatively close to the
`microphones, preferably within 1 meter, and that the Speaker
`only moves within a limited area. Non-Stationary and Sta
`tionary point-like noise Sources are damped by means of this
`Spatial evaluation. Because noise reduction cannot take
`place error-free, distortions and artificial insertions Such as
`“musical tones' can occur due to the Spatial Separation of the
`receiving channels (microphones at a specific spacing).
`When the individually-processed channels are combined, an
`averaging is performed to reduce these errors.
`
`45
`
`50
`
`55
`
`60
`
`65
`
`8
`The composite Signal is Subsequently further processed
`with the use of cross-correlation of the Signals in the
`individual channels, thus damping diffuse noise and echo
`components during Subsequent processing. The individual
`Voice channels are Subsequently added whereby the Statis
`tical disturbances of Spectral Subtraction are averaged.
`Finally, the composite Signal resulting from the addition is
`Subsequently processed with a modified coherence function
`to damp diffuse noise and echo components. The thus
`disclosed noise canceller effectively reduces noise using
`minimal computing resources.
`FIG. 8 shows a portable embodiment of the present
`invention where the Voice recognizer is hou

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket