`Doumaet al.
`
`119)
`
`[54]
`
`[75]
`
`METHODS AND APPARATUS FOR
`TRAINING AND OPERATING VOICE
`RECOGNITION SYSTEMS
`
`Inventors: Peter Douma, Wykoff, N.J.; Geoffrey
`Anderson, Cornwall, N.Y.; Masaaki
`Akahane, Mahwah; Semyen
`Mizikovsky, Union, both of N.J.
`
`[73]
`
`Assignees: Sony Corporation, Tokyo, Japan; Sony
`Electronics, Inc., Park Ridge, N.J.
`
`[21]
`
`Appl. No.: 302,460
`
`[22]
`
`Filed:
`
`Sep. 12, 1994
`
`[51]
`[52]
`[58]
`
`[56]
`
`Tint, C05 oocccceccccccccecccccsnessssnscesseeeccseneessses G10L 3/00
`US. Cle cecccescssesssesonetene 395/2.84; 395/2.55; 395/2.6
`Field of Search ..........c.ccccscesere 395/2, 2.4, 2.41,
`395/2.55, 2.6, 2.67, 2.79, 2.84; 381/51
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`|E0108A
`
`(11) Patent Number:
`[45] Date of Patent:
`
`5,583,965
`Dec. 10, 1996
`
`5,386,494
`5,450,525
`
`1/1995 White occssscssssseseeeenerenees 395/2.84
`9/1995 Russell et al.
`....ccsccssesecssnsee 395/2.84
`
`Primary Examiner—Kee M.Tung
`Attorney, Agent, or Firm—William S. Frommer; Alvin
`Sinderbrand
`
`[37]
`
`ABSTRACT
`
`A voice recognition system and method fortraining the same
`are provided wherein a first voice signal representing an
`instruction as well as a predetermined instruction signal
`corresponding to the first voice signal and identifying the
`instruction are input to the voice recognition system. The
`system processes the first voice signal based on the prede-
`termined instruction signal to produce voice recognition data
`for use by the system in identifying the instruction based on
`a second voice signal representing the same instruction. The
`processor stores the voice recognition data for subsequent
`use upon receipt of the second voice signal and carries out
`the instruction in response to the predetermined instruction
`signal correspondingto thefirst voice signal.
`
`5,086,385
`
`2/1992 Launey et al. oesseseeseeee 395/2.84
`
`13 Claims, 4 Drawing Sheets
`
`———
`\
`
`PRINTER 30
`|
`TT TET TT
`
`|
`
`ge
`
`gos
`
`l2
`
`~—~_~--!,;l-=--
`
`
`
`
`_-- 8)
`GAME
`|
`"1 patSSor
`1
`LoL Te m)
`A
`|
`|
`~~»
`HI FI
`Lo Ld
`ras
`a
`“77
`PLAYER
`Lo. “Hr
`rota | robeeee oe
`221 DEVICE
`|
`|
`STORAGE
`|
`
`COMPUTER
`
`--|
`
`FEEDBACK
`
`|
`
`— om ae oe oe
`
`PP |
`ae Ee
`I
`MOUSE
`P=
`bee —
`
`10
`
`Page 1 of 10
`
`GOOGLE EXHIBIT 1008
`
`Page 1 of 10
`
`GOOGLE EXHIBIT 1008
`
`
`
`U.S. Patent
`
`Dec. 10, 1996
`
`Sheet 1 of 4
`
`5,583,965
`
`“FT
`40 ~
`~----), -----
`pW 4 1 VOR
`Lo4~--!);l~ ~~ -!
`
`12
`
`
`
`
`—— 36
`CANE
`|
`71 pRoeSSoR
`|
`
`LoD Lue z
`744
`~~i
`
`ss Lt
`
`COMPUTER
`
`— . _.
`
`bo
`
`HI FI
`-->
`Lo =e!
`|
`cD
`“77
`PLAYER
`-— 7 THT
`ee ee E “Ly
`| repagk | La=--- OAT
`|
`DEVICE
`|
`1
`STORAGE
`|
`
`“|
`
`ee
`
`|
`
`eea
`
`—
`
`vee,
`
`|
`Pe eee 4
`i
`L
`MOUSE
`|
`F= =
`Me ee ee a
`
`0
`
`02
`
`Page 2 of 10
`
`Page 2 of 10
`
`
`
`U.S. Patent
`
`Dec. 10, 1996
`
`Sheet 2 of 4
`
`5,583,965
`
`20
`
`WAIT FOR INPUT
`FROM INPUT
`DEVICE
`
`
`
`
`
`
`INPUT FROM
`PUT DEVICE,
`INPUT DEVICE
`NPUT FROM
`
`
`
`
`NO VOICE DATA
`AND VOICE DATA
`
`
`
`
`
`
`
`
` YES
`
`
`ASSOCIATE
`IGNORE DATA,
`ITEM SELECTED
`60 TO 50
`
`VOICE DATA
`
`
`
`ITEM,
`ACT ON
`G0 T0 50
`
`Page 3 of 10
`
`Page 3 of 10
`
`
`
`U.S. Patent
`
`Dec. 10, 1996
`
`Sheet 3 of 4
`
`5,583,965
`
`FIG.3
`
`104
`
`
`
`MICROPHONE
`
`
`
` KEYPAD & SWITCHES
`
`
`MICROPROCESSOR
`
`FEEDBACK
`
`DEVICE
`
`Page 4 of 10
`
`Page 4 of 10
`
`
`
`U.S. Patent
`
`Dec. 10, 1996
`
`Sheet 4 of 4
`
`5,583,965
`
`
`
`OPERATE PHONE
`
`
`DISREGARD VOICE
`DATA AN
`
`OTHER INPUTS
`
`
` RETAIN VOICE
`SAMPLE AND
`ASSOCIATE
`
`WITH INPUT
`FROM 108
`
`
`
`INPUT FROM
`108 AND VOICE
`DATA?
`
`
`
`AGTNeCORD ING
`
`CARRY OUT
`
`VOICE RECOGNITION
`FUNCTION AND
`
`ASSOCIATE WITH
`AN INSTRUCTION
`
`FIG.4
`
`Page 5 of 10
`
`Page 5 of 10
`
`
`
`5,583,965
`
`1
`‘METHODS AND APPARATUS FOR
`TRAINING AND OPERATING VOICE
`RECOGNITION SYSTEMS
`
`BACKGROUND OF THE INVENTION
`
`Thepresent invention relates to voice recognition systems
`and methods involving training to identify an instruction
`corresponding to a voicesignal.
`Conventional voice recognition systems are categorized
`generally as either speaker independent systems which are
`intended to recognize instructions corresponding to voice
`Signals without
`training of the system to identify such
`instructions, and speaker dependent systems which employ
`such training. In the case of speaker dependent systems,
`voice samples are supplied to the system in response to a
`request from the system that a certain word or groups of
`words be spoken. The system processes the received voice
`signal to produce voice recognition data for future use in
`identifying an instruction correspondingto the same word or
`words expressed by the voice signal. In general, the greater
`the number of such samples provided to the system, the
`morereliably it operates subsequently to identify an instruc-
`tion corresponding to a particular voice signal.
`The training periods required for operating such speaker
`dependent systems are typically quite lengthy and complex.
`Users often find the training procedures tedious.and waste-
`ful.
`
`Training is normally conducted in a single session on a
`given day. During the session,
`the user of the system
`provides a large number ofvoice samples to the system so
`that it can “train” by matching the received voice samplesto
`data indicating the corresponding instruction. However,
`one’s voice changes from day to day. For example,illness or
`stress can cause one’s voice to change over the course of
`time. Consequently, the voice samples provided during the
`usual single training session might not be fairly representa-
`tive of the speaker’s voice under different conditions.
`
`OBJECTS AND SUMMARY OF THE
`INVENTION
`
`It is an object of the present invention to overcome the
`problems and shortcomings of conventional voice recogni-
`tion systems as expressed above.
`It is another object of the present invention to provide
`methods and systems for voice recognition which may be
`trained at the same time that they are put into use, so that a
`separate training procedure is not required.
`In accordance with a first aspect of the invention, a
`method for training and operating a voice recognition sys-
`tem is provided, comprising the steps of: inputting a first
`voice signal to a voice recognition system, the first voice
`signal representing an instruction for the system; inputting a
`first predetermined instruction signal to the system,the first
`predetermined instruction signal corresponding to the first
`voice signal and identifying the instruction separately of the
`first voice signal; processing the first voice signal based on
`the first predetermined instruction signal with the voice
`recognition system to produce voice recognition data for use
`by the system in identifying a second voice signal corre-
`sponding to the first voice signal and representing the
`instruction; storing the voice recognition data in the system;
`carrying out
`the instruction with the use of the voice
`recognition system in response to the first predetermined
`instruction signal corresponding to the first voice signal;
`
`10
`
`15
`
`20
`
`25
`
`30
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`inputting the secondvoice signal; identifying the instruction
`represented by the second voice signal based on the voice
`recognition data; and carrying out the identified instruction.
`In accordance with another aspect of the present inven-
`tion, a voice recognition system is provided comprising: a
`digital voice recognition processor programmedto receive a
`first voice signal corresponding to a predetermined instruc-
`tion, a first predetermined instruction signal corresponding
`to the first voice signal and identifying the instruction
`separately of the first voice signal, and a secondvoice signal
`representing the predeterminedinstruction; meansfor input-
`ting the first and second voice signals to the processor; and
`means for inputting the predeterminedinstruction signal to
`the processor; the processor being programmed furtherto
`processthefirst voice signal basedonthefirst predetermined
`instruction signal to produce voice recognition data enabling
`the processorto identify the instruction based on the second
`voice signal, to store the voice recognition data and to carry
`out the predetermined instruction based on the second voice
`signal and the voice recognition data; the processor being
`programmed to carry out the predetermined instruction in
`responseto the first predetermined instruction signal corre-
`sponding to thefirst voice signal.
`In accordance with a further aspect of the present inven-
`tion, a voice recognition control system comprises: means
`for inputting a first voice signal representing an instruction
`for the system; means for inputting a first predetermined
`instruction signal correspondingto thefirst voice signal and
`identifying the instruction separately ofthefirst voice signal;
`and processing means for processing the first voice signal
`based on the first predetermined instruction signal to pro-
`duce voice recognition data for identifying a second voice
`signal correspondingto the first voice signal and represent-
`ing the instruction; the processing means being operative to
`store the voice recognition data in the system and to carry
`out the instruction in response to the first predetermined
`instruction signal correspondingto the first voice signal; the
`means for inputting the first voice signal being further
`Operative to input the second voice signal; the processing
`meansbeing operative to identify the instruction represented
`by the second voice signal based on the voice recognition
`data and to carry out the identified instruction.
`The above, and other objects, features and advantageous
`of the present invention, will be apparent in the detailed
`description of certain advantageous embodiments thereof
`which is to be read in connection with the accompanying
`drawings forming a part hereof, and wherein corresponding
`parts and components are identified by the same reference
`numerals in the several views of the drawings.
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a generalized block diagram of various embodi-
`ments of voice recognition systems in accordance with the
`present invention;
`FIG. 2 is a flow chart for use in illustrating operation of
`the voice recognition systems of FIG. 1;
`FIG. 3 is a block diagram of a control system for a cellular
`telephone in accordance with certain embodiments of the
`present invention; and
`FIG.4 is a flow chart for use in illustrating operation of
`the control system of FIG.3.
`DETAILED DESCRIPTION OF CERTAIN
`ADVANTAGEOUS EMBODIMENTS
`
`With reference now to FIG. 1, a generalized block dia-
`gram of a computerized voice recognition system in accor-
`
`Page 6 of 10
`
`Page 6 of 10
`
`
`
`5,583,965
`
`3
`dance with various embodiments of the present invention is
`illustrated therein. The system of FIG. 1 includes a computer
`10 programmed to carry out voice recognition based on
`digitized voice data produced from sounds spoken by a user
`and received by a microphone 12. The voice recognition
`function of computer 10 is carried out by comparing voice
`data produced with the use of microphone 12 or a signature
`derived therefrom with voice recognition data previously
`stored by the computer 10. The computer 10 may be
`implemented, for example, by a microprocessor, microcom-
`puter, digital signal processor (DSP), RISC, CISC or other
`digital processor. The functions carried out by computer 10
`are carried out in other embodiments by multiple processors
`or a combinationof different types of processors (such as a
`microcontroller and a DSP). In still other embodiments,
`application specific integrated circuits (ASIC’s) employing
`neural nets or fuzzy logic are employed to carry out the
`functions of computer 10.
`The system of FIG. 1 further includes at least one input
`device which enables a user to input instruction data to the
`computer 10 separately from the voice data input with the
`use of microphone 12. In certain embodiments, a keyboard
`16 coupled with the computer 10 is provided for this
`purpose. In other embodiments, a mouse 18 coupled with the
`computer 10 serves this purpose. Moreover,
`in certain
`embodiments both a keyboard 16 and mouse18 are provided
`to afford the user an option for inputting instruction data.
`Keyboard 16 and mouse 18 are depicted in block form using
`dashed lines, as are further elements discussed below, to
`indicate that one or more of these devices are selected for
`use in a given embodiment depending on the application.
`Various other input devices, such as buttons, switches,
`keypads, touch sensitive displays, etc., may be employed to
`input data instructions, although notillustrated in FIG. 1 for
`simplicity and clarity. Keypads ‘and remote control devices
`are useful for many consumerelectronic devices for input-
`ting instructions and may also be employed in place of
`keyboard and/or mouse 18.
`The generalized system of FIG. 1 also includes a feedback
`device 22 coupled with computer 10 which servesto provide
`information to the user from the computer 10. Where an
`embodimenttakes the form of a personal computer system,
`a monitoror other suitable visual display typically serves as
`the feedback device 22. In consumerelectronic applications,
`LED, LCD and other types of visual displays are typically
`employed.
`In some embodiments, an, audible feedback
`device is employed such as a speaker or other sound
`transducer to provide coded soundsor synthesized speech as
`feedbackto the user. However, the use of feedback device 22
`is not essential to the present invention.
`The computer 10 respondsto received instruction data by
`carrying out an action such as storage of data therein or
`output of display data or sound data to the feedback device
`22 or to another peripheral. Exemplary instructions which
`maybe carried out by the computer 10 in response to such
`instruction data include changing directories, opening and
`closingfiles, editing files, printing, outputting other control
`signals to one or more peripherals, and so on. In short, the
`actions which maybeso initiated include any which may be
`executed by a computer.
`In many applications, the computer 10 is used to control
`a functionof a peripheral device such asa printer 30 coupled
`with the computer 10, and a data storage device 32. For
`example, documents producedandstored in the form of data
`through speech recognition may be printed by meansof the
`printer 30 under the control of the computer 10.
`The system of the present inventionfinds a broad range of
`applications in the consumerelectronics field. In one such
`Page 7 of 10
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`50
`
`55
`
`60
`
`65
`
`4
`application, the computer 10 controls a game image proces-
`sor 36 in a video game apparatus to produce and modify
`image datato be displayed by a television set or monitor in
`playing a video game. The computer 10 responds to voice
`commandsreceived via the microphone 12 for controlling
`the movement of objects within an image represented by
`such image data or else a changein the point of view of such
`an image.
`In other applications, various functions of a TV 40, a VCR
`42, a highfidelity sound reproduction system 44 and/or a CD
`player 46 are controlled by the computer 10 in response to
`a voice commandinput via the microphone 12. Other types
`of consumerandoffice electronics devices (such as answer-
`ing machines and remote controllers), toys, home appliances
`(such as door openers) and other devices may be controlled’
`with the use of the invention.
`
`With reference also to the flow chart of FIG. 2, an
`operation of the computer 10 in one embodimentin respond-
`ing to a voice input and/or an input from one of the input
`devices 16, 18 or otherwise to carry out an instruction is
`illustrated therein. Once the system has been initiated, the
`computer 10 waits to receive at least one of an input from an
`input device and voice data input with the use of the
`microphone 12. Upon receipt of one or more such signals, in
`a step 50 the computer 10 determines whetherit has received
`(1) both an input from an input device as well as voice data
`received by meansof the microphone 12,(2) voice data only,
`or (3) an input from the input device withoutreceipt of voice
`data. In case (1), the computer 10 producesa training mode
`signal; in case (2), it produces a voice recognition mode
`signal; and in case (3), it produces a non-voice command
`signal. Input of voice data may be determined by monitoring
`powerlevels of data produced with the use of microphone
`12. However,
`in the alternative mode selection may be
`carried out by meansof a switch or soft key. In case (1), in
`response to the training mode signal
`the computer 10
`proceedsto a step 54 to store the input as well as the voice
`data to carry out a training function,as indicated in a step 56.
`In the training functionof step 56, the computer 10 produces
`voice recognition data which it stores for future use in
`identifying the same spoken word, words or other sounds
`from the user. The voice recognition data is stored in
`association with data identifying an instruction designated
`by meansof the corresponding input from the input device,
`such as the keyboard 16, mouse 18 or other input device.
`Thevoice recognition data is thus associated with an instruc-
`tion code representing the corresponding instruction to be
`carried out upon receipt of voice data matching the voice
`recognition data. The nature of the instruction will, of
`course, depend uponthe application and the particular action
`which the user wishes to associate with the spoken voice
`command. For example, a user may wish the computer to
`respond to a voice command “open file” to open a desig-
`nated computer file or access such a function.
`While the computer may store voice recognition data in
`the form of digitized received voice sounds, preferably the
`received sounds are processed to produce a voice signature
`requiring less data and which is easier to match with a
`subsequently produced signature representing a subse-
`quently input voice signal. Such signatures can be produced,
`for example, by carrying out one or more spectral analyses
`of a received voice signal. For example, the received signal
`may be separated into time segments and each segment then
`subjected to a spectral analysis, such as a Fast Fourier
`Transform, to separate each segment into spectral compo-
`nents. A signature may then be produced from the various
`spectral components of the segments. The signatures are
`
`Page 7 of 10
`
`
`
`5,583,965
`
`5
`stored by the computer 10 in memory circuits, a hard drive,
`memory disk, tape or other storage device or medium for
`subsequent use in matching a stored signature with the
`signature of a received voice signal. Each signature is stored
`with a code representing the correspondinginstruction to be
`carried out by the computer 10, so that once a match has
`been made,the instruction codeis then used by the computer
`10 to carry out the corresponding action. Preferably, the
`system of FIGS. 1 and 2 does not generate audible emissions
`during the training mode.
`The input from the device 16 or 18 provided along with
`the voice data identifies the instruction. Based on this input,
`in a step 58 the computer 10 carries out the corresponding
`instruction. Accordingly, the system is placed into use for
`carrying out a desired function at the same time thatit is
`separately trained to recognize voice commands.
`If, however, only voice data has been input to the com-
`puter 10 without an accompanying input from a device such
`as the keyboard 16, mouse 18 orother device(i.e., case (2)
`above), in responseto the voice recognition mode signal, the
`computer inputs the voice data in a step 60 and computer 10
`either attempts to match the voice data directly with stored
`voice recognition data or else converts the voice data to a
`signature which is then used to determine whether a match
`exists with any stored signature. If a match is found, as
`determined in a step 62, the computer proceedsin step 58 to
`execute the corresponding instruction andthen return to step
`50.If, however, a matchis not found,as indicated by the step
`62, the computer ignores the voice data (step 64). Then the
`computer returns to the step 50 to await further inputs. In
`addition, the computer, in certain embodiments, outputs an
`appropriate indication to the user via the feedback device 22
`that the voice data was not recognized.
`Finally, if in the step 50 an input from an input device is
`received without any accompanying voice data (i.e., case
`(3)), in response the computer proceedsas indicated by step
`70 directly to step 58 to act upon the instruction represented
`by the input.
`In case (1) described above, the computer 10 not only
`trains itself to recognize a particular voice command by
`storing appropriate voice recognition data and associatingit
`with an input separately identifying the corresponding
`instruction, but it also carries out the command which is
`identified by the input. The input may be supplied concur-
`rently by means of the keyboard 16, the mouse 18 or other
`device. Accordingly, the system may be put to use imme-
`diately as it trains itself to recognize voice commands, and
`the training period may be extended over days or even
`weeks. A further benefit thus realized is that changesin the
`user’s voice over the same period (which might not be
`encountered during the course of a single training session)
`will be experienced by the system so thatit produces voice
`recognition data representative of the user’s voice under
`different conditions which could affect the quality of the
`user’s voice.
`The present invention is particularly useful in telephone
`applications especially where the need to locate and press
`switch buttons is distracting and preferably is carried out
`without the need to look away from some activity which
`simultaneously requires the user’s attention.
`In further
`embodiments of the present invention, telephones are pro-
`vided with a voice recognition function which permits
`training of the function at the sametime that the telephone
`is being controlled by the input of instruction data through
`a keypadorthe like so that it is put to use right away.
`An embodiment of a control system of a cellular tele-
`phone having a voice recognition capability in accordance
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`35
`
`60
`
`65
`
`6
`with the present inventionis illustrated in FIG. 3 in block
`diagram format. In the system of FIG. 3, a microprocessor
`and associated program and main memories are indicated by
`a block 100. The microprocessor is programmedto respond
`either to voice data input by means of a microphone 104 or
`an input from a keypador one or more switches (such as an
`off-hook switch), collectively indicated by a block 108
`coupled to the microprocessor 100. The microprocessor
`provides a voice signal to a codec 112 which carries out
`analog-to-digital conversion of the voice signal. The codec
`112 also carries out digital compression of the voicesignal
`for cellular transmission.
`
`The system also includes a feedback device 120 which
`receives an output from the microprocessor 100 and pro-
`vides a corresponding user-understandablesignal to the user
`as information fed back from the microprocessor 100. In
`various embodimentsthe feedback device 120 is comprised
`of one or more of an LCD, LED orother visual display,
`and/or a voice synthesizer, tone generator or other sound
`generating device.
`The control system of FIG. 3 serves to control the various
`operations of the cellular telephone, including operating
`modeselection, generation of DTMFtones,etc., in response
`to inputs from the keypad and/or switches 108 or to voice
`data input with the use of the microphone 104 and the codec
`112. The overall operation of the FIG. 3 system will now be
`described in connection with the flow chart of FIG.4.
`In a step 130 of FIG. 4, the microprocessor 100 deter-
`mines, based upon an input from one of the keypad and
`switches 108 whether the user has selected a conversation
`operating modeof the cellular telephone.If so, in a step 134,
`the microprocessor 100 ignores further voice data and inputs
`from the keypad and switches 108, and instead outputs
`digitized voice signals and DTMFtones via an output 138
`(FIG. 3) to transmission circuits of the cellular telephone
`{not shown for purposes of simplicity and clarity) for
`carrying on a telephone conversation.
`Tf in the step 130 it is determined that the conversation
`operating modehasnot beenselected, orif the conversation
`operating mode has been discontinued as indicated by an
`input from oneof the keypad and switches 108, processing
`continues to a step 140 where it is determined whetheran
`input has been received from oneofthe keypad and switches
`108 without an accompanying voice input from the micro-
`phone 104 and codec 112. Detection of a voice signal may
`be carried out, for example, by detecting power levels
`represented by data output from the codec 112 to determine
`whether a predetermined power level threshold has been
`exceeded,
`thus indicating that a voice signal has been
`received. If a non-voice input only has been received by the
`microprocessor 100, as indicated by the step 140 the micro-
`processor proceedsto carry out an instruction represented by
`the input, as indicated in a step 144. The input may represent
`standardcellular telephone operating instructions, such as an
`instruction to go off-hook, produce a respective DIMFtone,
`initiate the conversation operating mode, etc. Once step 144
`has beencarried out, processing returns to step 130.
`If, however, the inquiry of the step 140 is answered in the
`negative, processing continues in a step 148 in whichit is
`determined whether both an input from one of the input
`devices 108 and voice data have been received by the
`microprocessor 100. If so, in a step 150 the microprocessor
`100 either stores voice sample data or producesa signature
`for the input voice data and stores either the sample or the
`signature with an indication of the command represented by
`the input from the devices 108 for future use in recognizing
`
`Page 8 of 10
`
`Page 8 of 10
`
`
`
`5,583,965
`
`15
`
`20
`
`25
`
`8
`7
`responding to the first voice signal andidentifying the
`the corresponding
`a voice command and carrying out
`instruction separately of the first voice signal;
`instruction identified by the data associated with the voice
`sample or signature. Preferably, generation of DTMFtones
`processing thefirst voice signal based onthefirst prede-
`and other audible emissions by the telephone are suppressed
`termined instruction signal with the voice recognition
`in this mode of operation until all voice data been entered.
`system to produce voice recognition data for use by the
`As a further feature in certain embodiments, after a tele-
`system in identifying a secondvoicesignal correspond-
`phone numberhas beenentered, a verbal identifier (such as
`ing to the first voice signal and representing the instruc-
`tion;
`the name of the person whose telephone number has been
`entered) may be spoken into microphone 104 and also
`storing the voice recognition data in the system;
`entered. The microprocessor 100 responds by storing voice
`carrying out the instruction with the use of the voice
`sample data or a corresponding signature with data identi-
`recognition system in response to the first predeter-
`fying the associated telephone number. Then processing
`mined instruction signal corresponding to the first
`continues in the step 144 in which the microprocessor
`voice signal;
`carries out the instruction indicated by the input from the
`inputting the second voice signal;
`device 108.
`identifying the instruction represented by the second
`If the answerto the inquiry in step 148 is negative, in a
`voice signal based on the voice recognition data; and
`further step 154 it is determined whether only voice data has
`been received by the microprocessor 100. If so, in a step 160
`carrying out the identified instruction.
`2. The method of claim 1, further comprising the step of
`the microprocessor 100 attempts to match either a sample of
`producinga training mode signal in response to the input of
`the newly received voice data or a corresponding signature
`the first voice signal with the first predetermined instruction
`with either voice data or a signature stored in its memory to
`signal, and wherein the step of processing the first voice
`produce a match. If a match is produced, as indicated in a
`subsequentstep 162, the corresponding data stored with the
`signal is carried out in responseto the training modesignal.
`voice sample or signature which has matchedis usedto carry
`3. The methodof claim 1, further comprising the step of
`out the indicated instruction in the step 144. For example,if
`producing a voice recognition mode signal in responseto the
`the voice data or signature matches stored data representing
`input of the second voice signal in the absence of the input
`the name of a person and indicating his or her telephone
`of a corresponding signal with the second voice signal
`number, microprocessor 100 responds by outputting corre-
`identifying the instruction separately from the second voice
`sponding DTMF tones or else a command to the transmis-
`signal, and wherein the step of identifying the instruction
`sion circuits to generate the tones representing that tele-
`represented by the second voice signal is carried out in
`phone number,in orderto placeacall. If, however, the voice
`response to the voice recognition mode signal.
`data does not produce a match, the user is informed by
`4, The methodof claim 1, further comprising the steps of
`meansof the feedback device 120 that a voice commandhas
`producing a non-voice command signal in response to an
`not been recognized, as indicated in the step 166 and
`input of a second predeterminedinstruction signal identify-
`processing returnsto the step 130.
`ing the instruction and in the absence of a concurrent input
`of a voice signal
`to the system, and carrying out
`the
`Accordingly, it will be appreciated that the system of
`instruction based on the second predetermined instruction
`FIGS. 3 and 4 carries out a voice recognition training
`signal and the non-voice commandsignal.
`function simultaneously with operation of the cellular tele-
`5. The methodof claim 1, wherein the steps of inputting
`phone. Thatis, to train the system, a user operates a selected
`the voice signals comprise entering first and second voice
`button or switch and simultaneously speaks the correspond-
`signals representing an instruction for operating a telephone,
`ing command into the microphone 104. Thecellular tele-
`and the steps of carrying out the instruction and the identi-
`phone underthe control of the microprocessor 100 responds
`fied instruction comprise carrying out said instruction for
`to the commandinput by meansof the button or switch and
`operating a telephone.
`simultaneously stores appropriate voice sample or signature
`6. The method of claim 1, wherein the steps of inputting
`data for carrying out a voice recognition function at a later
`the voice signals comprise entering first and second voice
`time. Accordingly, a separate training session is not required,
`signals representing an instruction for operating a device
`but rather the cellular telephone may be placed in use
`selected from one ofa television receiver, a video cassette
`immediately as training to recognize voice commands is
`recorder, a video game image processor, a high fidelity audio
`simultaneously carried out.
`reproduction system and a compactdisk player andthe steps
`It will be appreciated that the embodimentof FIGS. 3 and
`of carrying out the instruction and the identified instruction
`4 may also be employed in telephones other than cellular
`comprise carrying out
`the instruction for operating the
`telephones.
`selected device.
`Although specific embodiments of the invention have
`7. A voice recognition system, comprising:
`been describedin detail herein with reference to the accom-
`a digital voice recognition processor programmed to
`panying drawings, it is to be understood that the invention
`receive a first voice signal corresponding to a prede-
`is not limited to those precise embodiments and that various
`terminedinstruction, a predeterminedinstruction signal
`changes and modifications may be effected therein by one
`correspondingto the first voice signal and identifying
`skilled in the art without departing from the scope or spirit
`the instruction separately of the first voice signal, and
`of the invention as defined in the appended claims.
`a second voice signal representing the predetermined
`Whatis claimedis:
`instruction;
`1. A method for training and operating a voice recognition
`meansfor inputting the first and second voice signals to
`system, comprising the steps of:
`the processor; and
`inputtinga first voice signal to a voice recognition system,
`meansfor inputting the predetermined instruction signal
`the first voice signal representing an instruction for the
`to the processor;
`system;
`the processor being programmed further to process the
`inputting a first predetermined instruction signal to the
`first voice signal based on the predetermined instruc-
`system, the first predetermined instruction signal cor-
`Page 9 of 10
`
`