`Knittel
`
`I 1111111111111111 11111 111111111111111 111111111111111 11111 1111111111 11111111
`US006606280B 1
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 6,606,280 Bl
`Aug. 12, 2003
`
`(54) VOICE-OPERATED REMOTE CONTROL
`
`(75)
`
`Inventor: Guenter Knittel, Mountain View, CA
`(US)
`
`(73) Assignee: Hewlett-Packard Development
`Company, Houston, TX (US)
`
`( *) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by O days.
`
`(21) Appl. No.: 09/255,288
`
`(22) Filed:
`
`Feb. 22, 1999
`
`Int. Cl.7 .................................................. H04D 1/00
`(51)
`(52) U.S. Cl. ................... 367/198; 340/825.69; 341/176
`(58) Field of Search ......................... 367/198; 381/73.1,
`381/110; 340/825.69, 825.25; 341/176;
`704/275, 246; 348/734
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`4,641,292 A * 2/1987 Tunnell ...................... 367/198
`5,199,080 A * 3/1993 Kimura et al. .............. 381/110
`5,226,090 A * 7/1993 Kimura ...................... 381/110
`5,241,692 A * 8/1993 Harrison ..................... 381/110
`5,247,580 A * 9/1993 Kimura et al. ................ 381/43
`5,255,326 A * 10/1993 Stevenson ................... 381/110
`5,267,323 A * 11/1993 Kimura ...................... 381/110
`5,335,276 A * 8/1994 Thompson et al.
`........... 380/21
`5,465,401 A * 11/1995 Thompson ................... 455/89
`5,602,963 A * 2/1997 Bissonnette et al. ........ 704/275
`5,636,464 A * 6/1997 Ciluffo ....................... 367/198
`5,650,831 A * 7/1997 Farwell ...................... 348/734
`5,668,929 A * 9/1997 Foster ........................ 367/198
`5,848,163 A * 12/1998 Gopalakrishnan et al. .. 704/275
`
`6,119,088 A * 9/2000 Ciluffo ....................... 704/275
`6,188,985 Bl * 2/2001 Thrift et al.
`................ 704/275
`6,204,796 Bl * 3/2001 Chan ..................... 340/825.69
`
`FOREIGN PATENT DOCUMENTS
`* 2/1994
`* 2/1994
`
`WO 94/03017
`WO
`WO 94/03020
`WO
`* cited by examiner
`Primary Examiner-Brian Zimmerman
`(74) Attorney, Agent, or Firm-Marc P. Schuyler
`
`H04Q/1/00
`H04Q/9/00
`
`(57)
`
`ABSTRACT
`
`This disclosure provides a voice-operated remote control
`intended to replace multiple entertainment system remotes,
`and it preferably includes two parts, a base unit and a remote
`( or table-top) unit. During normal operation, the base unit
`receives each electronic speaker driver signal from a stereo
`receiver or other sound source and uses speaker-specific
`transfer functions to generate an "audio mimic signal" which
`accounts for room acoustics and circuitry distortions. This
`signal is then subtracted from detected sound and a residual
`is used to detect spoken commands. In response to spoken
`commands, learned IR commands are transmitted by the
`base unit to the remote unit, which then repeats these
`commands, directing them toward the appropriate entertain(cid:173)
`ment system. Learning of room acoustics and of IR and
`spoken commands are each performed in discrete modes.
`During a speaker learning mode, the base unit causes each
`speaker in turn to generate a test pattern which is measured
`via microphone and used to develop a speaker-specific
`transfer function. During a command learning mode, a user
`speaks each command (e.g., "TV on," "Tape Off," "louder,"
`etc) several times into the remote unit until that spoken
`command is "learned" and recognizable.
`
`21 Claims, 5 Drawing Sheets
`
`29
`
`REMOTE UNIT
`
`MIC
`
`49
`KEYPAD
`
`51
`IA
`
`FROM
`BASE
`UNIT
`
`TO
`DEVICES
`AND
`BASE
`UNIT
`
`Page 1 of 15
`
`GOOGLE EXHIBIT 1017
`
`
`
`i,-
`~
`Q
`~
`'N
`a-...
`Q
`a-...
`_,.a-...
`rJ'l
`e
`
`Ul
`
`'"""' 0 ....,
`~ ....
`'JJ. =(cid:173)~
`
`0 8
`N
`'"""' ~N
`~
`~
`
`~ = ......
`~ ......
`~
`•
`r:JJ.
`d •
`
`75
`
`RECOGNITION
`'
`11
`/ 73
`I
`
`SPEEC'H
`
`59
`
`61
`
`IR
`
`UNIT
`BASE
`AND
`DEVICES
`TO
`
`UNIT
`BASE
`FROM
`
`I
`I
`I
`1 FILTRATION
`! ___ \_ __________ _
`
`57 1 RF
`
`39
`
`431
`
`_________ J
`
`---------
`
`19
`
`=
`
`41 31
`
`I
`I
`I
`I
`I
`
`I
`r37
`17
`I :
`/_I ~
`15
`
`-
`
`<I~
`
`.. __ I .2
`
`1
`
`23
`
`31
`
`11 I
`
`51
`
`~•~'--K_E_Y_c_;_D
`
`REMOTE UNIT
`
`29
`
`FIG. 2
`
`FIG. 1
`
`27
`
`Page 2 of 15
`
`
`
`i,-
`~
`Q
`~
`'N
`a-...
`Q
`a-...
`_,.a-...
`rJ'J.
`e
`
`Ul
`0 ....,
`N
`~ ....
`'JJ. =(cid:173)~
`
`0 8
`N
`'"""' ~N
`~
`~
`
`~ = ......
`~ ......
`~
`•
`r:JJ.
`d •
`
`FROM BASE UNIT AND ECHO
`
`RECEIVE IR COMMAND VIA RADIO
`
`_
`
`UNIT VIA RADIO
`
`/ RELAY AUDIO TO BASE
`
`AUDIO USING MICROPHONE
`PROCESS KEYAD / RECEIVE
`
`:
`
`I _____________________________ J
`:
`:
`
`USING IA TRANSMITTER
`
`L _____________________________ j
`1
`_._-.---,.
`
`TRANSMIT TO REMOTE UNIT
`FETCH IA COMMAND(S) AND
`
`ANY SPOKEN COMMAND
`
`MONITOR RESIDUAL AND RECOGNIZE
`
`I
`1
`:
`1
`
`SIGNAL FROM MIC OUT
`SUBTRACT AUDIO MIMIC
`
`RADIO FROM REMOTE UNIT
`RECEIVE MIC OUT SIGNAL VIA
`
`FIG. 4
`
`~
`127 I
`I
`I
`I
`
`!
`1
`
`:
`1
`I
`I
`I
`I
`
`APPLY Hn(w) FOR 1 TO N CHANNELS
`
`TO GET AUDIO MIMIC SIGNAL
`SUM TOGETHER COMPONENTS
`
`+
`
`RECEIVE SPEAKER AUDIO AND
`
`I
`I
`
`1
`
`COMMAND FROM DEVICE REMOTES
`
`DETECT AND STORE INFRARED
`
`I
`I
`I
`I
`~-----------------------------i L-----------------------------1
`
`RELAY AUDIO TO BASE UNIT
`PERFORM K SUCCESSFUL DETECTS~_-,----,AUDIO USING MICROPHONE/
`PROCESS KEYPAD/ RECEIVE
`
`:
`
`:
`I
`~-------
`
`1
`:
`:
`L-----------------------------1
`:
`: ~
`: 123:
`1
`1
`:
`:
`
`OF SPOKEN COMMAND
`
`SPOKEN COMMAND
`
`STORE J REPETITIONS OF
`
`1 TON SPEAKERS
`
`CALCULATE Hn(w) FOR
`
`1
`1
`
`RESPONSE AT REMOTE UNIT
`
`RECEIVE RADIO OF MICROPHONE
`
`USING TEST PATTERNS
`
`:
`1/:
`
`: ~~~~~~~~~~~= I
`[ BASE UNIT FUNCTIONS:
`
`DRIVE EACH OF 1 TO N SPEAKERS
`
`1
`
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`----------------1
`
`VIA RADIO
`
`VIA RADIO
`
`RELAY AUDIO TO BASE UNIT
`AUDIO USING MICROPHONE/
`PROCESS KEYPAD/ RECEIVE
`
`FUNCTIONS
`REMOTE UNIT
`
`117
`
`115
`
`OPERATION
`
`NORMAL
`
`107
`
`LEARNING MODE
`
`COMMAND
`
`105
`
`13
`
`10
`
`,------=-=,..,,,,...."----c---c=---.
`
`LEARNING MODE
`
`SPEAKER
`
`CONFIGURATION
`
`101
`
`INITIAL
`
`Page 3 of 15
`
`
`
`i---
`~
`Q
`~
`'N
`a-...
`Q
`a-...
`_,.a-...
`rJ'J.
`e
`
`~
`
`Ul
`0 ....,
`~ ....
`'JJ. =(cid:173)~
`
`0 8
`N
`'"""' ~N
`~
`~
`
`~ = ......
`~ ......
`~
`•
`r:JJ.
`d •
`
`FIG. 6
`
`175~ KEYPAD
`
`DONE
`
`ENTER IR COMMANDS
`USER PROMPTED TO
`
`1 PRESSED 1
`: <END>
`1
`---------1
`
`~-c-;J~~;-1
`
`TO TEST
`
`USER PROMPTED
`
`TO REPEAT Q TIMES
`
`USER PROMPTED
`L <LEARN> PRESSED:
`
`------.. -------
`
`131
`
`NO
`
`131
`
`FIG. 8
`
`<LEARN> PRESSED
`
`FIG. 5
`
`'
`
`' <END> PRESSED
`
`*
`
`171
`
`~
`
`155
`
`153
`
`147
`
`149 ~
`
`~
`
`Page 4 of 15
`
`
`
`i,-
`~
`Q
`~
`'N
`a-...
`Q
`a-...
`-..a-...
`rJ'J.
`e
`
`Ul
`0 ....,
`~ ....
`'JJ. =-~
`
`,i;;..
`
`0 8
`N
`'"""' ~N
`~
`~
`
`~ = ......
`~ ......
`~
`•
`r:JJ.
`d •
`
`L-.. -J L----
`I
`71 DSP ~_J AID 14-______ _J BPF 14------------,--_j----.E
`
`233
`
`235
`
`I ti r---/c----~
`
`235
`,-I
`
`l_ __ l ____ 1
`I
`I
`1
`
`I
`
`BUFFER
`
`-'------------'-----_______________ ]
`
`I
`
`I' I
`
`l_ L:~=--=-~=-~ _L/ ~-~=~--=-=-~=:~=--· ---------L ___ J
`
`L-71 DSP ~_J AID 14-------J BPF 14----------____ _j __ --.E
`: r-L/c----~
`233
`
`I
`
`:
`
`I
`I
`I
`
`J_
`211
`
`201
`
`I
`
`I
`'
`
`207
`
`207
`
`AMP
`
`L_____
`I
`I
`r----~
`231
`
`I
`r----~
`231
`
`I
`
`I BPF
`
`I ~ DSP ~ AID 14
`
`11
`
`?03
`
`:
`I
`I
`-------I
`
`209
`
`249
`
`--------
`
`--
`
`251
`
`FIG. 7
`
`------------------------
`
`. --239
`
`215
`
`1---------/~~-
`
`I
`
`Page 5 of 15
`
`
`
`U.S. Patent
`
`Aug. 12, 2003
`
`Sheet 5 of 5
`
`US 6,606,280 Bl
`
`FIG. 9
`
`FIG. 10
`
`PEAKER CONFIGURATIO
`MODE; USER STARTS "TEST
`CD"
`
`FOR EACH OF1 TO N SPEAKERS
`
`EACH MICROPHONE INPUT
`IS FIL TE RED & STORED
`
`CROSS-CORRELATE
`MICROPHONE INPUTS;IDENTIFY
`LOCATION OF SPEAKER n
`
`STORE PHASE INFORMATION
`FOR SPEAKER n
`
`NORMAL OPERATION
`,I,
`j FOR EACH OF 1 TO N SPEAKERS j
`
`..
`
`CORRELATE ALL MIC INPUTS USING
`STORED PHASE INFORMATION TO
`ISOLATE SPEAKER CONTRIBUTION
`i
`SUM SPEAKER CONTRIBUTIONS
`TO YIELD AUDIO MIMIC SIGNAL
`l
`SUBTRACT AUDIO MIMIC SIGNAL
`FROM ALL MIC INPUTS AND TAKE
`STRONGEST RESIDUAL
`l
`MONITOR RESIDUAL TO DETECT
`POSSIBLE SPOKEN COMMAND
`l
`IF SPOKEN COMMAND DETECTED,
`DETERMINE CORRESPONDING IR
`COMMANDS AND TRANSMIT TO
`ENTERTAINMENT SYSTEM(S)
`
`Page 6 of 15
`
`
`
`US 6,606,280 Bl
`
`1
`VOICE-OPERATED REMOTE CONTROL
`
`The present invention relates to electronic remote control
`devices, such as may be used to control a television, video(cid:173)
`cassette recorder or stereo component. In particular, this
`disclosure provides a voice-operated remote control that can
`be used for a wide variety of entertainment systems.
`
`BACKGROUND
`
`5
`
`2
`for the proper remote, or navigate a menu m a darkened
`entertainment room; a user "speaks," and a recognized
`command results in the proper electronic command being
`automatically effected. As can be seen, therefore, the present
`invention provides still additional convenience in using
`entertainment systems.
`One form of the present invention provides a voice(cid:173)
`operated remote control having a sound detector (such as a
`microphone) that detects sound. The remote also includes a
`10 memory that stores commands to be transmitted to one or
`more entertainments systems, a filtration module, a recog(cid:173)
`nition module, and a wireless transmitter. The microphone's
`output is passed to the filtration module, which filters
`background sound such as music to more clearly detect the
`user's voice. The recognition module compares the user's
`15 voice with spoken command data, which can also be stored
`in the memory. If the spoken command is recognized, the
`commands are retrieved from memory and transmitted to an
`entertainment system.
`In more particular features of the invention, the com-
`20 mands can be transmitted to the entertainment system
`through a transmitter, such as an infrared transmitter just as
`present-day remotes or "zappers," which also transmit in
`infrared. In this manner, a voice-operated remote control can
`be used to replace remotes that come with televisions (TVs)
`25 and other entertainment systems, e.g., the voice-operated
`remote control is used instead of a remote provided along
`with the TV or other entertainment system. The voice(cid:173)
`operated remote can be made "universal" such that a user
`can program the voice-operated remote control with infrared
`30 commands and device codes for video tape recorders, DVD
`players, TVs, stereo components, cable boxes, etc.
`More particularly, the preferred voice-operated system is
`embodied as two units, including a base unit and a remote
`( or table-top) unit. The remote unit preferably uses little
`35 power, and relays a microphone signal to the base unit that
`represents user speech among other "noise." The base unit is
`either in-line with electronic speaker signals, or is connected
`to receive a copy of those signals (e.g., connected to a TV
`to receive its audio output), and these signals are used to
`40 generate an audio mimic signal (e.g., a music signal) which
`is subtracted from the microphone output. The base unit
`thereby produces a residual used to recognize the user's
`spoken commands, notwithstanding the presence of a home
`theater system, sub-woofer, and other types of electronic
`45 speakers within a room. Upon detection of a spoken
`command, infrared commands can then be transmitted to the
`remote unit, which can have an infrared "repeater" for
`relaying commands back to the appropriate entertainment
`system or systems.
`The invention may be better understood by referring to
`the following detailed description, which should be read in
`conjunction with the accompanying drawings. The detailed
`description of a particular preferred embodiment, set out
`below to enable one to build and use one particular imple-
`55 mentation of the invention, is not intended to limit the
`enumerated claims, but to serve as a particular example
`thereof.
`
`Many people today have televisions (TVs), videocassette
`recorders(VCRs), home theater systems, digital versatile
`disk (DVD) players, stereo components and other entertain(cid:173)
`ment systems and, on an increasing basis, these devices are
`conveniently operated using remote controls (sometimes
`also called "remotes," "clickers" or "zappers"). These
`"remotes" typically use infrared light and special device
`codes to transmit commands to particular entertainment
`systems. Each remote/device pair usually uses a different
`device code, which prevents signals from being crossed.
`"Universal" remotes receive programming of multiple
`device codes and provide a user with many different control
`buttons, such that a single universal remote can often control
`several entertainment systems in a house or other
`environment, thereby replacing the need for at least some
`remotes.
`While useful for their intended purpose, however, these
`modern remotes are not necessarily optimal. A remote may
`become lost or damaged through frequent handling, or may
`run out of battery power, which must be replenished from
`time-to-time. Typically also, a user must first locate and
`grasp a remote before it may be used and then aim it toward
`the particular entertainment system to be controlled. Modern
`entertainment systems also have complicated control menus,
`which can require special buttons not found on the universal
`remotes. Not infrequently, and despite availability of uni(cid:173)
`versal remotes, a person may need three or more remotes for
`complete control of multiple home entertainment systems,
`particularly where devices such as cable boxes, laser disk
`players, DVD players and home theater systems are also
`involved. Even a relatively simple action, such as changing
`the television station, may require a sequence of interac(cid:173)
`tions.
`Finally, it should also be considered that the presence of
`complicated menus and numerous remotes increases the
`possibility of error and confusion, which can lead to user
`dissatisfaction.
`What is needed is a remote control that is easy to operate
`under all circumstances. Ideally, such a remote control
`should be user friendly, and "universal" to many different 50
`systems, notwithstanding the presence of complicated con(cid:173)
`trol menues. Also, such a remote control should withstand
`frequent use, being relatively insensitive to the wear from
`frequent handling that often affects handled remotes. The
`present invention solves these needs and provides further,
`related advantages.
`
`SUMMARY OF THE INVENTION
`The present invention provides a voice-operated remote
`control. By permitting a system to understand a user's 60
`spoken commands and reducing the requirement to fre(cid:173)
`quently handle a remote, the present invention provides a
`remote control that is easy to use and should have signifi(cid:173)
`cantly longer life than conventional handheld remotes. At
`the same time, by using spoken commands in place of 65
`buttons, the present invention potentially reduces user con(cid:173)
`fusion and frustration that might result from having to search
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`FIG. 1 shows a user, a preferred remote control, and
`several home entertainment systems having electronic
`speakers. The preferred remote control is seen in FIG. 1 to
`include a remote unit 29 and a base unit 31.
`FIG. 2 shows a basic block diagram of the preferred
`remote unit from FIG. 1, and shows a microphone and radio
`frequency (RF) transmitter, a keypad and an infrared (IR)
`repeater.
`
`Page 7 of 15
`
`
`
`US 6,606,280 Bl
`
`3
`FIG. 3 shows a basic block diagram of the preferred base
`unit from FIG. 1, including several speaker inputs, an RF
`receiver for receiving the microphone output from the
`preferred remote unit, a filtration module (indicated by
`phantom lines) for isolating user spoken commands, a
`speech recognition unit, and an infrared transmitter and
`receiver for issuing commands to entertainment systems;
`receipt of infrared commands is used in a command learning
`process while issued commands are preferably sent to (and
`repeated by) the remote unit, such that they are directed
`toward the appropriate entertainment system.
`FIG. 4 is a three-part functional diagram showing in a left
`column several basic modes of the preferred remote control
`and in middle and right columns the functions performed by
`each of the base unit (middle column) and remote unit (far
`right column) while in these modes.
`FIG. 5 is a perspective view of a remote unit, including a
`microphone grille, keypad and window for the IR repeater,
`all visible from the exterior of the remote unit; the remote
`unit is preferably placed in front of a user with the micro(cid:173)
`phone grille facing the user, while the IR window is directed
`towards one or more entertainment systems and the base
`unit.
`FIG. 6 is a detailed block diagram showing the circuitry 25
`of the preferred remote unit of FIG. 2.
`FIG. 7 is a detailed block diagram showing the circuitry
`of the preferred base unit of FIG. 3.
`FIG. 8 is a block diagram showing the process of learning
`to recognize user voice commands.
`FIGS. 9-10 are block diagrams of alternative processing,
`where two or more microphones (illustrated in the remote
`unit in FIG. 1) are used, to track and identify sound sources
`based on relative position to the remote unit.
`FIG. 9 is a block diagram showing use of multiple 35
`microphones in a speaker learn mode.
`FIG. 10 is a block diagram showing use of multiple
`microphones in normal operation.
`
`4
`the preferred remote unit are to capture a good voice signal
`from the user, and also to relay infrared (IR) commands to
`one or more entertainment systems. [The preferred embodi(cid:173)
`ment may be applied to systems that use some other corn-
`s munication besides IR, but since most entertainment sys(cid:173)
`tems use IR remotes, IR communication is preferably used.]
`The remote unit is preferably located close to the user,
`usually on a sofa table. It contains a microphone, amplifi(cid:173)
`cation and filtering circuitry and a radio frequency (RF)
`10 transmitter. It also has an IR receiver and transmitter,
`collectively called the IR repeater.
`The second of these boxes or units, the "base unit" ( or
`"rack unit") is preferably connected to all speaker outlets of
`all amplifiers in the room or, more precisely, all speakers
`15 which contribute to the "noise." This unit will most conve(cid:173)
`niently be placed in a stereo rack or entertainment center,
`and it contains noise cancellation circuitry, a signal
`generator, a RF receiver, a speech recognition unit, a small
`computer and an IR receiver/transmitter pair ("transceiver").
`20 Because this circuitry requires significantly more power than
`the remote unit, the base unit will preferably be a rectangular
`box that plugs into a conventional electrical outlet.
`Notably, while the preferred embodiment uses the remote
`unit and base unit to respectively house circuitry for various
`functions, this functional allocation and two-unit arrange(cid:173)
`ment are not required for implementation of the invention,
`and the functionality described below may be rearranged
`between these two units or even combined within a single
`housing without significantly changing the basic operating
`30 features described herein. For example, in an alternative
`embodiment, all communication between the remote unit
`and the base unit can occur by RF transmission, or by a
`direct electrical connection.
`FIG. 1 illustrates positioning of the preferred two-unit
`arrangement in a hypothetical home. In particular, FIG. 1
`shows an entertainment center 11 having several entertain(cid:173)
`ment systems, including a television (TV) 13, a videocas(cid:173)
`sette recorder (VCR) 15, a compact disk (CD) player 17 and
`a stereo receiver 19. The entertainment center may have
`40 many other common devices not seen in FIG. 1, such as a
`digital versatile disk (DVD) player, a cassette tape player, an
`equalizer, a laser disk player, a cable box, a satellite dish
`control, and other similar devices. As with many such
`entertainment systems, audio is produced, usually for stereo
`45 or television, and FIG. 1 shows two speaker sets, including
`a left channel speaker 21 and a right channel speaker 23, and
`a pair of TV speakers 25. Many modern day entertainment
`centers provide "home theater sound" and have all speakers
`driven by one element, often the stereo receiver 19, to
`so produce five channels of audio output (not seen in FIG. 1)
`including front and back sets of left and right audio channels
`and a center channel. The entertainment center 11 may also
`include a sub-woofer (not seen in FIG. 1). [Since most user
`spoken commands can be detected and distinguished by
`ss considering only the spectral range of 200 Hertz-4,000
`Hertz, the base unit and remote unit each filter both detected
`sound at the microphone and electronic speakers signals to
`consider this range only. Thus, sub-woofer driver signals
`usually do need not to be processed electronically, and will
`60 not be extensively discussed herein.]
`While the preferred embodiment as further described
`below accepts a home theater input (e.g., five channel
`audio), FIG. 1 illustrates four speakers for the purpose of
`providing an introduction to the principal parts.
`A user 27 of the entertainment center may have a multi(cid:173)
`tude of remotes that have been pre-supplied with the various
`entertainment systems 13-19, and the preferred embodiment
`
`DETAILED DESCRIPTION
`
`The invention summarized above and defined by the
`enumerated claims may be better understood by referring to
`the following detailed description, which should be read in
`conjunction with the accompanying drawings. This detailed
`description of a particular preferred embodiment, set out
`below to enable one to build and use one particular imple(cid:173)
`mentation of the invention, is not intended to limit the
`enumerated claims, but to serve as a particular example
`thereof. The particular example set out below is the pre(cid:173)
`ferred specific implementation of a voice-operated remote
`control having two distinct components, including a base
`unit and a remote unit. The invention, however, may also be
`applied to other types of systems as well.
`I. The Principal Parts
`In accordance with the principles of the present invention,
`the preferred embodiment is a voice-operated remote control
`that is split into two separate boxes or "units." Voice control
`immediately raises the issue of noise cancellation, especially
`in an environment in which sound at a high volume is a
`wanted feature (such as is typically the case when viewing
`entertainment). However, in the entertainment setting, the
`"noise" is relatively well known, e.g., it is roughly the sound
`produced by the speakers and reflected by a room's interior.
`Therefore, one of these two units, the "remote unit" ( or 65
`"table-top unit") is preferably a small, battery-powered
`device that is located near a user. The primary functions of
`
`Page 8 of 15
`
`
`
`US 6,606,280 Bl
`
`5
`is a voice-operated "universal" remote control that replaces
`all of these pre-supplied remotes. In particular, the preferred
`embodiment follows the two-unit format mentioned above
`and includes a remote unit 29 positioned near the user, and
`a base unit 31 positioned near or within the entertainment
`center 11. The remote unit is depicted as having an antenna
`33 (although the antenna will typically be within the remote
`unit, and not externally visible), at least one microphone
`(with two microphones 35 being illustrated in FIG. 1), and
`an infrared transmission window 36 through which the
`remote unit receives and sends infrared commands intended
`for the various entertainment systems 13-19. Importantly,
`only one microphone is used in the preferred embodiment,
`but an alternative embodiment discussed below which filters
`sound sources based on relative position to the remote unit
`might use at least two microphones.
`The various entertainment systems are all depicted as
`having cable connections 37 between one another, partly to
`enable provision of sound via electronic speaker cables 39
`and 41 to the left and right channel speakers 21 and 23. The
`base unit 31 is preferably positioned to intercept electronic
`speaker signals output by the stereo receiver 19, for a
`purpose that will be described below. In fact, it is desired for
`the base unit 31 to intercept all speaker signals produced by
`the home entertainment system and, to this effect, the audio
`output of the television in the hypothetical system illustrated
`is also coupled via a cable 43 to the base unit to provide a
`copy of the signals that drive the TV speakers 25. [In many
`home theater systems, the TV speakers will be muted, with
`all audio outputs being provided by the stereo receiver.]
`Basic operation of the remote unit 29 is illustrated with
`reference to FIG. 2, which illustrates microphone circuitry
`45, an antenna 47, keypad circuitry 49 for entering mode
`commands, audio mute and any desired numeric entries, and
`an IR repeater 51. The IR repeater receives keypad entries, 35
`which are transmitted via infrared to the base unit, and it also
`echos infrared commands intended for the home entertain(cid:173)
`ment systems, which are originally generated at the base unit
`in the preferred embodiment.
`FIG. 3 illustrates basic layout of the base unit 31, and
`shows an antenna 53, a RF demodulator 55, a filtration
`module 57, a speech recognition module 59, and an IR
`transceiver 61. The filtration module 57 receives a continu(cid:173)
`ous radio transmission from the remote unit's microphone,
`and it also receives a number of speaker inputs 63, each of
`which is put through analog-to-digital (AID) conversion and
`transformed by application of a speaker-specific transfer
`function; these functions are respectively designated by the
`numerals 65 and 67 in FIG. 3. The filtration module 57 sums
`these transformed speaker signals together via a summing
`junction 72 to yield an audio mimic signal 69. This audio
`mimic signal, in turn, is subtracted from information 71
`representing sound received at the microphone (not seen in
`FIG. 3) to thereby generate a residual 73. Because the audio
`mimic signal represents TV and stereo sound at the summing
`junction, the residual 73 will represent primarily speech of
`the user.
`The residual 73 is input to the speech recognition module
`59 which processes the residual to detect user speech, to
`learn new user spoken commands, and to associate a detec(cid:173)
`tion of a known spoken command with an IR command
`intended for one or more of the entertainment systems
`(which are seen in FIG. 1). As indicated by FIG. 3, these
`commands are stored in an IR code selection table 75 for
`selective transmission using the IR transceiver 61.
`Significantly, the remote unit 29 of FIG. 2, and the base
`unit 31 of FIG. 3, do not process all generated audio, since
`
`6
`only user speech is of interest in the preferred embodiment.
`Rather, a microphone filter (not seen in FIG. 2) removes
`high and low audio frequencies, such that less information
`has to be sent via RF to the base unit. Similarly, speaker
`5 bandpass filters (not seen in FIG. 3) filter the speaker inputs
`to the base unit, to similarly remove unneeded high and low
`audio frequencies.
`With the principal hardware components of the preferred
`embodiment thus introduced, the operation and implemen-
`10 tation of the preferred embodiment will now be described in
`additional detail.
`First, the preferred embodiment is designed to accept
`speaker inputs from a 5.1 channel system, such as defined by
`the 5.1 Dolby Digital Standard used by DVD recordings.
`15 The "0.1 channel" is an effects channel, usually fed into a
`sub-woofer and cut off sharply above circa 100 Hertz. Since
`this range is below the audio range of interest ( e.g., the audio
`range for user command processing), this input is either
`disregarded or passed-through by the base unit. Second, the
`20 5.1 channel amplifier is preferably the only device con(cid:173)
`nected to any speaker, i.e., any built-in TV speakers are
`always off. Thus, the base unit is preferably configured to
`receive only five speaker outputs of the amplifier: left and
`right front speakers; a center speaker; and left and right
`25 surround speakers. For reasons explained above, the sub(cid:173)
`woofer is not monitored. The base unit preferably also
`accepts a two-channel input from a conventional stereo
`system, in case the user does not have 5.1 channel system.
`To function in normal operation, the preferred embodi-
`30 ment must first be configured to learn spoken commands, to
`learn IR commands that are to be associated with each
`spoken command, and to learn speaker configuration within
`a given room so as to accurately mimic audio (i.e., to
`generate an accurate audio mimic signal). This configuration
`and learning are triggered by pressing certain mode buttons
`found on the remote unit, which causes the preferred remote
`control to enter into configuration and learning modes,
`respectively. FIG. 4 illustrates functions performed in these
`modes vis-a-vis normal operation of the preferred remote
`40 control.
`A left hand column of FIG. 4 shows blocks 103, 105 and
`107 for the basic operating modes of the preferred device,
`including the speaker learning mode 103, the command
`learning mode 105, and normal operation 107. The purpose
`45 of the speaker learning mode is to set up a programmable
`processing unit for each speaker channel inside the base unit,
`which mimics the signal transformations by the speakers,
`the circuitry of the remote unit and the base unit, the delay
`by the air travel and the room acoustics such as echoes from
`50 walls of the room. An exact reproduction of this chain
`enables the base unit to remove the sound from the speakers
`from any other sound, i.e., spoken-commands, received by
`the remote unit. The purpose of the command learning mode
`is to enable the base unit to detect spoken commands and
`55 associate them with infrared commands for sending to the
`various entertainment systems.
`Thus, the speaker learning from mode 103 and the com(cid:173)
`mand learning from mode 105 are required for use of the
`preferred remote control and, therefore, the preferred remote
`60 automatically enters these modes for initial configuration
`(represented by numeral 101) and when room acoustic
`information and stored user spoken commands and IR
`commands are otherwise not available. In addition, the
`speaker learning mode 103 is preferably entered whenever
`65 the room acoustic is changed permanently (e.g., new
`furniture, changed speaker placement), and the user is
`provided with a speaker learning mode button directly on the
`
`Page 9 of 15
`
`
`
`US 6,606,280 Bl
`
`5
`
`7
`housing of the remote unit to enable re-calibration of room
`acoustics. As the need for re-calibration implies, the remote
`unit is preferably left at a fixed position within a room during
`regular operation. Optionally, the base unit may automati(cid:173)
`cally enter the speaker learning mode 103 and the command
`learning mode 105 at periodic intervals, or in response to an
`inability to process detected user spoken commands.
`A middle column of FIG. 4 indicates functions of the base
`unit in each of the three modes mentioned, via separate
`dashed-line blocks 113, 115 and 117; these blocks corre(cid:173)
`spond to the speaker learning mode 103, the command
`learning mode 105 and normal operation 107. Each of these
`dashed-line blocks 113, 115 and 117 include various func(cid:173)
`tion blocks explaining operation of the base unit while in the
`corresponding mode. For example, as indicated by the
`top-most dashed-line block 113, during the speaker learning
`mode, the base unit provides a test pattern to a tuner or
`Dolby Digital 5.1 standard input, for purposes of testing
`each speaker in succession. The base unit receives detected
`sound from the microphone representing the speaker cur(cid:173)
`rently being tested as well as an electronic speaker driver 20
`signal from the stereo receiver and, using this information,
`the base unit calculates a transfer function Hn{ w) for each of
`N speakers (n=l to N) as they are individually tested. This
`transfer function represents all of the room reflections and
`delays that produce sound in response to each speaker. These 25
`various functions of the base unit during these various
`modes will be further discussed below.
`Finally, a third column of FIG. 4 also includes three
`dashed-line blocks 123, 125 and 127, which show remote
`unit operation during the speaker learning mode 103, the 30
`command learning mode 105 and normal operation 107. For
`example, during the speaker learning mode 103, the remote
`un