`P. S. Cohen and E. B. Sherwin, Jr.
`
`::;:::::;.=:.a:e
`~ - -~ Technical Disclosure Bulletin
`
`Vol. 37 No. 10 October 1994
`
`Minimizing Power Consumption in Micro-Processor Based Systems which
`Utilize Speech Recognition Devices
`
`Described are hardware implementations designed to mmuruze power consumption in micro(cid:173)
`processor based systems which utilize speech recognition devices. The implementation provides
`multi-state power management and resource allocation techniques for micro-processor based
`systems, such as portable/laptop type of systems. Five scenarios describe how the technique can
`be used.
`
`Typically, battery powered micro-processor systems can consume large amounts of power,
`particularly in systems which are equipped with speech recognition capabilities. This is because
`power is being consumed while waiting for the user to say something, as well as during the imple(cid:173)
`mentation of word-spotting functions, where a control word activates an entire speech-recognition
`subsystem. Active microphones, used in speech recognition systems, not only cause excessive
`power consumption, but, in some situations, generate defective recognition causing the system to
`perform an incorrect action. If a micro-processor based system is unable to sense the presence of
`the user, the microphone is being operated needlessly and inevitably will cause mis-recognition.
`
`The concept described herein utilizes various electro-mechanical and optical sensors to acti(cid:173)
`vate and de-activate microphones in speech recognition systems based on the presence and the
`appropriate human orientation. In this way, power consumption can be reduced. A multi-state
`power management and resource allocation technique is used to distinguish between three states.
`
`The three states are as follows:
`
`I. Microphone off.
`
`2. Microphone on and waiting for a control word, or control phases, as in word-spotting appli(cid:173)
`cations.
`
`3. A fully active speech-recognition system.
`
`The following five scenarios illustrate how the power saving concept can be used:
`
`Scenario #1 - User approaches a kiosk
`
`• The microphone is off. The Disk Access Storage Device (DASD) is powered down. The
`micro-processor and memory are in their lowest power state, capable of monitoring external
`interrupts. The control program, speech-recognition sub-system, and initial power state
`devices of the application are loaded and ready to be_ powered.
`
`• The kiosk senses the approach of a user. This could be by way of pressure sensitive pads,
`infrared detectors, motion detectors, push buttons, touch-screen or other electro mechanical
`means of detecting a large object. The microphone is opened, on-screen prompts are dis(cid:173)
`played, the DASD is spun and the entire system is activated.
`
`Vol. 37 No. 10 October 1994
`
`IBM Technical Disclosure Bulletin
`
`151
`
`
`
`Minimizing Power Consumption in Micro-Processor Based Systems which Utilize Speech Recognition
`Devices - Continued
`
`Scenario #2 ·Word Spotting and Phrase Spotting
`
`This scenario would be used in most offices, client server, and home-automation applica(cid:173)
`tions and consists of three or more state solutions where the presence of the user wakes up or
`partially wakes up the computer, but does not necessarily activate the entire system because the
`user is frequently present and not talking to the computer. Scenario #2 might be a user entering
`an office or walking by an intercom or speaker phone.
`
`'The following illustrates scenario #2:
`
`• An infrared detector, motion detector, or other electro-mechanical optical means to detect the
`presence of a human being or object in motion is activated.
`
`• The microphone is turned On.
`
`The system ensures that the control program and word spotting, or phrase spotting programs
`are loaded. DASD is spun up only on an exception basis. For example, the system has been
`completely powered down over a weekend, or during a vacation.
`
`• The system then word spots, or phrase spots, until an attention word, such as "computer" or
`"wake-up" type of command, is spoken.
`
`• Typically, word spotting and phrase spotting algorithms can be implemented on micro(cid:173)
`processors with little memory, thus avoiding powering up the complete system or server.
`
`• Once the word spotting or phrase spotting front-end detects the control sequence, the system
`is powered up. At this point, the speech recognition system may engage the user displayed
`prompts, or may re-validate the word spotting sequence. It may determine, for example, that
`a phone ring, or other background noise, was responsible for the interrupt. This duplicate, or
`repeat, analysis of the acoustics may be useful because the primary speech recognition system
`may have much more involved word spotting and phase spotting abilities than an always
`active, or initially activated speech recognition front end.
`
`Scenario #3 - Buffered Front-Ends and Mimicked Phrase Spotting
`
`In this scenario, the microphone is turned on as before by electrical, mechanical, or optical
`sensors, or the microphone may always be on simply waiting for noise above a certain threshold
`level, or noise at a certain level above the background acoustics. The user proceeds, as in prior
`scenarios, to either issue a command phrase, or to start into the dialogue depending upon the
`screen prompts. In this instance, the user is not actually talking to the speech recognition sub(cid:173)
`system which may still be in an Off mode.
`'Ib.e input acoustics are saved, buffered, or com(cid:173)
`pressed at the user station or at the microprocessor while the speech recognition system is
`powered up. To the user, it may appear that the microphone in the system has been activated
`instantaneously. The buffering technique allows many microphones to be attached to one server.
`The technique is also useful other devices, such as portable phones or cellular phones, where the
`minimizing of power consumption is critical. Por example, the portable or cellular phone might
`turn on the microphone when motion has been detected in the last five seconds. Once sound is
`detected over a threshold or background level, the speech system is activated and the buffered
`acoustic data is sent.
`
`152
`
`IBM Technical Disclosure Bulletin
`
`Vol. 37 No. 10 October 1994
`
`
`
`Minimizing Power Consumption in Micro-Processor Based Systems which Utilize Speech Recognition
`Devices - Continued
`
`Scenario #4 - Handset Activation
`
`In situations where the speech recognition sub-system is driven by a dedicated telephone
`handset, the system must maintain an alert status capable of buffering speech input as soon as the
`phone handset is picked up and be capable of fully activating the speech recognition sub-system
`in one to three seconds. Therefore, the sub-system can be powered down and the processor can
`be off, or in a low power usage state until the telephone handset is picked up. Similarly, in situ(cid:173)
`ations where the speech recognition sub-system is activated by a specific keystroke sequence on
`the telephone, or handset, the sub-system does not need to be fully activated until those key
`strokes are hit.
`
`Scenario #5 - Buffered Client Server and Dial Environments
`
`Certain situations arise where the speech recognition sub-system is infrequently used. Other situ(cid:173)
`ations might arise in a network of kiosk, or roadside call boxes. In this situation, the client, or
`microphone, must be capable of prompting the user and to buffer the first response as the session
`is established with the speech recognition server.
`
`Vol. 37 No. 10 October 1994
`
`IBM Technical Disclosure Bulletin
`
`153
`
`
`
`154
`
`IBM Technical Disclosure Bulletin
`
`Vol. 37 No. 10 October 1994
`
`