`
`1111111111111111 IIIIII IIIII 111111111111111111111111111111111111111111111111111111111111 IIII IIII
`
`100
`
`101
`
`Comm.
`channel
`
`TV
`
`;;;;;;;;;;;;
`;;;;;;;;;;;;
`
`;;;;;;;;;;;;
`
`;;;;;;;;;;;;
`
`!!!!!!!!
`
`-==
`;;;;;;;;;;;; ==
`;;;;;;;;;;;; -!!!!!!!! ==
`~ - - - - - - - - - - - - - - - - - - - - - -
`;;;;;;;;;;;; -;;;;;;;;;;;;
`;;;;;;;;;;;; -
`==
`;;;;;;;;;;;; -;;;;;;;;;;;; -
`-;;;;;;;;;;;;
`
`102
`
`108
`
`DSP
`
`. . - - - - , s
`5
`L...,L----+-------.t Amplifiers
`
`Acquisition
`module
`
`107
`
`106
`
`Amazon Ex. 1005
`IPR Petition - US RE47,049
`Amazon Ex. 1005, Page 1 of 23
`
`(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT)
`
`(19) World Intellectual Property Organization
`International Bureau
`
`(43) International Publication Date
`10 April 2008 (10.04.2008)
`
`PCT
`
`(51) International Patent Classification:
`
`Not classified
`
`(21) International Application Number:
`PCT/RS2007/000017
`
`(22) International Filing Date:
`19 September 2007 (19.09.2007)
`
`(25) Filing Language:
`
`(26) Publication Language:
`
`English
`
`English
`
`(10) International Publication Number
`WO 2008/041878 A2
`(81) Designated States (unless otherwise indicated, for every
`kind of national protection available): AE, AG, AL, AM,
`AT, AU, AZ, BA, BB, BG, BH, BR, BW, BY, BZ, CA, CH,
`CN, CO, CR, CU, CZ, DE, DK, DM, DO, DZ, EC, EE, EG,
`ES, FI, GB, GD, GE, GH, GM, GT, HN, HR, HU, ID, IL,
`IN, IS, JP, KE, KG, KM, KN, KP, KR, KZ, LA, LC, LK,
`LR, LS, LT, LU, LY, MA, MD, ME, MG, MK, MN, MW,
`MX, MY, MZ, NA, NG, NI, NO, NZ, OM, PG, PH, PL,
`PT, RO, RS, RU, SC, SD, SE, SG, SK, SL, SM, SV, SY,
`TJ, TM, TN, TR, TT, TZ, UA, UG, US, UZ, VC, VN, ZA,
`ZM, ZW
`
`(30) Priority Data:
`P-2006/0551
`
`4 October 2006 (04. 10.2006)
`
`RS
`
`(84) Designated States (unless otherwise indicated, for every
`kind of regional protection available): ARIPO (BW, GH,
`GM, KE, LS, MW, MZ, NA, SD, SL, SZ, TZ, UG, ZM,
`ZW), Eurasian (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM),
`European (AT,BE, BG, CH, CY, CZ, DE, DK, EE, ES, FI,
`FR, GB, GR, HU, IE, IS, IT, LT,LU, LV,MC, MT, NL, PL,
`(72) Inventors; and
`PT, RO, SE, SI, SK, TR), OAPI (BF, BJ, CF, CG, CI, CM,
`(75) Inventors/Applicants (for US only): SARIC, Zoran
`GA, GN, GQ, GW, ML, MR, NE, SN, TD, TG).
`[RS/RS]; Vukasovica 65/7, 11000 Novi Beograd (RS).
`JOVICIC, Slobodan [RS/RS]; Visnjicki venae 67, Declaration under Rule 4.17:
`11000 Beograd (RS). KOVACEVIC, Vladimir [RS/RS]; — of inventorship (Rule 4.17(iv))
`Radnicka 35A, 21000 Novi Sad (RS). TESLIC, Nikola
`[RS/RS]; BuI. Cara Lazara 29, 21000 Novi Sad (RS).
`Published:
`KUKOLJ, Dragan [RS/RS]; Narodnog fronta 31, 21000 — without international search report and to be republished
`Novi Sad (RS).
`upon receipt of that report
`
`(71) Applicant: MICRONAS NIT [RS/RS]; Fruskogorska
`l la, 21000 Novi Sad (RS).
`
`(54) Title: SYSTEM AND PROCEDURE OF FREE SPEECH COMMUNICATION USING A MICROPHONE ARRAY
`
`(57) Abstract: The invention relates to the system and procedure for hand-free voice communication in video-phone or teleconfer
`ence using a microphone array, whose main purpose is to make a quality recording of speaker in room, in the situation of larger ex
`pansion, with presence noise, with acoustic echo, produced by distance speaker and TV program, room reverberation and movement
`of the speaker in room. System contains: digital TV receiver and digital camera for picture reproduction and shooting, respectively,
`stereo loudspeakers and microphone array for sound reproduction and recording, respectively, amplifier and acquisition module for
`audio signals and DSP for acoustic signal processing. The procedure for microphone signal processing is done in frequency domain
`and it contains: acoustic echo suppression made of two signals: far-end speaker signal and stereo TV signal, acoustic spatial filtering
`of near-end speaker in accordance with noise sources and room reverberation, based on adaptive characteristic of microphone array
`directivity, of speaker localization in horizontal plane, of suppression of all residual noises and adaptive gain control of transmitting
`signal.
`
`Amazon Ex. 1005, Page 1 of 23
`
`
`
`WO 2008/041878
`
`PCT/RS2007/000017
`
`1
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`SYSTEM AND PROCEDURE OF FREE SPEECH COMMUNICATION USING A
`MICROPHONE ARRAY
`
`Technical Field
`
`The invention belongs to the field of acoustic signal processing, precisely speaking to the
`methods of acoustic echo cancellation, location and selection of an active speaker in the
`presence of a reverberations in the acoustic environment and the noise suppression by
`means of microphone array.
`
`Background Art
`
`Hands-free full-duplex speech communication systems are used in many existing
`applications, such as: video-phone systems, teleconference systems, room and car hands-
`free systems, human-machine interface using voice, etc.
`
`Usage of the hands-free speech communication systems implies not specified talker position
`in the acoustic environment, with variable distances from system's microphones and
`loudspeakers. The hands-free speech communication in such unknown conditions is reason
`for the number of technical problems, which should be solved, in order to preserve good
`quality of the speech communication.
`
`Basic problem is acoustic echo generated by partial acoustic energy transmission from a
`loudspeaker to the microphone, so the speaker on far-end is able to hear his own voice as an
`obstruction. Conventionally, signal echo canceling is done by adaptive filter using
`estimation of transfer function of acoustic echo between loudspeaker and microphone, so
`that its exit gets approximately same signal as acoustic echo signal. Deduction two of these
`signals cancels acoustic echo. However, canceling echo can not be perfect because of
`systems non-linearity and acoustics ambience non-steady. As a result it shows residual echo
`signal. At that basic request stays, recorded speech signal of near-end shouldn't be exposed
`by echo suppression and its process.
`
`In the acoustic ambient, acoustic disturbances of different nature and causes may appear.
`Those disturbances could be stationary and non-stationary (for example: computer noises or
`car noise) and they come from many different sources located on different positions in the
`room or space where the speaker stands.
`
`Besides that, in closed rooms (as a work rooms, halls and automobile-cabins) it shows up
`the effect of reverberation as an after effect of multiple acoustic wave reflections from walls
`and obstacles.
`Since the acoustic ambient besides the speaker contain sources of
`disturbances, the desired signal (coming from the speaker) must be separated from the
`disturbances in order to make possible its own recording. Conventionally, this problem may
`be solved by using a microphone array having a number of microphones ordered in line at
`minimum inter-distance. With appropriate processing of microphone array signal, direction
`dependent sensitivity of microphone system may be achieved. Such microphone systems
`
`Amazon Ex. 1005, Page 2 of 23
`
`
`
`WO 2008/041878
`
`PCT/RS2007/000017
`
`2
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`has narrow directivity characteristic, enough to record only the actual speaker in the
`acoustic ambient, while the signals of dislocated noise sources are suppressed,
`thereby
`providing higher signal-to-disturbance ratio. The gain depends on: directivity of the
`microphone array (width of the main lobe), side-lobe size, separability of speech sources
`and noise sources (to close sources are difficult
`to separate), reverberation time, non-
`stationary acoustic sources, etc.
`
`Determination of speaker direction in acoustic ambient and steering the diiectivity of
`microphone array according toward it is an important problem in hands-free communication
`systems. The procedures of determining the speaker direction 'are very sensitive to
`disturbances present in the ambient, specially: to non-stationary speaker (if it moves within
`ambient) and if there are several speakers in a given ambient simultaneously speaking
`(cocktail party effect). The determination of relative direction of the actual speaker to the
`microphone array in horizontal plane (determination of azimuth), is very important step in
`video-phone and teleconferencing systems, because of need to determine the speaker
`coordinates which are used for moveable camera control in the system.
`
`During speech recording in an acoustic ambient, the problem of additive stationary or non-
`stationary noise always appears so as the residual noise in processing of acoustic signals.
`They degrade the quality of the recorded speech signal. If they are intense enough, they may
`even reduce the perspicuity of the speech. There are many algorithms for noise reduction
`(NR), optimized for specific noise types. The common requirement for all of them is to
`improve the signal to noise ratio, but to avoid distorting of speech signal and reduction of its
`perspicuity.
`
`Variable ambient conditions, and variable distance between the speaker and microphone
`array, require automatic gain control (AGC), which makes the speaker voice level constant
`and more comfort for the receiver at the far-end of the communication channel. Automatic
`gain control in full-duplex systems requires additional information from near-end speech
`activity detector, from far-end speech activity detector and acoustic echo canceller.
`
`Refer to above mentioned technical problems in solution of "hand-free" communication
`system for speech signal transmission in full-duplex and its usage in video-phone and/or
`teleconference systems, are very complex. Those problems demand one integral and optimal
`solution approach, considering real time system operation based on commercial platform of
`digital signal processor (DSP).
`
`Quality of speech recording in the presence acoustic noises and room reverberations made a
`complex problem. In the conditions when the useful speech signal spectrum are overlapping
`with presence noises spectrum, using a single channel processing it is not possible to
`improve significantly of speech signal quality. In accordance with digital signal processing
`development and purchasing of enough powerful computer power of DSP, a way of multi-
`microphone procedure applying acoustic signals processing is open. Benefits of microphone
`array in relation to single channel processing is adaptation capability of its spatial receipt
`characteristics (directivity characteristic) to instantly schedule of chosen speaker and define
`noises in room. At that point, they realize a maximum suppression of presence noises, at the
`same time the speaker is emphasized. Main problems by microphone arrays usage are
`(M.S.Brandstein, D.B. Ward (Eds.), Microphone Arrays: Signal Processing Techniques and
`
`Amazon Ex. 1005, Page 3 of 23
`
`
`
`WO 2008/041878
`
`PCT/RS2007/000017
`
`3
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`Applications, Springer, Berlin 2001; Y. Huang, J. Benesty, Audio signal processing for next
`generation multimedia communication systems, Kluwer Academic Publ.; 2004): chosen
`speaker exactly location outset, outset of exactly number and positions of room presence
`noises, multi-reflections of useful source and noise of the room walls and non-steady of
`acoustic noise sources and chosen speaker.
`
`When the microphone array is used in video-phone or teleconference systems, in full duplex
`function, than the number of possible problems is getting larger. The biggest problem is
`presence of acoustic echo, and then need for automatic gain control (AGC) of system
`transmitter part, as well as possible presence of system non-steady, called microphony.
`Additional problem, which is being observed in this patent, is' presence of TV program
`signal, which shows up as an additive acoustic echo on entrance of microphone array.
`
`Large number of mentioned problems has been generated and made very different kind of
`solutions, which has been patented and which could solve some of problems or few integral
`problems. For example: U.S. published patent application 2006/ 0153360 Al,
`filled
`September 2nd 2005, entitled "Speech signal processing with combined noise reduction and
`echo compensation", gives integral solution of echo reduction and noise reduction, then
`U.S. published patent application 7,035,415 B2, filled May 15th 2001, entitled "Method and
`device for acoustic echo cancellation combined with adaptive beamforming", which gives
`integral solution of echo reduction and forming of directed microphone array characteristic,
`then EP published patent application 1 633 121 Al,
`filled September 3rd 2004, entitled
`"Speech signal processing with combined adaptive noise reduction and adaptive echo
`compensation", gives integral solution of residual echo reduction and noise reduction, then
`EP published patent application 1 571 875 A2, filled February 23rd 2005, entitled " A
`system and method for beamforming using a microphone array", which gives solution for
`only directed microphone array characteristic forming, then EP published patent application
`1 581 026 Al, filled March 17th 2004, entitled "Method for detecting and reducing noise
`from a microphone array" gives solution only for noise reduction in microphone array, as
`well as EP published patent application 1 286 175 A2, filled August 1st 2002, entitled
`"Robust
`talker localization in reverberant environment", gives solution only for talker
`localization in reverberant room.
`
`Integral solution all mentioned problems, realized in this patent, join positive characteristics
`of particular signal processing of mentioned problems and their solutions, they are going to
`be solved integrally in frequency domain, optimizing computer resources and gives real
`time solutions, securing
`quality of free speech communication in video-phone and/or
`teleconference systems.
`
`Disclosure of the Invention
`
`free speech communication system in video-phone or
`this patent
`Subject of
`is
`teleconference applying, which use microphone array and complex acoustic signal
`processing, which should secure better quality and clearness of speech signal in complex
`acoustic ambience, in which many previous mentioned failures are separately or integral
`eliminated.
`
`Amazon Ex. 1005, Page 4 of 23
`
`
`
`WO 2008/041878
`
`PCT/RS2007/000017
`
`4
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`System, which is subject of this patent, transmits speech and as transmitting medium is
`being used digital television. For recording and reproduction of speech signal is being used
`microphone array and loudspeaker, respective, which are integral TV receiver components.
`When we talk about video-phone or teleconference applying for recording and picture
`reproduction than we use digital camera and respective digital TV receiver.
`
`Invention essence is specific processing of speech signal, which has been recorded in one
`acoustic ambience in room where the speaker and system are present. For recording of
`speaker in room, which stands on define distance (few meters distance) from TV receiver,
`system uses microphone array of N microphones. Microphone array records all present
`room signals: useful signal as a directed wave, which gets from the talker to the microphone
`and different noise signals. As noise signals it shows up: acoustic echo as one loudspeaker
`direct wave, which is emitting interlocutor voice from the far-end of communication
`channel, acoustic echo as a directly sound wave, which are emitting stereo TV program,
`direct waves taken from one or more source of noise or also other sources, which we can
`hear in the room and reflected waves (room echo), made by their own sources of noise,
`including speaker, and all those noise, which appear to show during the room reverberation.
`We should emphasis that noise sources in the room can be stationary or non-steady, which
`is frequently matter, as by its characteristics, so as by its room location (mobile sound
`sources).
`
`Different kinds of noises required different techniques for its eliminating, and this invention
`essence is one optimally designed algorithm, which should at most eliminate all noises and
`which should secure the best speech signal quality, which is going to be transmitted to the
`interlocutor on the far-end of communication channel.
`
`Microphone signals from microphone array are being processed in one digital form in DSP,
`completely in one frequency domain. This domain enables certain advantages, as a
`processing speed and computer operation number, which is very important for DSP and its
`real time work. For acoustic echo cancellation it is necessary to put in all loudspeaker
`signals into the DSP.
`
`DSP run a few complex algorithms: acoustic echo canceling algorithm (AEC), microphone
`array processing signal algorithm for adaptive beam forming (ABF) and its directivity
`characteristics, estimation algorithm for direction of arrival (DOA) of useful signal for
`indoor localization of speaker, in other words speaker room localization, algorithm for
`reduction of stationary noise, non-steady noise and residual echo (NR- Noise Reduction)
`and algorithm for system automatic gain control (AGC), because of compensation between
`different speaker distance from the microphone array. Besides all those basic algorithms,
`DSP runs some others algorithms more as are: voice activity detector (VAD) on the near-
`end, VAD on far-end, double talk detector (DTD) on the both sides, additional post filtering
`(PF) of noise reduction, etc. The aim of mentioned algorithms is maximal reduction of all
`present noises with minimum of speech signal degradation, therewith secure of transmitting
`speech signal maximum quality.
`
`Specific aspect of invention subsist adaptive acoustic echo cancellation using an adaptive
`filter, which mould transferring acoustic way characteristic from loudspeaker
`to the
`microphone. Transferring characteristic is complex, working on transmitting way from 2
`
`Amazon Ex. 1005, Page 5 of 23
`
`
`
`WO 2008/041878
`
`PCT/RS2007/000017
`
`5
`
`5
`
`1Q.
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`... ' ., . ' . . . ' .··.
`
`. ' . . .
`
`' . .
`
`·~. . . ' . . . .... ,• '
`
`.
`
`. '• ~.
`
`' .
`
`(stereo) loudspeakers to the N microphone in the microphone array and each microphone
`signal is being filtered by its on adaptive filter. Work of adaptive filters is being controlled
`with speech activity detector on the both sides.
`
`Next specific part of invention is adaptive directivity characteristic of microphone array,
`which secure spatial filtering and directivity separation in the room with speaker, where the
`useful signal is being boost till the maximum of strength in accordance with and on other
`signals, which are being interfered. Directivity characteristic of microphone array is
`accomplished by adaptive weighting and summing of microphone signals, which secure
`directivity index stability in one frequency domain in one reverberation acoustic ambience.
`
`Defining direction of arrival of speaker directed acoustic wave is a..next specifi c thing of the
`invention. This system function of free speech communication is necessary for control and
`managing of directivity characteristic of microphone array by azimuth, also it can be used
`for control and video camera guiding. It uses microphone signals after acoustics echo
`cancellation. After generated cross-correlation of microphone
`signal and its phase
`transforms,
`the arrival direction of speakers directed acoustic wave is estimated. This
`function is being directly controlled by speech activity detector.
`
`Following specific of the invention is process of adaptive suppression of stationary and non-
`steady noises. Process is realized on the non-linear estimation noise compressor, which is
`being sorted to several sub-bands. Two estimation noises are being used, securing the
`optimal suppression result of speech signal characteristics. That has been done because of
`safety reason. Safety in meaning that process of adaptive noise reduction shouldn't degrade
`the quality speech signal. Process of filtration should be finished in accordance with
`adaptive Wiener post-filter.
`
`the invention is automatic gain control of speech signal before
`Specific aspect of
`transmission to the far-end interlocutor. This peculiarity is important copulative element of
`free speech communication system. System secures compensation between different speech
`signal intensity, as an individual speech characteristic on the one side, and different speech
`intensity on the other side, which is depending on speaker position, nearer or farther
`position in relation to the microphone array. The solution makes a difference between
`speaker activity and useful signal appearing of pause, residual echo, acoustic noise or far-
`end speech signal, wherefore the solution uses more information previously detected into
`the system. Analysis of possible scenarios has to be reliable; in counterpart it is possible to
`get one negative effect of useful speech signal attenuation.
`
`Specialty of this invention is improvement of each mentioned specifics, also improvement
`in the integration process of all algorithms to the one unite, which functioning is stable and
`quality. Algorithm procedures are being optimized using cooperative resources.
`
`These and other aspects, specifics arid' benefits of the invention "'are going "to be more
`evidentially after invention detail description review, patent claims and suitable figures.
`
`Amazon Ex. 1005, Page 6 of 23
`
`
`
`WO 2008/041878
`
`PCT/RS2007/000017
`
`6
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`Brief Description of the Drawings
`
`Figure 1 - shows elements of free video-phone communication system using a microphone
`array and digital television.
`
`Figure 2 - shows ambience conditions for the system appliance of free speech video-phone
`communication system using a microphone array.
`
`Figure 3 - shows a diagram block of audio signal processing subsystem within free video
`phone communication system; it contains one microphone array with adaptive directivity
`characteristic (SD-BF), block of speaker indoor location (DOA), block of echo cancellation
`(AEC), block of noise reduction (NR) and block of automatic gain control (AGC).
`
`Figure 4 - shows the block diagram of acoustic echo canceling (AEC).
`
`Figure 5 - shows the block diagram of adaptive determination of near-end speaker direction
`in horizontal plane (DOA-azimuth).
`
`Figure 6 - shows the block diagram of spatial filtering (SD-BF).
`
`Figure 7 - represents the block diagram of noise reduction (NR).
`
`Figure 8 - represents the block diagram of automatic gain control (AGC).
`
`Best Mode for Carrying Out of the Invention
`
`This invention shows a system and method of acoustic signal processing in a free speech
`communication using a microphone array.
`
`Figure 1 represents
`communication using a
`free video-phone
`system elements of
`microphone array and digital television. Digital television 100, which serves the user for a
`casually TV watching,
`in the free video-phone communication system, is being used as a
`video communication and as an audio terminal for audio communication with another
`speaker. Namely, when the communication channel way 101 gets a call and connection with
`another speaker is made, then the TV 100 is being used as a multimedia interface, where
`one speaker over the loudspeakers 102 is listening, and watching on the one part 105 of the
`TV screen 100 of its far-end interlocutor.
`In the same time, on the another end of
`communication channel (far-end side), the speaker on the similar TV receiver, using camera
`104 and microphone array 103, also see its interlocutor placed at near-end side. Camera 104
`is movable and it is controlled by coordinates, obtained by microphone signal processing
`from microphone array 103.
`
`Analog signals from a microphone in microphone array 103 are amplified by the amplifier
`106 and together with loudspeakers stereo signals 102 are introduced to acquisition module
`107, which digitalized them and send them to DSP 108 on the further processing. Proceeded
`speech signal of the near-end speaker in the DSP is being sent over a communication
`channel 101 to the speaker on the far-end. Acoustic signal process in DSP 108 gets spatial
`coordinates of speaker ambience location,
`in the room with free communication system.
`With them DSP 108 controls a camera steering 104, directed on the active speaker. On that
`
`Amazon Ex. 1005, Page 7 of 23
`
`
`
`WO 2008/041878
`
`PCT/RS2007/000017
`
`7
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`way, free audio and video communication between two speakers, with a digital television
`system is completely assured.
`
`Figure 2 schematically shows ambient conditions of free video-phone communication using
`a microphone array; it shows only a part of the system, which is related to acoustic signal
`processing. The room 201 has installed the system of free video-phone communication,
`speaker 202 and noise source 203, which is normal appearance of every acoustic ambience.
`Over the loudspeaker 102 stereo audio system of digital television, the speaker 202 is
`listening of incoming speech signal of its interlocutor 204 from the far-end, mostly as a
`mono signal. Microphone array (made of N number of microphones) records ambience
`sound 201. After complex microphone signal processing in the block 207, speech signal of
`the speaker 202 is transmitted by the block 208, to the far-end speaker as a mono signal.
`
`Ambience conditions 201 during the speech communication are very complex ϊ n the case
`of the free video-phone communication in the room 201, three noise sources are presence:
`stereo loudspeakers 102, which emit a far-end speaker voice and TV program, speaker 202
`and minimum one source of noise 203. It is possible that room can have more sources of
`noise: computer noise, air-condition noise, street noise, neighbors' noise, buildings
`vibrations or another speaker, or even few speakers, music, etc.
`
`Therefore, we have one very complex acoustic picture of the room. Microphone array 103
`as a sensor system, records all room sounds, and all direct sound waves out of each sound
`source, but at
`the same time,
`it records all sound reflections. For example, from the
`loudspeaker 102 to the microphone array 103 arrives one direct wave 209 followed by
`plenty of reflected waves, where only one wave 210 has been showed on the Figure 2, the
`speaker 202 sends a direct wave 211 and besides all those waves it sends two more reflected
`waves 212a and 212b, the noise source 203 sends one direct wave 213 and besides the rest
`of waves, one reflected wave 214, too.
`
`Out of all sounds, which the microphone array records, one is a direct and useful wave 211
`taken from the speaker 202, all the rest waves are noticed as a disturbances. The biggest
`disturbance is an acoustic echo 209, which comes from the loudspeaker 102. All other
`reflections,
`together, produce a room reverberation. The task of block for audio signal
`processing 207 is to cancel acoustic echo signal, to select a useful signal 211 from the other
`signals, to suppress reverberations signals, to suppress direct noise sources and their signals,
`and the number of those sources can be more than one. Special task of the 211 block is to
`follow acoustic room scene and its non-stationary, depending of speaker mobility, or
`position, or depending of noise mobility, are they non-stationary or changeable. In the
`following text, explanations of these issues from the invention would be particularly
`described.
`
`Figure 3 shows a schematic diagram of total audio signal processing procedure in free
`video- phone communication system using a microphone array. All microphone signals 103,
`from M l till the M5, as well as a loudspeakers stereo signal 102, Sp-L I Sp-R, are being
`digitalized into acquisition block 107, Figure 1, and converted into the frequency domain
`the x 7. It should be
`using a fast Fourier transform (FFT) 301 into the signals x/ till
`emphasized that the microphone array contains 5 microphones to resolve this patent, but if
`there is a need for few additional microphones, they can be install for the need of the
`
`Amazon Ex. 1005, Page 8 of 23
`
`
`
`WO 2008/041878
`
`PCT/RS2007/000017
`
`8
`
`6
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`~
`
`• • 1 • , •
`
`, ~-
`
`• •~ ,
`
`•
`
`•
`
`I
`
`•
`
`•
`
`• •• •
`
`• •
`
`,
`
`•• ,
`
`•
`
`'
`
`,
`
`, , •
`
`•
`
`• • ' ,• I
`
`••
`
`, . "• •
`
`•
`
`application. The block 302 suppress acoustic echo in all signals (x i
`till x 5) using an x and x 7
`signals as a referents. Suppressed signals SAECJ till S AECS are being used in the block 304 for
`assignment of direction of arrival of sound wave (DOA) by horizontal plane (azimuth q
`a) to
`the actual speaker. On that way the tracking of the active speaker is possible. Marking the
`azimuth angle q
`a in the block 303, the weighted coefficient of signals x \ till x 5 are being
`optimized, with one purpose, to form horizontal directivity characteristic of microphone
`array with receiving maximum on azimuth direction q
`a. Receiving characteristic formed in
`the block 303 has a superdirective nature, which means that the receiving directivity index is
`larger then directivity characteristic, which we get from delay compensation and sum of
`microphone signals.
`
`Block 303 does the time compensation between acoustic signal delay of the speaker on the
`one side, and the microphones on the other side. Control over this delay signal DOA ( q
`a)
`from the block 304,
`it is accomplished to control the -microphone array directivity by
`azimuth. Directivity
`characteristic
`of microphone
`array SD-BF
`(Superdirective
`Beamformer) in the block 303 is formed. The main lobe of this characteristic is its narrow
`and directed course, directed into the wanted aim, and the side lobes are intensely slower.
`That secures spatial filtering to the microphone array, precisely, separation of noise sources
`in the horizontal plane. That kind of form of directivity characteristic is very important for
`the reduction of unwanted noises,
`to separate them from the useful signal and room
`reverberation effect. Characteristic of directivity has been formed by microphone signal
`weighting and its summing into the one-channel output signal.
`
`in block 303 contains constantly speech signal and noise signal, which
`Output signal
`consists one residual signal after acoustic cancellation of an echo signal, suppressed
`ambience noise and reduced reverberation noise. That signal comes to the block noise
`reduction - NR 305 where the additional noise signal reduction is done. Reduction process is
`adaptive, concerning noise signal non-stationary. Also, important claim in NR realization
`block is the fact that noise reduction- and its process shouldn't- affect on speech signal
`quality.
`
`Final block of signal processing of free speech communication system in video-phone or
`teleconference processing is block 306 for automatic gain control (AGC) of speech signal.
`This block uses more information, which it takes out of systems, which are important for
`defining of possible speech signal conditions and where is necessary to correct
`its
`amplitude, on suitable manner. On that way it can be secured almost the same level of
`transmitting speech signal,
`independently of the distance between actual speaker and
`microphone array and it can assure a better quality on opposite side of the communication
`channel.
`
`On the system exit, the signal process result, using an inverse FFT in the block 307 is
`transformed from frequency to the time domain. Estimated speech signal on the near-end (S)
`is sent through the channel to the distant speaker.
`
`Figure 4 represent block diagram of acoustic echo canceling (AEC) 302, which is
`containing two main blocks: block 401, which is containing 5 adaptive NLMS (Normalized
`
`Amazon Ex. 1005, Page 9 of 23
`
`
`
`WO 2008/041878
`
`PCT/RS2007/000017
`
`9
`
`5
`
`5
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`(1)
`
`n
`
`y
`
`Least Mean Square) algorithms and block 402, which main function is detection of activities
`between near-end speaker and far-end speaker speech DTD (Double Talk Detection).
`
`till NLMS6, processes x/ till x microphone signals and
`NLMS algorithms, from NLMSl
`certain SAECI till $AEC signals to the blocks 303, 304 and 306, Figure 3.
`
`NLMS algorithm function is to cancel echo presence in each microphone signal. This
`function secures presence of reference signals out of loudspeaker 102 and control signal out
`of DTD detector 402. NLMS algorithm models transfer functions of acoustic way from each
`loudspeaker 102 to the each microphone 103: for example, NLMSl models transfer
`functions hu out of loudspeaker Sp-L to the microphone M l and IIRI from loudspeaker Sp-R
`to the microphone Ml, etc.
`
`Signal transmitted from loudspeaker through NLMS filters, gets a signal replica on the
`microphones, which came on acoustic way and deduction of these two signals,
`is
`accomplished by cancellation of signal echo on the NLMS algorithm exit. To get maximum
`quality of echo reduction, similar to the case o