`
`(19) World Intellectual Property Organization
`International Bureau
`
`(43) International Publication Date
`10 April 2008 (10.04.2008)
`
` (10) International Publication Number
`
`WO 2008/041878 A2
`
`(51) International Patent Classification:
`
`Not classified
`
`(21) International Application Number:
`PCT/RS2007/000017
`
`(22) International Filing Date:
`19 September 2007 (19.09.2007)
`
`(25) Filing Language:
`
`(26) Publication Language:
`
`English
`
`English
`
`(30) Priority Data:
`P-2006/0551
`
`4 October 2006 (04.10.2006)
`
`RS
`
`(71) Applicant: MICRONAS NIT [RS/RS], Fruskogorska
`Ila, 21000 Novi Sad (RS).
`
`(72)
`(75)
`
`Inventors; and
`Inventors/Applicants (for US only): SARIC, Zoran
`[RS/RS]; Vukasovica 65/7, 11000 Novi Beograd (RS).
`JOVICIC, Slobodan [RS/RS]; Visnjicki venac
`67,
`11000 Beograd (RS). KOVACEVIC, Viadimir [RS/RS];
`Radnicka 35A, 21000 Novi Sad (RS). TESLIC, Nikola
`[RS/RS]; Bul. Cara Lazara 29, 21000 Novi Sad (RS).
`KUKOLJ, Dragan [RS/RS]; Narodnog fronta 31, 21000
`Novi Sad (RS).
`
`(81) Designated States (unless otherwise indicated, for every
`kind of national protection available): ABR, AG, Al., AM,
`AT, AU, AZ, BA, BB, BG, BH, BR, BW, BY, BZ, CA, CH,
`CN, CO, CR, CU, CZ, DE, DK, DM, DO, DZ, EC, EE, EG,
`KS, FI, GB, GD, GE, GH, GM, GT, HN, HR, HU, 1D, I,
`IN, IS, JP, KE, KG, KM, KN, KP, KR, KZ, LA, LC, LK,
`LR, LS, LT, LU, LY, MA, MD, ME, MG, MK, MN, Mw,
`MX, MY, MZ, NA, NG, NI, NO, NZ, OM, PG, PH, PL,
`PT, RO, RS, RU, SC, SD, SE, SG, SK, SL, SM, SV, SY,
`TJ, TM, TN, TR, TT, TZ, UA, UG, US, UZ, VC, VN, ZA,
`2M, 7W.
`
`(84) Designated States (unless otherwise indicated, for every
`kind of regional protection available): ARIPO (BW, GH,
`GM, KE, LS, MW, MZ, NA, SD, SL, SZ, TZ, UG, ZM,
`ZW), Eurasian (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM),
`European (AT, BE, BG, CH, CY, CZ, DE, DK, EE, ES, FI,
`FR, GB, GR, HU,IE, IS, IT, LT, LU, LV, MC, MT, NL, PL,
`PT, RO, SE, SI, SK, TR), OAPI (BF, BJ, CF, CG, CI, CM,
`GA, GN, GQ, GW, ML, MR, NE, SN, TD, TG).
`
`Declaration under Rule 4.17:
`
`of inventorship (Rule 4.17(iv))
`
`Published:
`
`— without international search report and to be republished
`upon receipt of that report
`
`(54) Title: SYSTEM AND PROCEDURE OF FREE SPEECH COMMUNICATION USING A MICROPHONE ARRAY
`
`103
`
`Camera
`
`104
`
` Mi M2 M3 M4 M5
`
` 101
`
`Comm.
`channel
`
`108
`
`
`
`
`module
`Acquisition
`
`Amplifiers
`
`106
`
`107
`
`(87) Abstract: Uhe inventionrelates to the system and procedure for hand-free voice communication in video-phoneor teleconfer-
`ence using a microphonearray, whose main purpose is to make a quality recording of speaker in room, in the situation of larger ex-
`pansion, with presence noise, with acoustic echo, produced by distance speaker and TV program, room reverberation and movement
`of the speaker in room. System contains: digital TV receiver and digital camera for picture reproduction and shooting, respectively,
`stereo loudspeakers and microphonearray for sound reproduction and recording, respectively, amplifier and acquisition module for
`audio signals and DSP for acoustic signal processing. The procedure for microphone signal processing is done in frequency domain
`and it contains: acoustic echo suppression madeof twosignals: far-end speaker signal and stereo TV signal, acoustic spatial filtering
`of near-end speaker in accordance with noise sources and room reverberation, based on adaptive characteristic of microphone array
`directivity, of speaker localization in horizontal plane, of suppression of all residual noises and adaptive gain control of transmitting
`signal.
`
`ARAVAINAHAYDT7412
`
`
`
`
`
`2008/041878A2MINIITNIIIIIRNCIICAUTCRAT
`
`wo
`
`
`
`WO 2008/041878
`
`PCT/RS2007/000017
`
`SYSTEM AND PROCEDURE OF FREE SPEECH COMMUNICATION USING A
`MICROPHONE ARRAY
`
`Technical Field
`
`The invention belongs to the field of acoustic signal processing, precisely speaking to the
`methods of acoustic echo cancellation, location and selection of an active speaker in the
`presence of a reverberations in the acoustic environment and the noise suppression by
`means of microphone array.
`
`Background Art
`
`Hands-free full-duplex speech communication systems are used in many existing
`applications, such as: video-phone systems, teleconference systems, room and car hands-
`free systems, human-machine interface using voice,etc.
`
`Usage of the hands-free speech communication systems implies not specified talker position
`in the acoustic environment, with variable ‘distances from-<system’s microphones and
`loudspeakers. The hands-free speech communication in such unknown conditions is reason
`for the number of technical problems, which should be solved, in order to preserve good
`quality of the speech communication.
`
`Basic problem is acoustic echo generated by partial acoustic energy transmission from a
`loudspeaker to the microphone, so the speaker on far-end is able to hear his own voice as an
`obstruction. Conventionally, signal echo canceling is done by adaptive filter using
`estimation of transfer function of acoustic echo between loudspeaker and microphone, so
`that its exit gets approximately same signal as acoustic echo signal. Deduction two of these
`signals cancels acoustic echo. However, canceling echo can not be perfect because of
`systems non-linearity and acoustics ambience non-steady. As a result it shows residual echo
`signal. At that basic request stays, recorded speech signal of near-end shouldn’t be exposed
`by echo suppression and its process.
`
`In.the acoustic ambient, acoustic disturbances of.different. nature.and.causes may appear.
`Those disturbances could be stationary and non-stationary (for example: computer noises or
`car noise) and they come from many different sources located on different positions in the
`room or space where the speakerstands.
`
`Besides that, in closed rooms (as a work rooms, halls and automobile-cabins) it shows up
`the effect of reverberation as an after effect of multiple acoustic wave reflections from walls
`and obstacles.
`Since the acoustic ambient besides the speaker contain sources of
`disturbances,
`the desired signal (coming from the speaker) must be separated from the
`disturbances in order to make possible its own recording. Conventionally, this problem may
`be solved by using a microphone array having a number of microphones ordered in line at
`minimum inter-distance. With appropriate processing of microphone array signal, direction
`dependent sensitivity of microphone system may be achieved. Such microphone systems
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`ARAVAIMAHOTZIA
`
`
`
`WO 2008/041878
`
`PCT/RS2007/000017
`
`has narrow directivity characteristic, enough to record only the actual speaker in the
`acoustic ambient, while the signals of dislocated noise sources are suppressed,
`thereby
`providing higher signal-to-disturbance ratio. The gain depends on: directivity of the
`microphone array (width of the main lobe), side-lobe size, separability of speech sources
`and noise sources (to close sources are difficult to separate), reverberation time, non-
`stationary acoustic sources, etc.
`
`Determination of speaker direction in acoustic ambient and steering the directivity of
`microphone array according toward it is an important problem in hands-free communication
`systems. The’ procedures of determining thé speaker “directidn’ are very “sénsitive “to
`disturbances present in the ambient, specially: to non-stationary speaker (if it moves within
`ambient) and if there are several speakers in a given ambient simultaneously speaking
`(cocktail party effect). The determination of relative direction of the actual speaker to the
`microphone array in horizontal plane (determination of azimuth), is very important step in
`video-phone and teleconferencing systems, because of need to determine the speaker
`coordinates which are used for moveable camera control in the system.
`
`During speech recording in an acoustic ambient, the problem of additive stationary or non-
`stationary noise always appears so as the residual noise in processing of acoustic signals.
`They degrade the quality of the recorded speech signal. If they are intense enough, they may
`even reduce the perspicuity of the speech. There are many algorithms for noise reduction
`(NR), optimized for specific noise types. The common requirement for all of them is to
`improve the signal to noise ratio, but to avoid distorting of speech signal and reduction ofits
`perspicuity.
`
`Variable ambient conditions, and variable distance between the speaker and microphone
`array, require automatic gain control (AGC), which makes the speaker voice level constant
`and more comfort for the receiver at the far-end of the communication channel. Automatic
`gain control in full-duplex systems requires additional information from near-end speech
`activity detector, from far-end speech activity detector and acoustic echo canceller.
`
`Refer to above mentioned technical problems in solution of “hand-free” communication
`system for speech signal transmission in full-duplex and its usage in video-phone and/or
`teleconference systems, are very complex. Those problems demand one integral and optimal
`solution approach, considering real time system operation based on commercial platform of
`digital signal processor (DSP).
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`Quality of speech recording in the presence acoustic noises and room reverberations made a
`complex problem. In the conditions when the useful speech signal spectrum are overlapping
`with presence noises spectrum, using. asingle channel, processing it is not, possible.to
`improve significantly of speech signal quality. In accordance with digital signal processing
`development and purchasing of enough powerful computer power of DSP, a way of multi-
`microphone procedure applying acoustic signals processing is open. Benefits of microphone
`array in relation to single channel processing is adaptation capability of its spatial receipt
`characteristics (directivity characteristic) to instantly schedule of chosen speaker and define
`noises in room. At that point, they realize a maximum suppression of presence noises, at the
`same time the speaker is emphasized. Main problems by microphone arrays usage are
`(M.S.Brandstein, D.B. Ward (Eds.), Microphone Arrays: Signal Processing Techniques and
`
`ARAVAINAAYDT1E
`
`
`
`WO 2008/041878
`
`PCT/RS2007/000017
`
`Applications, Springer, Berlin 2001; Y. Huang, J. Benesty, Audio signal processingfor next
`generation multimedia communication systems, Kluwer Academic Publ.; 2004): chosen
`speaker exactly location outset, outset of exactly number and positions of room presence
`noises, multi-reflections of useful source and noise of the room walls and non-steady of
`acoustic noise sources and chosen speaker.
`
`When the microphonearray is used in video-phone or teleconference systems, in full duplex
`function, than the number of possible problemsis getting larger. The biggest problem is
`presence of acoustic echo, and then need for automatic gain control (AGC) of system
`transmitter part, as well as possible presence of system non-steady, called microphony.
`Additional problem, which is beingobserved ‘in’this’ patent; ‘ispresence of TV3program
`signal, which showsup as an additive acoustic echo on entrance of microphonearray.
`
`Large number of mentioned problems has been generated and made very different kind of
`solutions, which has been patented and which could solve some of problems or few integral
`problems. For example: U.S. published patent application 2006/ 0153360 Al,
`filled
`September 2nd 2005, entitled “Speech signal processing with combined noise reduction and
`echo compensation”, gives integral solution of echo reduction and noise reduction, then
`U.S. published patent application 7,035,415 B2, filled May 15th 2001, entitled “Method and
`device for acoustic echo cancellation combined with adaptive beamforming”, which gives
`integral solution of echo reduction and forming of directed microphone array characteristic,
`then EP published patent application 1 633 121 AJ, filled September 3rd 2004, entitled
`“Speech signal processing with combined adaptive noise reduction and adaptive echo
`compensation’, gives integral solution of residual echo reduction and noise reduction, then
`EP published patent application 1 571 875 A2, filled February 23rd 2005, entitled “ A
`systemand method for beamformitig ising a microphone array”,which givessolution for
`only directed microphone array characteristic forming, then EP published patent application
`1 581 026 Al, filled March 17th 2004, entitled “Method for detecting and reducing noise
`from a microphone array” gives solution only for noise reduction in microphone array, as
`well as EP published patent application | 286 175 A2, filled August lst 2002, entitled
`“Robust talker localization in reverberant environment’, gives solution only for talker
`localization in reverberant room.
`
`Integral solution all mentioned problems, realized in this patent, join positive characteristics
`of particular signal processing of mentioned problemsand their solutions, they are going to
`be solved integrally in frequency domain, optimizing computer resources and gives real
`time solutions, securing
`quality of free speech communication in video-phone and/or
`teleconference systems.
`
`Disclosure of the Invention |
`
`free speech communication system in video-phone or
`is
`this patent
`Subject of
`teleconference applying, which use microphone array and complex acoustic signal
`processing, which should secure better quality and clearness of speech signal in complex
`acoustic ambience, in which many previous mentioned failures are separately or integral
`eliminated.
`
`10
`
`15
`
`20
`
`205
`
`30
`
`35
`
`40
`
`ARAVAIMNHAHAYXTAR
`
`
`
`WO 2008/041878
`
`PCT/RS2007/000017
`
`System, which is subject of this patent, transmits speech and as transmitting medium is
`being used digital television. For recording and reproduction of speech signal is being used
`microphone array and loudspeaker, respective, which are integral TV receiver components.
`When we talk about video-phone or teleconference applying for recording and picture
`reproduction than we use digital camera and respective digital TV receiver.
`
`Invention essence is specific processing of speech signal, which has been recorded in one
`acoustic ambience in room where the speaker and system are present. For recording of
`speaker in room, which stands on define distance (few meters distance) from TV receiver,
`system uses microphone array of N microphones. Microphone array records all present
`roomsignals: useful signal as a directed wave, which getsfrom thetalker to the‘microphone
`and different noise signals. As noise signals it shows up: ‘acoustic echo as one ‘loudspeaker
`direct wave, which is emitting interlocutor voice from the far-end of communication
`channel, acoustic echo as a directly sound wave, which are emitting stereo TV program,
`direct waves taken from one or more source of noise or also other sources, which we can
`hear in the room and reflected waves (room echo), made by their own sources of noise,
`including speaker, and all those noise, which appear to show during the room reverberation.
`Weshould emphasis that noise sources in the room can be stationary or non-steady, which
`is frequently matter, as by its characteristics, so as by its room location (mobile sound
`sources).
`
`Different kinds of noises required different techniques for its eliminating, and this invention
`essence is one optimally designed algorithm, which should at most eliminate all noises and
`which should secure the best speech signal quality, which is going to be transmitted to the
`interlocutor on the far-end of communication channel.
`
`Microphonesignals from microplioiie array aré beingprocessedin One digital form’ in DSP,
`completely in one frequency domain. This domain enables certain advantages, as a
`processing speed and computer operation number, which is very important for DSP andits
`real time work. For acoustic echo cancellation it is necessary to put in all loudspeaker
`signals into the DSP.
`
`DSP run a few complex algorithms: acoustic echo canceling algorithm (AEC), microphone
`atray processing signal algorithm for adaptive beam forming (ABF) and its directivity
`characteristics, estimation algorithm for direction of arrival (DOA) of useful signal for
`indoor localization of speaker,
`in other words speaker room localization, algorithm for
`reduction of stationary noise, non-steady noise and residual echo (NR- Noise Reduction)
`and algorithm for system automatic gain control (AGC), because of compensation between
`different speaker distance from the microphone array. Besides all those basic algorithms,
`DSP runs some others algorithms more as are: voice activity detector (VAD) on the near-
`end, VAD on far-end, double talk detector (DTD) on the both sides, additional postfiltering
`(PF) ofnoise reduction, etc. Theaimof‘iientionéd algorithms‘is itiaximal reduction of all
`present noises with minimum of speech signal degradation, therewith secure of transmitting
`speech signal maximum quality.
`
`Specific aspect of invention subsist adaptive acoustic echo cancellation using an adaptive
`filter, which mould transferring acoustic way characteristic from loudspeaker to the
`microphone. Transferring characteristic is complex, working on transmitting way from 2
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`ARAVAINANHAYDT417
`
`
`
`WO 2008/041878
`
`PCT/RS2007/000017
`
`nA
`
`(stereo) loudspeakers to the N microphone in the microphone array and each microphone
`signal is being filtered by its on adaptive filter. Work of adaptive filters is being controlled
`with speech activity detector on the both sides.
`
`Next specific part of invention is adaptive directivity characteristic of microphone array,
`which secure spatial filtering and directivity separation in the room with speaker, where the
`useful signal is being boost till the maximum of strength in accordance with and on other
`signals, which are being interfered. Directivity characteristic of microphone array is
`accomplished by adaptive weighting and summing of microphone signals, which secure
`directivity index stability in one frequency domain in one reverberation acoustic ambience.
`
`Defining direction of arrival of speaker.directed, acoustic. wave is a,next specific thing of the
`invention. This system function of free speech communication is necessary for control and
`managing of directivity characteristic of microphone array by azimuth, also it can be used
`for conirol and video camera guiding. It uses microphone signals after acoustics echo
`cancellation. After generated cross-correlation of microphone signal and its phase
`transforms,
`the arrival direction of speakers directed acoustic wave is estimated. This
`function is being directly controlled by speech activity detector.
`
`Following specific of the invention is process of adaptive suppression of stationary and non-
`steady noises. Process is realized on the non-linear estimation noise compressor, which is
`being sorted to several sub-bands. Two estimation noises are being used, securing the
`optimal suppression result of speech signal characteristics. That has been done because of
`safety reason. Safety in meaning that process of adaptive noise reduction shouldn’t degrade
`the quality speech signal. Process of filtration should be finished in accordance with
`adaptive Wiener post-filter.
`Specific aspect of the inventionis automatic gain control ‘of speech signal before —
`transmission to the far-end interlocutor. This peculiarity is important copulative element of
`free speech communication system. System secures compensation between different speech
`signal intensity, as an individual speech characteristic on the oneside, and different speech
`intensity on the other side, which is depending on speaker position, nearer or farther
`position in relation to the microphone array. The solution makes a difference between
`speaker activity and useful signal appearing of pause, residual echo, acoustic noise or far-
`end speech signal, wherefore the solution uses more information previously detected into
`the system. Analysis of possible scenarios has to be reliable; in counterpart it is possible to
`get one negative effect of useful speech signal attenuation.
`
`Specialty of this invention is improvement of each mentioned specifics, also improvement
`in the integration processof all algorithms to the one unite, which functioning is stable and
`quality. Algorithm procedures are being optimized using cooperative resources.
`
`These and other aspects, specifics”arid" benefits of the ‘invention’ are going”to be more ~
`evidentially after invention detail description review, patent claims and suitable figures.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`ARAVAIMNHAHAYDT741I2
`
`
`
`WO 2008/041878
`
`PCT/RS2007/000017
`
`Brief Description of the Drawings
`
`Figure 1 - shows elements of free video-phone communication system using a microphone
`atray and digital television.
`
`Figure 2 - shows ambience conditions for the system appliance of free speech video-phone
`communication system using a microphonearray.
`
`Figure 3 - shows a diagram block of audio signal processing subsystem within free video-
`phone communication system; it contains one microphone array with adaptive directivity
`characteristic (SD-BF), block of speaker indoor location (DOA), block of echo cancellation
`(AEC), block of noise reduction (NR) and block ofautomatic gain control (AGO).
`Figure 4 - shows theblock diagramof acoustic echo canceling (AEC).
`
`10
`
`Figure 5 - shows the block diagram of adaptive determination of near-end speaker direction
`in horizontal plane (DOA-azimuth).
`
`Figure 6 - showsthe block diagram ofspatial filtering (SD-BF).
`
`Figure 7 - represents the block diagram of noise reduction (NR).
`
`15
`
`Figure 8 - represents the block diagram of automatic gain control (AGC).
`
`Best Mode for Carrying Out of the Invention
`
`This invention shows a system and method of acoustic signal processingin a free speech
`communication using a microphonearray.
`
`represents system elements of free video-phone communication using a
`Figure 1
`microphone array and digital television. Digital television 100, which serves the user for a
`casually TV watching, in the free video-phone communication system, is being used as a
`video communication and as an audio terminal for audio communication with another
`speaker. Namely, when the communication channel way 101 gets a call and connection with
`another speaker is made, then the TV 100 is being used as a multimedia interface, where
`one speaker over the loudspeakers 102 is listening, and watching on the one part 105 of the
`TV screen 100 of its far-end interlocutor. In the same time, on the another end of
`communication channel (far-end side), the speaker on the similar TV receiver, using camera
`104 and microphonearray 103, also see its interlocutor placed at near-end side. Camera 104
`is movable and it is controlled by coordinates, obtained by microphone signal processing
`from microphone array 103.
`
`Analog signals from a microphone in microphone array 103 are amplified by the amplifier
`106 and together with loudspeakers-stereo signals-102are introduced -to acquisitionmodule
`107, which digitalized them and send them to DSP 108 on the further processing. Proceeded
`speech signal of the near-end speaker in the DSP is being sent over a communication
`channel 101 to the speaker on the far-end. Acoustic signal process in DSP 108 gets spatial
`coordinates of speaker ambience location, in the room with free communication system.
`With them DSP 108 controls a camera steering 104, directed on the active speaker. On that
`
`20
`
`25
`
`30
`
`35
`
`ARAVAIMNANHAYDOTAO
`
`
`
`WO 2008/041878
`
`PCT/RS2007/000017
`
`way, free audio and video communication between two speakers, with a digital television
`system is completely assured.
`
`Figure 2 schematically shows ambient conditions of free video-phone communication using
`a microphone array; it shows only a part of the system, which is related to acoustic signal
`processing. The room 201 hasinstalled the system of free video-phone communication,
`speaker 202 and noise source 203, which is normalappearance of every acoustic ambience,
`Over the loudspeaker 102 stereo audio system of digital television, the speaker 202 is
`listening of incoming speech signal of its interlocutor 204 from the far-end, mostly as a
`mono signal. Microphone array (made of N number of microphones) records ambience
`sound 201. After complex microphone signal processing in the block 207, speech signal of
`the speaker 202 is transmitted by the block 208, to the far-end speaker as a mono signal.
`
`Ambience conditions 201duringthe speech communication are Very comiplex: Inthe case
`of the free video-phone communication in the room 201, three noise sources are presence:
`stereo loudspeakers 102, which emit a far-end speaker voice and TV program, speaker 202
`and minimum one source of noise 203. It is possible that room can have more sources of
`noise: computer noise, air-condition noise, street noise, neighbors’ noise, buildings
`vibrations or another speaker, or even few speakers, music, etc.
`
`Therefore, we have one very complex acoustic picture of the room. Microphone array 103
`as a sensor system, records all room sounds, and all direct sound waves out of each sound
`source, but at the same time,
`it records all sound reflections. For example, from the
`loudspeaker 102 to the microphone array 103 arrives one direct wave 209 followed by
`plenty of reflected waves, where only one wave 210 has been showed on the Figure 2, the
`speaker 202 sends a direct wave 211 and besidesall those wavesit sends two morereflected
`waves 212a and 212b,the noise source 203 sends one direct wave 213 and besides the rest
`of waves, one reflected wave 214,too.
`
`Out of all sounds, which the microphonearray records, one is a direct and useful wave 211
`taken from the speaker 202, all the rest waves are noticed as a disturbances. The biggest
`disturbance is an acoustic echo 209, which comes from the loudspeaker 102. All other
`reflections,
`together, produce a room reverberation. The task of block for audio signal
`processing 207 is to cancel acoustic echo signal, to select a useful signal 211 from the other
`signals, to suppress reverberations signals, to suppress direct noise sources and their signals,
`and the number of those sources can be more than one. Special task of the 211 block is to
`follow acoustic room scene and its non-stationary, depending of speaker mobility, or
`position, or depending of noise mobility, are they non-stationary or changeable. In the
`following text, explanations of these issues from the invention would be particularly
`described.
`
`Figure 3 shows a schematic diagram of total audio signal processing procedure in free
`video- phone communication system using a microphone array. All microphonesignals 103,
`from M1 till the M5, as well. as-a-loudspeakers. stereo signal. 102,-Sp-L I Sp-R, are being
`digitalized into acquisition block 107, Figure 1, and converted into the frequency domain
`using a fast Fourier transform (FFT) 301 into the signals x;
`till the x7. It should be
`emphasized that the microphone array contains 5 microphonesto resolve this patent, but if
`there is a need for few additional microphones, they can be install for the need of the
`
`10
`
`15
`
`20
`
`" 25
`
`30
`
`35
`
`40
`
`ARAVAINANAYSTIONH
`
`
`
`WO 2008/041878
`
`PCT/RS2007/000017
`
`application. The block 302 suppress acoustic echo in all signals (x, till xs) using an xg and x7
`signals as a referents. Suppressed signals Syzcy till Sagcs are being used in the block 304 for
`assignment of direction of arrival of sound wave (DOA) by horizontal plane (azimuth @,) to
`the actual speaker. On that way the tracking ofthe active speaker is possible. Marking the
`azimuth angle @, in the block 303, the weighted coefficient of signals x, till x; are being
`optimized, with one purpose, to form horizontal directivity characteristic of microphone
`array with receiving maximum on azimuth direction 6,. Receiving characteristic formed in
`the block 303 has a superdirective nature, which meansthat the receiving directivity index is
`larger then directivity characteristic, which we get from delay compensation and sum of
`mucrophonesignals.
`
`Block 303 does the time compensation between acoustic signal delay of the speaker on the
`one side, and the microphones on the other side. Control over this delay signal DOA (4,)
`from the block 304,
`it is accomplished to: control the «microphone -array- ‘directivity ‘by -
`azimuth. Directivity
`characteristic
`of microphone
`array SD-BF
`(Superdirective
`Beamformer) in the block 303 is formed. The main lobe ofthis characteristic is its narrow
`and directed course, directed into the wanted aim, and the side lobes are intensely slower.
`That secures spatialfiltering to the microphone array, precisely, separation of noise sources
`in the horizontal plane. That kind of form of directivity characteristic is very important for
`the reduction of unwanted noises,
`to separate them from the useful signal and room
`reverberation effect. Characteristic of directivity has been formed by microphone signal
`weighting and its summing into the one-channel output signal.
`
`Output signal in block 303 contains constantly speech signal and noise signal, which
`consists one residual signal after acoustic cancellation of an echo signal, suppressed
`ambience noise and reduced reverberation noise. That signal comes to the block noise
`reduction - NR 305 where the additional noise signal reduction is done. Reduction processis
`adaptive, concerning noise signal non-stationary. Also, important claim in NR realization
`block is the fact that noise reduction- and. its: process: shouldn*t- affect on-speech signal
`quality.
`
`Final block of signal processing of free speech communication system in video-phone or
`teleconference processing is block 306 for automatic gain control (AGC) of speechsignal.
`This block uses more information, which it takes out of systems, which are important for
`defining of possible speech signal conditions and where is necessary to correct
`its
`amplitude, on suitable manner. On that way it can be secured almost the same level of
`transmitting speech signal,
`independently of the distance between actual speaker and
`microphone array and it can assure a better quality on opposite side of the communication
`channel.
`
`On the system exit, the signal process result, using an inverse FFT in the block 307 is
`transformed from frequency to the time domain. Estimated speech signal on the near-end (8)
`is sent through the channel to the distant speaker.
`Figure 4 represent block ‘diagram ‘ofacousticecho canceling (AEC) 302, which is
`containing two main blocks: block 401, which is containing 5 adaptive NLMS (Normalized
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`AO
`
`ARAVAINMNANHAYSATO4
`
`
`
`WO 2008/041878
`
`PCT/RS2007/000017
`
`Least Mean Square) algorithms and block 402, which main function is detection ofactivities
`between near-end speaker and far-end speaker speech DTD (Double Talk Detection).
`
`NLMSalgorithms, from NLMS1 till NLMS6, processes x, till x; microphone signals and
`certain Syc; till S4zcs signals to the blocks 303, 304 and 306, Figure 3
`
`NLMSalgorithm function is to cancel echo presence in each microphone signal. This
`function secures presence of reference signals out of loudspeaker 102 and contro! signal out
`of DTD detector 402. NLMSalgorithm models transfer functions of acoustic way from each
`loudspeaker 102 to the each microphone 103:
`for example, NLMS1 models transfer
`functions fiz; out of loudspeaker Sp-L to the microphone M1 and Ap, from loudspeaker Sp-R
`to the microphone M1, etc.
`
`Signal transmitted from loudspeaker through NLMSfilters, gets a signal replica on the
`microphones, which came on acoustic way and deduction of these two signals, is
`accomplished by cancellation of signal‘echo on the NLMS. algorithm exit. To get“maximum
`quality of echo reduction, similar to the case of RLSI type of AEC algorithm (RLS-
`Recursive Least Squares),
`it will be described in the text below, DFT coefficient from
`previous processing blocks are used. NLMSalgorithm needs obviously less time in the
`relation to the RLS algorithm;
`in the NLMSalgorithm realization DTF coefficient of
`previous 5 processed blocks is being used.
`
`Block 403 with RLS1 AEC mark is a main algorithm part of detection procedure of double
`speech activity from block 402. RLS1 AEC does rudely reduction of acoustic noise in the
`microphone M1 signal using a RLS algorithm. RLS algorithm has a fast convergence, which
`insures a good estimation of speech signal, as well as an estimation of additive component
`of signal echo. In accordance with DTF window length of 1024 samples, which is not
`enough big to secure maximum of noise echo reduction in reverberation room, regression
`vector gets DTF coefficient out of previous three processed blocks. That process secures
`double benefit: maximumof echo reduction andS