`Kaneda
`
`[54] METHOD OF DETECTING ACOUSTIC
`SIGNAL
`
`[75) Inventor:
`
`. Yutaka Kaneda, Tokyo, Japan
`
`[73] Assignee: Nippon Telegraph & Telephone
`Corporation,Japan
`
`[21) Appl. No.: 490,773
`
`[22) Filed:
`
`Mar. 8, 1990
`
`Foreign Application Priority Data
`[30]
`Mar. 10, 1989 [JP]
`Japan .................................... 1-58953
`
`[51)
`
`Int. Cl.5 •••••••••••••••••••••••••• GlOL 3/00; H04R 3/00;
`H04R 75/00
`[52) U.S. Cl ......................................... 381/46; 381/47;
`381/41; 381/92; 381/155
`[58) Field of Search ....................... 381/46, 47, 40, 41,
`381/92, 155
`
`[56)
`
`References Cited
`U.S. PATENT DOCUMENTS
`4,195,360 3/1980 Fothergill ........................... 367/136
`4,215,241 7/1980 Pinkney ............................... 367/197
`4,412,097 10/1983 Ishigaki et al ........................ 381/92
`4,536,887 8/1985 Kaneda et al. ........................ 381/92
`4,559,642 12/1985 Miyaji et al ........................... 381/92
`4,589,137 5/1986 Miller .................................... 381/92
`4,653,102 3/1987 Hansen .................................. 381/47
`4,696,043 9/1987 Iwahara et al. ....................... 381/92
`4,888,807 12/1989 Reichel ................................. 381/92
`
`FOREIGN PATENT DOCUMENTS
`2128054 4/1984 United Kingdom .
`
`I 111111111111111111111111111111111111111111111111111 IIIII IIIIII Ill lllll llll
`US005208864A
`5,208,864
`[11] Patent Number:
`[45] Date of Patent: May 4, 1993
`
`OTHER PUBLICATIONS
`IEEE 1966 International Convention Record Part 2
`"Radio Communication; Broadcasting"; Audio Mar.
`21-25, 1966 pp. 148-156 Torick et al.
`IEEE Transactions on Acoustics Speech and Signal
`Processing vol. 34, No. 6 Dec. 1986 pp. 1391-1400
`Kaneda et al. "Adaphve Microphone Array System for
`Noise Reduction".
`"Computer-Steered Microphone Arrays for Sound
`Transduction in Large Rooms" by Flanagan et al.
`Acoustical Society of American, Nov. 1985.
`Primary Examiner-Dale M. Shaw
`Assistant Examiner-Kee M. Tung
`Attorney, Agent, or Firm-Blakely, Sokoloff, Taylor &
`Zafman
`ABSTRACT
`[57)
`According to a method of detecting an acoustic signal,
`first and second sound receiving units are located at
`substantially the same position and are used to output
`signals having different target signal power to noise
`power ratios (SIN ratios). When a difference between
`the powers of the signals output from the first and sec(cid:173)
`ond sound receiving units or a ratio of the power of the
`signal from the first sound receiving unit to that from
`the second sound receiving unit in a given period falls
`· within a predetermined range, reception of the target
`signal within the given period is discriminated. The first
`sound receiving unit is an adaptive microphone array
`capable of controlling directivity characteristics in cor(cid:173)
`respondence with a noise position.
`
`S Claims, 12 Drawing Sheets
`
`I
`
`51
`r-17
`: I
`I I
`
`I
`I
`I
`I
`I
`I
`
`I
`I
`I
`I
`I
`I
`I I
`L _.J
`
`52
`
`XI
`
`81 H
`
`43
`
`SIDIT TIME
`POWER r.AI.C.
`
`Pl
`
`42
`
`82
`
`X2
`
`44
`
`SIDIT TIME
`POWER r.AI.C.
`
`P2
`
`I
`I
`I
`
`I
`I
`
`45
`r 1 -- -----------------7
`84
`86
`IETECTDWED SI
`SPEEa! PEINIO
`IIPOWER
`IETEIMIIA TOI
`85
`IE1ECTIII WED
`II POWER llfEIDl
`
`I
`I
`I
`I
`I
`I
`I
`I
`I
`L-----· - - - - - - - - - - - -~
`
`S2
`
`-+-
`
`Page 1 of 24
`
`GOOGLE EXHIBIT 1028
`
`
`
`U.S. Patent
`
`May 4, 1993
`
`Sheet 1 of 12
`
`5,208,864
`
`a:: t.
`
`~
`LI.I
`:E
`I==
`
`!i ----
`
`~
`
`12
`
`13
`
`I
`
`I
`~
`15
`
`I ,.
`
`\
`16
`
`14
`;
`----Tb
`11
`
`.. ,
`
`I
`
`TIME
`
`FIG. l
`
`FI G.2(a)
`
`3
`h
`\_y
`
`3
`
`1
`
`FI G.2<b> G ~
`II
`2
`
`Page 2 of 24
`
`
`
`U.S. Patent
`
`May 4, 1993
`
`Sheet 2 of 12
`
`5,208,864
`
`12
`
`13
`
`13
`
`11
`
`.
`
`11
`
`C:0-., a..
`I
`;;;:;
`~
`- - 17---._
`- - - - - - - - - - - - - - - - ----PIii
`~o~-------4--~--
`:.
`\ .. :
`FIG.3(c)I
`18
`
`LI.I
`
`3 0 1
`~ rli
`
`2
`
`4
`(( ~
`(FAR AELO)
`
`FI G.4
`
`Page 3 of 24
`
`
`
`U.S. Patent
`
`May 4, 1993
`
`Sheet 3 of 12
`
`5,208,864
`
`-D-
`a:: I
`~
`s=
`FIG.5(a)i
`- 32""--1"1:
`I
`~
`s=
`FI G.5(b)I
`
`D-
`a::
`
`I
`I
`
`1--i
`I
`I
`
`12
`
`13
`
`11
`
`I
`I
`
`I i-a ........-31
`1----1
`I
`I
`I
`I
`I
`
`13
`
`11
`
`c=,
`D-
`u.,
`
`~ 0
`a::
`u.,
`a..
`a..
`FI G.5(c )=
`
`----
`
`--- -----
`
`17
`....__
`-----Pih
`
`I
`I
`
`33 :
`l
`...,
`I
`
`,..
`
`I
`
`4
`3 ~
`FIG.6(a) C} •
`I~ 21i
`
`Page 4 of 24
`
`
`
`U.S. Patent
`
`May 4, 1993
`
`Sheet 4 of 12
`
`5,208,864
`
`1
`
`2
`
`21
`
`SHORT TIME
`POWER CALC.
`
`22
`
`SPEECH PEIIOO
`r.ANIIOATE
`EECTlmt
`
`AVERAGE
`POWERCALC.
`
`AVERAGE
`POWERCALC.
`
`23
`
`24
`
`26
`
`CANIIOATE
`TESTING
`
`+ POL
`- \25
`
`FI G.7
`PRIOR ART
`
`a::
`
`I
`~
`I=
`Ii
`gs
`
`12
`
`13
`
`14
`I
`------TII
`
`I
`
`I
`I
`
`36
`'-D.1.
`, ..
`
`I
`
`TIME
`
`(34 .. !
`35
`. I
`L
`. I
`-■ I
`
`FI G.8
`
`Page 5 of 24
`
`
`
`U.S. Patent
`
`May 4, 1993
`
`Sheet 5 of 12
`
`5,208,864
`
`41
`
`51
`
`52
`
`,-.t--\_- -7
`I
`1 r- 1
`I I
`I I ,. -
`t::::: ffi
`I
`s _,I
`I Xl
`1 1 1 1
`11 1 1 ~ 1---4----4
`11 1 1 ~ I
`I I I
`I
`25
`I
`I I
`I
`I
`...__~ I
`..J
`I L.: -
`L _ _ _ _ _ _ _ _J
`
`43
`
`44
`
`SHORT TIME
`POWER CALC.
`
`X2
`
`SHORT TIME
`POWER CALC.
`
`42
`
`45
`
`Pl
`
`P2
`
`SPEECH PERIOD
`DETECTION
`BASED ON SHORT
`TIME POWER
`DIFFERENCE
`
`FI G.9
`
`61
`
`62
`
`f
`64
`
`0
`
`3
`
`f?
`M ~
`
`3
`
`FIG.10( a) FIG.10(b)
`
`~ij
`-C
`FI G.11
`
`Page 6 of 24
`
`
`
`U.S. Patent
`
`May 4, 1993
`
`Sheet 6 of 12
`
`5,208,864
`
`FI G.12
`
`71
`
`64
`~
`
`66
`
`~ 3
`FI G.13
`
`TIME
`FI G.14{a)
`
`71 72 73 f-IY-1~.
`
`TIME
`FI G.14(b)
`
`Page 7 of 24
`
`
`
`~
`0\
`"' 00
`00
`0
`"' ~
`UI
`
`""" N
`e.
`~ .....
`
`(I)
`l:S"
`00
`
`""" ~ w
`
`w,f;J,,
`~
`a::
`
`~ a ff> = ~
`c •
`
`•
`00
`
`FI G.15
`
`S2
`
`ON POWER INFFEIIEll:E
`DETECTION WED
`
`83
`
`Po
`
`I
`I
`,l __ -----------------7
`
`DETERMINATION
`SPEECH PERIOD
`86
`
`S1
`
`85
`ON POWER
`DETECTION BASED
`84
`
`L ______ ---------____ J
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`
`P2 I
`
`1+
`1-
`I
`I
`I
`I
`I
`Pl I
`
`45
`
`POWER CALC.
`SHORT TIME
`
`44
`
`POWERCALC.
`SHORT TIME
`
`43
`
`81
`
`I!
`
`f;g~
`
`52
`
`I
`I
`I
`I
`I
`I
`
`l I
`: I
`,_17
`51
`
`L _.J
`I I
`I
`I
`:
`I
`I
`
`I
`I
`I
`
`I
`:
`
`rJ..
`82
`
`X2
`
`I
`
`42
`
`Page 8 of 24
`
`
`
`U.S. Patent
`
`May 4, 1993
`
`Sheet 8 of 12
`
`5,208,864
`
`13
`
`121
`
`-A.
`CIC I
`~ s= ---
`122
`~
`....__ ______ ---4---~---
`2's
`i· J ·I
`~
`TIME
`16
`15
`FI G.16(a)
`
`~ a..
`a::
`
`LI.I
`
`I
`:E s=
`Ii: c::»
`ii
`
`122
`
`13
`
`11
`
`FI G.16(b)
`
`c= a..
`~ - - - - - -
`~
`~ ------------ ---------Pih
`25
`.
`·1
`
`I
`
`,
`
`j· 7
`18
`FI G.16(c)
`
`Page 9 of 24
`
`
`
`U.S. Patent
`U.S. Patent
`
`May 4, 1993
`May 4, 1993
`
`Sheet 9 of 12
`Sheet 9 of 12
`
`5,208,864
`5,208,864
`
`2
`
`~
`
`q
`
`q
`
`<J
`q
`
`c:,
`
`8P 09 .. I
`WH 0¢
`
`~
`
`-
`«
`-
`I'-
`S
`......
`•
`c.,
`O
`.....
`u.
`LL,
`
`-
`-
`2
`J::J
`I'-
`tS
`......
`•
`0
`~
`.....
`u.
`(1,
`
`~
`
`-
`-
`v
`I'-
`tS
`......
`•
`Oo
`~
`.....
`u.
`(1,
`
`-
`-
`7
`"'O
`I'-
`Ss
`......
`•
`c.,
`O
`.....
`u.
`(L,
`
`Z
`
`£
`
`E-
`=.
`Ir
`=
`a=
`
`1--
`
`Page 10 of 24
`
`Page 10 of 24
`
`
`
`U.S. Patent
`
`May 4, 1993
`
`Sheet 10 of 12
`
`5,208,864
`
`r-----------------7
`, r---1,..,--51
`v41
`I
`I
`I
`_-1----i
`I
`r-
`I
`I
`I I
`1
`I
`-
`I
`:-T:
`I
`ii I
`L-
`I
`I -
`I
`\., I I
`I 11
`I
`!i
`.
`: I I
`I
`I
`I I
`I
`I
`I
`I
`I
`:
`I
`--1-J
`I
`I
`r-
`I
`I
`-
`I I
`1
`I
`.... _!-,
`I
`L_
`I
`l 52
`I
`I I
`I
`I
`_JI
`I L..-
`I
`I
`-- -r------ ___ ...,
`__ J__
`L - - -
`L----------,
`I
`v42
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`,12
`I -
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`52A
`1./
`1
`L ___________ ...,
`
`:
`
`l
`
`FI G.18
`
`T043
`If Fl6.9
`
`TO 44
`If Fl6.9
`
`G;I:::;!
`
`= 5!!
`ii~
`c,:, s
`s=
`
`:a
`
`S1 FROM 84
`If FIG.15
`
`86
`r----------------------L--,
`l
`86a
`I
`I
`lf
`I
`,~
`I
`I
`I -
`I
`I
`I
`86b1
`I
`I
`I
`I
`L------ - - - - - - - - - _________ J
`S2 FIIJM 85 If FIG.15
`
`SPEECH PERIOO
`IETERMINA Tlili
`
`PERIOO
`DESCIIMINATION
`
`FI G.19
`
`Page 11 of 24
`
`
`
`U.S. Patent
`
`May 4, 1993
`
`Sheet 11 of 12
`
`5,208,864
`
`0
`N • c.,
`.....
`ti.
`
`r--------- --------,
`
`St
`
`~cs:
`..., ...,
`U,I~
`~ a:
`
`I
`I
`I
`I
`I
`I
`I
`I
`
`I
`I
`I
`I
`L_
`~-
`l
`I
`:
`-
`• • • • • • •
`-
`-
`I
`-../!
`·&n L ________________ J
`I
`
`------------'
`--------- --.
`
`Page 12 of 24
`
`
`
`U.S. Patent
`
`May 4, 1993
`
`Sheet 12 of 12
`
`5,208,864
`
`,..52
`/51
`r..L.7 r - - - - - - - - - - - - - - - - ' - - -7
`I
`I I
`59
`X
`I U1
`I
`I
`561-r, ~1-,;....___;~----------.-i. )....-~~
`I
`I I
`I
`I
`I
`I
`I I
`I U2
`I
`I
`
`562f1
`
`I
`563~
`I
`
`I
`I
`I
`I
`I
`I
`I
`
`I
`I
`I
`I
`I
`I UM
`
`-
`
`58M-I
`
`I
`
`FIG.21
`
`I
`
`L __ J L-------------------~
`
`Page 13 of 24
`
`
`
`1
`
`5,208,864
`
`METHOD OF DETECTING ACOUSTIC SIGNAL
`
`2
`FIGS. 3(a), 3(b), and 3(c) are charts for explaining an
`ideal operation of the second conventional method.
`More specifically, FIG. 3(a) shows a time change in
`power Pl of the output from the first microphone, and
`BACKGROUND OF THE INVENTION
`The- present invention relates to a method of detect- S FIG. 3(b) shows a time change in power P2 of the out-
`put from the second microphone. Reference numerals
`11 in FIGS. 3(a) and 3(b) as in FIG. 1 denote stationary
`ing an acoustic signal, and a method of detecting a
`noise; 12, unstationary noise, and 13, speech. Since the
`period of a desired acoustic signal in a signal including
`two microphones are arranged as shown in FIG. 2(a) or
`noise and the desired acoustic signal.
`In recent years, although speech recognition appara- 10 FIG. 2(b), the power of the speech in FIG. 3(b) is lower
`tuses have been remarkably developed, the develop-
`than that in FIG. 3(a), while the noise power levels of
`ment of a speech recognition apparatus for recognizing
`these outputs are equal to each other. As shown in FIG.
`speech in a noisy environment has been retarded be-
`3(c), according to the second conventional method, a
`difference PD ( = Pl-P2) between the short time pow-
`cause it is difficult to correctly detect a speech period
`(i.e., to detect a period during which speech is present 1s ers Pl and P2 of the two signals is calculated. When the
`on the time axis) in a signal contaminated by noise.
`power difference PD is larger than a given threshold
`When a noise period is recognized as a speech period,
`value Pth17, a corresponding time period 18 is detected
`noise is forcibly caused to correspond to any phoneme,
`as a speech period. According to the second conven-
`and it is impossible to obtain a correct speech recogni-
`tional method, as is apparent from FIG. 3(c), the unsta-
`tion result. Therefore, it is very important to develop a 20 tionary noise period having a high power is not de-
`tected as a speech period, unlike in the first conven-
`speech period detection technique which can be used in
`a noisy environment.
`tional method.
`FIG. 1 is a timing chart for explaining the first con-
`The second conventional method, however, is rarely
`ventional speech period detection method. This chart
`operated in an ideal state because the following three
`shows changes in short time power as a function of 2s conditions must be satisfied to correctly detect a speech
`time. The short time power of a signal output from a
`period by utilizing a power difference in the two sig-
`microphone is plotted along the ordinate, and the time
`nals:
`·
`Condition 1: An SIN ratio difference in two signals
`is plotted along the abscissa. In the following descrip-
`tion, the short time power will be referred to as a
`must be present.
`"power". A signal generally contains stationary noise 3o Condition 2: Noise and speech periods of the two
`11 (noise having almost a constant power, such as air-
`signals must be matched with each other as a function of
`conditioning noise or fan noise of equipment), unstation-
`time.
`Condition 3: A variation in SIN ratio difference
`ary noise 12 (noise whose power is greatly changed,
`caused by various factors is small (stability of the SIN
`such as a door closing sound and undesired speech), and
`desired speech 13. Although the power of the stationary 35 ratio difference).
`According to the second conventional method, the
`noise can be known in advance, the unstationary noise
`power is unpredictable.
`first condition is satisfied, while the second and third
`According to the first conventional method, a power
`conditions are not satisfied. Therefore, the following
`of a signal is kept monitored. When this power exceeds
`problems are posed.
`a threshold value Th14 determined on the basis of the 40
`The first problem will be described below. FIG. 4
`stationary noise power, the corresponding period is
`shows an arrangement obtained by adding a noise
`recognized as a speech period. Most of the existing
`source 4 to the arrangement of FIG. 2(a). At this time,
`speech is input to the first microphone 1 and then the
`speech recognition apparatuses perform speech period
`second microphone 2. However, noise is input to the
`detection by using this method. According to this
`method, although a correct speech period 16 shown in 45 second microphone 2 and then the first microphone 1.
`FIG. 1 can be detected, an unstationary noise period 15
`Therefore, the speech and noise periods of the two
`having a high power is also erroneously detected as a
`microphone output signals are not matched as a func-
`speech period, resulting in inconvenience.
`tion of time.
`The above situation is shown in FIGS. S(a), S(b), and
`The second conventional method will be described
`so S(c). FIG. S(a) shows the power Pl of the output from
`below.
`According to the second conventional method, two
`the first microphone 1, FIG. S(b) shows the power P2 of
`microphones are located to cause an SIN ratio differ-
`the output from the second microphone 2, and FIG. S(c)
`ence between outputs from the two microphones. The
`shows the power difference PD. Reference numeral 11
`examples of microphone arrangement for the method
`denotes stationary noise; 12, unstationary noise; and 13,
`are shown in FIGS. 2(a) and 2(b). That is, as shown in ss speech, as in FIGS. 3(a) to 3(c).
`FIG. 2(a), a first microphone 1 is located near a speaker
`Relationships between the speech powers and the
`3, and a second microphone 2 is located away from the
`noise powers in FIGS. S(a) and S(b) are the same as
`speaker 3. Alternatively, as shown in FIG. 2(b), the first
`those in FIGS. 3(a) and 3(b). However, in the relation-
`microphone 1 is located in front of the speaker 3, and
`ships shown in FIGS. S(a) and S(b), the speech as the
`the second microphone 2 is located near the side of the 60 output from the second microphone 2 is delayed from
`that as the output from the first microphone 1 by a
`speaker 3. In these arrangement, the speech power level
`of the output from the first microphone is higher than
`period TS31, whereas the noise as the output from the
`that from the second microphone. On the other hand,
`second microphone 2 advances from that from the out-
`put from the first microphone by a period TN32. The
`assuming that noise is generated in a remote location,
`the noise power levels of the outputs from these micro- 65 speech and noise periods are not matched with each
`phones are almost equal to each other. As a result, an
`other as a function of time. As a result, the difference
`SIN ratio difference in outputs of the two microphones
`PD between the two signal powers is different from that
`occurs.
`of FIG. 3(c), as shown in FIG. S(c). When a period
`
`Page 14 of 24
`
`
`
`5,208,864
`
`4
`3
`signal is kept monitored by the speech period candidate
`during which the difference exceeds the threshold value
`detection unit 22. The speech period candidate detec-
`Pth17 is detected as a speech period, a period 33 in FIG.
`tion unit 22 detects a speech period candidate as a per-
`5(c) is erroneously detected as a speech period, thus
`iod when its power exceeds a threshold value Th. The
`posing the first problem. Because the time difference
`TN32 in this noise period is greatly changed depending 5 above operations are the same as those in the first con-
`ventional method shown in FIG. 1. The noise period 15
`on the position of the noise source, it is impossible to
`establish matching by using a delay element.
`shown in FIG. 1 is detected as a speech period candi-
`As the second _problem, there are various factors for
`date. Then, average powers of the outputs from the first
`changing an SIN ratio difference between the two mi-
`and second microphones during this candidate period
`c~ophone outputs in a ~~actical situation, ":iere~ore, it is 10 are calculated by the average power calculation units 23
`difficult to assure ~tabihty of the SIN ratio difference
`and 24. Next, the difference POL between two average
`. .
`.
`between the t"'.o ~1gnals as ~allows.
`powers is obtained by the power difference detection
`unit 25. Finally, when the power difference POL ex-
`The first van~t1on factor IS the pas1t1on o~ the n01se
`source. As d~nbed above, th~ no!SC source 1s assumed
`ceeds a predetermined threshold value POLt, this can-
`to~ located_1D a remote locat1~n. When, howe~er, the 15 didate period is recognized as a correct speech period
`n01se source 1s located at a relatively close locat10n, the
`by the speech period candidate testing unit 26. Other-
`position of the noise ~u~ce becomes a large variation
`wise, this candidate period is discarded.
`facto~ for !he _SIN_ ratio difference. FIGS. 6(a) and 6(?)
`According to the characteristic feature of the third
`expla1D this s1tuat1on. Reference numerals 1 and _2 ID
`conventional method, a difference between the average
`FIGS. 6(a) an~ 6(b) denote first and se~nd micro- 20 powers obtained within a relatively long time candidate
`ph~nes, respectively; 3, s~ers; and 4, n?1se sources,
`period·, is calculated in place of the short time power
`as 1.11. FIG_. 4: Whe~ the noise source 4 IS located . at
`difference. Even if the speech and noise periods of one
`positions IDdicated m FIGS. 6(a) or ~b), the nm~e
`microphone output are not matched with those of the
`power of the output from the first microphone 1 1s
`th
`•
`h
`t
`t
`h
`· FIGS 5( )
`d
`. a an
`.
`h
`o er microp one ou pu , as s own ID
`.
`hi h
`h
`h
`h
`f
`2
`· SIN
`f
`· f
`f
`g er t an t at rom t e second m1crop one
`, as ID 25 5(b)
`d b
`th ra 10 cause Y
`the speech powers. As a result, an SIN ratio difference
`' or heven
`tme ':'tsaninfla ions ID
`f: . 1
`be
`room ec oes occur, 1
`uence on
`e average power
`h
`•
`h
`be
`fi
`comes air Y
`d'fti
`tween t e two rmcrop one outputs
`•
`all Th
`t· 1
`h bird
`ere ore, t e t
`con-
`small.
`1 e~ence 1s rera 1ve y sm
`.
`The second variation factor is movement of the
`vent1onal meth<;>d seems to solve the problems of the
`speaker. For example, when the speaker 3 turns his head 30 second co~ventional ~roblem.
`.
`in a right 45° direction in FIG. 6(b), the speech signal is
`In the t~d c~nvent1on8:1 method, however, SIDce the
`speech ~n?'1 IS dete~ed b~d on. the avera¥e
`received by each microphone at almost the same level.
`As a result, a speech power difference does not occur in
`~w~r ~thin the candidate penod, an. IDCOrrect dis-
`the outputs of the two microphones, thus an SIN ratio
`c~ation result ~curs when the n~ISC and speech
`35 penods appear cont1Duously, as shown ID FIG. 8. FIG.
`difference varies.
`The third variation factor is an influence of room
`8 shows an output from the first microphone. A correct
`echoes. When two microphones are located so as to
`speech p~riod is a period ~ in F~G. 8. As shown in
`cause the SIN ratio difference in their outputs, room
`FIG. 8, sm?e uns~t10n~ noise 12_1s closet? speech 13
`echoes having different time structures and magnitudes
`al~ng the ttme axis, a penod 35 which co~tams both the
`are added to the noise and speech components of the 40 n01~e and speech penods and the short_ time power of
`each microphone output. As a result, an SIN ratio is
`which exc~ds a th_reshold value :1114 1~ detect~ as a
`~pe7ch pe~od candidate. When this can~1date penod 35
`difference greatly changed as a function of time.
`In addition to the above mentioned major variation
`1s ~1scnm1Dated as a correct spe~ch penod upon_ calcu-
`lat1on of an average power difference, a penod 36
`factors there are other factors such as electrical noise
`and vibration noise. Therefore, it is very difficult to f1Dd 45 shown in FIG. 8 becomes an erroneously detected per-
`a microphone arrangement which assure a stable SIN
`iod. When the above speech period is discarded, the
`ratio difference in an atmosphere where these various
`correct speech period is recognized as a non-speech
`factors for changing the SIN ratios are present.
`period. In either case, an erroneous discrimination re-
`suit is obtained.
`As described above, the second conventional method
`has the above decisive drawback and cannot be effec- so
`The third conventional method, therefore, cannot
`tively utilized in practical applications.
`serve as a means for solving the drawback of the second
`The third conventional method for overcoming this
`conventional method.
`drawback of the second conventional method will be
`Various problems are present in the conventional
`described with reference to FIG. 7. Referring to FIG. 7,
`speech period detection methods. It is therefore difficult
`reference numeral 1 denotes a first microphone; 2, a 55 to correctly detect a speech period when unstationary
`second microphone; 21, a short time power calculation
`noise is present in an input signal.
`unit; 22, a speech period candidate detection unit; 23
`SUMMARY OF THE INVENTION
`and 24, average power calculation units for speech
`It is therefore a principal object of the present inven-
`period candidates; 25, a power difference detection unit;
`60 tion to provide a method of detecting an acoustic signal,
`and 26, a speech period candidate testing unit.
`capable of detecting an speech period in an atmosphere
`According to this method, as in the second conven(cid:173)
`of unstationary noise with higher precision than a con(cid:173)
`tional method, the first microphone is located such that
`ventional technique.
`a ratio of speech to ambient noise is large, whereas the
`second microphone is located such that an SIN ratio is
`It is another object of the present invention to pro(cid:173)
`vide a method of detecting an acoustic signal, capable of
`smaller than that of the first microphone. According to 65
`this method, a short time power of an output signal from
`detecting a speech period with high precision even if a
`the first microphone 1 is calculated by the short time
`noise source is present at an arbitrary position except for
`a position near a speaker ( + 30° range when the speaker
`power calculation unit 21. The short time power of the
`
`Page 15 of 24
`
`
`
`5,208,864
`
`15
`
`45
`
`55
`
`5
`is viewed from the microphone), and even if the speaker
`moves within an expected range.
`In order to achieve the above objects of the present
`invention, the following requirements are indispensable.
`That is, in order to correctly detect a speech period by 5
`using a power difference between two signals, the fol(cid:173)
`lowing three conditions must be satisfied:
`Condition I: A,n SIN ratio difference in two signals
`must be present.
`Condition 2: Noise and speech periods of the two 10
`signals must be matched with each other as a function of
`time.
`Condition 3: A variation in SIN ratio difference
`caused by various factors is small (stability of the SIN
`ratio difference).
`According to the first feature of the present inven(cid:173)
`tion, in order to satisfy both the first and second condi(cid:173)
`tions, two sound receiving units for generating signals
`having different SIN ratios are located at a single posi(cid:173)
`tion (strictly speaking, this single position can be posi- 20
`tions which can be deemed to be a single position to
`effectively operate the present invention), and a speech
`period is detected by using a power difference between
`the two output signals. According to the second feature
`of the present invention, one of the two sound receiving
`units comprises a microphone array system having a
`directivity control function to satisfy the third condi(cid:173)
`tion.
`According to the first feature of the present inven- 30
`tion, since noise and speech reach both the sound re(cid:173)
`ceiving units at the identical time, the noise and speech
`periods of an output from one sound receiving unit are
`matched with those from the other sound receiving unit
`as a function of time, thus satisfying the second condi- 35
`tion and solving the first problem of the second conven(cid:173)
`tional method.
`When the two sound receiving units are located at the
`single position, the time structures of the echoes added
`to the signals are equal to each other. Therefore, the 40
`influence of the echoes which causes variations in SIN
`ratio difference between the two sound receiving unit
`outputs, as pointed as the second problem of the second
`conventional method, can be greatly reduced by the
`first feature of the present invention.
`According to the second feature of the present inven(cid:173)
`tion, variations in SIN ratio difference between the two
`sound receiving unit outputs caused by the position of
`the noise source and movement of the speaker, as
`pointed out as the second problem of the second con- so
`ventional problem, can be decreased. This will be de(cid:173)
`scribed in detail later.
`The present invention will be described in detail with
`reference to preferred embodiments in conjunction with
`the accompanying drawings.
`
`25
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`FIG. 1 is a chart showing the first conventional
`speech period detecting method;
`FIGS. 2(a) and 2(b) are views showing microphone 60
`arrangements for explaining the second conventional
`speech period detecting method;
`FIGS. 3(a), 3(b), and 3(c) are charts for explaining an
`ideal operation of the second conventional method;
`FIG. 4 is a view showing a positional relationship 65
`between microphones and a noise source;
`FIGS. 5(a), 5(b), and 5(c) are charts for explaining
`problems of the second conventional method;
`
`6
`FIGS. 6(a) and 6(b) are views each showing a rela(cid:173)
`tionship between microphones and a noise source;
`FIG. 7 is a block diagram showing a third conven(cid:173)
`tional speech period detecting method;
`FIG. 8 is a chart for explaining a problem of the third
`conventional method described in FIG. 7;
`FIG. 9 is a block diagram for explaining an embodi(cid:173)
`ment of a method of detecting an acoustic signal ac(cid:173)
`cording to the present invention;
`FIGS. lO(a) and lO(b) are views for explaining prob(cid:173)
`lems posed when unidirectional and omnidirectional
`microphones are used;
`FIG. 11 is a view for explaining a problem posed
`when a superdirectional sound receiving unit is used;
`FIG. 12 is a block diagram of a detailed arrangement
`of a first sound receiving unit shown in FIG. 9;
`FIG. 13 is a view showing directivity characteristics
`of an adaptive microphone array;
`FIGS. 14(a) and 14(b) are charts showing.waveforms
`of reception signals of impulsive noise with room ech(cid:173)
`oes when an omnidirectional microphone and an adapt(cid:173)
`ive microphone array are used;
`FIG. 15 is a block diagram showing a detailed ar(cid:173)
`rangement of the embodiment shown in FIG. 9;
`FIGS. 16(a), 16(b), and 16(c) are charts for explaining
`an operation of a speech period detection unit shown in
`FIG. 15;
`FIGS. 17(a), 17(b), 17(c), and 17(d) are charts show(cid:173)
`ing experimental results to confirm effectiveness of the
`present invention; and
`FIGS. 18, 19 and 20 are block diagrams showing
`other embodiments of the present invention.
`FIG. 21 is an alternative, yet equivalent, illustration
`of the diagram of FIG. 12.
`
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENTS
`An arrangement of the present invention is shown in
`FIG. 9. Referring to FIG. 9, reference numeral 41 de(cid:173)
`notes a first sound receiving unit (i.e., a microphone
`array system) for outputting a signal having a high SIN
`ratio. The first sound receiving unit 41 comprises a
`microphone array 51 consisting of a plurality of micro(cid:173)
`phone elements and a directivity controller 52. Refer(cid:173)
`ence numeral 42 denotes a ~ond sound receiving unit
`for outputting a signal having an SIN ratio lower than
`that of the output from the first sound receiving unit 41.
`These two sound receiving units 41 and 42 are located
`at the same position. Reference numerals 43 and 44
`denote short time power calculation units; and 45, a
`speech period detection unit based on a short time
`power difference.
`In order to describe the effectiveness of the micro(cid:173)
`phone array system in the present invention, assume
`that a unidirectional microphone is used as the first
`sound receiving unit 41 in place of the microphone
`array system, and that an omnidirectional microphone is
`used as the second sound receiving unit 42. With this
`arrangement, an SIN ratio of an output from the first
`sound receiving unit directed toward the speaker is
`larger than that of the output from the omnidirectional
`second sound receiving unit.
`The above method is not always operated well, as
`will be described with reference to FIGS. lO(a) and
`lO(b). Referring to FIGS. lO(a) and lO(b), reference
`numeral 61 denotes a directivity pattern of a unidirec(cid:173)
`tional microphone; and 62, a directivity pattern of an
`omnidirectional microphone. Reference numerals 3
`
`Page 16 of 24
`
`
`
`7
`denote speakers; and 63 and 64, positions of the noise
`sources. As shown in FIG. lO(a), the unidirectional
`microphone has a hlgh sensitivity in the speaker side
`and a low sensitivity in the opposite side. FIG. lO(b)
`shows the omnidirectional microphone has equal sensi- 5
`tivity levels in all directions. When the noise source is
`located at the position 63 in each of FIGS. lO(a) and
`lO(b), an SIN ratio of an output from the unidirectional
`microphone is larger than that of an output from the
`omnidirectional microphone. However, when the noise 10
`source is located at the position 64 (or moved to the
`position 64) in FIGS. lO(a) and lO(b), the sensitivity of
`the unidirectional microphone for noise is much in(cid:173)
`creased, and a difference between the SIN ratios of the
`outputs from the unidirectional and omnidirectional 15
`microphones becomes fairly small. In thls manner, by
`the method using the unidirectional microphone as the
`first sound receiving unit, the SIN ratios are greatly
`changed depending on the position of the noise source.
`The problem posed by use of the unidirectional mi- 20
`crophone may be solved by using a so-called "superdi(cid:173)
`rectional sound receiving unit" as the first sound receiv(cid:173)
`ing unit 41 of FIG. 9. However the directivity charac(cid:173)
`teristics of the "superdirectional sound receiving unit"
`generally vary depending on frequencies. The directiv- 25
`ity characteristics have almost omnidirectivity in a low(cid:173)
`frequency range and very sharp directivity as shown in
`FIG. 11 in a hlgh-frequency range. As a result, the SIN
`ratios are changed depending on the position of the
`noise source in the low-frequency range, and the SIN 30
`ratios are changed depending on slight movement of the
`speaker in the hlgh-frequency range.
`As described above, in order to obtain good speech
`period detection results, it is difficult to use a general(cid:173)
`purpose directional sound receiving unit as the first 35
`sound receiving unit 41 in the arrangement of the pres(cid:173)
`ent invention shown in FIG. 9.
`In the present invention using the microphone array
`system having a directivity control function, the varia(cid:173)
`tions in SIN ratio can be kept small for changes in noise
`source position and movement of the speaker. This will
`be described in detail below.
`A typical example of a microphone array system
`having a directivity control function is a sound receiv(cid:173)
`ing unit called an adaptive microphone array. An ar(cid:173)
`rangement of the adaptive microphone array is shown 45
`in FIG. 12. Referring to FIG. 12, reference numeral 51
`denotes a microphone array consisting of M micro(cid:173)
`phone elements 561 to 56M, and 52, a directivity control(cid:173)
`ler. The directivity controller 52 comprises filters 531 to
`53M respectively connected to microphone outputs, an 50
`adder 55 for adding filter outputs, and a filter controller
`54.
`The filter controller 54 receives each microphone
`output signal and an output x1 from the adder 55 and
`controls the characteristics of the filters 531 to 53M to 55
`reduce a noise component contained in the output x1.
`The principle of operation of the filter controller 54
`will be described below. The output signal x1 from the
`adder 55 can be expressed as a sum of a speech compo-
`nent s and a noise component n as follows:
`
`40
`
`60
`
`x1=s+n
`
`(I)
`
`When filter characteristics for minimizing a power n2 of
`the noise component are unconditionally obtained, all 65
`the filters 531 to 53M