`
`US007970144B1
`
`(12) (10) Patent No.: US 7,970,144 B1
`Avendano et al. 45) Date of Patent: Jun. 28, 2011
`(54) EXTRACTING AND MODIFYING A PANNED g,ggg, éé; : g; }ggg }?_?}Vis et 311
`5 s 1jima et al.
`SOURCE FOR ENHANCEMENT AND UPMIX 5,953,696 A 9/1999 Nishiguchi et al.
`OF AUDIO SIGNALS 6,011,851 A * 1/2000 Connoretal. ... 381/17
`6,021,386 A 2/2000 Davis et al.
`(75) Inventors: Carlos Avendano, Campbell, CA (US); 6,098,038 A 8/2000 Hermansky et al.
`Michael Goodwin, Scotts Valley, CA g,igg,zg; g} . 2%885 Ela}’nlllan 041205
`. . . 405, aroche .........cooeeiienns
`(US): Ramkumar Sridharan, Capitola, 6,430,528 Bl 82002 Jourjine et al.
`CA (US); Martin Wolters, Nuremberg 6449368 Bl 9/2002 Davis et al.
`(DE); Jean-Marc Jot, Aptos, CA (US) 6,473,733 B1 10/2002 McArthur et al.
`6,570,991 Bl 5/2003 Scheirer et al.
`(73) Assignee: Creative Technology Ltd, Singapore 6,766,028 B1* 7/2004 Dickens ........cccceo..... 381/310
`(SG) 6,792,118 B2 9/2004 Watts
`6,917,686 B2 7/2005 Jot et al.
`3k
`(*) Notice: Subject. to any disclaimer,. the term of this g:ggg:ggg E% ggggg g(})le'n" """""""""""""""" 381723
`patent is extended or adjusted under 35 7,006,636 B2 2/2006 Baumgarte et al.
`U.S.C. 154(b) by 963 days. 7,039,204 B2* 5/2006 Baumgarte ............. 381/119
`(Continued)
`(21) Appl. No.: 10/738,607
`) FOREIGN PATENT DOCUMENTS
`(22) Filed: Dec. 17,2003 WO WO/01/24577 4/2001
`(51) Imt.CL OTHER PUBLICATIONS
`HO4R 5/00 (2006.01) Carlos Avendano and Jean-Marc Jot: Ambience Extraction and Syn-
`(52) US.CL ............... 381/1;381/17; 381/61; 381/27, thesis from Stereo Signals for Multi-Channel Audio Up-Mix; vol.
`381/97 11 1957-1960: © 2002 IEEE.
`(58) Field of Classification Search ................ 381/119, ]
`381/61, 63, 99, 10, 17-18, 19, 1-2, 98, 103, (Continued)
`381/27, 303, 306-307, 309-310; 315/291 ] ]
`See application file for complete search history. Primary Examiner — Devona B Faulk
`Assistant Examiner — Disler Paul
`(56) References Cited
`
`U.S. PATENT DOCUMENTS
`
`3,697,692 A 10/1972 Hafler
`4,024,344 A 5/1977 Dolby et al.
`5,666,424 A 9/1997 Fosgate et al.
`5,671,287 A 9/1997 Gerzon
`5,872,851 A 2/1999 Petroff
`5,878,389 A 3/1999 Hermansky et al.
`5,886,276 A 3/1999 Levine et al.
`
`(57) ABSTRACT
`
`Modifying a panned source in an audio signal comprising a
`plurality of channel signals is disclosed. Portions associated
`with the panned source are identified in at least selected ones
`of the channel signals. The identified portions are modified
`based at least in part on a user input.
`
`31 Claims, 13 Drawing Sheets
`
`00
`q%
`
`18
`
`SI.(WI;k) ’M[e(m,k)] ] . ‘
`
`Sémk)
`{SL(M, K)’
`
`qze
`
`[m, k)
`
`ooz
`
`g;(:»,k)
`
`sg G M;k)
`
`6
`
`Exhibit 1022
`Page 01 of 23
`
`' M[QCM,k)]
`
`5c(m ,k)
`
`Samsung v. Zophonos
`IPR2026-00083
`Exhibit 1022
`
`
`
`
`
`
`
`
`
`US 7,970,144 B1
`Page 2
`
`U.S. PATENT DOCUMENTS
`
`7,076,071 B2 7/2006 Katz
`
`7,257,231 Bl 8/2007 Avendano et al.
`
`7,272,556 B1* 9/2007 Aguilaretal. ............... 704/230
`7,277,550 B1 10/2007 Avendano et al.
`
`7,353,169 Bl 4/2008 Goodwin et al.
`
`7,412,380 Bl 8/2008 Avendano et al.
`
`7,567,845 Bl 7/2009 Avendano et al.
`
`2002/0054685 Al*
`2002/0094795 Al
`2002/0136412 Al
`2002/0154783 Al
`2003/0026441 Al 2/2003 Faller
`
`2003/0174845 Al* 9/2003 Hagiwara ................ 381/17
`2003/0233158 Al* 12/2003 Aisoetal. ..o 700/94
`2004/0044525 Al 3/2004 Vinton et al.
`
`2004/0122662 Al 6/2004 Crockett
`
`5/2002 Avendano etal. .............. 381/66
`7/2002 Mitzlaff
`9/2002 Sugimoto
`
`10/2002 Fincham
`
`2004/0196988 Al* 10/2004 Mouliosetal. ............... 381/119
`2004/0212320 Al* 10/2004 Dowlingetal. .......... 315/291
`2007/0041592 Al 2/2007 Avendano et al.
`
`OTHER PUBLICATIONS
`
`Jean-Marc Jot and Carlos Avendano: Spatial Enhancement of Audio
`Recordings; AES 23™ International Conference, Copenhagen, Den-
`mark, May 23-25, 2003.
`
`Carlos Avendano: Frequency-Domain Source Identification and
`Manipulation in Stereo Mixes for Enhancement, Suppression and
`Re-Panning Applications; 2003 IEEE Workshop on Applications of
`Signed Processing to Audio and Acoustics; Oct. 19-22, 2003, New
`Paltz, NY.
`
`Eric Lindemann: Two Microphone Nonlinear Frequency Domain
`Beamformer for Hearing Aid Noise Reduction; Application of Signal
`Processing to Audio and Acoustics, Oct. 15-18, 1995, pp. 24-27. New
`Paltz, NY.
`
`U.S. Appl. No. 10/163,158, filed Jun. 4, 2002, Avendano et al.
`
`U.S. Appl. No. 10/163,168, filed Jun. 4, 2002, Avendano et al.
`Allen, et al, “Multimicrophone signal-processing technique to
`remove room reverberation from speech signals” J. Accoust. Soc.
`Am., vol. 62, No. 4, Oct. 1977, p. 912-915.
`
`Baumgarte, Frank , et al, “Estimation of Auditory Spatial Cues for
`Binaural Cue Coding”, IEEE Int’l. Conf. On Acoustics, Speech and
`Signal Processing, May 2000.
`
`Begault, Durand R., “3-D Sound for Virtual Reality and Multimedia”,
`A P Professional, p. 226-229.
`
`Blauert, Jens, “Spatial Hearing the Psychophysics of Human Sound
`Localization”, The MIT Press, pp. 238-257.
`
`Dressler, Roger, “Dolby Surround Pro Logic I Decoder Principles of
`Operation”, Dolby Laboratories, Inc., 100 Potrero Ave., San Fran-
`cisco, CA 94103.
`
`Faller, Christof, et al, “Binaural Cue Coding: A Novel and Efficient
`Representation of Spatial Audio”, IEEE Int’l. Conf. On Acoustics,
`Speech & Signal Processing, May 2002.
`
`Exhibit 1022
`Page 02 of 23
`
`Gerzon, Michael A., “Optimum Reproduction Matrices for
`Multispeaker Stereo”, J. Audio Eng. Soc. vol. 40, No. 78, Jul. Aug.
`1992.
`
`Holman, Tomlinson, “Mixing the Sound” Surround Magazine, p.
`35-37, Jun. 2001.
`
`Jot, Jean-Marc, et al, “A Comparative Study of 3-D Audio Encoding
`and Rendering Techniques”, AES 16th Int’l. Conf. On Spatial Sound
`Reproduction, Rovaniemi, Finland 1999.
`
`Kyriakakis, C., et al, “Virtual Microphone for Multichannel Audio
`Applications” In Proc. IEEE ICME 2000, vol. 1, pp. 11-14, Aug.
`2000.
`
`Miles, Michael T., “An Optimum Linear-Matrix Stereo Imaging Sys-
`tem.” AES 101 Convention, 1996, preprint 4364 (J-4).
`
`Pulkki, Ville, et al. “Localization of Amplitude-Panned Virtual
`Sources I: Stereophonic Panning”, J. Audio Eng. Soc., vol. 49, No. 9,
`Sep. 2002.
`
`Rumsey, Francis, “Controlled Subjective Assessments of Two-to-
`Five-Channel Surround Sound Processing Algorithms”, J. Audio
`Eng. Soc., vol. 47, No. 7/8, Jul./Aug. 1999.
`
`Schoeder, Manfred R., “An Artificial Stereophonic Effect Obtained
`from a Single Audio Signal”, Journal of the Audio Engineering
`Society, vol. 6, pp. 74-79, Apr. 1958.
`
`Jourjine et al., Blind Separation of Disjoint Orthogonal Signals:
`Demixing N Sources from 2 Mixtures, IEEE International Confer-
`ence on Acoustics, Speech and Signal Processing, vol. 5, pp. 2985-
`2988, Apr. 2000.
`
`Steven F. Boll. Suppression of Acoustic Noise in Speech Using
`Spectral Subtraction. IEEE Transactions on Acoustics, Speech and
`Signal Processing. Apr. 1979. pp. 113-120. vol. ASSP-27, No. 2.
`Bosi, Marina, et al., ISO/IEC MPEG-2 advanced audio coding, AES
`101, Los Angeles, Nov. 1996, J. Audio Eng. Soc., vol. 45, No. 10, Oct.
`1997.
`
`Duxbury, Chris, et al, “Separation of Transient Information in Musi-
`cal Audio Using Multiresolution Analysis Techniques”, Proceedings
`of the COST G-6 Conference on Digital Audio Effects (DAFX-01)
`Dec. 2001.
`
`Levine, Scott N., et al. “Improvements to the Switched Parametric
`and Transform Audio Coder”, Proceedings of the IEEE Workshop on
`Applications of Signal Processing to Audio and Acoustics, Oct. 1999,
`pp. 43-46.
`
`Pan, Davis, “A Tutorial on MPEG/Audio Compression” IEEE
`MultiMedia, Summer 1995.
`
`Quatieri, T.F., et al, “Speech Enhancement Based on Auditory Spec-
`tral Change”, Proceedings of the IEEE Workshop on Applications of
`Signal Processing to Audio and Acoustics, Oct. 1999, pp. 43-46.
`Baumgarte et al., Estimation of Auditory Spatial Cues for Binaural
`Cue Coding, IEEE International Conference on Acoustics, Speech
`and Signal Processing, May 2002.
`
`* cited by examiner
`
`
`
`
`
`
`
`
`U.S. Patent Jun. 28, 2011 Sheet 1 of 13 US 7,970,144 B1
`
`1 1
`0.8
`05}
`06
`O,
`0.4
`0.2 | 0.5}
`0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
`Fie p Fie IR
`1 1
`0.8}
`0.5}
`06
`0.
`0.41
`0.2 0.5}
`0 N K s .
`0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
`Fie (e Fie 1D
`
`Exhibit 1022
`Page 03 of 23
`
`
`
`
`
`
`
`
`U.S. Patent Jun. 28, 2011 Sheet 2 of 13 US 7,970,144 B1
`
`Left
`channel
`Sil) _p E Su(mik)
`
`(7>
`
`\ 4 - sa(t)
`F(mk) M{e(m k)] e
`
`Right ' —
`channel T
`
`(7p]
`
`FIG. 2-
`
`Exhibit 1022
`Page 04 of 23
`
`
`
`
`
`
`
`
`U.S. Patent Jun. 28, 2011 Sheet 3 of 13 US 7,970,144 B1
`
`40 T - —
`30t -
`o 20F 1
`=2
`= 10} i
`2
`w 0 .
`10} !\.
`-20 y - 1
`
`05 05 1
`
`F ol
`
`FT6.3
`
`Exhibit 1022
`Page 05 of 23
`
`
`
`
`
`
`
`
`U.S. Patent Jun. 28, 2011 Sheet 4 of 13 US 7,970,144 B1
`
`TDENT)FY PoRtions OF A
`Recervel AuDlo SigmAL- — 407
`THAT ARE AZSoCiArTED WITHH
`
`A PANNED SowRce ofF INTEREST
`
`MODIFY THe PANNED SoLRCE
`IN ACCORDANCE WITH A -\,4-04.
`
`User_ [NPUT TO CRERTE A
`MoDIFIED AUlle SISNAL-
`
`}
`
`PRovIPE TRE MeDified auplo !{«\1403’;
`S\GNAL. AS OUTPUT J
`
`Frg. 4
`
`Exhibit 1022
`Page 06 of 23
`
`
`
`
`
`
`
`
`U.S. Patent Jun. 28, 2011 Sheet 5 of 13 US 7,970,144 B1
`
`500 ¢0b
`N
`S L( m,k)
`SO | So4
`R G [
`SL(M,K ?
`A
`SpcMK)
`S$O8
`TG 5
`Exhibit 1022
`
`Page 07 of 23
`
`
`
`
`
`
`
`
`U.S. Patent Jun. 28, 2011 Sheet 6 of 13 US 7,970,144 B1
`
`Loy )
`N
`S SL(m, k)
`o2 S0
`IR o
`S.(mK > K— Yu
`) [ (k) M];e—(m,k)] J
`N
`Spim,K)
`50%
`Y
`(
`TRANSIENT T’(m)
`ANALYSLS
`
`Fig b
`
`Exhibit 1022
`Page 08 of 23
`
`
`
`
`
`
`
`
`U.S. Patent Jun. 28, 2011 Sheet 7 of 13 US 7,970,144 B1
`
`ol
`
`[ (mk)
`
`Sfié’fl,k)
`
`Frq. 7
`
`Exhibit 1022
`Page 09 of 23
`
`
`
`
`
`
`
`
`U.S. Patent Jun. 28, 2011
`
`102
`
`S, (m, k) g
`
`F(’")k) —)
`
`W
`
`-
`
`ggCM,k)
`
`3072
`(
`
`TRANSIENT
`ANALYS IS
`
`Fia. §
`
`Exhibit 1022
`Page 10 of 23
`
`T(m)
`
`Sheet 8 of 13
`
`P
`
`US 7,970,144 B1
`
`360
`
`flc ("\)
`
`FAN
`S (.m,k)
`
`oA
`
`(
`
`GAIN
`DETERMINKTION
`
`T
`
`Gu
`
`
`
`
`
`
`
`
`U.S. Patent Jun. 28, 2011 Sheet 9 of 13 US 7,970,144 B1
`
`o0
`q}/
`| L‘Wl(p
`\ A 918
`s Silmk)
`{SL(VW,K)[
`5L(Wl;|<) M[GCm,k)] | ‘. ‘
`26
`r_‘\é, S;(m,k)
`Pz (k) 12 Ae'® |
`—
`14 3u
`ML8em k)] | \- | Selmk)
`Se(mk) .
`W6 qlo
`Fra. 9A
`Exhibit 1022
`
`Page 11 of 23
`
`
`
`
`
`
`
`
`U.S. Patent Jun. 28, 2011 Sheet 10 of 13 US 7,970,144 B1
`
`SLim, k)
`
`r
`o2 ,t P(mk)
`
`06 9o
`F1a 98
`Exhibit 1022
`
`Page 12 of 23
`
`
`
`
`
`
`
`
`U.S. Patent Jun. 28, 2011 Sheet 11 of 13 US 7,970,144 B1
`
`100¢
`' 1014
`Sedm k)
`T 7
`N\
`0k
`100 S}
`I,_J (_
`SL,Cm,k) INTERMEDITE D12~
`~7 }’ ! Mop\ FICATION
`- FacTor_
`\o0¥ loo 4
`(
`
`AN _
`r‘ /ka) \"7 M [6n, )] € —
`
`Sg M, K) | —
`
`Exhibit 1022
`Page 13 of 23
`
`
`
`
`
`
`
`
`U.S. Patent Jun. 28, 2011 Sheet 12 of 13 US 7,970,144 B1
`![\\oo
`( Ly O / -
`R -
`¢
`. —> 5 R
`C o) MobiFicaTIioN
`C—3 5 % AND
`P 2 UPMIX
`& .
`9
`[ R+l
`|
`( | _dz)/zc
`FLa. ||
`
`Exhibit 1022
`Page 14 of 23
`
`
`
`
`
`
`
`
`U.S. Patent Jun. 28, 2011 Sheet 13 of 13 US 7,970,144 B1
`
`)22©
`Ve
`
`— MAX gahancE— 1206
`
`22
`
`1204
`
`oy
`
`— 12,10
`e L MAX SUPPRETS
`
`VOCAL.
`
`Fila. 12
`
`Exhibit 1022
`Page 15 of 23
`
`
`
`
`
`
`
`
`US 7,970,144 B1
`
`1
`EXTRACTING AND MODIFYING A PANNED
`SOURCE FOR ENHANCEMENT AND UPMIX
`OF AUDIO SIGNALS
`
`INCORPORATION BY REFERENCE
`
`U.S. patent application Ser. No. 10/163,158, entitled
`Ambience Generation for Stereo Signals, filed Jun. 4, 2002,
`now U.S. Pat. No. 7,567,845 B1, is incorporated herein by
`reference for all purposes. U.S. patent application Ser. No.
`10/163,168, entitled Stream Segregation for Stereo Signals,
`filed Jun. 4, 2002, now U.S. Pat. No. 7,257,231, is incorpo-
`rated herein by reference for all purposes.
`
`U.S. patent application Ser. No. 10/738,361, entitled
`Ambience Extraction and Modification for Enhancement and
`Upmix of Audio Signals, filed Dec. 17, 2003, now U.S. Pat.
`No. 7,412,380, is incorporated herein by reference for all
`purposes.
`
`FIELD OF THE INVENTION
`
`The present invention relates generally to digital signal
`processing. More specifically, extracting and modifying a
`panned source for enhancement and upmix of audio signals is
`disclosed.
`
`BACKGROUND OF THE INVENTION
`
`Stereo recordings and other multichannel audio signals
`may comprise one or more components designed to give a
`listener the sense that a particular source of sound is posi-
`tioned at a particular location relative to the listener. For
`example, in the case of a stereo recording made in a studio, the
`recording engineer might mix the left and right signal so as to
`give the listener a sense that a particular source recorded in
`isolation of other sources is located at some angle off the axis
`between the left and right speakers. The term “panning” is
`often used to describe such techniques, and a source panned
`to aparticular location relative to a listener located at a certain
`spot equidistant from both the left and right speakers (and/or
`other or different speakers in the case of audio signals other
`than stereo signals) will be referred to herein as a “panned
`source”.
`
`A special case of a panned source is a source panned to the
`center. Vocal components of music recordings, for example,
`typically are center-panned, to give a listener a sense that the
`singer or speaker is located in the center of a virtual stage
`defined by the left and right speakers. Other sources might be
`panned to other locations to the left or right of center.
`
`The level of a panned source relative to the overall signal is
`determined in the case of a studio recording by a sound
`engineer and in the case of a live recording by such factors as
`the location of each source in relation to the microphones
`used to make the recording, the equipment used, the charac-
`teristics of the venue, etc. An individual listener, however,
`may prefer that a particular panned source have a level rela-
`tive to the rest of the audio signal that is different (higher or
`lower) than the level it has in the original audio signal. There-
`fore, there is a need for a way to allow a user to control the
`level of a panned source in an audio signal.
`
`As noted above, vocal components typically are panned to
`the center. However, other sources, e.g., percussion instru-
`ments, also typically may be panned to the center. A listener
`may wish to modify (e.g., enhance or suppress) a center-
`panned vocal component without modifying other center-
`panned sources at the same time. Therefore, there is a need for
`
`Exhibit 1022
`Page 16 of 23
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`
`away to isolate a center-panned vocal component from other
`sources, such as percussion instruments, that may be panned
`to the center.
`
`Finally, listeners with surround sound systems of various
`configurations (e.g., five speaker, seven speaker, etc.) may
`desire a way to “upmix” a received audio signal, if necessary,
`to make use of the full capabilities of their playback system.
`For example, a user may wish to generate an audio signal for
`aplayback channel by extracting a panned source from one or
`more channels of an input audio signal and providing the
`extracted component to the playback channel. A user might
`want to extract a center-panned vocal component, for
`example, and provide the vocal component as a generated
`signal for the center playback channel. Some users may wish
`to generate such a signal regardless of whether the received
`audio signal has a corresponding channel. In such embodi-
`ments, listeners further need a way to control the level of the
`panned source signal generated for such channels in accor-
`dance with their individual preferences.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The present invention will be readily understood by the
`following detailed description in conjunction with the accom-
`panying drawings, wherein like reference numerals designate
`like structural elements, and in which:
`
`FIG. 1A is a plot of this panning function as a function of
`the panning coefficient o in an embodiment in which p=1-a.
`
`FIG. 1B is a plot of this panning index as a function of . in
`an embodiment in which f=1-c.
`
`FIG. 1C is a plot of the panning function (m.k) as a
`function of in an embodiment in which f=(1-a?)"/2.
`
`FIG. 1D is a plot of the panning index in (5) as a function
`of o in an embodiment in which p=(1-a)'"2.
`
`FIG. 2 is a block diagram illustrating a system used in one
`embodiment to extract from a stereo signal a signal panned in
`a particular direction.
`
`FIG. 3 is a plot of the average energy from an energy
`histogram over a period of time as a function of I for the
`sample signal described above.
`
`FIG. 4 is a flow chart illustrating a process used in one
`embodiment to identify and modify a panned source in an
`audio signal.
`
`FIG. 5 is a block diagram of a system used in one embodi-
`ment to identify and modify a panned source in an audio
`signal.
`
`FIG. 6 is a block diagram of a system used in one embodi-
`ment to identify and modify a panned source in an audio
`signal, in which transient analysis has been incorporated.
`
`FIG. 7 is a block diagram of a system used in one embodi-
`ment to extract and modify a panned source.
`
`FIG. 8 is a block diagram of a system used in one embodi-
`ment to extract and modify a panned source, in which tran-
`sient analysis has been incorporated.
`
`FIG. 9A is a block diagram of an alternative system used in
`one embodiment to extract and modify a panned source.
`
`FIG. 9B illustrates an alternative and computationally
`more efficient approach for extracting the phase information
`in a system such as system 900 of FIG. 9A.
`
`FIG. 10 is a block diagram of a system used in one embodi-
`ment to extract and modify a panned source using a simplified
`implementation of the approach used in the system 900 of
`FIG. 9A.
`
`FIG. 11 is a block diagram of a system used in one embodi-
`ment to extract and modify a panned source for enhancement
`of'a multichannel audio signal.
`
`
`
`
`
`
`
`
`US 7,970,144 B1
`
`3
`
`FIG. 12 illustrates a user interface provided in one embodi-
`ment to enable a user to indicate a desired level of modifica-
`tion of a panned source.
`
`DETAILED DESCRIPTION
`
`It should be appreciated that the present invention can be
`implemented in numerous ways, including as a process, an
`apparatus, a system, or a computer readable medium such as
`a computer readable storage medium or a computer network
`wherein program instructions are sent over optical or elec-
`tronic communication links. It should be noted that the order
`of' the steps of disclosed processes may be altered within the
`scope of the invention.
`
`A detailed description of one or more preferred embodi-
`ments of the invention is provided below along with accom-
`panying figures that illustrate by way of example the prin-
`ciples of the invention. While the invention is described in
`connection with such embodiments, it should be understood
`that the invention is not limited to any embodiment. On the
`contrary, the scope of the invention is limited only by the
`appended claims and the invention encompasses numerous
`alternatives, modifications and equivalents. For the purpose
`of example, numerous specific details are set forth in the
`following description in order to provide a thorough under-
`standing of the present invention. The present invention may
`be practiced according to the claims without some or all of
`these specific details. For the purpose of clarity, technical
`material that is known in the technical fields related to the
`invention has not been described in detail so that the present
`invention is not unnecessarily obscured.
`
`Extracting and modifying a panned source for enhance-
`ment and upmix of audio signals is disclosed. In one embodi-
`ment, a panned source is identified in an audio signal and
`portions of the audio signal associated with the panned source
`are modified, such as by enhancing or suppressing such por-
`tions relative to other portions of the signal. In one embodi-
`ment, a panned source is identified and extracted, and a user-
`controlled modification is applied to the panned source prior
`to routing the modified panned source as a generated signal
`for an appropriate channel of a multichannel playback sys-
`tem, such as a surround sound system. In one embodiment, a
`center-panned vocal component is distinguished from certain
`other sources that may also be panned to the center by incor-
`porating transient analysis. These and other embodiments are
`described more fully below.
`
`As used herein, the term “audio signal” comprises any set
`of audio data susceptible to being rendered via a playback
`system, including without limitation a signal received via a
`network or wireless communication, a live feed received in
`real-time from a local and/or remote location, and/or a signal
`generated by a playback system or component by reading
`data stored on a storage device, such as a sound recording
`stored on a compact disc, magnetic tape, flash or other
`memory device, or any type of media that may be used to store
`audio data, and may include without limitation a mono, ste-
`reo, or multichannel audio signal including any number of
`channel signals.
`
`1. Identifying and Extracting a Panned Source
`
`In this section we describe a metric used to compare two
`complementary channels of a multichannel audio signal, such
`as the left and right channels of a stereo signal. This metric
`allows us to estimate the panning coefficients, via a panning
`index, of the different sources in the stereo mix. Let us start by
`defining our signal model. We assume that the stereo record-
`
`Exhibit 1022
`Page 17 of 23
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`
`ing consists of multiple sources that are panned in amplitude.
`The stereo signal with N amplitude-panned sources can be
`written as
`
`S,(O=3B,5(#) and Sp(F)=Z,a,S,®), fori=1, . . ., N.. (1)
`
`(et et
`
`where o, are the panning coefficients and p, are factors
`derived from the panning coefficients. In one embodiment,
`B,=(1-c,,)"", which preserves the energy of each source. In
`one embodiment, f§,=1-c,. Since the time-domain signals
`corresponding to the sources overlap in amplitude, it is very
`difficult (if not impossible) to determine in the time domain
`which portions of the signal correspond to a given source, not
`to mention the difficulty in estimating the corresponding pan-
`ning coefficients. However, if we transform the signals using
`the short-time Fourier transform (STFT), we can look at the
`signals in different frequencies at different instants in time
`thus making the task of estimating the panning coefficients
`less difficult.
`
`In one embodiment, the left and right channel signals are
`compared in the STFT domain using an instantaneous corre-
`lation, or similarity measure. The proposed short-time simi-
`larity can be written as
`
`W, k)=21S; (m, k) Sp* (m, k)| [1S, (m, k) P+1Sg(m, k) 2] L, )
`
`we also define two partial similarity functions that will
`become useful later on:
`
`YL E)IS () Sg* (mK)Sy (m ) |2 (22)
`
`YR(m,K) =S E)S,* (m, ) |Sw(m ) I (2b)
`
`In other embodiments, other similarity functions may be
`used.
`
`The similarity in (2) has the following important proper-
`ties. If we assume that only one amplitude-panned source is
`present, then the function will have a value proportional to the
`panning coefficient at those time/frequency regions where the
`source has some energy, i.e.
`
`Y(m, k) =
`
`AaS(m, KBS (m, O[lSon, L +18Som, T, =2080" + g7
`
`Ifthe source is center-panned (a=f3), then the function will
`attain its maximum value of one, and if the source is panned
`completely to one side, the function will attain its minimum
`value of zero. In other words, the function is bounded. Given
`its properties, this function allows us to identify and separate
`time-frequency regions with similar panning coefficients. For
`example, by segregating time-frequency bins with a given
`similarity value we can generate a new short-time transform
`signal, which upon reconstruction will produce a time-do-
`main signal with an individual source (if only one source was
`panned in that location).
`
`FIG. 1A is a plot of this panning function as a function of
`the panning coefficient o in an embodiment in which p=1-a.
`Notice that given the quadratic dependence on ¢, the function
`P(m,k) is multi-valued and symmetrical about 0.5. That is, if
`a source is panned say at a=0.2, then the similarity function
`will have a value of y=0.47, but a source panned at a=0.8 will
`have the same similarity value.
`
`While this ambiguity might appear to be a disadvantage for
`source localization and segregation, it can easily be resolved
`using the difference between the partial similarity measures
`in (2). The difference is computed simply as
`
`D(m, k)= (m,f)=pr(m.k), 3
`
`
`
`
`
`
`
`
`US 7,970,144 B1
`
`5
`
`and we notice that time-frequency regions with positive val-
`ues of D(m,k) correspond to signals panned to the left (i.e.
`a<0.5), and negative values correspond to signals panned to
`the right (i.e. @>0.5). Regions with zero value correspond to
`non-overlapping regions of signals panned to the center. Thus
`we can define an ambiguity-resolving function as
`
`D'(m,k)=1 if D(m, k>0 @)
`
`and
`
`D'(m, k=1 if D(m, F)<=0.
`
`Multiplying the quantity one minus the similarity function
`by D'(m,k) we obtain a new metric, referred to herein as a
`panning index, which is anti-symmetrical and still bounded
`but whose values now vary from one to minus one as a
`function of the panning coefficient, i.e.
`
`T(m k) =[1-y(m,})]D'(m, k), ®
`
`FIG. 1B is a plot of this panning index as a function of o in
`an embodiment in which pl-a. FIG. 1C is a plot of the
`panning function y(m,k) as a function of @ in an embodiment
`in which f=(1-a?)""2. FIG. 1D is a plot of the panning index
`in (5) as a function of a in an embodiment in which p=(1-
`0(‘2)1/2.
`
`In the following sections we describe the application of the
`short-time similarity and panning index to upmix, unmix, and
`source identification (localization). Notice that given a pan-
`ning index we can obtain the corresponding panning coeffi-
`cient given the one-to-one correspondence of the functions.
`
`The above concepts and equations are applied in one
`embodiment to extract one or more audio streams comprising
`apanned source from a two-channel signal by selecting direc-
`tions in the stereo image. As we discussed above, the panning
`index in (5) can be used to estimate the panning coefficient of
`an amplitude-panned signal. If multiple panned signals are
`present in the mix and if we assume that the signals do not
`overlap significantly in the time-frequency domain, then the
`panning index I'(m.k) will have different values in different
`time-frequency regions corresponding to the panning coeffi-
`cients of the signals that dominate those regions. Thus, the
`signals can be separated by grouping the time-frequency
`regions where I'(m,k) has a given value and using these
`regions to synthesize time-domain signals.
`
`FIG. 2 is a block diagram illustrating a system used in one
`embodiment to extract from a stereo signal a signal panned in
`a particular direction. For example, in one embodiment to
`extract the center-panned signal(s) we find all time-frequency
`regions for which the panning index I'(m k) is zero and define
`a function ®(m,k) that is one for all I'(m k)=0, and zero (or, in
`one embodiment, a small non-zero number, to avoid artifacts)
`otherwise. In one variation on this approach, we find all
`time-frequency regions for which the panning index I'(m,k)
`falls within a window centered on zero (e.g., all regions for
`which —e=I"(m,k)=¢) and define a function ®(m,k) that is
`one for all regions having a panning index that falls in the
`window and zero (or, in one embodiment, a small non-zero
`number, to avoid artifacts) otherwise. In some alternative
`embodiments, the value of the function ®(m,k) is one for all
`regions having a panning index equal to zero and a value less
`than and greater than or equal to zero for regions having a
`panning index that falls within the window, depending on the
`value, such that for panning index values close to zero (or the
`non-zero center of the window, for a window not centered on
`zero) the value of ©(m,k) is close to one and for panning index
`values at the edges of the window (e.g., .I'(m,k)=¢€ or —€) the
`value of ®(m,k) is close to zero. We can then synthesize a
`
`Exhibit 1022
`Page 18 of 23
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`
`time-domain function by multiplying S;(m,k) and Sz(m,k)
`by a modification function M[®(m,k)] and applying the
`ISTFT. In one embodiment, the value of the modification
`function M[O(m,k)] is the same as the value of the function
`O(m,k). In one alternative embodiment, the value of the
`modification function M[®(m k)] is not the same as the value
`of the function ®(m.k) but is determined by the value of the
`function @(m,k). The same procedure can be applied to sig-
`nals panned to other directions, with the function ®(m,k)
`being defined to equal one when I'(m.k) is equal to the pan-
`ning index value associated with the panned source (or a
`window centered on or otherwise comprising the panning
`index value associated with the source), and zero (or a small
`number) for all other values of T'(m,k). In one embodiment in
`which the function ®(m,k) is defined to equal one when
`T'(m,k) is a panning index value that falls within a window of
`panning index values associated with the source, a user inter-
`face is provided to enable a user to provide an input to define
`the size of the window, such as by indicating the value of the
`window size variable € in the inequality —e=I'(m.k)=e.
`
`In some embodiments, the width of the panning index
`window is determined based on the desired trade-off between
`separation and distortion (a wider window will produce
`smoother transitions but will allow signal components
`panned near zero to pass).
`
`To illustrate the operation of the un-mixing algorithm we
`performed the following simulation. We generated a stereo
`mix by amplitude-panning three sources, a speech signal
`S, (1), an acoustic guitar S,(t) and a trumpet S;(t) with the
`following weights:
`
`Sy (H)=0.55,(1)+0.75,(1)+0.155(2) and Sp(£)=0.55,(£)+
`0.355(2)+0.955(9).
`
`We applied a window centered at I'=0 to extract the center-
`panned signal, in this case the speech signal, and two win-
`dows at I'==0.8 and I'=0.27 (corresponding to ¢=0.1 and
`a=0.3) to extract the horn and guitar signals respectively. In
`this case we know the panning coefficients of the signals that
`we wish to separate. This scenario corresponds to applica-
`tions where we wish to extract or separate a signal at a given
`location.
`
`We now describe a method for identifying amplitude-
`panned sources in a stereo mix. In one embodiment, the
`process is to compute the short-time panning index I'(m,k)
`and produce an energy histogram by integrating the energy in
`time-frequency regions with the same (or similar) panning
`index value. This can be done in running time to detect the
`presence of a panned signal at a given time interval, or as an
`average over the duration of the signal. FIG. 3 is a plot of the
`average energy from an energy histogram over a period of
`time as a function of T for the sample signal described above.
`The histogram was computed by integrating the energy in
`both stereo signals for each panning index value from -1 to 1
`in 0.01 increments. Notice how the plot shows three very
`strong peaks at panning index values of I'==0.8, 0 and 0.275,
`which correspond to values of ¢=0.1,0.5 and 0.7 respectively.
`
`Once the prominent sources are identified automatically
`from the peaks in the energy histogram, the techniques
`described above can be used extract and synthesize signals
`that consist primarily of the prominent sources, or if desired
`to extract and synthesize a particular source of interest.
`
`2. Identification and Modification of a Panned Source
`
`In the preceding section, we describe how a prominent
`panned source may be identified and segregated. In this sec-
`tion, we disclose applying the techniques described above to
`selectively modify portions of an audio signal associated with
`a panned source of interest.
`
`
`
`
`
`
`
`
`US 7,970,144 B1
`
`7
`
`FIG. 4 is a flow chart illustrating a process used in one
`embodiment to identity and modify a panned source in an
`audio signal. The process begins in step 402, in which por-
`tions of the audio signal that are associated with a panned
`source of interest are identified. In one embodiment, the
`energy histogram approach described above in connection
`with FIG. 3 may be used to identify a panned source of
`interest. In one embodiment, the panning index (or coeffi-
`cient) of the panned source of interest may be known, deter-
`mined, or estimated based on knowledge regarding the audio
`signal and how it was created. For example, in one embodi-
`ment it may be assume that a featured vocal component has
`been panned to the center.
`
`In step 404, the portions of the audio signal associated with
`the panned source are modified in accordance with a user
`input to create a modified audio signal. In one embodiment,
`the modification performed in step 404 is determined not by
`auser input but instead by one or more settings established in
`advance, such as by a sound designer. In one embodiment, the
`modified audio signal comprises a channel of an input audio
`signal in which portions associated with the panned source
`have been modified, e.g., enhanced or suppressed. The modi-
`fied audio signal is provided as output in step 406.
`
`FIG. 5 is a block diagram of a system used in one embodi-
`ment to identify and m



