throbber
United States Patent
`
`US007970144B1
`
`(12) (10) Patent No.: US 7,970,144 B1
`Avendano et al. 45) Date of Patent: Jun. 28, 2011
`(54) EXTRACTING AND MODIFYING A PANNED g,ggg, éé; : g; }ggg }?_?}Vis et 311
`5 s 1jima et al.
`SOURCE FOR ENHANCEMENT AND UPMIX 5,953,696 A 9/1999 Nishiguchi et al.
`OF AUDIO SIGNALS 6,011,851 A * 1/2000 Connoretal. ... 381/17
`6,021,386 A 2/2000 Davis et al.
`(75) Inventors: Carlos Avendano, Campbell, CA (US); 6,098,038 A 8/2000 Hermansky et al.
`Michael Goodwin, Scotts Valley, CA g,igg,zg; g} . 2%885 Ela}’nlllan 041205
`. . . 405, aroche .........cooeeiienns
`(US): Ramkumar Sridharan, Capitola, 6,430,528 Bl 82002 Jourjine et al.
`CA (US); Martin Wolters, Nuremberg 6449368 Bl 9/2002 Davis et al.
`(DE); Jean-Marc Jot, Aptos, CA (US) 6,473,733 B1 10/2002 McArthur et al.
`6,570,991 Bl 5/2003 Scheirer et al.
`(73) Assignee: Creative Technology Ltd, Singapore 6,766,028 B1* 7/2004 Dickens ........cccceo..... 381/310
`(SG) 6,792,118 B2 9/2004 Watts
`6,917,686 B2 7/2005 Jot et al.
`3k
`(*) Notice: Subject. to any disclaimer,. the term of this g:ggg:ggg E% ggggg g(})le'n" """""""""""""""" 381723
`patent is extended or adjusted under 35 7,006,636 B2 2/2006 Baumgarte et al.
`U.S.C. 154(b) by 963 days. 7,039,204 B2* 5/2006 Baumgarte ............. 381/119
`(Continued)
`(21) Appl. No.: 10/738,607
`) FOREIGN PATENT DOCUMENTS
`(22) Filed: Dec. 17,2003 WO WO/01/24577 4/2001
`(51) Imt.CL OTHER PUBLICATIONS
`HO4R 5/00 (2006.01) Carlos Avendano and Jean-Marc Jot: Ambience Extraction and Syn-
`(52) US.CL ............... 381/1;381/17; 381/61; 381/27, thesis from Stereo Signals for Multi-Channel Audio Up-Mix; vol.
`381/97 11 1957-1960: © 2002 IEEE.
`(58) Field of Classification Search ................ 381/119, ]
`381/61, 63, 99, 10, 17-18, 19, 1-2, 98, 103, (Continued)
`381/27, 303, 306-307, 309-310; 315/291 ] ]
`See application file for complete search history. Primary Examiner — Devona B Faulk
`Assistant Examiner — Disler Paul
`(56) References Cited
`
`U.S. PATENT DOCUMENTS
`
`3,697,692 A 10/1972 Hafler
`4,024,344 A 5/1977 Dolby et al.
`5,666,424 A 9/1997 Fosgate et al.
`5,671,287 A 9/1997 Gerzon
`5,872,851 A 2/1999 Petroff
`5,878,389 A 3/1999 Hermansky et al.
`5,886,276 A 3/1999 Levine et al.
`
`(57) ABSTRACT
`
`Modifying a panned source in an audio signal comprising a
`plurality of channel signals is disclosed. Portions associated
`with the panned source are identified in at least selected ones
`of the channel signals. The identified portions are modified
`based at least in part on a user input.
`
`31 Claims, 13 Drawing Sheets
`
`00
`q%
`
`18
`
`SI.(WI;k) ’M[e(m,k)] ] . ‘
`
`Sémk)
`{SL(M, K)’
`
`qze
`
`[m, k)
`
`ooz
`
`g;(:»,k)
`
`sg G M;k)
`
`6
`
`Exhibit 1022
`Page 01 of 23
`
`' M[QCM,k)]
`
`5c(m ,k)
`
`Samsung v. Zophonos
`IPR2026-00083
`Exhibit 1022
`
`
`
`
`
`
`
`
`
`US 7,970,144 B1
`Page 2
`
`U.S. PATENT DOCUMENTS
`
`7,076,071 B2 7/2006 Katz
`
`7,257,231 Bl 8/2007 Avendano et al.
`
`7,272,556 B1* 9/2007 Aguilaretal. ............... 704/230
`7,277,550 B1 10/2007 Avendano et al.
`
`7,353,169 Bl 4/2008 Goodwin et al.
`
`7,412,380 Bl 8/2008 Avendano et al.
`
`7,567,845 Bl 7/2009 Avendano et al.
`
`2002/0054685 Al*
`2002/0094795 Al
`2002/0136412 Al
`2002/0154783 Al
`2003/0026441 Al 2/2003 Faller
`
`2003/0174845 Al* 9/2003 Hagiwara ................ 381/17
`2003/0233158 Al* 12/2003 Aisoetal. ..o 700/94
`2004/0044525 Al 3/2004 Vinton et al.
`
`2004/0122662 Al 6/2004 Crockett
`
`5/2002 Avendano etal. .............. 381/66
`7/2002 Mitzlaff
`9/2002 Sugimoto
`
`10/2002 Fincham
`
`2004/0196988 Al* 10/2004 Mouliosetal. ............... 381/119
`2004/0212320 Al* 10/2004 Dowlingetal. .......... 315/291
`2007/0041592 Al 2/2007 Avendano et al.
`
`OTHER PUBLICATIONS
`
`Jean-Marc Jot and Carlos Avendano: Spatial Enhancement of Audio
`Recordings; AES 23™ International Conference, Copenhagen, Den-
`mark, May 23-25, 2003.
`
`Carlos Avendano: Frequency-Domain Source Identification and
`Manipulation in Stereo Mixes for Enhancement, Suppression and
`Re-Panning Applications; 2003 IEEE Workshop on Applications of
`Signed Processing to Audio and Acoustics; Oct. 19-22, 2003, New
`Paltz, NY.
`
`Eric Lindemann: Two Microphone Nonlinear Frequency Domain
`Beamformer for Hearing Aid Noise Reduction; Application of Signal
`Processing to Audio and Acoustics, Oct. 15-18, 1995, pp. 24-27. New
`Paltz, NY.
`
`U.S. Appl. No. 10/163,158, filed Jun. 4, 2002, Avendano et al.
`
`U.S. Appl. No. 10/163,168, filed Jun. 4, 2002, Avendano et al.
`Allen, et al, “Multimicrophone signal-processing technique to
`remove room reverberation from speech signals” J. Accoust. Soc.
`Am., vol. 62, No. 4, Oct. 1977, p. 912-915.
`
`Baumgarte, Frank , et al, “Estimation of Auditory Spatial Cues for
`Binaural Cue Coding”, IEEE Int’l. Conf. On Acoustics, Speech and
`Signal Processing, May 2000.
`
`Begault, Durand R., “3-D Sound for Virtual Reality and Multimedia”,
`A P Professional, p. 226-229.
`
`Blauert, Jens, “Spatial Hearing the Psychophysics of Human Sound
`Localization”, The MIT Press, pp. 238-257.
`
`Dressler, Roger, “Dolby Surround Pro Logic I Decoder Principles of
`Operation”, Dolby Laboratories, Inc., 100 Potrero Ave., San Fran-
`cisco, CA 94103.
`
`Faller, Christof, et al, “Binaural Cue Coding: A Novel and Efficient
`Representation of Spatial Audio”, IEEE Int’l. Conf. On Acoustics,
`Speech & Signal Processing, May 2002.
`
`Exhibit 1022
`Page 02 of 23
`
`Gerzon, Michael A., “Optimum Reproduction Matrices for
`Multispeaker Stereo”, J. Audio Eng. Soc. vol. 40, No. 78, Jul. Aug.
`1992.
`
`Holman, Tomlinson, “Mixing the Sound” Surround Magazine, p.
`35-37, Jun. 2001.
`
`Jot, Jean-Marc, et al, “A Comparative Study of 3-D Audio Encoding
`and Rendering Techniques”, AES 16th Int’l. Conf. On Spatial Sound
`Reproduction, Rovaniemi, Finland 1999.
`
`Kyriakakis, C., et al, “Virtual Microphone for Multichannel Audio
`Applications” In Proc. IEEE ICME 2000, vol. 1, pp. 11-14, Aug.
`2000.
`
`Miles, Michael T., “An Optimum Linear-Matrix Stereo Imaging Sys-
`tem.” AES 101 Convention, 1996, preprint 4364 (J-4).
`
`Pulkki, Ville, et al. “Localization of Amplitude-Panned Virtual
`Sources I: Stereophonic Panning”, J. Audio Eng. Soc., vol. 49, No. 9,
`Sep. 2002.
`
`Rumsey, Francis, “Controlled Subjective Assessments of Two-to-
`Five-Channel Surround Sound Processing Algorithms”, J. Audio
`Eng. Soc., vol. 47, No. 7/8, Jul./Aug. 1999.
`
`Schoeder, Manfred R., “An Artificial Stereophonic Effect Obtained
`from a Single Audio Signal”, Journal of the Audio Engineering
`Society, vol. 6, pp. 74-79, Apr. 1958.
`
`Jourjine et al., Blind Separation of Disjoint Orthogonal Signals:
`Demixing N Sources from 2 Mixtures, IEEE International Confer-
`ence on Acoustics, Speech and Signal Processing, vol. 5, pp. 2985-
`2988, Apr. 2000.
`
`Steven F. Boll. Suppression of Acoustic Noise in Speech Using
`Spectral Subtraction. IEEE Transactions on Acoustics, Speech and
`Signal Processing. Apr. 1979. pp. 113-120. vol. ASSP-27, No. 2.
`Bosi, Marina, et al., ISO/IEC MPEG-2 advanced audio coding, AES
`101, Los Angeles, Nov. 1996, J. Audio Eng. Soc., vol. 45, No. 10, Oct.
`1997.
`
`Duxbury, Chris, et al, “Separation of Transient Information in Musi-
`cal Audio Using Multiresolution Analysis Techniques”, Proceedings
`of the COST G-6 Conference on Digital Audio Effects (DAFX-01)
`Dec. 2001.
`
`Levine, Scott N., et al. “Improvements to the Switched Parametric
`and Transform Audio Coder”, Proceedings of the IEEE Workshop on
`Applications of Signal Processing to Audio and Acoustics, Oct. 1999,
`pp. 43-46.
`
`Pan, Davis, “A Tutorial on MPEG/Audio Compression” IEEE
`MultiMedia, Summer 1995.
`
`Quatieri, T.F., et al, “Speech Enhancement Based on Auditory Spec-
`tral Change”, Proceedings of the IEEE Workshop on Applications of
`Signal Processing to Audio and Acoustics, Oct. 1999, pp. 43-46.
`Baumgarte et al., Estimation of Auditory Spatial Cues for Binaural
`Cue Coding, IEEE International Conference on Acoustics, Speech
`and Signal Processing, May 2002.
`
`* cited by examiner
`
`
`
`
`
`
`
`
`U.S. Patent Jun. 28, 2011 Sheet 1 of 13 US 7,970,144 B1
`
`1 1
`0.8
`05}
`06
`O,
`0.4
`0.2 | 0.5}
`0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
`Fie p Fie IR
`1 1
`0.8}
`0.5}
`06
`0.
`0.41
`0.2 0.5}
`0 N K s .
`0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
`Fie (e Fie 1D
`
`Exhibit 1022
`Page 03 of 23
`
`
`
`
`
`
`
`
`U.S. Patent Jun. 28, 2011 Sheet 2 of 13 US 7,970,144 B1
`
`Left
`channel
`Sil) _p E Su(mik)
`
`(7>
`
`\ 4 - sa(t)
`F(mk) M{e(m k)] e
`
`Right ' —
`channel T
`
`(7p]
`
`FIG. 2-
`
`Exhibit 1022
`Page 04 of 23
`
`
`
`
`
`
`
`
`U.S. Patent Jun. 28, 2011 Sheet 3 of 13 US 7,970,144 B1
`
`40 T - —
`30t -
`o 20F 1
`=2
`= 10} i
`2
`w 0 .
`10} !\.
`-20 y - 1
`
`05 05 1
`
`F ol
`
`FT6.3
`
`Exhibit 1022
`Page 05 of 23
`
`
`
`
`
`
`
`
`U.S. Patent Jun. 28, 2011 Sheet 4 of 13 US 7,970,144 B1
`
`TDENT)FY PoRtions OF A
`Recervel AuDlo SigmAL- — 407
`THAT ARE AZSoCiArTED WITHH
`
`A PANNED SowRce ofF INTEREST
`
`MODIFY THe PANNED SoLRCE
`IN ACCORDANCE WITH A -\,4-04.
`
`User_ [NPUT TO CRERTE A
`MoDIFIED AUlle SISNAL-
`
`}
`
`PRovIPE TRE MeDified auplo !{«\1403’;
`S\GNAL. AS OUTPUT J
`
`Frg. 4
`
`Exhibit 1022
`Page 06 of 23
`
`
`
`
`
`
`
`
`U.S. Patent Jun. 28, 2011 Sheet 5 of 13 US 7,970,144 B1
`
`500 ¢0b
`N
`S L( m,k)
`SO | So4
`R G [
`SL(M,K ?
`A
`SpcMK)
`S$O8
`TG 5
`Exhibit 1022
`
`Page 07 of 23
`
`
`
`
`
`
`
`
`U.S. Patent Jun. 28, 2011 Sheet 6 of 13 US 7,970,144 B1
`
`Loy )
`N
`S SL(m, k)
`o2 S0
`IR o
`S.(mK > K— Yu
`) [ (k) M];e—(m,k)] J
`N
`Spim,K)
`50%
`Y
`(
`TRANSIENT T’(m)
`ANALYSLS
`
`Fig b
`
`Exhibit 1022
`Page 08 of 23
`
`
`
`
`
`
`
`
`U.S. Patent Jun. 28, 2011 Sheet 7 of 13 US 7,970,144 B1
`
`ol
`
`[ (mk)
`
`Sfié’fl,k)
`
`Frq. 7
`
`Exhibit 1022
`Page 09 of 23
`
`
`
`
`
`
`
`
`U.S. Patent Jun. 28, 2011
`
`102
`
`S, (m, k) g
`
`F(’")k) —)
`
`W
`
`-
`
`ggCM,k)
`
`3072
`(
`
`TRANSIENT
`ANALYS IS
`
`Fia. §
`
`Exhibit 1022
`Page 10 of 23
`
`T(m)
`
`Sheet 8 of 13
`
`P
`
`US 7,970,144 B1
`
`360
`
`flc ("\)
`
`FAN
`S (.m,k)
`
`oA
`
`(
`
`GAIN
`DETERMINKTION
`
`T
`
`Gu
`
`
`
`
`
`
`
`
`U.S. Patent Jun. 28, 2011 Sheet 9 of 13 US 7,970,144 B1
`
`o0
`q}/
`| L‘Wl(p
`\ A 918
`s Silmk)
`{SL(VW,K)[
`5L(Wl;|<) M[GCm,k)] | ‘. ‘
`26
`r_‘\é, S;(m,k)
`Pz (k) 12 Ae'® |
`—
`14 3u
`ML8em k)] | \- | Selmk)
`Se(mk) .
`W6 qlo
`Fra. 9A
`Exhibit 1022
`
`Page 11 of 23
`
`
`
`
`
`
`
`
`U.S. Patent Jun. 28, 2011 Sheet 10 of 13 US 7,970,144 B1
`
`SLim, k)
`
`r
`o2 ,t P(mk)
`
`06 9o
`F1a 98
`Exhibit 1022
`
`Page 12 of 23
`
`
`
`
`
`
`
`
`U.S. Patent Jun. 28, 2011 Sheet 11 of 13 US 7,970,144 B1
`
`100¢
`' 1014
`Sedm k)
`T 7
`N\
`0k
`100 S}
`I,_J (_
`SL,Cm,k) INTERMEDITE D12~
`~7 }’ ! Mop\ FICATION
`- FacTor_
`\o0¥ loo 4
`(
`
`AN _
`r‘ /ka) \"7 M [6n, )] € —
`
`Sg M, K) | —
`
`Exhibit 1022
`Page 13 of 23
`
`
`
`
`
`
`
`
`U.S. Patent Jun. 28, 2011 Sheet 12 of 13 US 7,970,144 B1
`![\\oo
`( Ly O / -
`R -

`. —> 5 R
`C o) MobiFicaTIioN
`C—3 5 % AND
`P 2 UPMIX
`& .
`9
`[ R+l
`|
`( | _dz)/zc
`FLa. ||
`
`Exhibit 1022
`Page 14 of 23
`
`
`
`
`
`
`
`
`U.S. Patent Jun. 28, 2011 Sheet 13 of 13 US 7,970,144 B1
`
`)22©
`Ve
`
`— MAX gahancE— 1206
`
`22
`
`1204
`
`oy
`
`— 12,10
`e L MAX SUPPRETS
`
`VOCAL.
`
`Fila. 12
`
`Exhibit 1022
`Page 15 of 23
`
`
`
`
`
`
`
`
`US 7,970,144 B1
`
`1
`EXTRACTING AND MODIFYING A PANNED
`SOURCE FOR ENHANCEMENT AND UPMIX
`OF AUDIO SIGNALS
`
`INCORPORATION BY REFERENCE
`
`U.S. patent application Ser. No. 10/163,158, entitled
`Ambience Generation for Stereo Signals, filed Jun. 4, 2002,
`now U.S. Pat. No. 7,567,845 B1, is incorporated herein by
`reference for all purposes. U.S. patent application Ser. No.
`10/163,168, entitled Stream Segregation for Stereo Signals,
`filed Jun. 4, 2002, now U.S. Pat. No. 7,257,231, is incorpo-
`rated herein by reference for all purposes.
`
`U.S. patent application Ser. No. 10/738,361, entitled
`Ambience Extraction and Modification for Enhancement and
`Upmix of Audio Signals, filed Dec. 17, 2003, now U.S. Pat.
`No. 7,412,380, is incorporated herein by reference for all
`purposes.
`
`FIELD OF THE INVENTION
`
`The present invention relates generally to digital signal
`processing. More specifically, extracting and modifying a
`panned source for enhancement and upmix of audio signals is
`disclosed.
`
`BACKGROUND OF THE INVENTION
`
`Stereo recordings and other multichannel audio signals
`may comprise one or more components designed to give a
`listener the sense that a particular source of sound is posi-
`tioned at a particular location relative to the listener. For
`example, in the case of a stereo recording made in a studio, the
`recording engineer might mix the left and right signal so as to
`give the listener a sense that a particular source recorded in
`isolation of other sources is located at some angle off the axis
`between the left and right speakers. The term “panning” is
`often used to describe such techniques, and a source panned
`to aparticular location relative to a listener located at a certain
`spot equidistant from both the left and right speakers (and/or
`other or different speakers in the case of audio signals other
`than stereo signals) will be referred to herein as a “panned
`source”.
`
`A special case of a panned source is a source panned to the
`center. Vocal components of music recordings, for example,
`typically are center-panned, to give a listener a sense that the
`singer or speaker is located in the center of a virtual stage
`defined by the left and right speakers. Other sources might be
`panned to other locations to the left or right of center.
`
`The level of a panned source relative to the overall signal is
`determined in the case of a studio recording by a sound
`engineer and in the case of a live recording by such factors as
`the location of each source in relation to the microphones
`used to make the recording, the equipment used, the charac-
`teristics of the venue, etc. An individual listener, however,
`may prefer that a particular panned source have a level rela-
`tive to the rest of the audio signal that is different (higher or
`lower) than the level it has in the original audio signal. There-
`fore, there is a need for a way to allow a user to control the
`level of a panned source in an audio signal.
`
`As noted above, vocal components typically are panned to
`the center. However, other sources, e.g., percussion instru-
`ments, also typically may be panned to the center. A listener
`may wish to modify (e.g., enhance or suppress) a center-
`panned vocal component without modifying other center-
`panned sources at the same time. Therefore, there is a need for
`
`Exhibit 1022
`Page 16 of 23
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`
`away to isolate a center-panned vocal component from other
`sources, such as percussion instruments, that may be panned
`to the center.
`
`Finally, listeners with surround sound systems of various
`configurations (e.g., five speaker, seven speaker, etc.) may
`desire a way to “upmix” a received audio signal, if necessary,
`to make use of the full capabilities of their playback system.
`For example, a user may wish to generate an audio signal for
`aplayback channel by extracting a panned source from one or
`more channels of an input audio signal and providing the
`extracted component to the playback channel. A user might
`want to extract a center-panned vocal component, for
`example, and provide the vocal component as a generated
`signal for the center playback channel. Some users may wish
`to generate such a signal regardless of whether the received
`audio signal has a corresponding channel. In such embodi-
`ments, listeners further need a way to control the level of the
`panned source signal generated for such channels in accor-
`dance with their individual preferences.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The present invention will be readily understood by the
`following detailed description in conjunction with the accom-
`panying drawings, wherein like reference numerals designate
`like structural elements, and in which:
`
`FIG. 1A is a plot of this panning function as a function of
`the panning coefficient o in an embodiment in which p=1-a.
`
`FIG. 1B is a plot of this panning index as a function of . in
`an embodiment in which f=1-c.
`
`FIG. 1C is a plot of the panning function (m.k) as a
`function of in an embodiment in which f=(1-a?)"/2.
`
`FIG. 1D is a plot of the panning index in (5) as a function
`of o in an embodiment in which p=(1-a)'"2.
`
`FIG. 2 is a block diagram illustrating a system used in one
`embodiment to extract from a stereo signal a signal panned in
`a particular direction.
`
`FIG. 3 is a plot of the average energy from an energy
`histogram over a period of time as a function of I for the
`sample signal described above.
`
`FIG. 4 is a flow chart illustrating a process used in one
`embodiment to identify and modify a panned source in an
`audio signal.
`
`FIG. 5 is a block diagram of a system used in one embodi-
`ment to identify and modify a panned source in an audio
`signal.
`
`FIG. 6 is a block diagram of a system used in one embodi-
`ment to identify and modify a panned source in an audio
`signal, in which transient analysis has been incorporated.
`
`FIG. 7 is a block diagram of a system used in one embodi-
`ment to extract and modify a panned source.
`
`FIG. 8 is a block diagram of a system used in one embodi-
`ment to extract and modify a panned source, in which tran-
`sient analysis has been incorporated.
`
`FIG. 9A is a block diagram of an alternative system used in
`one embodiment to extract and modify a panned source.
`
`FIG. 9B illustrates an alternative and computationally
`more efficient approach for extracting the phase information
`in a system such as system 900 of FIG. 9A.
`
`FIG. 10 is a block diagram of a system used in one embodi-
`ment to extract and modify a panned source using a simplified
`implementation of the approach used in the system 900 of
`FIG. 9A.
`
`FIG. 11 is a block diagram of a system used in one embodi-
`ment to extract and modify a panned source for enhancement
`of'a multichannel audio signal.
`
`
`
`
`
`
`
`
`US 7,970,144 B1
`
`3
`
`FIG. 12 illustrates a user interface provided in one embodi-
`ment to enable a user to indicate a desired level of modifica-
`tion of a panned source.
`
`DETAILED DESCRIPTION
`
`It should be appreciated that the present invention can be
`implemented in numerous ways, including as a process, an
`apparatus, a system, or a computer readable medium such as
`a computer readable storage medium or a computer network
`wherein program instructions are sent over optical or elec-
`tronic communication links. It should be noted that the order
`of' the steps of disclosed processes may be altered within the
`scope of the invention.
`
`A detailed description of one or more preferred embodi-
`ments of the invention is provided below along with accom-
`panying figures that illustrate by way of example the prin-
`ciples of the invention. While the invention is described in
`connection with such embodiments, it should be understood
`that the invention is not limited to any embodiment. On the
`contrary, the scope of the invention is limited only by the
`appended claims and the invention encompasses numerous
`alternatives, modifications and equivalents. For the purpose
`of example, numerous specific details are set forth in the
`following description in order to provide a thorough under-
`standing of the present invention. The present invention may
`be practiced according to the claims without some or all of
`these specific details. For the purpose of clarity, technical
`material that is known in the technical fields related to the
`invention has not been described in detail so that the present
`invention is not unnecessarily obscured.
`
`Extracting and modifying a panned source for enhance-
`ment and upmix of audio signals is disclosed. In one embodi-
`ment, a panned source is identified in an audio signal and
`portions of the audio signal associated with the panned source
`are modified, such as by enhancing or suppressing such por-
`tions relative to other portions of the signal. In one embodi-
`ment, a panned source is identified and extracted, and a user-
`controlled modification is applied to the panned source prior
`to routing the modified panned source as a generated signal
`for an appropriate channel of a multichannel playback sys-
`tem, such as a surround sound system. In one embodiment, a
`center-panned vocal component is distinguished from certain
`other sources that may also be panned to the center by incor-
`porating transient analysis. These and other embodiments are
`described more fully below.
`
`As used herein, the term “audio signal” comprises any set
`of audio data susceptible to being rendered via a playback
`system, including without limitation a signal received via a
`network or wireless communication, a live feed received in
`real-time from a local and/or remote location, and/or a signal
`generated by a playback system or component by reading
`data stored on a storage device, such as a sound recording
`stored on a compact disc, magnetic tape, flash or other
`memory device, or any type of media that may be used to store
`audio data, and may include without limitation a mono, ste-
`reo, or multichannel audio signal including any number of
`channel signals.
`
`1. Identifying and Extracting a Panned Source
`
`In this section we describe a metric used to compare two
`complementary channels of a multichannel audio signal, such
`as the left and right channels of a stereo signal. This metric
`allows us to estimate the panning coefficients, via a panning
`index, of the different sources in the stereo mix. Let us start by
`defining our signal model. We assume that the stereo record-
`
`Exhibit 1022
`Page 17 of 23
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`
`ing consists of multiple sources that are panned in amplitude.
`The stereo signal with N amplitude-panned sources can be
`written as
`
`S,(O=3B,5(#) and Sp(F)=Z,a,S,®), fori=1, . . ., N.. (1)
`
`(et et
`
`where o, are the panning coefficients and p, are factors
`derived from the panning coefficients. In one embodiment,
`B,=(1-c,,)"", which preserves the energy of each source. In
`one embodiment, f§,=1-c,. Since the time-domain signals
`corresponding to the sources overlap in amplitude, it is very
`difficult (if not impossible) to determine in the time domain
`which portions of the signal correspond to a given source, not
`to mention the difficulty in estimating the corresponding pan-
`ning coefficients. However, if we transform the signals using
`the short-time Fourier transform (STFT), we can look at the
`signals in different frequencies at different instants in time
`thus making the task of estimating the panning coefficients
`less difficult.
`
`In one embodiment, the left and right channel signals are
`compared in the STFT domain using an instantaneous corre-
`lation, or similarity measure. The proposed short-time simi-
`larity can be written as
`
`W, k)=21S; (m, k) Sp* (m, k)| [1S, (m, k) P+1Sg(m, k) 2] L, )
`
`we also define two partial similarity functions that will
`become useful later on:
`
`YL E)IS () Sg* (mK)Sy (m ) |2 (22)
`
`YR(m,K) =S E)S,* (m, ) |Sw(m ) I (2b)
`
`In other embodiments, other similarity functions may be
`used.
`
`The similarity in (2) has the following important proper-
`ties. If we assume that only one amplitude-panned source is
`present, then the function will have a value proportional to the
`panning coefficient at those time/frequency regions where the
`source has some energy, i.e.
`
`Y(m, k) =
`
`AaS(m, KBS (m, O[lSon, L +18Som, T, =2080" + g7
`
`Ifthe source is center-panned (a=f3), then the function will
`attain its maximum value of one, and if the source is panned
`completely to one side, the function will attain its minimum
`value of zero. In other words, the function is bounded. Given
`its properties, this function allows us to identify and separate
`time-frequency regions with similar panning coefficients. For
`example, by segregating time-frequency bins with a given
`similarity value we can generate a new short-time transform
`signal, which upon reconstruction will produce a time-do-
`main signal with an individual source (if only one source was
`panned in that location).
`
`FIG. 1A is a plot of this panning function as a function of
`the panning coefficient o in an embodiment in which p=1-a.
`Notice that given the quadratic dependence on ¢, the function
`P(m,k) is multi-valued and symmetrical about 0.5. That is, if
`a source is panned say at a=0.2, then the similarity function
`will have a value of y=0.47, but a source panned at a=0.8 will
`have the same similarity value.
`
`While this ambiguity might appear to be a disadvantage for
`source localization and segregation, it can easily be resolved
`using the difference between the partial similarity measures
`in (2). The difference is computed simply as
`
`D(m, k)= (m,f)=pr(m.k), 3
`
`
`
`
`
`
`
`
`US 7,970,144 B1
`
`5
`
`and we notice that time-frequency regions with positive val-
`ues of D(m,k) correspond to signals panned to the left (i.e.
`a<0.5), and negative values correspond to signals panned to
`the right (i.e. @>0.5). Regions with zero value correspond to
`non-overlapping regions of signals panned to the center. Thus
`we can define an ambiguity-resolving function as
`
`D'(m,k)=1 if D(m, k>0 @)
`
`and
`
`D'(m, k=1 if D(m, F)<=0.
`
`Multiplying the quantity one minus the similarity function
`by D'(m,k) we obtain a new metric, referred to herein as a
`panning index, which is anti-symmetrical and still bounded
`but whose values now vary from one to minus one as a
`function of the panning coefficient, i.e.
`
`T(m k) =[1-y(m,})]D'(m, k), ®
`
`FIG. 1B is a plot of this panning index as a function of o in
`an embodiment in which pl-a. FIG. 1C is a plot of the
`panning function y(m,k) as a function of @ in an embodiment
`in which f=(1-a?)""2. FIG. 1D is a plot of the panning index
`in (5) as a function of a in an embodiment in which p=(1-
`0(‘2)1/2.
`
`In the following sections we describe the application of the
`short-time similarity and panning index to upmix, unmix, and
`source identification (localization). Notice that given a pan-
`ning index we can obtain the corresponding panning coeffi-
`cient given the one-to-one correspondence of the functions.
`
`The above concepts and equations are applied in one
`embodiment to extract one or more audio streams comprising
`apanned source from a two-channel signal by selecting direc-
`tions in the stereo image. As we discussed above, the panning
`index in (5) can be used to estimate the panning coefficient of
`an amplitude-panned signal. If multiple panned signals are
`present in the mix and if we assume that the signals do not
`overlap significantly in the time-frequency domain, then the
`panning index I'(m.k) will have different values in different
`time-frequency regions corresponding to the panning coeffi-
`cients of the signals that dominate those regions. Thus, the
`signals can be separated by grouping the time-frequency
`regions where I'(m,k) has a given value and using these
`regions to synthesize time-domain signals.
`
`FIG. 2 is a block diagram illustrating a system used in one
`embodiment to extract from a stereo signal a signal panned in
`a particular direction. For example, in one embodiment to
`extract the center-panned signal(s) we find all time-frequency
`regions for which the panning index I'(m k) is zero and define
`a function ®(m,k) that is one for all I'(m k)=0, and zero (or, in
`one embodiment, a small non-zero number, to avoid artifacts)
`otherwise. In one variation on this approach, we find all
`time-frequency regions for which the panning index I'(m,k)
`falls within a window centered on zero (e.g., all regions for
`which —e=I"(m,k)=¢) and define a function ®(m,k) that is
`one for all regions having a panning index that falls in the
`window and zero (or, in one embodiment, a small non-zero
`number, to avoid artifacts) otherwise. In some alternative
`embodiments, the value of the function ®(m,k) is one for all
`regions having a panning index equal to zero and a value less
`than and greater than or equal to zero for regions having a
`panning index that falls within the window, depending on the
`value, such that for panning index values close to zero (or the
`non-zero center of the window, for a window not centered on
`zero) the value of ©(m,k) is close to one and for panning index
`values at the edges of the window (e.g., .I'(m,k)=¢€ or —€) the
`value of ®(m,k) is close to zero. We can then synthesize a
`
`Exhibit 1022
`Page 18 of 23
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`
`time-domain function by multiplying S;(m,k) and Sz(m,k)
`by a modification function M[®(m,k)] and applying the
`ISTFT. In one embodiment, the value of the modification
`function M[O(m,k)] is the same as the value of the function
`O(m,k). In one alternative embodiment, the value of the
`modification function M[®(m k)] is not the same as the value
`of the function ®(m.k) but is determined by the value of the
`function @(m,k). The same procedure can be applied to sig-
`nals panned to other directions, with the function ®(m,k)
`being defined to equal one when I'(m.k) is equal to the pan-
`ning index value associated with the panned source (or a
`window centered on or otherwise comprising the panning
`index value associated with the source), and zero (or a small
`number) for all other values of T'(m,k). In one embodiment in
`which the function ®(m,k) is defined to equal one when
`T'(m,k) is a panning index value that falls within a window of
`panning index values associated with the source, a user inter-
`face is provided to enable a user to provide an input to define
`the size of the window, such as by indicating the value of the
`window size variable € in the inequality —e=I'(m.k)=e.
`
`In some embodiments, the width of the panning index
`window is determined based on the desired trade-off between
`separation and distortion (a wider window will produce
`smoother transitions but will allow signal components
`panned near zero to pass).
`
`To illustrate the operation of the un-mixing algorithm we
`performed the following simulation. We generated a stereo
`mix by amplitude-panning three sources, a speech signal
`S, (1), an acoustic guitar S,(t) and a trumpet S;(t) with the
`following weights:
`
`Sy (H)=0.55,(1)+0.75,(1)+0.155(2) and Sp(£)=0.55,(£)+
`0.355(2)+0.955(9).
`
`We applied a window centered at I'=0 to extract the center-
`panned signal, in this case the speech signal, and two win-
`dows at I'==0.8 and I'=0.27 (corresponding to ¢=0.1 and
`a=0.3) to extract the horn and guitar signals respectively. In
`this case we know the panning coefficients of the signals that
`we wish to separate. This scenario corresponds to applica-
`tions where we wish to extract or separate a signal at a given
`location.
`
`We now describe a method for identifying amplitude-
`panned sources in a stereo mix. In one embodiment, the
`process is to compute the short-time panning index I'(m,k)
`and produce an energy histogram by integrating the energy in
`time-frequency regions with the same (or similar) panning
`index value. This can be done in running time to detect the
`presence of a panned signal at a given time interval, or as an
`average over the duration of the signal. FIG. 3 is a plot of the
`average energy from an energy histogram over a period of
`time as a function of T for the sample signal described above.
`The histogram was computed by integrating the energy in
`both stereo signals for each panning index value from -1 to 1
`in 0.01 increments. Notice how the plot shows three very
`strong peaks at panning index values of I'==0.8, 0 and 0.275,
`which correspond to values of ¢=0.1,0.5 and 0.7 respectively.
`
`Once the prominent sources are identified automatically
`from the peaks in the energy histogram, the techniques
`described above can be used extract and synthesize signals
`that consist primarily of the prominent sources, or if desired
`to extract and synthesize a particular source of interest.
`
`2. Identification and Modification of a Panned Source
`
`In the preceding section, we describe how a prominent
`panned source may be identified and segregated. In this sec-
`tion, we disclose applying the techniques described above to
`selectively modify portions of an audio signal associated with
`a panned source of interest.
`
`
`
`
`
`
`
`
`US 7,970,144 B1
`
`7
`
`FIG. 4 is a flow chart illustrating a process used in one
`embodiment to identity and modify a panned source in an
`audio signal. The process begins in step 402, in which por-
`tions of the audio signal that are associated with a panned
`source of interest are identified. In one embodiment, the
`energy histogram approach described above in connection
`with FIG. 3 may be used to identify a panned source of
`interest. In one embodiment, the panning index (or coeffi-
`cient) of the panned source of interest may be known, deter-
`mined, or estimated based on knowledge regarding the audio
`signal and how it was created. For example, in one embodi-
`ment it may be assume that a featured vocal component has
`been panned to the center.
`
`In step 404, the portions of the audio signal associated with
`the panned source are modified in accordance with a user
`input to create a modified audio signal. In one embodiment,
`the modification performed in step 404 is determined not by
`auser input but instead by one or more settings established in
`advance, such as by a sound designer. In one embodiment, the
`modified audio signal comprises a channel of an input audio
`signal in which portions associated with the panned source
`have been modified, e.g., enhanced or suppressed. The modi-
`fied audio signal is provided as output in step 406.
`
`FIG. 5 is a block diagram of a system used in one embodi-
`ment to identify and m

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket