throbber
GOOGLE EXHIBIT 1010
`
`UUs
`aT
`IEEE
`
`sh)
`JULY-DEC
`Ne
`
`Page 1 of 30
`
`GOOGLE EXHIBIT 1010
`
`

`

`aSee
`
`AUTOMATED
`Utate
`DETTE
`Sw LAR CAROAEL
`+ Fingerprint Identity Authentication
`+ Fingerprint Features
`EyHLA CAREETO
`
`Papers on:
`
`_
`+ Speaker Recognition
`+ EvaluationofIdentification ~
`and Verification Systems
`Sy ZENCLO SYA
`
`1915 Classic Paper:
`“The Pure Electron Discharge
`andIt’s Applicitnin
`Radio Telegraphy aud Telephony”
`by Irving Langmuir
`
`Scanning the Past: Ralbh Bown &
`The Golden Age ofPropagation Research
`
`EI
`
`THE PROCEEDINGS
`
`fi
`
`» THE INSTITUTE OF ELECTRICAL AND ELECTRONICS eenase Nex
`
`SEPTEMBER1997
`

`)
`
`,
`:
`
`a
`\
`
`.
`\
`F
`
`’
`
`,
`
`y
`
`Page 2 of 30
`
`Page 2 of 30
`
`

`

`Iris Recognition: An Emerging Biometric Technology, R. P. Wildes
`1347 Prolog, W. Shen and R. Khanna
`1365—AnIdentity-Authentication System Using Fingerprints
`A. K. Jain, L. Hong, S. Pankanti, and R. Bolle
`1364 Prolog, W. Shen and R. Khanna
`1390_—_—~Fingerprint Features—Statistical Analysis and System Performance Estimates
`(Invited Paper), A. R. Roddy and J. D. Stosz
`:
`1389 Prolog, W. Shen and R. Khanna
`Face Recognition: Eigenface, Elastic Matching, and Neural Nets
`(Invited Paper), J. Zhang, Y. Yan, and M. Lades
`1422 Prolog, W. Shen and R. Khanna
`Speaker Recognition: A Tutorial (Invited Paper), J. P. Campbell, Jr.
`1436 Prolog, W. Shen and R. Khanna
`1464~~Evaluation of Automated Biometrics-Based Identification and Verification Systems
`W. Shen, M. Surette, and R. Khanna
`
`
`
`VEVIVIVEDEDEDEDEDDE)ssevemecnoerere1]0]001TTTTTUTITTTTTTTTTPTPTETETEETETTTT
`
`
`
`Prolog, W. Shen and R. Khanna
`1463
`Biometrics: Privacy’s Foe or Privacy’s Friend? J. D. Woodward
`1479
`Prolog, W. Shen, R. Khanna, and J. D. Woodward
`Comments on “The Pure Electron Discharge and Its Applications in Radio Telegraphy
`and Telephony” (Invited Paper), C. K. Birdsall
`1496—-The Pure Electron Discharge and Its Applications in Radio Telegraphy and Telephony
`(Classic Paper), I. Langmuir
`
`& IPIROCIEIEIDIINGS®TEEE
`
`
`
`published monthly by the Institute of Electrical and Electronics Engineers, Inc.
`September 1997 Vol. 85
`No. 9
`
`car
`
`SEP 23 1997
`
`en,
`GROUP AéGLE,)
`¥ah
`
`COPY
`RY oF GONGS=
`
`SPECIAL ISSUE ON
`AUTOMATED BIOMETRICS
`
`Edited by Weicheng Shen and Rajiv Khanna
`
`
`
`
`
`1343
`
`Scanning the Special Issue on Automated Biometrics, W. Shen and R. Khanna
`
`PAPERS
`
`1348
`
`1423
`
`1437
`
`1480
`
`1493
`
`BOOK REVIEWS
`
`1509
`
`The Balanced Scorecard: Translating Strategy into Action by R. S. Kaplan and
`D. P. Norton, Reviewed by R. C. Dorf and M. Rattanen
`
`SCANNING THE PAST
`
`1511
`1514.
`1516
`
`Ralph Bown and the Golden Age of Propagation Research, J. E. Brittain
`FUTURE SPECIAL ISSUES/SPECIAL SECTIONS OF THE PROCEEDINGS
`PROCEEDINGS CLASSIC PAPER REPRINT SCHEDULE
`
`Page 3 of 30
`
`Page 3 of 30
`
`

`

`PROCEEDINGS OF ‘THE IEEE
`1997 EDITORIAL BOARD
`Richard B. Fair, Editor
`James E. Brittain, Associate Editor, History
`
`Winser E. Alexander
`Roger Barr
`Albert Benveniste
`G. M. Borsuk
`Bimal K, Bose
`Lawrence Carin
`Giovanni De Micheli
`Per Enge
`E. K. Gannett
`Erol Gelenbe
`T. G, Giallorenzi
`J. D. Gibson
`Bijan Jabbari
`Dwight L. Jaggard
`Peter Kaiser
`
`Sung-Mo(Steve) Kang
`Murat Kunt
`Chen-Ching Liu
`Massimo Maresca
`K. W. Martin
`Yale N. Patt
`Theo Pavlidis
`P. B. Schneck
`Marwan Simaan
`L. M. Terman
`Fawwaz T. Ulaby
`Paul P. Wang
`H. R. Wittmann
`Francis T. S. Yu
`
`1997 IEEE PUBLICATIONS BOARD
`Friedolf Smits, Chair
`Tariq S. Durrani, Vice Chair
`
`IEEE STAFF
`Daniel J. Senese, Executive Director
`
`Frederick T. Andrews
`David Daut
`Kenneth Dawson
`Stephen L. Diamond
`Richard B. Fair
`Gregg Gibson
`Roger Hoyt
`W. Dexter Johnston, Jr.
`Marcel Keschner
`
`Deborah Flaherty Kizer
`Prasad Kodali
`Frank Lord
`William Middleton
`Robert T. Nash
`Charles Robinson
`Allan C. Schell
`Steven Unger
`George W. Zobrist
`
`STAFF EXECUTIVES
`Anthony J. Ferraro, Publications
`Richard D. Schwartz, Business
`Administration
`
`MANAGING DIRECTORS
`Donald Curtis, Human Resources
`Cecelia Jankowski, RegionalActivities
`Peter A Lewis, Educational
`Activities
`Andrew G. Salem, Standards Activities
`W. Thomas Suttle,
`Professional Activities
`John Witsken,
`Information Technology
`
`PUBLICATIONS DIRECTORS
`
`Kenneth Moore, [EEE Press
`Lewis Moore,
`Publications Administration
`Fran Zappulla,
`Staff Director, IEEE Periodicals
`
`PROCEEDINGS STAFF
`
`Jim Calder, Managing Editor
`Margery Scanlon,
`Editorial Coordinator
`Gail S. Ferenc, Transactions Manager
`Valerie Cammarata, Edttorial Manager
`Geraldine E. Krolin,
`Managing Editor, TRANSACTIONS/JOURNALS
`Tonya Ugoretz Buzby, Associate Editor
`
`Frank Caruthers, Jim Esch,
`Howard Falk, Richard A. O’Donnell.
`Kevin Self, George Likourezos
`Contributing Editors
`Stephen Goldberg, Cover Artist
`Susan Schneiderman, Richard C. Faust,
`Advertising Sales
`
`Manuscripis should be submitted in triplicate to the Editor at the IEEE Operations Center. A
`summary ofinsiruclions for preparation is found in the most recent January issue ofthis journal.
`Detailed instructions are contained in “Information for IEEE Transactions and Journals Authors,"
`available on request, After a manuscript has been accepted for publication, the author's organi-
`zation will be requested (o honor a charge of $110 per printed page (one-page minimum charge) to
`cover part of the publication cost. Responsibility for contents of papers rests upon the authors and
`not on the IEEE orits members.
`Copyright: [t is the policy of the [EEE to own the copyright to the technical contributionsit
`publishes on behalf of the interests of the [EEE.its authors, and their employers and to facilitale the
`appropriate reuse ofthis matcrial by others. To comply with the U.S. Copyright Law, authors are
`requested (o sign an IEEE copyright form before publication. This Form, a copy of which is found
`in the most recent January issue of the journal, returns to authors and their employers righis to
`reuse thei: material for their own purposes.
`
`
`
`
`
`HTTPDS2@etests[TTTPPEEee
`
`for the following month’s issue, Send new address, plus mailing label showing old address, to the
`PROCEEDINGS OF THE [EEE (ISSN 0018-9219; cudes IREPAD)is published monthly by the
`IEEE Operations Center. Member copies of the PROCEEDINGSare for personal use only.
`Institute of Electrical and Electronics Engineers, Inc.
`IEEE Corporate Office: 345 Eas! 47th
`Street, New York, NY J00|7-2394 USA. IEEE Operations Center: 445 Hoes Lane, P, O. Box
`1331, Piscataway, NJ 08855-1331 USA,
`NJ Telephone: 732-981-0060. Copyright and
`Advertising correspondence should be addressed to PROCEEDINGS Advertising Department, IEEE
`Operations Center, 445 Hoes Lane, Piscataway, NJ 08855-1331
`Reprint Permission:—Abstracting is permitted with credit to the source, Libraries are permitted
`to photocopy forprivale use of patrons, provided the per-copy fee indicated in the code at the
`bottomof the first page is paid through the Copyright Clearance Center, 222 Rosewood Drive,
`Danvers, MA 01923.
`Forall other copying,
`reprint, or republication permission, write to
`Copyrights and Permissions Department. [EEE Publications Administration, 445 Hoes Lane, P. O.
`Box 1331, Piscataway. NJ 08855-1331. Copyright © 1997 by the Institute of Electrical and
`Electronics Engineers. Inc.. All rights reserved. Periodicals Postage Paid at New York, NY and at
`additional mailing offices.
`Postmaster: Send address changes to PROCEEDINGS OF THE IEEE,
`IEEE. 445 Hoes Lane, P. O. Box 1331. Piscataway. NJ 08855-1331. GST Registration No.
`125634188. Printed in U.S.A
`
`Annual Subscription: Member and nonmemberprices available on request. Single copies: IEEE
`members $10.00 (first copy only), nonmembers $20,00 per copy, (Note: Add $4.00 for poslage and
`handling charge by any order from $1.00 to $50.00, including prepaid orders.) Other: Available in
`microfiche and microfilm. Changeof address must be received bythe first of a monthto be effective
`
`fax: 732-562-5456, email: j.calder@ieee.arg.)
`
`CONTRIBUTIONS
`
`The PROCEEDINGS OF THE IEEE publishes comprehensive, in-depth
`review, tutorial, and survey papers written for technically knowledgeable
`readers who are not necessarily specialists in the subjects being treated.
`The papers are of long-range interest and broad significance. Applications
`and technological issues, as well as theory, are emphasized. The topics
`include all aspects of electrical and computer engineering and science.
`From time to time, papers on managerial, historical, enconomic, and
`ethical aspects of technology are published. Papers are authored by
`recognized authorities and reviewed by experts. They include extensive
`introductions written at a level suitable for the nonspecialist, with ample
`references for those who wish to probe further. Several issues a year are
`devoted to a single subject of special importance.
`IMPORTANT: Prospective authors, before preparing a full-length manu-
`script, should submit a proposal containing a description of the topic and
`its
`importance to PROCEEDINGS readers, a detailed outline of the proposed
`
`paperand its type of coverage, and a brief biography showing the author's
`qualifications for writing the paper (including reference to previously
`published material as well as information on the author's relation to the
`topic). If the proposal receives a favorable review, the author will be
`encouraged to prepare the paper, which after submittal will go through the
`normal review process. Guidelines for proposals are available from the
`address below or the PROCEEDINGS home page:
`http://www .ieee.org/pubs/transjour/proc.
`Technicalletters are no longer published in the PROCEEDINGS. Comments
`on and corrections to material published in this journal will be considered,
`however,
`
`Please send proposals to the Editor, PROCREDINGS OF THE IEEE, 445
`Hoes Lane, Piscataway, NJ 08855-1331 USA.(Telephone: 732-562-5478,
`
`COVER Thisspecial issue covers the subject of automated biometric systemsthat are used to verify individual identity using unique biometric mcasurements
`of the human body, Our coverillustrates the concept of one of the growing number of applications of these systems.
`
`Page 4 of 30
`
`Page 4 of 30
`
`

`

`Speaker Recognition: A Tutorial
`
`
`
`JOSEPH P. CAMPBELL, JR., SENIOR MEMBER,IEEE
`
`Invited Paper
`
`A tutorial on the design and development of automatic speaker-
`recognition systems is presented. Automatic speaker recognition
`is the use of a machine to recognize a person from a spoken
`phrase. These systems can operate in two modes:
`to identify
`a particular person or to verify a person’s claimed identity.
`Speech processing and the basic components ofautomatic speaker-
`recognition systems are shownand designtradeoffs are discussed.
`Then, a new automatic speaker-recognition system is given. This
`recognizer performs with 98.9% correct identification, Last,
`the
`performances of various systems are conipared.
`Keywords—Accesscontrol, authentication, biomedical measure-
`ments, biomedical signal processing, biomedical transducers, bio-
`metric, communication system security, computer network security,
`computer security, corpus, data bases,
`identification of persons,
`public safety, site security monitoring, speaker recognition, speech
`processing, verification.
`
`I.
`
`INTRODUCTION
`
`In keeping with this special issue on biometrics, the focus
`of this paper is on facilities and network access-control
`applications of speaker recognition. Speech processing is a
`diverse field with many applications. Fig.
`1 shows a few of
`these areas and how speaker recognition relates to the rest
`ofthe field; this paper focuses on the three boxed areas.
`Speaker recognition encompasses verification and iden-
`tification. Automatic speaker verification (ASV)is the use
`of a machine to verify a person’s claimed identity from
`his voice. The literature abounds with different terms for
`speaker verification, including voice verification, speaker
`authentication, voice authentication, talker authentication,
`and talker verification. In automatic speaker identification
`(ASI), there is no a priori identity claim, and the system
`decides who the person is, what group the person is a
`member of, or (in the open-set case) that the person is
`unknown. General overviews of speaker recognition have
`been given in [2], [12], [17], [37], [51], [52], and [59].
`Speakerverification is defined as deciding if a speakeris
`whom he claims to be. This is different than the speaker
`
`Manuscript received April 20, 1997; revised June 27, 1997.
`The author is with the National Security Agency, R22, Ft. Meade,
`MD 20755-6516 USA;
`and the Whiting School of Engineering,
`The Johns Hopkins University, Baltimore, MD 21218 USA (e-mail:
`j.campbell @ieee.org).
`Publisher Item Identifier $ 0018-92 19(97)06947-8.
`
`Speech Provessing
`
`Analysis/Synthesis
`
`Recognition
`
`Coding
`
`Speech
`Recognition
`
`Speaker
`Recognition
`
`Language
`Identification
`
`Speaker
`Identification
`
`Speaker
`Detection
`
`Speaker
`Verification
`
`Text
`Independent
`Unwitting
`Speakers
`Variable
`Quality
`Speech
`
`Text
`Independent
`Cooperative
`Speakers
`High
`Quality
`Speech
`
`Speech
`
`Text
`Independent
`Cooperative
`Speakers
`High
`Quality
`Speech
`
`Text
`Dependent
`Cooperative
`Speakers
`High
`Quality
`
`Fig. 1. Speech processing.
`
`™
`
`identification problem, which is deciding if a speaker is a
`specific person or is among a group of persons. In speaker
`verification, a person makes an identity claim (e.g., by
`entering an employee numberor presenting his smart card).
`In text-dependent recognition, the phrase is known to the
`system and can be fixed or prompted (visually or orally).
`The claimant speaks the phrase into a microphone. This
`signal is analyzed by a verification system that makes the
`binary decision to accept or reject the user’s identity claim
`or possibly to report
`insufficient confidence and request
`additional input before making the decision.
`A typical ASV setup is shown in Fig. 2. The claimant,
`who has previously enrolled in the system, presents an
`encrypted smart card containing his identification informa-
`tion. He then attempts to be authenticated by speaking a
`prompted phrase(s) into the microphone. There is generally
`a tradeoff between accuracy andtest-session duration. In
`addition to his voice, ambient room noise and delayed
`versions of his voice enter the microphone via reflective
`acoustic surfaces. Prior to a verification session, users must
`enroll in the system (typically under supervised conditions).
`During this enrollment, voice models are generated and
`stored (possibly on a smart card) for use in later verification
`
`PROCEEDINGS OF THE IEEE, VOL. 85, NO. 9, SEPTEMBER 1997
`
`1437
`
`U.S. Government work not protected by U.S. copyright.
`
`Page 5 of 30
`
`Page 5 of 30
`
`

`

`
`
`
`Pattern
`
`Feature
`|_Digital
`Filtering
`and A/D|Speech”|Extraction
`Matching|Scores
`
`
`
`
`
`NEARS Taconic
`
`Doe SS . | Surface Microphone
`
` © _ Microphone
`
`Claimed ID
`
`Fig. 3. Generic speaker-verification system.
`
`B. Problem Formulation
`
`Speech is a complicated signal produced as a result
`of several
`transformations occurring at several different
`levels: semantic, linguistic, articulatory, and acoustic. Dif-
`ferences in these transformations appear as differences in
`the acoustic properties of the speech signal. Speaker-related
`differences are a result of a combination of anatomical
`differences inherent in the vocal tract and the learned speak-
`ing habits of different individuals. In speaker recognition,
`all these differences can be used to discriminate between
`speakers.
`
`C. Generic Speaker Verification
`The general approach to ASV consists of five steps:
`digital speech data acquisition, feature extraction, pattern
`matching, making an accept/reject decision, and enrollment
`to generate speaker reference models. A block diagram
`of this procedure is shown in Fig. 3. Feature extraction
`maps eachinterval of speech to a multidimensional feature
`space. (A speech interval typically spans 10-30 msof the
`speech waveform andis referred to as a frame of speech.)
`This sequence of feature vectors 2; is then compared to
`speaker models by pattern matching. This results in a match
`score 2; for each vector or sequence of vectors. The match
`score measures the similarity of the computed input feature
`vectors to models of the claimed speaker or feature vector
`patterns for the claimed speaker. Last, a decision is made to
`either accept or reject the claimant according to the match
`score or sequence of match scores, which is a hypothesis-
`testing problem.
`For speakerrecognition, features that exhibit high speaker
`discrimination power, high interspeaker variability, and
`low intraspeaker variability are desired. Many forms of
`pattern matching and corresponding models are possible.
`Pattern-matching methods include dynamic time warping
`(DTW), the hidden Markov model (HMM),artificial neural
`networks, and vector quantization (VQ). Template models
`are used in DTW,statistical models are used in HMM,and
`codebook models are used in VQ.
`
`D. Overview
`
`The purpose of this introductory section is to present a
`general framework and motivation for speaker recognition,
`an overview of the entire paper, and a presentation of
`previous work in speaker recognition.
`Section II contains an overview of speech processing,
`including speech signal acquisition,
`the data base used
`in later experiments, speech production, linear prediction
`(LP),
`transformations,
`and the cepstrum. Section TI
`
`PROCEEDINGS OF THE IEEE, VOL. 85, NO. 9, SEPTEMBER 1997
`
`Smart Card
`
`(Ambient Noise
`
`Fig. 2. Typical speaker-verification setup.
`
`
`
`Table 1 Sources of Verification Error
`
`Misspoken or misread prompted phrases
`
`Extreme emotionalstates (e.g., stress or duress)
`
`Time varying(intra- or intersession) microphone placement
`
`Poor or inconsistent room acoustics(e.g., multipath and noise)
`
`Channel mismatch(e.g., using different microphones for
`enrollment and verification)
`
`Sickness(e.g., head colds can alter the vocal tract)
`
`Aging (the vocal tract can drift away from models with age)
`
`sessions. There is also generally a tradeoff between accu-
`tacy and the duration and number of enrollmentsessions.
`Many factors can contribute to verification and iden-
`tification errors. Table 1
`lists some of the human and
`environmental factors that contribute to these errors, a few
`of which are shown in Fig. 2. These factors generally are
`outside the scope of algorithms or are better corrected by
`means other than algorithms (e.g., better microphones).
`These factors are important, however, because no matter
`how good a speaker-recognition algorithm is, human er-
`ror (e.g., misreading or misspeaking) ultimately limits its
`performance.
`
`A. Motivation
`
`ASV andASI are probably the most natural and econom-
`ical methods for solving the problems of unauthorized use
`of computer and communications systems and multilevel
`access control. With the ubiquitous telephone network and
`microphones bundled with computers, the cost of a speaker-
`recognition system might only be for software.
`Biometric systems automatically recognize a person by
`using distinguishing traits (a narrow definition). Speaker
`recognition is a performance biometric, i.e., you perform
`a task to be recognized. Your voice, like other biometrics,
`cannot be forgotten or misplaced, unlike knowledge-based
`(e.g., password) or possession-based (e.g., key) access-
`control methods. Speaker-recognition systems can be made
`somewhatrobust against noise and channelvariations [33],
`[49], ordinary human changes (e.g.,
`time-of-day voice
`changes and minor head colds), and mimicry by humans
`and tape recorders [22].
`
`1438
`
`Page 6 of 30
`
`Page 6 of 30
`
`

`

`the divergence measure, and
`presents feature selection,
`the Bhattacharyya distance. This section is highlighted
`by the development of the divergence shape measure and
`the Bhattacharyya distance shape. Section IV introduces
`pattern matching and Section V presents classification,
`decision theory, and receiver operating characteristic
`(ROC) curves. Section VI describes a simple but effective
`speaker-recognition algorithm. Section VII demonstrates
`the performanceof various speaker-recognition algorithms,
`and Section VIII concludes by summarizing this paper.
`
`E. Previous Work
`
`There is considerable speaker-recognition activity in in-
`dustry, national laboratories, and universities. Among those
`who have researched and designed several generations of
`speaker-recognition systems are AT&T (andits derivatives);
`Bolt, Beranek, and Newman;
`the Dalle Molle Institute
`for Perceptual Artificial Intelligence (Switzerland); ITT;
`Massachusetts Institute of Technology Lincoln Labs; Na-
`tional Tsing Hua University (Taiwan); Nagoya Univer-
`sity (Japan); Nippon Telegraph and Telephone (Japan);
`Rensselaer Polytechnic Institute; Rutgers University; and
`Texas Instruments (TT). The majority of ASV research is
`directed at verification over telephone lines [36]. Sandia
`National Laboratories, the National Institute of Standards
`and Technology [35], and the National Security Agency [8]
`have conducted evaluations of speaker-recognition systems.
`Table 2 shows a sampling of the chronological advance-
`ment in speaker verification. The following terms are used
`to define the columns in Table 2: “source” refers to a cita-
`
`tion in the references, “org”is the company or school where
`the work was done, “features” are the signal measurements
`(e.g., cepstrum), “input” is the type of input speech (labora-
`tory, office quality, or telephone), “text” indicates whether
`a text-dependent or text-independent mode of operation
`is used, “method” is the heart of the pattern-matching
`process, “pop” is the population size of the test (number
`£6,
`99
`of people), and “error” is the equal error percentage for
`speaker-verification systems
`“v”
`or the recognition error
`percentage for speaker identification systems “2” given
`the specified duration of test speech in seconds. This
`data is presented to give a simplified general view of
`past speaker-recognition research. The references should
`be consulted for important distinctions that are not
`in-
`cluded,e.g., differences in enrollment, differences in cross-
`gender impostortrials, differences in normalizing “cohort”
`speakers [53], differences in partitioning the impostor and
`cohort sets, and differences in known versus unknown
`impostors [8].
`It should be noted that
`it
`is difficult
`to
`make meaningful comparisons between the text-dependent
`and the generally more difficult
`text-independent
`tasks.
`Text-independent approaches, such as Gish’s segmental
`Gaussian model
`[18] and Reynolds’ Gaussian Mixture
`Model[49], need to deal with unique problems(e.g., sounds
`or articulations present
`in the test material but not
`in
`training). It is also difficult to compare between the binary-
`choice verification task and the generally more difficult
`multiple-choice identification task [12], [39].
`
`CAMPBELL: SPEAKER RECOGNITION
`
`Page 7 of 30
`
`trend shows accuracy improvements over
`The general
`time with larger tests (enabled by larger data bases), thus
`increasing confidence in the performance measurements.
`For high-security applications,
`these speaker-recognition
`systems would need to be used in combination with other
`authenticators (e.g., smart card). The performanceof current
`speaker-recognition systems, however, makes them suitable
`for many practical applications. There are more than a
`dozen commercial ASV systems,
`including those from
`ITT, Lernout & Hauspie, T-NETIX, Veritel, and Voice
`Control Systems. Perhaps the largest scale deployment of
`any biometric to date is Sprint’s Voice FONCARD®, which
`uses TI’s voice verification engine.
`Speaker-verification applications include access control,
`telephone banking, and telephone credit cards. The ac-
`counting firm of Ernst and Young estimates that high-tech
`computer thieves in the United States steal $3-5 billion
`annually. Automatic speaker-recognition technology could
`substantially reduce this crime by reducing these fraudulent
`transactions.
`
`As automatic speaker-verification systems gain wide-
`spread use, it is imperative to understand the errors made
`by these systems. There are two types of errors: the false
`acceptance of an invalid user (FA or Type I) and the false
`rejection of a valid user (FR or Type II). It takes a pair
`of subjects to make a false acceptance error: an impostor
`and a target. Because of this hunter'and prey relationship,
`in this paper, the impostor is referred to as a wolf and the
`targét as a sheep. False acceptance errors are the ultimate
`concern of high-security speaker-verification applications;
`however, they can be traded off for false rejection errors.
`After reviewing the methods of speaker recognition,
`a simple speaker-recognition system will be presented.
`A data base of 186 people collected over a three-month
`period was used in closed-set
`speaker
`identification
`experiments. A speaker-recognition system using methods
`presented here is practical
`to implement
`in software on
`a modest personal computer. The example system uses
`features and measures
`for
`speaker
`recognition based
`upon speaker-discrimination criteria (the ultimate goal of
`any recognition system). Experimental results show that
`these new features and measures yield 1.1% closed-set
`speaker identification error on data bases of 44 and 43
`people. The features and measures use long-term statistics
`based upon
`an
`information-theoretic
`shape measure
`between line spectrum pair (LSP) frequency features. This
`new measure,
`the divergence shape, can be interpreted
`geometrically as the shape of an information-theoretic
`measure called divergence. The LSP’s were found to be
`very effective features in this divergence shape measure.
`The following section contains an overview of digital
`signal acquisition, speech production, speech signal pro-
`cessing, LP, and mel cepstra.
`
`II.
`
`SPEECH PROCESSING
`
`Speech processing extracts the desired information from
`a speech signal. To process a signal by a digital computer,
`
`1439
`
`Page 7 of 30
`
`

`

`Table 2.
`
`Selected Chronology of Speaker-Recognition Progress
`
`se [Oe[te[wet[ome[or
`
`
`
`Atal 1974 [1] AT&T|Cepstrum|Pattern Match Dependent i: 2%@0.5s
`
`Lab
`
`
`v: 2%@|s
`
`Error
`
`Lab|Independent a
`
`
`Markel and
`Davis 1979
`[34]
`
`STI
`
`LP
`
`Long Term
`Statistics
`
`17
`
`i: 2%@39s
`
`10
`Furui 1981
`
`
`AT&T|Normalized} Patten Match|Tele-|Dependent Vv: 0.2%@3s
`[16]
`Cepstrum
`phone
`
`
`
`21
`
`
`Schwartz, et al.|BBN Nonparametric|Tele-|IndependentLAR i: 2.5%@2s
`1982 [56]
`pdf
`phone
`ab
`
`Li and Wrench
`1983 [31]
`
`TT
`
`I
`
`LP,
`Cepstrum
`
`Pattern Match
`
`(e
`
`Independent
`
`i: 21%@3s
`i: 4%@10s
`
`TI
`
`Lab Dependent|200|v: 0.8%@6s
`
`DTW
`Doddington
`Filter-bank
`1985 [12]
`
`
`
` Tele- 10 isolated
`
`
`100|i: S%@1.5s
`VQ (size 64)
`Soong,et al.
`AT&T
`LP
`phone
`digits
`Likelihood
`1985 [57]
`i: 1.5%@3.5s
`
`
`Ratio
`Distortion
`
`
`
`
`DTW
`Independent
`Likelihood
`
`Scoring
`
`
`
`Attili, et al. Cepstrum,|Projected Long
`
`
`Dependent 90|v: 1%@3s
`1988 [3]
`LP,
`Term Statistics
`Autocorr
`
`
`
`
`
`
`
`1TT
`Higgins,et al.
`LAR,
`DTW
`Office|Dependent|186|v: 1.7%@10s
`LP-
`Likelihood
`1991 [22]
`
`
`Cepstrum
`Scoring
`
`
`
`
`LP
`HMM
`10 isolated
`
`
`
`
`(AR mix)
`digits
`
`
`
`
`Reynolds 1995|MIT-LL Mel-
`
`
`HMM Office|Dependent|138] i: 0.8%@10s
`{48]; Reynolds
`Cepstrum
`(GMM)
`v: 0.12%@10s
`and Carlson
`1995 [49]
`
`Higgins and
`Wohlford 1986
`[23]
`
`ITT
`
`Cepstrum
`
`Tishby 1991
`[60
`
`AT&T
`
`v: 10%@2.5s
`v: 4.5%@10s
`
`v: 2.8%@1.5s
`v: 0.8%@3.5s
`
`
`
`
`
`Che and Lin
`Rutgers
`Cepstrum Office|Dependent|138|i: 0.56%@2.5sHMM
`1995 [9]
`i: 0.14%@10s
`v: 0.62%@2.5s
`
`AFIT
`Colombi,etal.
`HMM Office|Dependent|138] i: 0.22%@10s
`
`
`1996 [10]
`monophone
`v: 0.28%@ 10s
`
`Reynolds 1996|MIT-LL
`Mel- Tele-|Independent|416|v: 11%/16%@3sHMM
`
`
`Cepstrum,
`(GMM)
`phone
`Vv: 6%/8%@10s
`Mel-
`v: 3%/5%@3 0s
`dCepstrum
`matched/mis-
`matched handset
`
`[50}
`
`the signal must be representedin digital form so that it can
`be used by a digital computer.
`
`A. Speech Signal Acquisition
`Initially,
`the acoustic sound pressure wave is trans-
`formed into a digital signal suitable for voice process-
`ing. A microphone or telephone handset can be used to
`convert
`the acoustic wave into an analog signal. This
`analog signal is conditioned with antialiasingfiltering (and
`possibly additionalfiltering to compensate for any channel
`impairments). The antialiasing filter limits the bandwidth
`of the signal to approximately the Nyquist rate (half the
`
`1440
`
`sampling rate) before sampling. The conditioned analog
`signal
`is then sampled to form a digital signal by an
`analog-to-digital (A/D) converter. Today’s A/D converters
`for speech applications typically sample with 12-16 bits
`of resolution at 8000-20000 samples per second. Over-
`sampling is commonly used to allow a simpler analog
`antialiasing filter and to control the fidelity of the sampled
`signal precisely (e.g., sigma—delta converters).
`the analog
`In local speaker-verification applications,
`channel is simply the microphone,
`its cable, and analog
`signal conditioning. Thus, the resulting digital signal can
`be very high quality,
`lacking distortions produced by
`
`PROCEEDINGS OF THE IEEE, VOL. 85, NO. 9, SEPTEMBER 1997
`
`Page 8 of 30
`
`Page 8 of 30
`
`

`

`a= NASAL CAVITY
`Collected with a STU-III electret-microphone telephone handset
`HARD
`PALATE
`over 3 month period in a real-world office environment
`SOFT PALATE===
`4 enrollment sessions per subject with 24 phrases per session
`(VELUM )
`Hrarg BONE ——
`EPIGLOTTIS --—-
`
`Table 3. The YOHO Corpus
`
`“Combination lock” phrases(e.g., “twenty-six, eighty-one,fifty-
`seven’)
`
`138 subjects: 106 males, 32 females
`
`10 verification sessions per subject at approximately 3-day
`intervals with 4 phrasesper session
`Total of 1380 validated test sessions
`
`
`
`
`
`cRICOID
`CARTILAGE
`
`LUNG
`
`Fig. 4. Human vocal system. (Reprinted with permission from J.
`Flanagan, Speech Analysis and Perception, 2nd ed. New York and
`Berlin: Springer-Verlag, 1972, p. 10, Fig. 2.1. © Springer-Verlag.)
`
`a t
`
`e
`
`* oral cavity (forward of the velum and boundedby the
`lips, tongue, and palate);
`* nasal pharynx (above the velum, rear end of nasal
`cavity);
`
`* nasal cavity (above the palate and extending from the
`pharynx to the nostrils).
`An adult male vocaltract is approximately 17 cm long [14].
`The vocal folds (formerly known as vocal cords) are
`shownin Fig. 4. The larynx is composed of the vocal folds,
`the top ofthe cricoid cartilage, the arytenoid cartilages, and
`the thyroid cartilage (also known as “Adam’s apple”). The
`vocal folds are stretched between the thyroid cartilage and
`the arytenoid cartilages. The area between the vocal folds
`is called the glottis.
`As the acoustic wave passes through the vocaltract, its
`frequency content (spectrum)is altered by the resonances of
`the vocal tract. Vocal tract resonances are called formants.
`Thus,
`the vocal
`tract shape can be estimated from the
`spectral shape (e.g., formant location and spectraltilt) of
`the voice signal.
`Voice verification systems typically use features derived
`only from the vocal tract. As seen in Fig. 4, the human vo-
`cal mechanism is driven by an excitation source, which also
`contains speaker-dependent information. The excitation is
`generated by airflow from the lungs, carried by the trachea
`(also called the “wind pipe’) through the vocal folds (or the
`arytenoid cartilages). The excitation can be characterized as
`phonation, whispering, frication, compression, vibration, or
`a combination of these.
`
`1441
`
`8 kHz sampling with 3.8 kHz analog bandwidth (STU-III like)
`
`1.2 gigabytes of data
`
`transmission of analog signals over long-distance telephone
`lines.
`
`B. YOHO Speaker-Verification Corpus
`The work presented here is based on high-quality sig-
`nals for benign-channel speaker-verification applications.
`The primary data base for this work is known as the
`YOHO Speaker-Verification Corpus, which was collected
`by ITT under a U.S. government contract. The YOHO
`data base wasthe first large-scale, scientifically controlled
`and collected, high-quality speech data base for speaker-
`verification testing at high confidence levels. Table 3 de-
`scribes the YOHOdata base [21]. YOHOis available from
`the Linguistic Data Consortium (University of Pennsylva-
`nia), and test plans have been developed for its use [8].
`This data base already is in digital form, emulating the
`third generation Secure Terminal Unit’s (STU-III) secure
`voice telephone input characteristics, so the first signal
`processing block ofthe verification system in Fig. 3 (signal
`conditioning and acquisition) is taken care of.
`the
`In a text-dependent speaker-verification scenario,
`phrases are known to the system (e.g.,
`the claimant
`is
`prompted to say them). The syntax used in the YOHO
`data base is “combination lock” phrases. For example,
`the prompt might read, “Say: twenty-six, eighty-one,fifty-
`seven.”
`
`YOHO was designed for U.S. government evaluation
`of speaker-verification systems in “office” environments.
`In addition to office environments,
`there are enormous
`consumer markets that must contend with noisy speech
`(e.g., telephone services) and far-field microphones (e.g.,
`com

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket