`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
` UNITED STATES PATENT AND TRADEMARK OFFICE
`
`
`
`
`
`
`BEFORE THE PATENT TRIAL AND APPEAL BOARD
`
`
`
`
`
`
`APPLE INC.,
`Petitioner,
`
`v.
`
`Zentian Limited
`Patent Owner.
`____________________
`
`Case IPR2023-00036
`Patent No. 10,839,789
`____________________
`
`
`
`DECLARATION OF DAVID ANDERSON, Ph.D. IN SUPPORT
` OF PATENT OWNER’S PRELIMINARY RESPONSE
`
`
`
`
`
`
`
`
`
`
`Case IPR2023-00036
`DECLARATION OF DAVID ANDERSON, PH.D. ISO
` PATENT OWNER’S PRELIMINARY RESPONSE
`
`
`TABLE OF CONTENTS
`
`
`Introduction
`Engagement
`Background and qualifications
`
`Relevant legal standards
`Person of ordinary skill in the art
`Burden of proof
`Claim construction
`
`1
`1
`1
`5
`5
`5
`8
`8
`8
`10
`12
`13
`13
`15
`
`I.
`A.
`B.
`C. Materials considered
`II.
`A.
`B.
`C.
`D. Obviousness
`III. Overview of the ’789 Patent
`IV. Smyth
`V. Mozer
`VI. The ’789’s requirement for “[a]n acoustic coprocessor”
`VII. The ’789 patent’s requirement that “the calculation apparatus and the
`
`acoustic model memory are fabricated on a single integrated circuit”
`
`
`
`
`
`
`
`
`
`
`
`- i -
`
`
`
`
`
` Case IPR2023-00036
`
`DECLARATION OF DAVID ANDERSON, PH.D. ISO
` PATENT OWNER’S PRELIMINARY RESPONSE
`
`
`
`EXHIBIT LIST
`
`
`Description
`
`Exhibit No.
`2001
`
`
`
`
`
`
`
`
`
`
`
`
`- ii -
`
`
`
`
`
`
`
`
`
`
`I.
`
`I, David Anderson, Ph.D, do hereby declare as follows:
`
`Introduction
`
`Engagement
`
`A.
`
`1.
`
`I have been retained by Patent Owner Zentian Limited (“Zentian” or
`
`“Patent Owner”) to provide my opinions with respect to Zentian’s Preliminary
`
`Response to the Petition in Inter Partes Review proceeding IPR2023-00036, with
`
`respect to U.S. Pat. 10,839,789. I am being compensated for my time spent on this
`
`matter. I have no interest in the outcome of this proceeding and the payment of my
`
`fees is in no way contingent on my providing any particular opinions.
`
`2.
`
`As part of this engagement, I have also been asked to provide my
`
`technical review, analysis, insights, and opinions regarding the materials cited and
`
`relied upon by the Petition, including the prior art references and the supporting
`
`Declaration of Mr. Schmandt.
`
`3.
`
`The statements made herein are based on my own knowledge and
`
`opinions.
`
`B.
`
`Background and qualifications
`
`4. My full qualifications, including my professional experience and
`
`education, can be found in my Curriculum Vitae, which includes a complete list of
`
`my publications, and is attached as Ex. A to this declaration.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`I am a professor in the School of Electrical and Computer Engineering
`
`
`
`5.
`
`at the Georgia Institute of Technology (“Georgia Tech”) in Atlanta, Georgia. I
`
`have been a professor at Georgia Tech since 1999. In 2009 I served as a visiting
`
`professor in the Department of Computer Science at Korea University in Seoul,
`
`South Korea.
`
`6.
`
`I received my Ph.D. in Electrical and Computer Engineering from
`
`Georgia Tech in 1999. I received my B.S. and M.S. in Electrical Engineering from
`
`Brigham Young University in 1993 and 1994, respectively.
`
`7.
`
`In my employment prior to Georgia Tech as well as in my subsequent
`
`studies and research, I have worked extensively in areas related to the research,
`
`design, and implementation of speech and audio processing systems. I have also
`
`taught graduate and undergraduate level courses at Georgia Tech on the
`
`implementation of signal processing and embedded systems. For example, I have
`
`taught courses on statistical machine learning, machine learning for speech, pattern
`
`recognition, multimedia processing and systems, software design, computer
`
`architecture, real-time signal processing systems, and applications of signal
`
`processing (covering topics in audio processing and speech recognition). I have
`
`also designed and taught a course on signal processing in the context of human
`
`
`
`
`
`
`
`
`2
`
`
`
`
`
`
`
`
`
`
`
`
`perception. These courses and my research have covered many topics relevant to
`
`the subject matter of the ’789 patent and the prior art cited therein.
`
`8.
`
`I have served as principal investigator or co-principal investigator in
`
`numerous multi-disciplinary research projects including “Blind Source Separation
`
`for Audio,” “Audio Classification,” “Auditory Scene Analysis,” “Hearing Aid
`
`Audio Processing,” “Speaker Driver Sound Enhancement,” “I-Vector Based Voice
`
`Quality,” “Analysis of Voice Exercise Using Signal Processing,” and “Smart
`
`Homes for Effective and Safe Remote Work During a Pandemic and Beyond.”
`
`9.
`
`I also have extensive experience with the practical implementation of
`
`signal processing algorithms, information theory, signal detection, and related
`
`topics through my research and consulting. I have published over 200 book
`
`chapters and papers in reviewed journals and conferences. Topics include those
`
`such as “Speech recognition using filter bank features,” “Speaker adaptation using
`
`speaker similarity score on DNN features.” “Segmentation based speech
`
`enhancement using auxiliary sensors,” “A framework for estimation of clean
`
`speech by fusion of outputs from multiple speech enhancement systems,”
`
`“Distributed acquisition and processing systems for speech and audio,” “A missing
`
`data-based feature fusion strategy for noise-robust automatic speech recognition
`
`using noisy sensors,” “Learning distances to improve phoneme classification,”
`
`
`
`
`
`
`
`
`3
`
`
`
`
`
`
`
`
`
`
`
`
`“Identification of voice quality variation using i-vectors,” “Varying time-constants
`
`and gain adaptation in feature extraction for speech processing,” “Low bit-rate
`
`coding of speech in harsh conditions using non-acoustic auxiliary devices,”
`
`“Speech analysis and coding using a multi-resolution sinusoidal transform,”
`
`“Biologically inspired auditory sensing system interfaces on a chip,” “Cascade
`
`classifiers for audio classification,” and “Single acoustic channel speech
`
`enhancement based on glottal correlation using non-acoustic sensors.” I have also
`
`contributed book chapters for treatises such as “Independent Component Analysis
`
`for Audio and Biosignal Applications,” and written a book on Fixed-Point Signal
`
`Processing which is related to the practical implementation of systems for
`
`processing sound and other signals.
`
`10.
`
`I am a named inventor on eight patents, including “Speech activity
`
`detector for use in noise reduction system, and methods therefor” (U.S. Patent No.
`
`6,351,731), and “Analog audio signal enhancement system using a noise
`
`suppression algorithm” (U.S. Patent No. 7,590,250).
`
`11.
`
`I am a Senior Member of the Institute of Electrical and Electronics
`
`Engineers (“IEEE”) and have been a Member since 1991. I am also a Member of
`
`the IEEE Signal Processing Society. From 1994 to 2016, I was also a member of
`
`the Acoustical Society of America. In 2003, I served as the Co-Chair for the NSF
`
`
`
`
`
`
`
`
`4
`
`
`
`
`
`
`
`
`
`
`
`
`Symposium on Next Generation Automatic Speech Recognition. In 2004, I
`
`received the Presidential Early Career Award for Scientists and Engineers,
`
`presented by then-President George W. Bush, for my work on ultra-low-power
`
`signal processing system design.
`
`C. Materials considered
`
`12.
`
`In the course of preparing my opinions, I have reviewed and am familiar
`
`with the ’789 patent, including its written description, figures, and claims. I have
`
`also reviewed and am familiar with the Petition in this proceeding, the supporting
`
`Declaration of Mr. Schmandt, and the relied upon prior art, including Smyth and
`
`Mozer. I have also reviewed the materials cited in this declaration. My opinions are
`
`based on my review of these materials as well as my more than 30 years of
`
`experience, research, and education in the field of art.
`
`II. Relevant legal standards
`
`13.
`
`I am not an attorney. I offer no opinions on the law. But counsel has
`
`informed me of the following legal standards relevant to my analysis here. I have
`
`applied these standards in arriving at my conclusions.
`
`A.
`
`Person of ordinary skill in the art
`
`14.
`
`I understand that an analysis of the claims of a patent in view of prior
`
`art has to be provided from the perspective of a person having ordinary skill in the
`
`
`
`
`
`
`
`
`5
`
`
`
`
`
`
`
`
`
`art at the time of invention of the ’789 patent. I understand that I should consider
`
`
`
`factors such as the educational level and years of experience of those working in the
`
`pertinent art; the types of problems encountered in the art; the teachings of the prior
`
`art; patents and publications of other persons or companies; and the sophistication
`
`of the technology. I understand that the person of ordinary skill in the art is not a
`
`specific real individual, but rather a hypothetical individual having the qualities
`
`reflected by the factors discussed above.
`
`15.
`
`I understand that the Petition applies a priority date of September 14,
`
`2004, for the challenged claims, Pet. 3, and I apply the same date.
`
`16.
`
`I further understand that the Petition defines the person of ordinary skill
`
`in the art at the time of the invention as having had a master’s degree in computer
`
`engineering, computer science, electrical engineering, or a related field, with at least
`
`two years of experience in the field of speech recognition, or a bachelor’s degree in
`
`the same fields with at least four years of experience in the field of speech
`
`recognition. The Petition adds that further education or experience might substitute
`
`for the above requirements. I do not dispute the Petition’s assumptions at this time,
`
`and my opinions are rendered on the basis of the same definition of the ordinary
`
`artisan set forth in the Petition.
`
`
`
`
`
`
`
`
`6
`
`
`
`
`
`
`
`
`
`I also note, however, that an ordinarily skilled engineer at the time of
`
`
`
`17.
`
`the invention would have been trained in evaluating both the costs and benefits of a
`
`particular design choice. Engineers are trained (both in school and through general
`
`experience in the workforce) to recognize that design choices can have complex
`
`consequences that need to be evaluated before forming a motivation to pursue a
`
`particular design choice, and before forming an expectation of success as to that
`
`design choice. In my opinion, anyone who did not recognize these realities would
`
`not be a person of ordinary skill in the art. Thus, a person who would have simply
`
`formed design motivations based only on the premise that a particular combination
`
`of known elements would be possible would not be a person of ordinary skill
`
`regardless of their education, experience, or technical knowledge. Likewise, a person
`
`who would have formed design motivations as to a particular combination of known
`
`elements based only on the premise that the combination may provide some benefit,
`
`with no consideration of the relevance of the benefit in the specific context and in
`
`relation to the costs or disadvantages of that combination, would also not have be a
`
`person of ordinary skill in the art, regardless of their education, experience, or
`
`technical knowledge. In my opinion, a person of ordinary skill in the art would have
`
`been deliberative and considered, rather than impulsive.
`
`
`
`
`
`
`
`
`7
`
`
`
`
`
`
`
`
`
`18. Throughout my declaration, even if I discuss my analysis in the present
`
`
`
`tense, I am always making my determinations based on what a person of ordinary
`
`skill in the art (“POSA”) would have known at the time of the invention. Based on
`
`my background and qualifications, I have experience and knowledge exceeding the
`
`level of a POSA, and am qualified to offer the testimony set forth in this declaration.
`
`B.
`
`19.
`
`Burden of proof
`
`I understand that in an inter partes review the petitioner has the burden
`
`of proving a proposition of unpatentability by a preponderance of the evidence.
`
`C. Claim construction
`
`20.
`
`I understand that in an inter partes review, claims are interpreted based
`
`on the same standard applied by Article III courts, i.e., based on their ordinary and
`
`customary meaning as understood in view of the claim language, the patent’s
`
`description, and the prosecution history viewed from the perspective of the ordinary
`
`artisan. I further understand that where a patent defines claim language, the
`
`definition in the patent controls, regardless of whether those working in the art may
`
`have understood the claim language differently based on ordinary meaning.
`
`D. Obviousness
`
`21.
`
`I understand that a patent may not be valid even though the invention
`
`is not identically disclosed or described in the prior art if the differences between the
`
`
`
`
`
`
`
`
`8
`
`
`
`
`
`
`
`
`
`subject matter sought to be patented and the prior art are such that the subject matter
`
`
`
`as a whole would have been obvious to a person having ordinary skill in the art in
`
`the relevant subject matter at the time the invention was made.
`
`22.
`
`I understand that, to demonstrate obviousness, it is not sufficient for a
`
`petition to merely show that all of the elements of the claims at issue are found in
`
`separate prior art references or even scattered across different embodiments and
`
`teachings of a single reference. The petition must thus go further, to explain how a
`
`person of ordinary skill would combine specific prior art references or teachings,
`
`which combinations of elements in specific references would yield a predictable
`
`result, and how any specific combination would operate or read on the claims.
`
`Similarly, it is not sufficient to allege that the prior art could be combined, but rather,
`
`the petition must show why and how a person of ordinary skill would have combined
`
`them.
`
`23.
`
`I understand that where an alleged motivation to combine relies on a
`
`particular factual premise, the petitioner bears the burden of providing specific
`
`support for that premise. I understand that obviousness cannot be shown by
`
`conclusory statements, and that the petition must provide articulated reasoning with
`
`some rational underpinning to support its conclusion of obviousness. I also
`
`understand that skill in the art and “common sense” rarely operate to supply missing
`
`
`
`
`
`
`
`
`9
`
`
`
`
`
`
`
`
`
`knowledge to show obviousness, nor does skill in the art or “common sense” act as
`
`
`
`a bridge over gaps in substantive presentation of an obviousness case.
`
`III. Overview of the ’789 Patent
`
`24. U.S. Patent 10,839,789, titled “Speech recognition circuit and method,”
`
`is directed to an improved speech recognition circuit and associated methods. Ex.
`
`1001, 1:20-21. The ’789 patent teaches and claims an acoustic co-processor
`
`comprising a calculating apparatus for calculating distances between a feature vector
`
`and acoustic states of an acoustic model, which acoustic states and acoustic model
`
`are stored in and read from an acoustic model memory that is fabricated on the same
`
`integrated circuit as the calculating apparatus. Claim 1; Ex. 1001, 25:40-55, 34:56-
`
`63.
`
`25. The ’789 patent teaches that an “audio input for speech recognition”
`
`may be input to the front end in the form of digital audio or analog audio that is
`
`converted to digital audio using an analog to digital converter. Ex. 1001, 12:52-12:55
`
`“The audio input is divided into time frames, each time frame typically being on the
`
`order of 10 ms.” Ex. 1001, 12:55-57. “For each audio input time frame, the audio
`
`signal is converted into a feature vector. This may be done by splitting the audio
`
`signal into spectral components,” such as, for instance, 13 components plus their
`
`first and second derivatives, creating a total of 39 components. Ex. 1001, 12:58-60.
`
`
`
`
`
`
`
`
`10
`
`
`
`
`
`
`
`
`
`The feature vector thus “represents a point in an N-dimensional space,” where N is
`
`
`
`generally in the range of 20 to 39. Ex. 1001, 13:22-23.
`
`26. Each feature vector is then passed to the calculating circuit, or distance
`
`calculation engine, which calculates a distance indicating the similarity between a
`
`feature vector and one or more predetermined acoustic states of an acoustic model.
`
`Ex. 1001, 6:63-65, 25:42-44 (“Each feature vector is transferred to a distance
`
`calculation engine circuit 204, to obtain distances for each state of the acoustic
`
`model.”). “The distance calculator stage of the recognition process computes a
`
`probability or likelihood that a feature vector corresponds to a particular state.” Ex.
`
`1001, 13:26-29. “The likelihood of each state is determined by the distance between
`
`the feature vector and each state.” Ex. 1001, 13:3-4. The distance calculation may
`
`be a Mahalanobis distance using Gaussian distributions. Ex. 1001, 4:23-24. “The
`
`MHD (Mahalanobis Distance) is a distance between two N-dimensional points,
`
`scaled by the statistical variation in each component.” Ex. 1001, 13:15-18. The ’789
`
`patent teaches calculating the distance between a feature vector and 8,000 states,
`
`“i.e. one distance for each of the 8,000 states,” Ex. 1001, 13:61-63, which it teaches
`
`“gives the best recognition results when used with a language model.” Id. at 13:10-
`
`15, 14:5-6 (“Each state is also a 39 dimensional vector, having the same spectral
`
`components as the feature vector.”). “Due to the 10 ms frame length, a feature vector
`
`
`
`
`
`
`
`
`11
`
`
`
`
`
`
`
`
`
`arrives at the MHD engine,” i.e., the distance calculation engine or calculating
`
`
`
`circuit, “every 10 ms.” Ex. 1001, 14:1-2.
`
`27. The distance calculation engine or calculating circuit “may be included
`
`within an accelerator,” Ex. 1001, 3:62-64, which may be a “loosely bound co-
`
`processor for a CPU running speech recognition software,” and which “has the
`
`advantage of reducing computational load on the CPU, and reducing memory
`
`bandwidth load for the CPU.” Id. at 24:26-31; see Figs. 17-23. “Each time a feature
`
`vector is loaded into the accelerator, the accelerator computes the distances for all
`
`states for that feature vector[.]” Ex. 1001, 26:15-18.
`
`28.
`
`“The distances calculated by the distance calculation engine are then
`
`transferred to the search stage 106 of the speech recognition circuit, which uses
`
`models such as one or more word models and/or language models to generate and
`
`output recognised text.” Ex. 1001, 23:14-19.
`
`IV. Smyth
`
`29. U.S. Pat. 5,819,222, titled “Task-constrained connected speech
`
`recognition of propagation of tokens only if valid propagation path is present,”
`
`(“Smyth”), is directed to “task-constrained connected word recognition where the
`
`task, for example, might be to recognise one of a set of account numbers or product
`
`codes.” Ex. 1005, 1:15-21.
`
`
`
`
`
`
`
`
`12
`
`
`
`
`
`
`
`
`
`
`
`
`V. Mozer
`
`30. U.S. Pat. 6,832,194, titled “Audio recognition peripheral system,”
`
`(“Mozer”), is directed to “integrated circuits for implementing audio recognition,”
`
`offering a “low cost audio recognition peripheral to operate in conjunction with a
`
`processor.” Ex. 1046, 1:5-10, 2:4-7.
`
`VI. The ’789’s requirement for “[a]n acoustic coprocessor”
`
`
`31. Every challenged claim of the ’789 patent recites “[a]n acoustic
`
`coprocessor.”
`
`32. The ’789 patent teaches that in the relevant embodiment, “the distance
`
`calculation engine is designed as a speech accelerator, to operate as a loosely
`
`bound co-processor for a CPU running speech recognition software. This has the
`
`advantage of reducing the computational load on the CPU, and reducing memory
`
`bandwidth load for the CPU.” Ex. 1001, 24:26-31 (emphasis added).
`
`33. The ’789 patent explains that the claimed “coprocessor” (1) performs
`
`the function of the “accelerator” taught throughout the specification; (2) is not the
`
`main CPU; and (3) must work in conjunction with a main CPU. The ’789 patent’s
`
`teachings as to the claimed “coprocessor” in those respects are consistent with
`
`contemporaneous dictionary definitions, which define “coprocessor” as “a
`
`processor that is connected to a main processor and operates concurrently with
`
`
`
`
`
`
`
`
`13
`
`
`
`
`
`
`
`
`
`the main processor,” Ex. 2003 at 3 (emphasis added), and as “[a] processor used in
`
`
`
`conjunction with a central processing unit. . . .” Ex. 2009 at 3.
`
`34. The ’789 patent’s teachings as to the “accelerator” are likewise
`
`consistent with the above definitions and teachings, consistently disclosing that the
`
`“accelerator” must work in conjunction with another processor that is the main
`
`CPU. See, e.g., Ex. 1001, Figs. 18-22, 26:44-46 (“The Acoustic Model may be
`
`loaded into the Acoustic Model Memory by software running on the CPU prior to
`
`the first use of the Accelerator.”); 32:11-16 (“the operation of the CPU and
`
`accelerator are tightly coupled”). There are no embodiments to the contrary.
`
`35. The Petition identifies Smyth’s classifier 34 and at least one alleged
`
`signal line as the claimed “coprocessor.” Pet. 11, 16, 44. The Petition, citing to Mr.
`
`Schmandt’s declaration at paragraph 129, alleges that Smyth’s classifier and the
`
`alleged associated signal lines “are collectively a coprocessor because the
`
`classifier is designed to operate with the feature extractor and the sequencer (and
`
`other components of speech recognizer 3) to calculate distances and performed
`
`[sic] other claimed functionality.” Pet. 16.
`
`36. The Petition’s showing falls short of meeting the basic and well-
`
`known meaning of “coprocessor,” as taught in the ’789 patent and
`
`contemporaneous dictionaries. In particular, the Petition’s theory as to
`
`
`
`
`
`
`
`
`14
`
`
`
`
`
`
`
`
`
`
`
`
`“coprocessor” as set forth with respect to independent claims 1 and 10 never
`
`demonstrates (or even alleges) that (1) Smyth’s classifier 34 is not the main CPU;
`
`and (2) that Smyth’s classifier 34 works in conjunction with another processor that
`
`is the main CPU.
`
`37.
`
` The Petition’s allegation that the classifier 34 works with “the feature
`
`extractor and the sequencer (and other components of speech recognizer 3)” is not
`
`evidence that Smyth’s classifier 34 is not the main CPU or that it works in
`
`conjunction with a main CPU, as the definition of “coprocessor” requires. A
`
`processor that simply works with other processors and other components of a
`
`speech recognition system is not necessarily a coprocessor, because such a
`
`processor may in fact be the main CPU, or else may not work in conjunction with a
`
`main CPU at all.
`
`VII. The ’789 patent’s requirement that “the calculation apparatus and the
`acoustic model memory are fabricated on a single integrated circuit”
`
`
`
`38.
`
`I understand independent claim 1 and its dependents, as well as claims
`
`11 and 24 require that the claimed “calculating apparatus” and the claimed
`
`“acoustic model memory” are “fabricated on a single integrated circuit.”
`
`39. The Petition initially relies on Smyth alone to meet those
`
`requirements. Pet. 38-40. According to the Petition, Smyth’s “calculating
`
`
`
`
`
`
`
`
`15
`
`
`
`
`
`
`
`
`
`
`
`
`apparatus” is the classifier processor 341, and Smyth’s “acoustic model memory”
`
`is the state memory 342. Pet. 38.
`
`40. Smyth teaches that the classifier processor 341 and the state memory
`
`342 are distinct structures, and does not depict or otherwise describe them as
`
`being “fabricated on a single integrated circuit.” Ex. 1005, Fig. 3, 5:54-6:7.
`
`
`
`41. The Petition theorizes that because the processor 341 could be a
`
`Motorola DSP56000, and because the Motorola DSP56000 had on-board memory,
`
`“Smyth teaches or renders obvious a single integrated circuit (IC) on which is
`
`fabricated both the classifier processor 341 and state memory 342.” Pet. 38-40
`
`(emphasis added). The Petition provides and relies on Kloker, Ex. 1009, which
`
`describes the design details of the Motorola DSP56000.
`
`16
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`42. The combined teachings of Kloker and Smyth, however, definitively
`
`prove that the onboard memory of the Motorola DSP56000 could not be the state
`
`memory 342 of Smyth, and could not serve as the recited “acoustic model
`
`memory” of the challenged claims.
`
`43. Kloker teaches that the Motorola DSP56000 had a total memory size
`
`of only 512 words (24-bit) of RAM, as shown in Kloker’s Fig. 2 under the labels
`
`“x memory” and “y memory” (256 words of RAM in each memory). Ex. 1009 at 4
`
`(Fig. 2) (annotated).
`
`
`
`
`
`
`
`
`17
`
`
`
`
`
`
`
`
`
`
`
`Since bytes are 8-bits, each of Kloker’s 24-bit words is 3 bytes. Thus, Kloker’s 512
`
`
`
`words of memory only amounts to approximately 1.5 kilobytes that could even
`
`theoretically be used for storing data structures.
`
`44.
`
` While the Motorola DSP56000 also has 2048 words (24-bit) of
`
`program ROM, that ROM memory is read-only memory used for storing
`
`executable code and defining the processor operation. The ROM blocks in the “x
`
`memory” and “y memory” are likewise programmed with data used for Fourier
`
`transforms. These are not modifiable by a programmer, developer, or other user of
`
`the DSP processor. Accordingly, the ROM memory could not be used to store the
`
`recited “acoustic model data.” Nonetheless, even the entire ROM memory is only
`
`approximately 7.5 kilobytes in size, and the entirety of the Motorola DSP56000’s
`
`memory is barely 9 kilobytes.
`
`45. By contrast, Smyth teaches that the state memory 342 holds multiple
`
`“state field[s]” for each of a “plurality of speech states,” e.g., three states per
`
`allophone. Ex. 1005, 5:58-62. As explained below, an ordinary artisan would have
`
`understood from that teaching that Smyth’s state memory 342 must have at least
`
`750 kilobytes of data storage capacity—480x more than the usable RAM in the
`
`Motorola DSP56000, and 80x times more than the combined ROM and RAM in
`
`that device.
`
`
`
`
`
`
`
`
`18
`
`
`
`
`
`
`
`
`
`46. An ordinary artisan would have been able to calculate the approximate
`
`
`
`storage requirement of Smyth’s state memory 342 because Smyth discloses using
`
`what a POSITA would recognize as triphones having three states per allophone,
`
`Ex. 1005, 5:58-62, of which there are tens of thousands. The ordinary artisan
`
`would have known that even in a resource-constrained system, realistically at least
`
`2000 allophones (triphones) of the more common allophones are used. Each
`
`acoustic state requires storage of the mean and variance for each Gaussian for each
`
`feature in the vector which may be of dimension 16-40 (Smyth references a paper
`
`that uses 16 features per vector but more features is more common). Ex. 1005,
`
`5:31-45 (citing “On the Evaluation of Speech Recognisers and Databases using a
`
`Reference System”, Chollet & Gagnoulet, 1982 proc. IEEE p2026 and “On the use
`
`of Instantaneous and Transitional Spectral Information in Speaker Recognition”,
`
`Soong & Rosenberg, 1988 IEEE Trans. On Acoustics, Speech and Signal
`
`Processing Vol. 36 No. 6 p871). Using 16 to 40 features per vector would
`
`require 128-320 bytes of memory. Thus, if 2000 allophones are used, with three
`
`states each (as disclosed by Smyth and commonly used in ASR), the acoustic
`
`model would require more than 750 kBytes of memory. This estimate is on the low
`
`end. At the time of the patents in question, typical acoustic models had states
`
`represented by multiple Gaussians (Gaussian mixtures) and higher-dimensional
`
`
`
`
`
`
`
`
`19
`
`
`
`
`
`
`
`feature vectors to have acceptable performance. As the’789 patent itself teaches,
`
`such models may occupy many megabytes of storage space. (See Ex. 1001, 34:52-
`
`
`
`
`
`
`66).
`
`47. Accordingly, the 1.5kB RAM onboard memory of the Motorola
`
`DSP56000 (or even the entire 9-kilobyte RAM and ROM memory) could not be
`
`the state memory 342 taught in Smyth, and also could not be an “acoustic model
`
`memory” as recited in the claims.
`
`48.
`
`It necessarily follows that Smyth’s state memory 342 was not
`
`fabricated on the same integrated circuit as the Motorola DSP56000, since the
`
`design of the integrated circuit of the Motorola DSP56000 is shown in Kloker, and
`
`that design does not include any memory that meets the requirements of the state
`
`memory 342. Smyth therefore did not teach or suggest that its state memory 342
`
`and processor 341 were fabricated on a single integrated circuit, as claims 1-9 and
`
`11 and 24 require.
`
`49. The Petition alternatively proposes a combination of Smyth and
`
`Mozer. Pet. 41-44.
`
`50. Mozer’s relevant embodiment teaches a memory 460 consisting of a
`
`4K SRAM. Ex. 1046, 10:20-21. Mozer does not specific whether its 4K SRAM has
`
`4K bits (which would be roughly 512 bytes or .5 kB) or 4K bytes (which would be
`
`
`
`
`
`
`
`
`20
`
`
`
`
`
`
`
`
`
`4kB). But even assuming generously that Mozer’s RAM had 4 kB of storage space,
`
`
`
`that memory size would have been far too small to serve as Smyth’s state memory
`
`342, which would have required more than 750 kB, as explained above. Indeed, in
`
`the sections of Mozer cited by the Petition, Mozer only teaches that the memory
`
`460 holds “a first vector representation of the audio signal 425 and a second vector
`
`representing a template.” Ex. 1046, 10:17-20. That is a tiny fraction of the
`
`information that Smyth teaches as being stored in its state memory 342. Thus,
`
`Mozer could not have taught or suggested to an ordinary artisan the idea of
`
`fabricating Smyth’s state memory 342 and its classifier processor 341 on a single
`
`integrated circuit.
`
`51. The Petition’s stated motivation for modifying Smyth “to form the
`
`classifier processor and associated memory on a single IC” could not possibly have
`
`motivated an ordinary artisan to undertake such a modification. The Petition’s
`
`articulated motivation is that modifying Smyth in that manner would have allowed
`
`Smyth’s recognition system to “offload” vector processing operations associated
`
`with audio recognition to Smyth’s audio recognition peripheral, i.e., Smyth’s
`
`classifier 34. Pet. 42. Smyth already offloads vector processing operations
`
`associated with audio recognition to its classifier 34 without the Petition’s
`
`proposed modification. Pet. 42 (“Smyth already teaches the classifier performing
`
`
`
`
`
`
`
`
`21
`
`
`
`
`
`
`
`
`
`
`
`
`the distance calculations is performed by a processor distinct from the processor
`
`for performing feature vector calculation and the processor for performing word
`
`identification.”).
`
`52. An ordinary artisan likewise would have known that Smyth, without
`
`modification, already achieves the exact benefit the Petition contends the ordinary
`
`artisan would have sought to achieve by modifying Smyth. Accordingly, the
`
`Petition’s articulated motivation could not have possibly motivated the ordinary
`
`artisan to “fabricate classifier processor 341 . . . and state memory 342 . . . on a
`
`single integrated circuit.” Pet. 44.
`
`53. The Petition’s proposed modification requires fabricating an
`
`integrated circuit. Pet. 42. Fabricating a circuit would have been well beyond the
`
`skill level of the Petition’s stated person of ordinary skill. According to the
`
`Petition, the person of ordinary skill in this context would have had a master’s
`
`degree in computer engineering, computer science, electrical engineering, or a
`
`related field, along with two to four years of work experience in “speech
`
`recognition.” Pet. 3. It is my opinion that a person with those qualifications would
`
`not have been able to undertake the highly complex task of fabricating a new
`
`integrated circuit.
`
`
`
`
`
`
`
`
`22
`
`
`
`
`
`
`
`
`
`
`
`
`54. The Petition’s assertion that others in the past had succeeded in
`
`fabricating integrated circuits, even those allegedly with a calculating apparatus
`
`and an acoustic model memory, Pet. 43-44 (citing Toyoda), would have been
`
`irrelevant to the ordinary artisan’s expectation of success because the ordinary
`
`artisan as defined by the Petition did not have the capability to achieve that feat.
`
`Accordingly, the ordinary artisan would not have had a reasonable expectation of
`
`success with respect to the Petition’s proposed modification.
`
`55.
`
`I hereby declare that all statements made herein of my own
`
`knowledge are true and that all opinions expressed herein are my own; and further
`
`that these statements were made with the knowledge that willful false statements
`
`and the like are punishable by fine or imprisonment, or both, under Section 1001 of
`
`Title 18 of the United States Code.
`
`
`
` ______________________________
`Executed on March 15, 2023
`
`David Anderson, Ph.D
`
`
`
`
`
`
`
`
`
`23
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`School of Electrical and Computer Engineering
`Georgia Institute of Technology
`(cid:72) 770-883-0708
`(cid:66) anderson@gatech.edu
`
`David V. Anderson
`March 1, 2023
`Earned Degrees
`1999 Ph.D. in Electrical and Co