throbber

`
`
`UNITED STATES PATENT AND TRADEMARK OFFICE
`___________________
`
`BEFORE THE PATENT TRIAL AND APPEAL BOARD
`___________________
`
`INTEL CORPORATION
`Petitioner
`
`
`v.
`
`
`
`HEALTH DISCOVERY CORPORATION
`Patent Owner
`___________________
`
`Case IPR2021-00549
`Patent 7,117,188
`___________________
`
`PETITION FOR INTER PARTES REVIEW OF
`U.S. PATENT 7,117,188
`
`
`
`
`
`
`
`Mail Stop PATENT BOARD
`Patent Trial and Appeal Board
`U.S. Patent & Trademark Office
`P.O. Box 1450
`Alexandria, VA 22313–1450
`
`
`

`

`
`
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`I.
`II.
`III.
`
`TABLE OF CONTENTS
`Introduction ...................................................................................................... 1
`Grounds for Standing ....................................................................................... 1
`Identification of Challenge .............................................................................. 1
`A.
`Citation of prior art ................................................................................ 1
`B.
`Grounds for Challenge .......................................................................... 5
`IV. The ’188 patent ................................................................................................ 5
`A.
`Technology Background ....................................................................... 5
`1. Machine Learning ....................................................................... 5
`2.
`SVMs .......................................................................................... 7
`3.
`Feature Selection ....................................................................... 12
`The ’188 patent .................................................................................... 16
`B.
`Priority Date ........................................................................................ 18
`C.
`Claim Construction ............................................................................. 21
`D.
`Level of Ordinary Skill in the Art ....................................................... 21
`E.
`V. GROUND 1: Combination of Mukherjee and Platt renders
`claims 1-10 and 13-23 obvious. ..................................................................... 21
`A.
`Combination Overview. ...................................................................... 21
`1. Mukherjee ................................................................................. 21
`2.
`Platt ........................................................................................... 23
`3. Motivation to Combine ............................................................. 24
`Independent Claim 1 ........................................................................... 27
`1.
`Preamble [1P]. ........................................................................... 27
`
`B.
`
`i
`
`

`

`
`
`
`C.
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`“Inputting” limitation [1A]. ...................................................... 29
`2.
`“Optimizing the plurality of weights” limitation [1B]. ............. 35
`3.
`“Computing” limitations [1C]. .................................................. 38
`4.
`“Eliminating” limitation [1D]. .................................................. 40
`5.
`“Repeating steps” limitation [1E]. ............................................ 41
`6.
`“Inputting … a live set of data” limitation [1F]. ....................... 45
`7.
`Independent claims 13 and 19 ............................................................. 47
`1.
`Preambles [13P]/[19P] .............................................................. 48
`2.
`“Optimum subset of features” limitations [13E]/[19E] ............ 49
`D. Dependent Claims 2-10, 14-18, 20-23 ................................................ 50
`1.
`Soft Margin SVM: Claim 2 ....................................................... 50
`2.
`Ranking Criterion: Claim 3 ....................................................... 52
`3.
`Quadratic Decision Function: Claim 4 ..................................... 53
`4.
`Feature Elimination: Claims 5-7, 14-16, and 21-23 ................. 55
`5.
`Gene Expression Data: Claim 8 ................................................ 57
`6.
`Pre-Processing: Claims 9 and 18 .............................................. 58
`7.
`New SVM: Claims 10, 17, and 20 ............................................ 59
`VI. GROUND 2: Combination of Mukherjee, Platt, and Kohavi renders claims
`1-10 and 13-23 obvious. ................................................................................ 59
`A.
`Independent claims 1, 13, and 19 ........................................................ 61
`B. Motivation to Combine ....................................................................... 67
`VII. GROUND 3: Combination of Mukherjee, Platt, Kohavi, and Cortes renders
`claim 2 obvious. ............................................................................................. 69
`
`
`
`
`ii
`
`

`

`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`VIII. GROUND 4: Combination of Mukherjee, Platt, Kohavi, and Castelli renders
`claims 11-12 obvious. .................................................................................... 72
`A.
`Combination Overview ....................................................................... 72
`B.
`Dependent Claim 11 ............................................................................ 75
`1.
`“Pre-processing” limitation [11A]. .......................................... 75
`2.
`“Selecting a cluster center” and “using the cluster
`centers” limitations [11B]/[11C]. ............................................. 77
`Dependent Claim 12 ............................................................................ 78
`C.
`IX. The Board Should Reach the Merits of This Petition ................................... 79
`A.
`Evidence Weighs Against Fintiv-based Discretionary Denial. ........... 79
`B.
`Interference Estoppel Does Not Apply or Preclude Review............... 81
`X. Mandatory notices (37 C.F.R. § 42.8(b)) ...................................................... 82
`A.
`Real Party-in-Interest .......................................................................... 82
`B.
`Related Matters .................................................................................... 82
`C.
`Lead and Backup Counsel ................................................................... 82
`XI. Conclusion ..................................................................................................... 83
`
`
`
`
`
`
`
`iii
`
`

`

`
`
`
`INTEL
`Exhibit
`No.
`
`1001
`
`1002
`
`1003
`
`1004
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`EXHIBIT LIST
`
`Description
`
`U.S. Patent 7,117,188 to Isabelle Guyon et al. (“’188 Patent”)
`
`File History for U.S. Patent 7,117,188 (“’188 FH”)
`
`Declaration of Theodoros Evgeniou, Ph.D. in Support of IPR Petition
`
`Curriculum Vitae of Theodoros Evgeniou, Ph.D.
`
`1005 Mukherjee et al., Support Vector Machine Classification of
`Microarray Data, Technical Report C.B.C.L. Paper No. 182, A.I.
`Memo No. 1677, M.I.T. (1998) (“Mukherjee”)
`
`1006
`
`1007
`
`1008
`
`1009
`
`1010
`
`1011
`
`1012
`
`U.S. Patent 6,327,581 to Platt, filed on April 6, 1998 and issued
`December 4, 2001 (“Platt”)
`
`Kohavi et al., Wrappers for feature subset selection, Artificial
`Intelligence 97, 273-324 (1997) (“Kohavi”)
`
`U.S. Patent 5,649,068 to Boser, et al., filed May 16, 1996 and issued
`July 15, 1997 (“Boser”)
`
`Hocking et al., Selection of the Best Subset in Regression Analysis,
`Technometrics, 9:4, 531-540 (1967) (“Hocking”)
`
`Cristianini, N., et al., An Introduction to Support Vector Machines and
`Other Kernel-based Learning Methods, Cambridge University Press.
`2000 (“Cristianini”)
`
`Cortes, C., et al, Support-Vector Networks, Machine Learning, 20,
`273-297 (1995) (“Cortes”)
`
`U.S. Patent 6,122,628 to Castelli, et al., filed Oct. 31, 1997, issued
`September 19, 2000 (“Castelli”)
`
`i
`
`

`

`
`
`
`INTEL
`Exhibit
`No.
`
`1013
`
`1014
`
`1015
`
`1016
`
`1017
`
`1018
`
`1019
`
`1020
`
`1021
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`Description
`
`Saunders, C. et al., Support Vector Machine Reference Manual,
`Department of Computer Science, Royal Holloway, CSD-TR-98-03,
`1998. (“Saunders”),
`
`Burros, R.H., Three Rational Methods for Reduction of Skewness,
`Psychological Bulletin, Vol. 48, No. 6, 505-511 (1951) (“Burros”)
`
`Bradley, Paul, Mathematical Programming Approaches to Machine
`Learning and Data Mining, University of Wisconsin-Madison (Aug.
`27, 1998) (“Bradley”);
`
`Samuel, A.L., Some Studies in Machine Learning Using the Game of
`Checkers, IBM Journal, 1959,
`
`Burges, C., A Tutorial on Support Vector Machines for Pattern
`Recognition, Kluwer Acad. Pub., Boston (1998) (“Burges”),
`
`da Silva, F., Notes on Support Vector Machine, INESC (Nov. 1998)
`(“da Silva”)
`
`Hamaker, H.C., On Multiple Regression Analysis, (March 1962)
`(“Hamaker”)
`
`Rendell, Larry et al., The Feature Selection Problem: Traditional
`Methods and a New Algorithm, AAAI-92 Proceedings (1992)
`(“Rendell”);
`
`Aha, David W. et al., A Comparative Evaluation of Sequential Feature
`Selection Algorithms, AI & Statistics Workshop (1995) (“Aha”)
`
`1022
`
`Golub TR, et al. Molecular Classification Of Cancer: Class Discovery
`and Class Prediction by Gene Expression Monitoring. Science, 1999
`Oct 15; Vol. 286 (“Golub”)
`1023 Mitchell, T., Machine Learning, McGraw-Hill, Inc. (1997)
`(“Mitchell”)
`
`ii
`
`

`

`
`
`
`INTEL
`Exhibit
`No.
`
`1024
`
`1025
`
`1026
`
`1027
`
`1028
`
`1029
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`Description
`
`Herbrich, R., Learning Kernel Classifiers: Theory and Algorithms
`(Adaptive Computation and Machine Learning), The MIT Press
`(2001)
`
`Boser, et al., A Training Algorithm for Optimal Margin Classifiers,
`Computational Learning Theory, 144-152 (July 1992) (“Boser
`Article”)
`
`Cochran, W.G., The Omission or Addition of an Independent Variate
`in Multiple Linear Regression, Wiley for the Royal Statistical Society
`Vol. 5, No. 2 (1938) (“Cochran”)
`
`Oosterhoff, J., On the selection of independent variables in a
`regression equation, Preliminary Report S319 (VP23), Stichting
`Mathematisch Centrum, Amsterdam (1963) (“Oosterhoff”)
`
`G.M. Furnival and R.W. Wilson, Regression by leaps and bounds,
`Technometrics 16 (1974) 499-511. (“Furnival”)
`
`Osuna, E, et al. Support Vector Machines:- Training and Applications;
`MIT C.B.C.L. Paper No. 144; March 1997
`
`1030
`
`Declaration of Sylvia D. Hall-Ellis, Ph.D. and Curriculum Vitae
`
`1031
`
`1032
`
`1033
`
`Website: Archive Publications - Theory of Learning,
`https://web.archive.org/web/20000308145521/http://www.ai.mit.edu/p
`rojects/cbcl/publications/theory-learning.html
`
`Website: MIT AI Lab Projects and Research Groups,
`https://web.archive.org/web/19990221235902/http://www.ai.mit.edu/p
`rojects/
`
`Website: MIT CBCL,
`https://web.archive.org/web/20000418092038/http://www.ai.mit.edu/p
`rojects/cbcl/publications/index-pubs.html
`
`iii
`
`

`

`
`
`
`INTEL
`Exhibit
`No.
`
`1034
`
`1035
`
`1036
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`Description
`
`Mukherjee, S., et al., Support Vector Machine Classification of
`Microarray Data, Artificial Intelligence Lab and Center for Biological
`and Computational Learning, MIT, May 2000
`
`Reserved
`
`Reserved
`
`1037
`Reserved
`1038 MARC Record for Kohavi (INTEL-1007) in Karl F. Wendt
`Engineering Library at the University of Wisconsin – Madison.
`1039 MARC Record for journal Artificial Intelligence from OCLC
`bibliographic database
`
`1040 MARC Record for journal Technometrics at Linda Hall Library
`
`1041
`
`1042
`
`Library of Congress subject header sh2008112270
`
`Library of Congress subject heading sh2008110286
`
`1043
`Library of Congress subject heading sh85046441
`1044 MARC Record for journal Technometrics form OCLC bibliographic
`database
`
`1045 MARC Record for Cristianini in Library of Congress
`
`1046 MARC Record for Cristianini from OCLC bibliographic database
`
`1047
`
`Library of Congress subject heading sh2008009003
`
`1048
`Library of Congress subject heading sh85072061
`1049 MARC Record for Machine Learning in Karl F. Wendt Engineering
`Library
`
`iv
`
`

`

`
`
`
`INTEL
`Exhibit
`No.
`
`1050
`
`1051
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`Description
`
`Library of Congress subject heading sh85079324
`
`Library of Congress subject heading sh85099890
`
`1052
`Library of Congress subject heading sh2007101478
`1053 MARC record for the journal Machine Learning from the OCLC
`bibliographic database
`
`1054
`Technical Reports (Selection), entry 6 for Saunders
`1055 MARC record for the journal Psychological Bulletin at the University
`of Wisconsin – Madison Libraries
`
`1056
`Library of Congress subject heading sh2010108771
`1057 MARC record for the journal Psychological Bulletin obtained from the
`OCLC bibliographic database
`
`1058
`
`Online catalog record for Bradley from the University of Wisconsin –
`Madison Library
`
`1059 Metadata record for the Bradley dissertation in the digital collection
`
`1060
`
`1061
`
`1062
`
`1063
`
`1064
`
`1065
`
`MARC record for the doctoral dissertation, Mathematical
`Programming Approaches to Machine Learning and Data Mining by
`Bradley obtained from the OCLC bibliographic database
`
`Stamped version of INTEL-1011 (Cortes)
`
`Stamped version of INTEL-1014 (Burros)
`
`Stamped version of INTEL-1007 (Kohavi)
`
`Stamped version of INTEL-1009 (Hocking)
`
`ACM Digital Library entry showing 1999 publication of INTEL-1010
`
`v
`
`

`

`
`
`
`INTEL
`Exhibit
`No.
`
`1066
`
`1067
`
`1068
`
`1069
`
`1070
`
`1071
`
`1072
`
`1073
`
`1074
`
`1075
`
`1076
`
`1077
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`Description
`
`Excerpts from Health Discovery Corp., v. Intel Corp., Civil Action
`6:20-cv-00666-ADA, Preliminary Infringement Contentions and
`Exhibits 1-4, served December 1, 2020
`
`Health Discovery Corp., v. Intel Corp., Civil Action 6:20-cv-00666-
`ADA, Scheduling Order, Dkt. No. 27 (W.D. Tex Dec. 21, 2020)
`
`United States District Court, Western District of Texas, General Order
`Regarding Emergency Procedures Authorized by the Coronavirus Aid,
`Relief, dated Mar. 30, 2020;
`
`United States District Court, Western District of Texas, Seventh
`Supplemental Order Regarding Court Operations Under the Exigent
`Circumstances Created by the Covid-19, dated Aug. 6, 2020;
`
`United States District Court, Western District of Texas, Thirteenth
`Supplemental Order Regarding Court Operations Under the Exigent
`Circumstances Created by the Covid-19, dated Feb. 2, 2021
`
`Eric Q. Li et al., v. Jason Weston et al., Patent Interference No.
`106,066-JTM, Li Substantive Motion 1, Paper 20 (PTAB Jan. 23,
`2017)
`
`Eric Q. Li et al., v. Jason Weston et al., Patent Interference No.
`106,066-JTM, Decision on Motions, Paper 148 (PTAB February 27,
`2019)
`
`Reserved
`
`Reserved
`
`Reserved
`
`Reserved
`
`Reserved
`
`vi
`
`

`

`
`
`
`
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`Description
`
`INTEL
`Exhibit
`No.
`
`1078
`
`Reserved
`
`1079
`
`1080
`
`U.S. Patent 6,658,395 to Barnhill, filed May 24, 2000, issued Dec. 2,
`2003 (“’395 patent”)
`
`U.S. Patent 6,128,608 to Barnhill, filed May 1, 1999, issued Oct. 3,
`2000 (“’608 patent”)
`
`1081
`
`Reserved
`
`1082
`
`1083
`
`1084
`
`1085
`
`U.S. Patent 6,427,141 to Barnhill, filed May 9, 2000, issued July 30,
`2002 (“’141 patent”)
`
`Reserved
`
`Reserved
`
`U.S. Patent 6,882,990 to Barnhill et al., filed Aug. 7, 2000, issued
`April 19, 2005 (“’990 patent”)
`
`1086
`
`U.S. Provisional 60/083,961
`
`
`
`vii
`
`

`

`
`
`
`I.
`
`Introduction
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`Intel Corporation (“Petitioner”) requests Inter Partes review of claims 1-23
`
`(the “challenged claims”) of U.S. Patent 7,117,188 (“the ’188 patent”; INTEL-
`
`1001). The ’188 patent is directed to the use of support vector machines (“SVMs”)
`
`and recursive feature elimination (“RFE”) to identify patterns in data. (INTEL-
`
`1001, Abstract.) As such, the challenged claims merely combine a well-known
`
`machine-learning algorithm, SVM, with a known feature-selection technique, RFE.
`
`The Petition, supported by the Declaration of Dr. Theodoros Evgeniou,
`
`demonstrates that the challenged claims are unpatentable.
`
`II. Grounds for Standing
`
`Petitioner certifies the ’188 patent is available for inter partes review and
`
`Petitioner is not barred or estopped from requesting inter partes review.
`
`III.
`
`Identification of Challenge
`A. Citation of prior art
`
`In its preliminary infringement contentions, Patent Owner (“PO”) contends
`
`the priority date for the ’188 patent is May 1, 1999. (INTEL-1066, Ex. 1, 1.)
`
`Contrary to PO’s assertions, the ’188 patent is not entitled to benefit of U.S. Patent
`
`6,128,608 (“the ’608 patent”, INTEL-1080), filed May 1, 1999, for the reasons
`
`discussed in Section IV.C. The priority date for the ’188 patent is no earlier than
`
`the filing date of U.S. Patent 6,882,990, August 7, 2000. (INTEL-1085.)
`
`1
`
`

`

`
`
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`The Grounds cite the following references, each published or filed before
`
`August 7, 2000.
`
`“Support Vector Machine Classification of Microarray Data” to
`
`Mukherjee et al. (“Mukherjee”; INTEL-1005) is prior art under pre-AIA 35 U.S.C.
`
`§102(a). Mukherjee was publicly accessible by at least December 1999 through
`
`MIT’s Artificial Intelligence (“AI”) Laboratory, Center for Biological and
`
`Computational Learning (“CBCL”). (See INTEL-1030, ¶¶42-44.) Mukherjee was
`
`listed on the CBCL website1 as a 1999 publication2. (See INTEL-1031 (website
`
`listing publications, archived March 8, 2000).) CBCL was a research group within
`
`MIT’s AI Laboratory focused on “the problem of learning within a multi-
`
`disciplinary approach, in the areas of theory, engineering application and
`
`neuroscience.” (INTEL-1032 (website listing research projects and groups within
`
`MIT’s AI Laboratory, archived February 21, 1999).) MIT was recognized prior to
`
`August 2000 as a leading research institution in AI/machine learning. (INTEL-
`
`1003, ¶87.) An individual or person of ordinary skill in the art (“POSITA”)
`
`interested in machine learning prior to August 2000 would have known about
`
`MIT’s AI Laboratory and CBCL and would have been able to locate the MIT
`
`
`1 www.ai.mit.edu/projects/cbcl/publications/theory-learning.html.
`2 Mukherjee bears a copyright date of 1998.
`
`2
`
`

`

`
`
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`CBCL website. (Id.) CBCL further indexed its publications by category. (See
`
`INTEL-1033 (website listing categories of publications, archived April 18, 2000).)
`
`Mukherjee was made available in the “Theory of Learning” category. (See INTEL-
`
`1031 (website archived March 8, 2000).) Mukherjee further provides a website on
`
`its front page through which any member of the public can access the publication
`
`without restriction. (See INTEL-1005, 1.) Thus, Mukherjee “was disseminated or
`
`otherwise made available to the extent that persons interested and ordinarily skilled
`
`in the subject matter or art exercising reasonable diligence[ ] can locate it” prior to
`
`August 7, 2000. See Medtronic, Inc. v. Barry, 891 F.3d 1368, 1380 (Fed. Cir.
`
`2018).
`
`A May 2000 paper cites Mukherjee, further evidencing its public
`
`availability. (See INTEL-1034.) Finally, PO cited Mukherjee as prior art in the
`
`’188 patent. (See INTEL-1002, 280, 321.)
`
`U.S. Patent 6,327,581 to Platt (“Platt”; INTEL-1006), filed on April 6,
`
`1998, is prior art under pre-AIA 35 U.S.C. §102(e).
`
`“Wrappers for feature subset selection” to Kohavi et al. (“Kohavi”,
`
`INTEL-1007) is prior art under pre-AIA 35 U.S.C. §102(b). Kohavi was published
`
`in Volume 97, Issues 1-2 of the journal Artificial Intelligence. (INTEL-1030, ¶45.)
`
`This volume of Artificial Intelligence was catalogued, indexed, and made publicly
`
`accessible through the Karl F. Wendt Engineering Library at the University of
`
`3
`
`

`

`
`
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`Wisconsin–Madison library by December 31, 1997. (INTEL-1030, ¶¶46-48.)
`
`Further, Kohavi was catalogued and indexed at the University of Minnesota
`
`Libraries and made part of the OCLC bibliographic database by December 31,
`
`1997, further evidencing its public availability. (See INTEL-1030, ¶¶49-50, see
`
`also, INTEL-1063 (stamped version from Linda Hall Library).)
`
` “Support-Vector Networks” to Cortes et al. (“Cortes”; INTEL-1011) is
`
`prior art under 35 pre-AIA U.S.C. §102(b). Cortes was published in Volume 20,
`
`Number 3 of the journal Machine Learning. This volume of Machine Learning was
`
`catalogued, indexed, and made publicly available through the Karl F. Wendt
`
`Engineering Library at the University of Wisconsin–Madison by October 5, 1995.
`
`(INTEL-1030, ¶¶62-67; see also, INTEL-1061 (stamped version of Cortes from
`
`UW-Madison.)
`
`U.S. Patent 6,122,628 to Castelli, et al. (“Castelli”; INTEL-1012), filed
`
`October 31, 1997, is prior art under pre-AIA 35 U.S.C. § 102(e).
`
`These references were not applied by the Examiner in a rejection or
`
`substantively discussed during prosecution of the ’188 patent. (See INTEL-1002,
`
`315.)
`
`4
`
`

`

`
`
`
`1
`2
`3
`4
`5
`6
`
`B. Grounds for Challenge
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`Ground
`
`§103
`
`§103
`
`§103
`
`§103
`
`§103
`
`§103
`
`Combination
`
`Mukherjee, Platt
`
`Mukherjee, Platt, Kohavi
`
`Mukherjee, Platt, Cortes
`
`Mukherjee, Platt, Kohavi, Cortes
`
`Mukherjee, Platt, Castelli
`
`Mukherjee, Platt, Kohavi, Castelli
`
`Claims
`
`1-10, 13-23
`
`1-10, 13-23
`
`2
`
`2
`
`11-12
`
`11-12
`
`
`IV. The ’188 patent
`
`The ’188 patent is directed to an SVM implementing RFE for feature
`
`selection. (INTEL-1001, Abstract.) As detailed in the Technical Background and
`
`acknowledged in the ’188 patent, SVMs were well-known machine learning
`
`classifiers. RFE is simply an iterative, backwards feature-selection technique used
`
`in statistical methods and machine learning long before the ’188 patent.
`
`A. Technology Background
`1. Machine Learning
`Machine learning gives computers the ability to learn without being
`
`explicitly programmed. (INTEL-1016, 210-229.) Machine learning algorithms can
`
`be categorized as supervised (like the ’188 patent) or unsupervised. In supervised
`
`learning, a sample of input-output pairs, (x, y): (x1, y1), . . . , (xm, ym), referred to as
`
`5
`
`

`

`
`
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`“training data” (illustrated below), is provided to the algorithm. The values xi are
`
`referred to as observations/patterns and the values yi as labels/classifications.
`
`“Classification” is a common supervised learning problem in which outputs are
`
`discrete values (e.g., 0 or 1). (INTEL-1010, 2.) In the example below, training data
`
`represented by solid circles classifies as positive (e.g., 1) and by open circles
`
`classifies as negative (e.g., 0). (INTEL-1003, ¶31.)
`
`
`
`The underlying function mapping inputs to outputs is referred to as a
`
`decision function for classification problems. (INTEL-1010, 2.) Using the decision
`
`function, the learning algorithm generalizes to new, unseen data points⎯i.e., given
`
`a new data pattern, x, (i.e., “live data”), the learning algorithm predicts the
`
`corresponding label, y (classification). As shown below, many functions
`
`(represented by lines) can be used to separate (classify) data. The goal is to
`
`6
`
`

`

`
`
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`optimize the “fit” of the function to maximize predictive accuracy. (INTEL-1003,
`
`¶¶32-33.)
`
`
`
`The selected function, however, can be subject to overfitting caused by a
`
`decision function that fits the available data but does not generalize well to predict
`
`new data. (INTEL-1003, ¶34.) Overfitting usually indicates the selection of an
`
`overly complex decision function having too many fetures. (Id.)
`
`2.
`SVMs
`SVMs, introduced in 1992, are a set of supervised learning methods used for
`
`classification. (INTEL-1003, ¶35.) SVMs use “a hypothesis [or “primal”] space of
`
`linear functions in a high dimensional feature [or “dual”] space, trained with a
`
`learning algorithm from optimization theory that implements a learning bias
`
`derived from statistical learning theory.” (INTEL-1010, 7.) SVMs therefore belong
`
`to the family of generalized linear classifiers. (INTEL-1003, ¶35.)
`
`7
`
`

`

`
`
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`The decision boundary created by an SVM is called a hyperplane (labeled
`
`H0) separating positive from negative samples/observations. (INTEL-1017, 8.) The
`
`distance from the separating hyperplane (H0) to the closest sample/observation is
`
`the margin, illustrated below for a linearly separable case. (Id.) A pair of
`
`hyperplanes (H1, H2), parallel to the separating hyperplane, gives the maximum
`
`margin. Training points that lie on one of these hyperplanes (H1 or H2) are called
`
`support vectors (indicated with red circles). (Id.)
`
`
`
`A linear SVM fits the widest possible margin between output classes by
`
`predicting the output class (y) using the decision function:
`
`𝐰(cid:2904)∙𝐱+𝑏= 𝑤(cid:2869)x(cid:2869)+ ⋯+ 𝑤(cid:2924)x(cid:2924)+𝑏
`
`8
`
`

`

`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`where w is the weight vector3, x is the input vector (or pattern) having components
`
`
`
`
`(features) x1, x2, …, xn, and b is a constant (bias). (INTEL-1003, ¶37.) The linear
`
`combination of features in the input vector, x, and weights in the weight vector, w,
`
`adjusting for bias, predicts the output, y. (Id.)
`
`In the case of linearly separable data, the following inequality-constrained
`
`(INTEL-1003, ¶39.) SVMs use optimization (maximizing the margin) to construct
`
`the optimal hyperplane:
`
`equations can be satisfied: 𝑥⃗(cid:3036)∙𝑤(cid:4652)(cid:4652)⃗+𝑏≥+1 for 𝑦(cid:3036)=+1
`𝑥⃗(cid:3036)∙𝑤(cid:4652)(cid:4652)⃗+𝑏<−1 for 𝑦(cid:3036)=−1
`𝒘𝟎∙𝒙+𝑏(cid:2868)=0.
`providing the maximum margin can be determined by minimizing ‖𝒘‖(cid:2870). (INTEL-
`
`(Id.; see also INTEL-1011, 278, 291.) SVM theory dictates that the hyperplane
`
`1003, ¶39.) Thus, finding the optimal hyperplane is an optimization problem that
`
`can be solved by optimization techniques.
`
`Numerous optimization techniques were known prior to the ’188 patent.
`
`(INTEL-1003, ¶¶47-54.) “Dual optimization” provides an alternative formulation
`
`of a mathematical problem that is computationally easier to solve. (INTEL-1003,
`
`
`3 wT denotes the transpose of the weight vector matrix.
`
`9
`
`

`

`
`
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`¶48.) In dual optimization, an original parameter (e.g., w) is replaced by a new
`
`parameter (e.g., α), to derive the alternative formulation. (Id.) Lagrange multipliers
`
`(i.e., α) are one example of a dual optimization technique. (Id.; see also INTEL-
`
`1011, 291.) Lagrange multipliers can be used to find an extreme value of a
`
`function f(x) subject to constraint g(x). In the case of an SVM with f(x)=‖𝒘‖(cid:2870) and
`g(x)=𝑦(cid:3036)(cid:4666)𝑥⃗(cid:3036)∙𝑤(cid:4652)(cid:4652)⃗+𝑏(cid:4667)−1 , the resulting Lagrangian is:
`12𝒘∙𝒘−(cid:3533)𝛼(cid:3036)(cid:4670)𝑦(cid:3036)(cid:4666)𝒙𝒊∙𝒘+𝑏(cid:4667)−1(cid:4671)
`.
`(cid:3039)
`(cid:3036)(cid:2880)(cid:2869)
`
`
`(INTEL-1003, ¶49; see also INTEL-1011, 291-92.) In constructing this “dual”
`
`equation, the following additional constraints on equation (1) must be used to
`
`(1)
`
`account for requirements of the “primal” (original) problem:
`
`𝒘𝟎=(cid:3533)𝛼(cid:3036)𝑦(cid:3036)𝑥(cid:3036)
`(cid:3039)
`(cid:3036)(cid:2880)(cid:2869)
`=0
`(cid:3533)𝛼(cid:3036)𝑦(cid:3036)
`(cid:3036)
`INTEL-1011, 291.) Only the input patterns (training vectors) 𝒙𝒊 with 𝜶𝒊>0
`contribute to the optimal hyperplane 𝒘𝟎. The hyperplane (i.e., 𝒘𝟎∙𝒙+𝑏=0) is
`
`
`
`
`
`(2)
`
`
`
`(3)
`
`
`where l is the number of samples/observations x. (INTEL-1003, ¶49; see also
`
`therefore defined by the linear combination of only a subset of initial input pattern
`
`vectors⎯the “support vectors.” (INTEL-1003, ¶51.)
`
`10
`
`

`

`As can be seen from constraint (2), a relationship exists between feature
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`
`
`
`
`
`
`weight vectors, 𝒘, and pattern weight vectors, 𝜶, such that one can be
`
`mathematically derived and calculated from the other. (INTEL-1003, ¶¶52-54.)
`
`Specifically, substituting constraints (2) and (3) into equation (1) results in the
`
`following:
`
`
`(INTEL-1003, ¶54; see also INTEL-1011, 291-92.) A POSITA would recognize
`
`(4)
`
`(cid:3036)(cid:2880)(cid:2869) −12(cid:3533)𝛼(cid:3036)𝛼(cid:3037)𝑦(cid:3036)𝑦(cid:3037)𝒙𝒊∙𝒙𝒋
`(cid:3533)𝛼(cid:3036)
`.
`(cid:3039)
`(cid:3039)
`(cid:3036),(cid:3037)(cid:2880)(cid:2869)
`that the dot product, 𝒙𝒊∙𝒙𝒋, is a linear kernel of the form 𝐾(cid:3435)𝑥(cid:3036),𝑥(cid:3037)(cid:3439)=𝒙𝒊∙𝒙𝒋.
`𝛼(cid:3038)𝑲(cid:4666)𝒙,𝒙(cid:3038)(cid:4667)+𝑏
`, becomes 𝑑(cid:4666)𝑥(cid:4667)=𝒘∙
`the “dual” decision function, 𝑑(cid:4666)𝑥(cid:4667)=∑
`(cid:3041)(cid:3038)(cid:2880)(cid:2869)
`
`𝒙+𝑏 in “primal” space, with 𝒘=∑𝛼(cid:3036)𝑦(cid:3036)𝑥(cid:3036)(cid:3036)
`
`(INTEL-1003, ¶54; see also, INTEL-1025.) Accordingly, in the linear kernel case,
`
`. (INTEL-1003, ¶54.)
`
`Many data sets, however, are not linearly separable. One solution to solve
`
`non-linear classification problems with linear classifiers is to generate new features
`
`from the input data such that the input patterns become linearly separable in
`
`expanded “dual” (or feature) space. (INTEL-1018, 19.) Feature expansion requires
`
`explicit translation of training patterns from low dimensional space to higher
`
`dimensional space were the problem is linearly separable as shown below.
`
`(INTEL-1018, 29.) SVMs use kernel functions (e.g., linear, polynomial, gaussian,
`
`11
`
`

`

`
`
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`and sigmoidal) to specify this non-linear transformation. (INTEL-1018, 30-32;
`
`INTEL-1003, ¶¶55-59.)
`
`
`Even in scenarios where it is possible to completely separate datasets, better
`
`generalization may be achieved when errors are permitted in the hypothesis.
`
`(INTEL-1003, ¶60; INTEL-1011, 286.) This type of SVM is known as a soft
`
`margin SVM. (INTEL-1003, ¶60; INTEL-1011, 280.) In soft margin SVMs, new
`
`variables, referred to as “slack variables,” are introduced in the optimization
`
`problem to account for errors. (INTEL-1003, ¶60; INTEL-1010, Ch. 6.1.2.)
`
`3.
`Feature Selection
`A machine learning dataset is generally comprised of rows
`
`(samples/patterns) and columns with a set of columns associated with sample
`
`features and one column associated with the output/label. For example, the training
`
`set in an exemplary SVM predicting people most likely to buy lottery tickets
`
`12
`
`

`

`
`
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`consists of known information of various people (top table) plotted as data points
`
`(bottom table):
`
`Known Information
`
`
`
`Age
`
`Salary
`
`Person1
`Person2
`Person3
`Person4
`Person5
`
`40
`
`60
`
`28
`
`37
`
`65
`
`$50,000
`
`$500,000
`
`$30,000
`
`$150,000
`
`$30,000
`
`No. of
`Children Gender Married
`0
`M
`Y
`
`1
`
`5
`
`3
`
`3
`
`F
`
`M
`
`F
`
`F
`
`Y
`
`N
`
`N
`
`Y
`
`Buy
`
`N
`
`Y
`
`Y
`
`N
`
`Y
`
`Data Points
`
`
`
`Feature1 Feature2 Feature3 Feature4 Feature5
`
`Buy
`
`Person1
`Person2
`Person3
`Person4
`Person5
`
`0.4
`
`0.6
`
`.28
`
`.37
`
`.65
`
`0.1
`
`1
`
`0.06
`
`0.3
`
`0.06
`
`0
`
`0.2
`
`1
`
`0.6
`
`0.6
`
`0
`
`1
`
`0
`
`1
`
`1
`
`1
`
`1
`
`0
`
`0
`
`1
`
`0
`
`1
`
`1
`
`0
`
`1
`
`The first 5 columns are features and the last column is the classification. In
`
`
`
`
`
`this exemplary dataset, features can have continuous values (e.g., age) or can be
`
`binary (e.g., marital status). The data in the bottom table has been converted (pre-
`
`13
`
`

`

`
`
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`processed) into a usable format (e.g., 0/1 versus N/Y) and scaled so feature values
`
`are comparable (e.g., $500,000 scaled to 1).
`
`Feature selection selects the most relevant features to improve predictability
`
`of new classifications. The terms “feature elimination” and “feature selection” both
`
`refer to the process of starting from a set of features and ending up with a smaller
`
`subset⎯“selecting” features included in the subset and “eliminating” the
`
`remainder. (INTEL-1003, ¶64.)
`
`a. History of Feature Selection
`Feature elimination⎯individually or by feature groups⎯ has been
`
`considered by various communities for at least four decades including the
`
`“classical statistics” community, focusing on the widely used statistical method of
`
`linear regression; the “traditional” (less statistical) machine learning community;
`
`and the “statistical” machine learning community. The ’188 patent belongs to the
`
`statistical com

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket