`
`
`UNITED STATES PATENT AND TRADEMARK OFFICE
`___________________
`
`BEFORE THE PATENT TRIAL AND APPEAL BOARD
`___________________
`
`INTEL CORPORATION
`Petitioner
`
`
`v.
`
`
`
`HEALTH DISCOVERY CORPORATION
`Patent Owner
`___________________
`
`Case IPR2021-00549
`Patent 7,117,188
`___________________
`
`PETITION FOR INTER PARTES REVIEW OF
`U.S. PATENT 7,117,188
`
`
`
`
`
`
`
`Mail Stop PATENT BOARD
`Patent Trial and Appeal Board
`U.S. Patent & Trademark Office
`P.O. Box 1450
`Alexandria, VA 22313–1450
`
`
`
`
`
`
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`I.
`II.
`III.
`
`TABLE OF CONTENTS
`Introduction ...................................................................................................... 1
`Grounds for Standing ....................................................................................... 1
`Identification of Challenge .............................................................................. 1
`A.
`Citation of prior art ................................................................................ 1
`B.
`Grounds for Challenge .......................................................................... 5
`IV. The ’188 patent ................................................................................................ 5
`A.
`Technology Background ....................................................................... 5
`1. Machine Learning ....................................................................... 5
`2.
`SVMs .......................................................................................... 7
`3.
`Feature Selection ....................................................................... 12
`The ’188 patent .................................................................................... 16
`B.
`Priority Date ........................................................................................ 18
`C.
`Claim Construction ............................................................................. 21
`D.
`Level of Ordinary Skill in the Art ....................................................... 21
`E.
`V. GROUND 1: Combination of Mukherjee and Platt renders
`claims 1-10 and 13-23 obvious. ..................................................................... 21
`A.
`Combination Overview. ...................................................................... 21
`1. Mukherjee ................................................................................. 21
`2.
`Platt ........................................................................................... 23
`3. Motivation to Combine ............................................................. 24
`Independent Claim 1 ........................................................................... 27
`1.
`Preamble [1P]. ........................................................................... 27
`
`B.
`
`i
`
`
`
`
`
`
`C.
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`“Inputting” limitation [1A]. ...................................................... 29
`2.
`“Optimizing the plurality of weights” limitation [1B]. ............. 35
`3.
`“Computing” limitations [1C]. .................................................. 38
`4.
`“Eliminating” limitation [1D]. .................................................. 40
`5.
`“Repeating steps” limitation [1E]. ............................................ 41
`6.
`“Inputting … a live set of data” limitation [1F]. ....................... 45
`7.
`Independent claims 13 and 19 ............................................................. 47
`1.
`Preambles [13P]/[19P] .............................................................. 48
`2.
`“Optimum subset of features” limitations [13E]/[19E] ............ 49
`D. Dependent Claims 2-10, 14-18, 20-23 ................................................ 50
`1.
`Soft Margin SVM: Claim 2 ....................................................... 50
`2.
`Ranking Criterion: Claim 3 ....................................................... 52
`3.
`Quadratic Decision Function: Claim 4 ..................................... 53
`4.
`Feature Elimination: Claims 5-7, 14-16, and 21-23 ................. 55
`5.
`Gene Expression Data: Claim 8 ................................................ 57
`6.
`Pre-Processing: Claims 9 and 18 .............................................. 58
`7.
`New SVM: Claims 10, 17, and 20 ............................................ 59
`VI. GROUND 2: Combination of Mukherjee, Platt, and Kohavi renders claims
`1-10 and 13-23 obvious. ................................................................................ 59
`A.
`Independent claims 1, 13, and 19 ........................................................ 61
`B. Motivation to Combine ....................................................................... 67
`VII. GROUND 3: Combination of Mukherjee, Platt, Kohavi, and Cortes renders
`claim 2 obvious. ............................................................................................. 69
`
`
`
`
`ii
`
`
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`VIII. GROUND 4: Combination of Mukherjee, Platt, Kohavi, and Castelli renders
`claims 11-12 obvious. .................................................................................... 72
`A.
`Combination Overview ....................................................................... 72
`B.
`Dependent Claim 11 ............................................................................ 75
`1.
`“Pre-processing” limitation [11A]. .......................................... 75
`2.
`“Selecting a cluster center” and “using the cluster
`centers” limitations [11B]/[11C]. ............................................. 77
`Dependent Claim 12 ............................................................................ 78
`C.
`IX. The Board Should Reach the Merits of This Petition ................................... 79
`A.
`Evidence Weighs Against Fintiv-based Discretionary Denial. ........... 79
`B.
`Interference Estoppel Does Not Apply or Preclude Review............... 81
`X. Mandatory notices (37 C.F.R. § 42.8(b)) ...................................................... 82
`A.
`Real Party-in-Interest .......................................................................... 82
`B.
`Related Matters .................................................................................... 82
`C.
`Lead and Backup Counsel ................................................................... 82
`XI. Conclusion ..................................................................................................... 83
`
`
`
`
`
`
`
`iii
`
`
`
`
`
`
`INTEL
`Exhibit
`No.
`
`1001
`
`1002
`
`1003
`
`1004
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`EXHIBIT LIST
`
`Description
`
`U.S. Patent 7,117,188 to Isabelle Guyon et al. (“’188 Patent”)
`
`File History for U.S. Patent 7,117,188 (“’188 FH”)
`
`Declaration of Theodoros Evgeniou, Ph.D. in Support of IPR Petition
`
`Curriculum Vitae of Theodoros Evgeniou, Ph.D.
`
`1005 Mukherjee et al., Support Vector Machine Classification of
`Microarray Data, Technical Report C.B.C.L. Paper No. 182, A.I.
`Memo No. 1677, M.I.T. (1998) (“Mukherjee”)
`
`1006
`
`1007
`
`1008
`
`1009
`
`1010
`
`1011
`
`1012
`
`U.S. Patent 6,327,581 to Platt, filed on April 6, 1998 and issued
`December 4, 2001 (“Platt”)
`
`Kohavi et al., Wrappers for feature subset selection, Artificial
`Intelligence 97, 273-324 (1997) (“Kohavi”)
`
`U.S. Patent 5,649,068 to Boser, et al., filed May 16, 1996 and issued
`July 15, 1997 (“Boser”)
`
`Hocking et al., Selection of the Best Subset in Regression Analysis,
`Technometrics, 9:4, 531-540 (1967) (“Hocking”)
`
`Cristianini, N., et al., An Introduction to Support Vector Machines and
`Other Kernel-based Learning Methods, Cambridge University Press.
`2000 (“Cristianini”)
`
`Cortes, C., et al, Support-Vector Networks, Machine Learning, 20,
`273-297 (1995) (“Cortes”)
`
`U.S. Patent 6,122,628 to Castelli, et al., filed Oct. 31, 1997, issued
`September 19, 2000 (“Castelli”)
`
`i
`
`
`
`
`
`
`INTEL
`Exhibit
`No.
`
`1013
`
`1014
`
`1015
`
`1016
`
`1017
`
`1018
`
`1019
`
`1020
`
`1021
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`Description
`
`Saunders, C. et al., Support Vector Machine Reference Manual,
`Department of Computer Science, Royal Holloway, CSD-TR-98-03,
`1998. (“Saunders”),
`
`Burros, R.H., Three Rational Methods for Reduction of Skewness,
`Psychological Bulletin, Vol. 48, No. 6, 505-511 (1951) (“Burros”)
`
`Bradley, Paul, Mathematical Programming Approaches to Machine
`Learning and Data Mining, University of Wisconsin-Madison (Aug.
`27, 1998) (“Bradley”);
`
`Samuel, A.L., Some Studies in Machine Learning Using the Game of
`Checkers, IBM Journal, 1959,
`
`Burges, C., A Tutorial on Support Vector Machines for Pattern
`Recognition, Kluwer Acad. Pub., Boston (1998) (“Burges”),
`
`da Silva, F., Notes on Support Vector Machine, INESC (Nov. 1998)
`(“da Silva”)
`
`Hamaker, H.C., On Multiple Regression Analysis, (March 1962)
`(“Hamaker”)
`
`Rendell, Larry et al., The Feature Selection Problem: Traditional
`Methods and a New Algorithm, AAAI-92 Proceedings (1992)
`(“Rendell”);
`
`Aha, David W. et al., A Comparative Evaluation of Sequential Feature
`Selection Algorithms, AI & Statistics Workshop (1995) (“Aha”)
`
`1022
`
`Golub TR, et al. Molecular Classification Of Cancer: Class Discovery
`and Class Prediction by Gene Expression Monitoring. Science, 1999
`Oct 15; Vol. 286 (“Golub”)
`1023 Mitchell, T., Machine Learning, McGraw-Hill, Inc. (1997)
`(“Mitchell”)
`
`ii
`
`
`
`
`
`
`INTEL
`Exhibit
`No.
`
`1024
`
`1025
`
`1026
`
`1027
`
`1028
`
`1029
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`Description
`
`Herbrich, R., Learning Kernel Classifiers: Theory and Algorithms
`(Adaptive Computation and Machine Learning), The MIT Press
`(2001)
`
`Boser, et al., A Training Algorithm for Optimal Margin Classifiers,
`Computational Learning Theory, 144-152 (July 1992) (“Boser
`Article”)
`
`Cochran, W.G., The Omission or Addition of an Independent Variate
`in Multiple Linear Regression, Wiley for the Royal Statistical Society
`Vol. 5, No. 2 (1938) (“Cochran”)
`
`Oosterhoff, J., On the selection of independent variables in a
`regression equation, Preliminary Report S319 (VP23), Stichting
`Mathematisch Centrum, Amsterdam (1963) (“Oosterhoff”)
`
`G.M. Furnival and R.W. Wilson, Regression by leaps and bounds,
`Technometrics 16 (1974) 499-511. (“Furnival”)
`
`Osuna, E, et al. Support Vector Machines:- Training and Applications;
`MIT C.B.C.L. Paper No. 144; March 1997
`
`1030
`
`Declaration of Sylvia D. Hall-Ellis, Ph.D. and Curriculum Vitae
`
`1031
`
`1032
`
`1033
`
`Website: Archive Publications - Theory of Learning,
`https://web.archive.org/web/20000308145521/http://www.ai.mit.edu/p
`rojects/cbcl/publications/theory-learning.html
`
`Website: MIT AI Lab Projects and Research Groups,
`https://web.archive.org/web/19990221235902/http://www.ai.mit.edu/p
`rojects/
`
`Website: MIT CBCL,
`https://web.archive.org/web/20000418092038/http://www.ai.mit.edu/p
`rojects/cbcl/publications/index-pubs.html
`
`iii
`
`
`
`
`
`
`INTEL
`Exhibit
`No.
`
`1034
`
`1035
`
`1036
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`Description
`
`Mukherjee, S., et al., Support Vector Machine Classification of
`Microarray Data, Artificial Intelligence Lab and Center for Biological
`and Computational Learning, MIT, May 2000
`
`Reserved
`
`Reserved
`
`1037
`Reserved
`1038 MARC Record for Kohavi (INTEL-1007) in Karl F. Wendt
`Engineering Library at the University of Wisconsin – Madison.
`1039 MARC Record for journal Artificial Intelligence from OCLC
`bibliographic database
`
`1040 MARC Record for journal Technometrics at Linda Hall Library
`
`1041
`
`1042
`
`Library of Congress subject header sh2008112270
`
`Library of Congress subject heading sh2008110286
`
`1043
`Library of Congress subject heading sh85046441
`1044 MARC Record for journal Technometrics form OCLC bibliographic
`database
`
`1045 MARC Record for Cristianini in Library of Congress
`
`1046 MARC Record for Cristianini from OCLC bibliographic database
`
`1047
`
`Library of Congress subject heading sh2008009003
`
`1048
`Library of Congress subject heading sh85072061
`1049 MARC Record for Machine Learning in Karl F. Wendt Engineering
`Library
`
`iv
`
`
`
`
`
`
`INTEL
`Exhibit
`No.
`
`1050
`
`1051
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`Description
`
`Library of Congress subject heading sh85079324
`
`Library of Congress subject heading sh85099890
`
`1052
`Library of Congress subject heading sh2007101478
`1053 MARC record for the journal Machine Learning from the OCLC
`bibliographic database
`
`1054
`Technical Reports (Selection), entry 6 for Saunders
`1055 MARC record for the journal Psychological Bulletin at the University
`of Wisconsin – Madison Libraries
`
`1056
`Library of Congress subject heading sh2010108771
`1057 MARC record for the journal Psychological Bulletin obtained from the
`OCLC bibliographic database
`
`1058
`
`Online catalog record for Bradley from the University of Wisconsin –
`Madison Library
`
`1059 Metadata record for the Bradley dissertation in the digital collection
`
`1060
`
`1061
`
`1062
`
`1063
`
`1064
`
`1065
`
`MARC record for the doctoral dissertation, Mathematical
`Programming Approaches to Machine Learning and Data Mining by
`Bradley obtained from the OCLC bibliographic database
`
`Stamped version of INTEL-1011 (Cortes)
`
`Stamped version of INTEL-1014 (Burros)
`
`Stamped version of INTEL-1007 (Kohavi)
`
`Stamped version of INTEL-1009 (Hocking)
`
`ACM Digital Library entry showing 1999 publication of INTEL-1010
`
`v
`
`
`
`
`
`
`INTEL
`Exhibit
`No.
`
`1066
`
`1067
`
`1068
`
`1069
`
`1070
`
`1071
`
`1072
`
`1073
`
`1074
`
`1075
`
`1076
`
`1077
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`Description
`
`Excerpts from Health Discovery Corp., v. Intel Corp., Civil Action
`6:20-cv-00666-ADA, Preliminary Infringement Contentions and
`Exhibits 1-4, served December 1, 2020
`
`Health Discovery Corp., v. Intel Corp., Civil Action 6:20-cv-00666-
`ADA, Scheduling Order, Dkt. No. 27 (W.D. Tex Dec. 21, 2020)
`
`United States District Court, Western District of Texas, General Order
`Regarding Emergency Procedures Authorized by the Coronavirus Aid,
`Relief, dated Mar. 30, 2020;
`
`United States District Court, Western District of Texas, Seventh
`Supplemental Order Regarding Court Operations Under the Exigent
`Circumstances Created by the Covid-19, dated Aug. 6, 2020;
`
`United States District Court, Western District of Texas, Thirteenth
`Supplemental Order Regarding Court Operations Under the Exigent
`Circumstances Created by the Covid-19, dated Feb. 2, 2021
`
`Eric Q. Li et al., v. Jason Weston et al., Patent Interference No.
`106,066-JTM, Li Substantive Motion 1, Paper 20 (PTAB Jan. 23,
`2017)
`
`Eric Q. Li et al., v. Jason Weston et al., Patent Interference No.
`106,066-JTM, Decision on Motions, Paper 148 (PTAB February 27,
`2019)
`
`Reserved
`
`Reserved
`
`Reserved
`
`Reserved
`
`Reserved
`
`vi
`
`
`
`
`
`
`
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`Description
`
`INTEL
`Exhibit
`No.
`
`1078
`
`Reserved
`
`1079
`
`1080
`
`U.S. Patent 6,658,395 to Barnhill, filed May 24, 2000, issued Dec. 2,
`2003 (“’395 patent”)
`
`U.S. Patent 6,128,608 to Barnhill, filed May 1, 1999, issued Oct. 3,
`2000 (“’608 patent”)
`
`1081
`
`Reserved
`
`1082
`
`1083
`
`1084
`
`1085
`
`U.S. Patent 6,427,141 to Barnhill, filed May 9, 2000, issued July 30,
`2002 (“’141 patent”)
`
`Reserved
`
`Reserved
`
`U.S. Patent 6,882,990 to Barnhill et al., filed Aug. 7, 2000, issued
`April 19, 2005 (“’990 patent”)
`
`1086
`
`U.S. Provisional 60/083,961
`
`
`
`vii
`
`
`
`
`
`
`I.
`
`Introduction
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`Intel Corporation (“Petitioner”) requests Inter Partes review of claims 1-23
`
`(the “challenged claims”) of U.S. Patent 7,117,188 (“the ’188 patent”; INTEL-
`
`1001). The ’188 patent is directed to the use of support vector machines (“SVMs”)
`
`and recursive feature elimination (“RFE”) to identify patterns in data. (INTEL-
`
`1001, Abstract.) As such, the challenged claims merely combine a well-known
`
`machine-learning algorithm, SVM, with a known feature-selection technique, RFE.
`
`The Petition, supported by the Declaration of Dr. Theodoros Evgeniou,
`
`demonstrates that the challenged claims are unpatentable.
`
`II. Grounds for Standing
`
`Petitioner certifies the ’188 patent is available for inter partes review and
`
`Petitioner is not barred or estopped from requesting inter partes review.
`
`III.
`
`Identification of Challenge
`A. Citation of prior art
`
`In its preliminary infringement contentions, Patent Owner (“PO”) contends
`
`the priority date for the ’188 patent is May 1, 1999. (INTEL-1066, Ex. 1, 1.)
`
`Contrary to PO’s assertions, the ’188 patent is not entitled to benefit of U.S. Patent
`
`6,128,608 (“the ’608 patent”, INTEL-1080), filed May 1, 1999, for the reasons
`
`discussed in Section IV.C. The priority date for the ’188 patent is no earlier than
`
`the filing date of U.S. Patent 6,882,990, August 7, 2000. (INTEL-1085.)
`
`1
`
`
`
`
`
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`The Grounds cite the following references, each published or filed before
`
`August 7, 2000.
`
`“Support Vector Machine Classification of Microarray Data” to
`
`Mukherjee et al. (“Mukherjee”; INTEL-1005) is prior art under pre-AIA 35 U.S.C.
`
`§102(a). Mukherjee was publicly accessible by at least December 1999 through
`
`MIT’s Artificial Intelligence (“AI”) Laboratory, Center for Biological and
`
`Computational Learning (“CBCL”). (See INTEL-1030, ¶¶42-44.) Mukherjee was
`
`listed on the CBCL website1 as a 1999 publication2. (See INTEL-1031 (website
`
`listing publications, archived March 8, 2000).) CBCL was a research group within
`
`MIT’s AI Laboratory focused on “the problem of learning within a multi-
`
`disciplinary approach, in the areas of theory, engineering application and
`
`neuroscience.” (INTEL-1032 (website listing research projects and groups within
`
`MIT’s AI Laboratory, archived February 21, 1999).) MIT was recognized prior to
`
`August 2000 as a leading research institution in AI/machine learning. (INTEL-
`
`1003, ¶87.) An individual or person of ordinary skill in the art (“POSITA”)
`
`interested in machine learning prior to August 2000 would have known about
`
`MIT’s AI Laboratory and CBCL and would have been able to locate the MIT
`
`
`1 www.ai.mit.edu/projects/cbcl/publications/theory-learning.html.
`2 Mukherjee bears a copyright date of 1998.
`
`2
`
`
`
`
`
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`CBCL website. (Id.) CBCL further indexed its publications by category. (See
`
`INTEL-1033 (website listing categories of publications, archived April 18, 2000).)
`
`Mukherjee was made available in the “Theory of Learning” category. (See INTEL-
`
`1031 (website archived March 8, 2000).) Mukherjee further provides a website on
`
`its front page through which any member of the public can access the publication
`
`without restriction. (See INTEL-1005, 1.) Thus, Mukherjee “was disseminated or
`
`otherwise made available to the extent that persons interested and ordinarily skilled
`
`in the subject matter or art exercising reasonable diligence[ ] can locate it” prior to
`
`August 7, 2000. See Medtronic, Inc. v. Barry, 891 F.3d 1368, 1380 (Fed. Cir.
`
`2018).
`
`A May 2000 paper cites Mukherjee, further evidencing its public
`
`availability. (See INTEL-1034.) Finally, PO cited Mukherjee as prior art in the
`
`’188 patent. (See INTEL-1002, 280, 321.)
`
`U.S. Patent 6,327,581 to Platt (“Platt”; INTEL-1006), filed on April 6,
`
`1998, is prior art under pre-AIA 35 U.S.C. §102(e).
`
`“Wrappers for feature subset selection” to Kohavi et al. (“Kohavi”,
`
`INTEL-1007) is prior art under pre-AIA 35 U.S.C. §102(b). Kohavi was published
`
`in Volume 97, Issues 1-2 of the journal Artificial Intelligence. (INTEL-1030, ¶45.)
`
`This volume of Artificial Intelligence was catalogued, indexed, and made publicly
`
`accessible through the Karl F. Wendt Engineering Library at the University of
`
`3
`
`
`
`
`
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`Wisconsin–Madison library by December 31, 1997. (INTEL-1030, ¶¶46-48.)
`
`Further, Kohavi was catalogued and indexed at the University of Minnesota
`
`Libraries and made part of the OCLC bibliographic database by December 31,
`
`1997, further evidencing its public availability. (See INTEL-1030, ¶¶49-50, see
`
`also, INTEL-1063 (stamped version from Linda Hall Library).)
`
` “Support-Vector Networks” to Cortes et al. (“Cortes”; INTEL-1011) is
`
`prior art under 35 pre-AIA U.S.C. §102(b). Cortes was published in Volume 20,
`
`Number 3 of the journal Machine Learning. This volume of Machine Learning was
`
`catalogued, indexed, and made publicly available through the Karl F. Wendt
`
`Engineering Library at the University of Wisconsin–Madison by October 5, 1995.
`
`(INTEL-1030, ¶¶62-67; see also, INTEL-1061 (stamped version of Cortes from
`
`UW-Madison.)
`
`U.S. Patent 6,122,628 to Castelli, et al. (“Castelli”; INTEL-1012), filed
`
`October 31, 1997, is prior art under pre-AIA 35 U.S.C. § 102(e).
`
`These references were not applied by the Examiner in a rejection or
`
`substantively discussed during prosecution of the ’188 patent. (See INTEL-1002,
`
`315.)
`
`4
`
`
`
`
`
`
`1
`2
`3
`4
`5
`6
`
`B. Grounds for Challenge
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`Ground
`
`§103
`
`§103
`
`§103
`
`§103
`
`§103
`
`§103
`
`Combination
`
`Mukherjee, Platt
`
`Mukherjee, Platt, Kohavi
`
`Mukherjee, Platt, Cortes
`
`Mukherjee, Platt, Kohavi, Cortes
`
`Mukherjee, Platt, Castelli
`
`Mukherjee, Platt, Kohavi, Castelli
`
`Claims
`
`1-10, 13-23
`
`1-10, 13-23
`
`2
`
`2
`
`11-12
`
`11-12
`
`
`IV. The ’188 patent
`
`The ’188 patent is directed to an SVM implementing RFE for feature
`
`selection. (INTEL-1001, Abstract.) As detailed in the Technical Background and
`
`acknowledged in the ’188 patent, SVMs were well-known machine learning
`
`classifiers. RFE is simply an iterative, backwards feature-selection technique used
`
`in statistical methods and machine learning long before the ’188 patent.
`
`A. Technology Background
`1. Machine Learning
`Machine learning gives computers the ability to learn without being
`
`explicitly programmed. (INTEL-1016, 210-229.) Machine learning algorithms can
`
`be categorized as supervised (like the ’188 patent) or unsupervised. In supervised
`
`learning, a sample of input-output pairs, (x, y): (x1, y1), . . . , (xm, ym), referred to as
`
`5
`
`
`
`
`
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`“training data” (illustrated below), is provided to the algorithm. The values xi are
`
`referred to as observations/patterns and the values yi as labels/classifications.
`
`“Classification” is a common supervised learning problem in which outputs are
`
`discrete values (e.g., 0 or 1). (INTEL-1010, 2.) In the example below, training data
`
`represented by solid circles classifies as positive (e.g., 1) and by open circles
`
`classifies as negative (e.g., 0). (INTEL-1003, ¶31.)
`
`
`
`The underlying function mapping inputs to outputs is referred to as a
`
`decision function for classification problems. (INTEL-1010, 2.) Using the decision
`
`function, the learning algorithm generalizes to new, unseen data points⎯i.e., given
`
`a new data pattern, x, (i.e., “live data”), the learning algorithm predicts the
`
`corresponding label, y (classification). As shown below, many functions
`
`(represented by lines) can be used to separate (classify) data. The goal is to
`
`6
`
`
`
`
`
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`optimize the “fit” of the function to maximize predictive accuracy. (INTEL-1003,
`
`¶¶32-33.)
`
`
`
`The selected function, however, can be subject to overfitting caused by a
`
`decision function that fits the available data but does not generalize well to predict
`
`new data. (INTEL-1003, ¶34.) Overfitting usually indicates the selection of an
`
`overly complex decision function having too many fetures. (Id.)
`
`2.
`SVMs
`SVMs, introduced in 1992, are a set of supervised learning methods used for
`
`classification. (INTEL-1003, ¶35.) SVMs use “a hypothesis [or “primal”] space of
`
`linear functions in a high dimensional feature [or “dual”] space, trained with a
`
`learning algorithm from optimization theory that implements a learning bias
`
`derived from statistical learning theory.” (INTEL-1010, 7.) SVMs therefore belong
`
`to the family of generalized linear classifiers. (INTEL-1003, ¶35.)
`
`7
`
`
`
`
`
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`The decision boundary created by an SVM is called a hyperplane (labeled
`
`H0) separating positive from negative samples/observations. (INTEL-1017, 8.) The
`
`distance from the separating hyperplane (H0) to the closest sample/observation is
`
`the margin, illustrated below for a linearly separable case. (Id.) A pair of
`
`hyperplanes (H1, H2), parallel to the separating hyperplane, gives the maximum
`
`margin. Training points that lie on one of these hyperplanes (H1 or H2) are called
`
`support vectors (indicated with red circles). (Id.)
`
`
`
`A linear SVM fits the widest possible margin between output classes by
`
`predicting the output class (y) using the decision function:
`
`𝐰(cid:2904)∙𝐱+𝑏= 𝑤(cid:2869)x(cid:2869)+ ⋯+ 𝑤(cid:2924)x(cid:2924)+𝑏
`
`8
`
`
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`where w is the weight vector3, x is the input vector (or pattern) having components
`
`
`
`
`(features) x1, x2, …, xn, and b is a constant (bias). (INTEL-1003, ¶37.) The linear
`
`combination of features in the input vector, x, and weights in the weight vector, w,
`
`adjusting for bias, predicts the output, y. (Id.)
`
`In the case of linearly separable data, the following inequality-constrained
`
`(INTEL-1003, ¶39.) SVMs use optimization (maximizing the margin) to construct
`
`the optimal hyperplane:
`
`equations can be satisfied: 𝑥⃗(cid:3036)∙𝑤(cid:4652)(cid:4652)⃗+𝑏≥+1 for 𝑦(cid:3036)=+1
`𝑥⃗(cid:3036)∙𝑤(cid:4652)(cid:4652)⃗+𝑏<−1 for 𝑦(cid:3036)=−1
`𝒘𝟎∙𝒙+𝑏(cid:2868)=0.
`providing the maximum margin can be determined by minimizing ‖𝒘‖(cid:2870). (INTEL-
`
`(Id.; see also INTEL-1011, 278, 291.) SVM theory dictates that the hyperplane
`
`1003, ¶39.) Thus, finding the optimal hyperplane is an optimization problem that
`
`can be solved by optimization techniques.
`
`Numerous optimization techniques were known prior to the ’188 patent.
`
`(INTEL-1003, ¶¶47-54.) “Dual optimization” provides an alternative formulation
`
`of a mathematical problem that is computationally easier to solve. (INTEL-1003,
`
`
`3 wT denotes the transpose of the weight vector matrix.
`
`9
`
`
`
`
`
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`¶48.) In dual optimization, an original parameter (e.g., w) is replaced by a new
`
`parameter (e.g., α), to derive the alternative formulation. (Id.) Lagrange multipliers
`
`(i.e., α) are one example of a dual optimization technique. (Id.; see also INTEL-
`
`1011, 291.) Lagrange multipliers can be used to find an extreme value of a
`
`function f(x) subject to constraint g(x). In the case of an SVM with f(x)=‖𝒘‖(cid:2870) and
`g(x)=𝑦(cid:3036)(cid:4666)𝑥⃗(cid:3036)∙𝑤(cid:4652)(cid:4652)⃗+𝑏(cid:4667)−1 , the resulting Lagrangian is:
`12𝒘∙𝒘−(cid:3533)𝛼(cid:3036)(cid:4670)𝑦(cid:3036)(cid:4666)𝒙𝒊∙𝒘+𝑏(cid:4667)−1(cid:4671)
`.
`(cid:3039)
`(cid:3036)(cid:2880)(cid:2869)
`
`
`(INTEL-1003, ¶49; see also INTEL-1011, 291-92.) In constructing this “dual”
`
`equation, the following additional constraints on equation (1) must be used to
`
`(1)
`
`account for requirements of the “primal” (original) problem:
`
`𝒘𝟎=(cid:3533)𝛼(cid:3036)𝑦(cid:3036)𝑥(cid:3036)
`(cid:3039)
`(cid:3036)(cid:2880)(cid:2869)
`=0
`(cid:3533)𝛼(cid:3036)𝑦(cid:3036)
`(cid:3036)
`INTEL-1011, 291.) Only the input patterns (training vectors) 𝒙𝒊 with 𝜶𝒊>0
`contribute to the optimal hyperplane 𝒘𝟎. The hyperplane (i.e., 𝒘𝟎∙𝒙+𝑏=0) is
`
`
`
`
`
`(2)
`
`
`
`(3)
`
`
`where l is the number of samples/observations x. (INTEL-1003, ¶49; see also
`
`therefore defined by the linear combination of only a subset of initial input pattern
`
`vectors⎯the “support vectors.” (INTEL-1003, ¶51.)
`
`10
`
`
`
`As can be seen from constraint (2), a relationship exists between feature
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`
`
`
`
`
`
`weight vectors, 𝒘, and pattern weight vectors, 𝜶, such that one can be
`
`mathematically derived and calculated from the other. (INTEL-1003, ¶¶52-54.)
`
`Specifically, substituting constraints (2) and (3) into equation (1) results in the
`
`following:
`
`
`(INTEL-1003, ¶54; see also INTEL-1011, 291-92.) A POSITA would recognize
`
`(4)
`
`(cid:3036)(cid:2880)(cid:2869) −12(cid:3533)𝛼(cid:3036)𝛼(cid:3037)𝑦(cid:3036)𝑦(cid:3037)𝒙𝒊∙𝒙𝒋
`(cid:3533)𝛼(cid:3036)
`.
`(cid:3039)
`(cid:3039)
`(cid:3036),(cid:3037)(cid:2880)(cid:2869)
`that the dot product, 𝒙𝒊∙𝒙𝒋, is a linear kernel of the form 𝐾(cid:3435)𝑥(cid:3036),𝑥(cid:3037)(cid:3439)=𝒙𝒊∙𝒙𝒋.
`𝛼(cid:3038)𝑲(cid:4666)𝒙,𝒙(cid:3038)(cid:4667)+𝑏
`, becomes 𝑑(cid:4666)𝑥(cid:4667)=𝒘∙
`the “dual” decision function, 𝑑(cid:4666)𝑥(cid:4667)=∑
`(cid:3041)(cid:3038)(cid:2880)(cid:2869)
`
`𝒙+𝑏 in “primal” space, with 𝒘=∑𝛼(cid:3036)𝑦(cid:3036)𝑥(cid:3036)(cid:3036)
`
`(INTEL-1003, ¶54; see also, INTEL-1025.) Accordingly, in the linear kernel case,
`
`. (INTEL-1003, ¶54.)
`
`Many data sets, however, are not linearly separable. One solution to solve
`
`non-linear classification problems with linear classifiers is to generate new features
`
`from the input data such that the input patterns become linearly separable in
`
`expanded “dual” (or feature) space. (INTEL-1018, 19.) Feature expansion requires
`
`explicit translation of training patterns from low dimensional space to higher
`
`dimensional space were the problem is linearly separable as shown below.
`
`(INTEL-1018, 29.) SVMs use kernel functions (e.g., linear, polynomial, gaussian,
`
`11
`
`
`
`
`
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`and sigmoidal) to specify this non-linear transformation. (INTEL-1018, 30-32;
`
`INTEL-1003, ¶¶55-59.)
`
`
`Even in scenarios where it is possible to completely separate datasets, better
`
`generalization may be achieved when errors are permitted in the hypothesis.
`
`(INTEL-1003, ¶60; INTEL-1011, 286.) This type of SVM is known as a soft
`
`margin SVM. (INTEL-1003, ¶60; INTEL-1011, 280.) In soft margin SVMs, new
`
`variables, referred to as “slack variables,” are introduced in the optimization
`
`problem to account for errors. (INTEL-1003, ¶60; INTEL-1010, Ch. 6.1.2.)
`
`3.
`Feature Selection
`A machine learning dataset is generally comprised of rows
`
`(samples/patterns) and columns with a set of columns associated with sample
`
`features and one column associated with the output/label. For example, the training
`
`set in an exemplary SVM predicting people most likely to buy lottery tickets
`
`12
`
`
`
`
`
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`consists of known information of various people (top table) plotted as data points
`
`(bottom table):
`
`Known Information
`
`
`
`Age
`
`Salary
`
`Person1
`Person2
`Person3
`Person4
`Person5
`
`40
`
`60
`
`28
`
`37
`
`65
`
`$50,000
`
`$500,000
`
`$30,000
`
`$150,000
`
`$30,000
`
`No. of
`Children Gender Married
`0
`M
`Y
`
`1
`
`5
`
`3
`
`3
`
`F
`
`M
`
`F
`
`F
`
`Y
`
`N
`
`N
`
`Y
`
`Buy
`
`N
`
`Y
`
`Y
`
`N
`
`Y
`
`Data Points
`
`
`
`Feature1 Feature2 Feature3 Feature4 Feature5
`
`Buy
`
`Person1
`Person2
`Person3
`Person4
`Person5
`
`0.4
`
`0.6
`
`.28
`
`.37
`
`.65
`
`0.1
`
`1
`
`0.06
`
`0.3
`
`0.06
`
`0
`
`0.2
`
`1
`
`0.6
`
`0.6
`
`0
`
`1
`
`0
`
`1
`
`1
`
`1
`
`1
`
`0
`
`0
`
`1
`
`0
`
`1
`
`1
`
`0
`
`1
`
`The first 5 columns are features and the last column is the classification. In
`
`
`
`
`
`this exemplary dataset, features can have continuous values (e.g., age) or can be
`
`binary (e.g., marital status). The data in the bottom table has been converted (pre-
`
`13
`
`
`
`
`
`
`
`
`
`Petition for IPR of U.S. 7,117,188
`IPR2021-00549
`
`
`processed) into a usable format (e.g., 0/1 versus N/Y) and scaled so feature values
`
`are comparable (e.g., $500,000 scaled to 1).
`
`Feature selection selects the most relevant features to improve predictability
`
`of new classifications. The terms “feature elimination” and “feature selection” both
`
`refer to the process of starting from a set of features and ending up with a smaller
`
`subset⎯“selecting” features included in the subset and “eliminating” the
`
`remainder. (INTEL-1003, ¶64.)
`
`a. History of Feature Selection
`Feature elimination⎯individually or by feature groups⎯ has been
`
`considered by various communities for at least four decades including the
`
`“classical statistics” community, focusing on the widely used statistical method of
`
`linear regression; the “traditional” (less statistical) machine learning community;
`
`and the “statistical” machine learning community. The ’188 patent belongs to the
`
`statistical com