`By: Jason R. Mudd, Reg. No. 57,700
`Eric A. Buresh, Reg. No. 50,394
`jason.mudd@eriseip.com
`eric.buresh@eriseip.com
`ERISE IP, P.A.
`7015 College Blvd., Suite 700
`Overland Park, Kansas 66211
`Telephone: (913) 777-5600
`Roshan Mansinghani, Reg. No. 62,429
`roshan@unifiedpatents.com
`Unified Patents Inc.
`13355 Noel Road, Suite 1100
`Dallas, TX, 75240
`Telephone: (214) 945-0200
`
`
`
`
`
`IPR2019-00472
`U.S. Patent 7,454,430
`
`
`
`
`Jonathan Bowser, Reg. No. 54,574
`jbowser@unifiedpatents.com
`Unified Patents Inc.
`1875 Connecticut Ave. NW, Floor 10
`Washington, D.C. 20009
`Telephone: (202) 701-1015
`
`UNITED STATES PATENT AND TRADEMARK OFFICE
`____________
`
`BEFORE THE PATENT TRIAL AND APPEAL BOARD
` ____________
`
`UNIFIED PATENTS INC.
`Petitioner
`
`v.
`
`RECURSIVE WEB TECHNOLOGIES, LLC
`Patent Owner
`
`SPIDER SEARCH ANALYTICS, LLC
`Patent Owner
`____________
`
`IPR2019-00472
`U.S. 7,454,430
` ____________
`
` PETITION FOR INTER PARTES REVIEW
`OF U.S. PATENT 7,454,430
`
`
`
`
`
`IPR2019-00472
`U.S. Patent 7,454,430
`
`
`
`Table of Contents
`
`Introduction .................................................................................................... 1
`I.
`II. U.S. Patent 7,454,430 ..................................................................................... 1
`A. Alleged invention ........................................................................................... 1
`B. Prosecution history ......................................................................................... 3
`III. Requirements for inter partes review under 37 C.F.R. § 42.104 ................... 3
`A. Grounds for standing under 37 C.F.R. § 42.104(a) ........................................ 3
`B. Identification of challenge under 37 C.F.R. § 42.104(b)
`and relief requested ......................................................................................... 3
`C. Level of ordinary skill in the art ..................................................................... 4
`D. Claim Construction ......................................................................................... 5
`IV. The Challenged Claims Are Unpatentable ..................................................... 5
`A. Ground 1: Claims 1, 10, 12-13, 19-20, and 23 are obvious in view of
`Quass and Bharat ........................................................................................... 5
`B. Ground 2: Claims 2 and 3 are obvious over Quass in view of Bharat
`in further view of Vanderveldt ..................................................................... 24
`C. Ground 3: Claim 11 is obvious over Quass in view of Bharat
`in further view of EHC ................................................................................. 27
`D. Ground 4: Claims 1, 10, 12-13, and 19-20 are obvious over Bergholz
`in view of Bharat .......................................................................................... 30
`E. Ground 5: Claims 2-3, 5, and 7-8 are obvious over Bergholz in view
`of Bharat in further view of Vanderveldt ..................................................... 42
`F. Ground 6: Claim 11 is obvious over Bergholz in view of Bharat
`in further view of EHC ................................................................................. 49
`G. Ground 7: Claims 5 and 7-8 are obvious over Baldi in view of Bharat
`in further view of Vanderveldt ..................................................................... 51
`Conclusion .................................................................................................... 58
`V.
`VI. Mandatory Notices Under 37 C.F.R. § 42.8(a)(1) ........................................ 59
`A. Real Party-In-Interest ................................................................................... 59
`B. Related Matters ............................................................................................. 59
`C. Lead and Back-Up Counsel .......................................................................... 61
`
`
`
`
`
`IPR2019-00472
`U.S. Patent 7,454,430
`
`I.
`
`INTRODUCTION
`Petitioner Unified Patents Inc. (“Petitioner”) respectfully requests inter partes
`
`review (“IPR”) of claims 1-3, 5, 7-8, 10-13, 19-20, and 23 (collectively, the
`
`“Challenged Claims”) of U.S. Patent 7,454,430 (“the ’430 Patent”) (Ex. 1001).
`
`II. U.S. PATENT 7,454,430
`A. Alleged Invention
`The ’430 Patent relates to automatically finding and extracting information
`
`from electronic documents, such as web pages, in a process commonly known as a
`
`“crawl.” ’430 Patent (Ex. 1001) at Abstract, 1:17-22. The ’430 Patent also recites
`
`steps for analyzing web pages to generate requests appropriately configured to
`
`harvest resulting dynamic pages from a server (i.e., from what is known as the “Deep
`
`Web”). Id. at 13:1-5, 13:48-55, 14:59-67.
`
`Breadth First Crawling
`
`
`
`The ’430 Patent describes the well-known method of conducting a crawl in a
`
`“breadth first” manner, meaning that “all links from a particular page are first
`
`explored then each one of them is used as a starting point for the next step.” Id. at
`
`13:32-35.1 This is in contrast to a “depth first” search, in which a particular link from
`
`
`
`
` 1
`
` All emphases appearing in quotations have been added by Petitioner unless indicated
`
`otherwise.
`
`
`
`1
`
`
`
`IPR2019-00472
`U.S. Patent 7,454,430
`the particular (top) page is followed to a maximum depth of search (further explained
`
`below) before returning to explore additional links from the top page. Smyth Decl.
`
`(Ex. 1003) at ¶40.
`
`Depth and Relevance
`
`
`
`The “depth” of a subsequent page is equal to the minimum number of links
`
`that must be followed from a starting page in order to reach a subsequent page. ’430
`
`Patent (Ex. 1001) at 6:48-53. As discussed by the ’430 Patent, pages of interest to a
`
`given application (i.e. “relevant” pages) are unlikely to be at a great depth from a
`
`starting page (“…the relevant pages are in most cases no deeper than 2-3 levels down
`
`from the main page.”), and thus crawlers may be configured to only crawl to a certain
`
`maximum depth (i.e. number of links) from starting pages in the interests of speed
`
`and efficiency. ’430 Patent (Ex. 1001) at 13:24-31.
`
`Dynamic Web Pages
`
`
`
`The ’430 describes dynamic web pages as pages that do not exist until after
`
`they are requested (e.g., such as in response to user input), which was known to pose
`
`a challenge for standard web crawlers. ’430 Patent (Ex. 1001) at 4:54-67. This type
`
`of content is often stored in a server and available to users via a search form, for
`
`example as seen in job boards, online dictionaries, and airline travel websites. Id.
`
`Analysis and Request Generation
`
`
`
`In order for a crawler to access dynamic pages, the ’430 Patent teaches
`
`collecting dynamic pages and determining their underlying structure to generate
`
`
`
`2
`
`
`
`IPR2019-00472
`U.S. Patent 7,454,430
`appropriate requests to be submitted to the database. Id. at 13:48-55. A plurality of
`
`these requests may be configured to create exhaustive enumerations of questions that
`
`will generate all dynamic pages that the server can produce. Id. at 14:7-19.
`
`
`
`However, as shown below, all of the above concepts were well-known in the
`
`art prior to the ’430 Patent.
`
`B.
`Prosecution History
`During prosecution of the ’430 Patent, the examiner issued a restriction
`
`
`
`requirement but did not issue any claim rejections. File History (Ex. 1002) at pp.91-
`
`95. None of the prior art relied upon here was of record during prosecution.
`
`III. REQUIREMENTS FOR INTER PARTES REVIEW UNDER 37 C.F.R.
`§ 42.104
`A. Grounds for standing under 37 C.F.R. § 42.104(a)
`Petitioner certifies that the ’430 patent is available for IPR and that the
`
`Petitioner is not barred or estopped from requesting IPR challenging the Claims of
`
`the ’430 Patent identified in this Petition.
`
`B.
`
`Identification of challenge under 37 C.F.R. § 42.104(b) and relief
`requested
`
`In view of the prior art and evidence, at least claims 1-3, 5, 7-8, 10-13, 19-20,
`
`and 23 of the ’430 Patent are unpatentable and should be cancelled. 37 C.F.R.
`
`§ 42.104(b)(1). Based on the prior art references identified below, IPR of the
`
`Challenged Claims should be granted. 37 C.F.R. § 42.104(b)(2).
`
`
`
`3
`
`
`
`IPR2019-00472
`U.S. Patent 7,454,430
`Exhibit
`Nos.
`
`Proposed Grounds of Unpatentability
`
`
`Ground 1: Claims 1, 10, 12-13, 19-20, and 23 are obvious over
`U.S. Pub. 2002/0083068 to Quass et al. (“Quass”) in view of U.S.
`Patent 6,411,952 to Bharat et al. (“Bharat”)
`Ground 2: Claims 2 and 3 are obvious over Quass in view of
`Bharat in further view of U.S. Patent 6,266,668 to Vanderveldt et
`al. (“Vanderveldt”)
`Ground 3: Claim 11 is obvious over Quass in view of Bharat
`further in view of Enhanced Hyperlink Categorization Using
`Hyperlinks, Chakrabarti et al. (“EHC”)
`Ground 4: Claims 1, 10, 12-13, and 19-20 are obvious over
`“Crawling for Domain-Specific Hidden Web Resources,” Bergholz
`et al. (“Bergholz”) in view of Bharat
`Ground 5: Claims 2-3, 5, and 7-8 are obvious over Bergholz in
`view of Bharat further in view of Vanderveldt
`Ground 6: Claim 11 is obvious over Bergholz in view of Bharat
`in further view of EHC
`Ground 7: Claims 5 and 7-8 are obvious over Modeling the
`Internet and the Web by Baldi et al. (“Baldi”) in view of Bharat
`further in view of Vanderveldt
`
`1005, 1006
`
`1005, 1006,
`1007
`
`1005, 1006,
`1008
`
`1009, 1006
`
`1009, 1006,
`1007
`
`1009, 1006,
`1008
`
`1010, 1006,
`1007
`
`
`
`
`Section IV identifies where each element of the Challenged Claims is found
`
`in the prior art and identifies the relevance of the evidence to the challenges. 37
`
`C.F.R. § 42.104(b)(4)-(5).
`
`C.
`
`Level of ordinary skill in the art
`
`A person having ordinary skill in the art (“PHOSITA”) of the ’430 Patent by
`
`4
`
`
`
`
`
`
`
`IPR2019-00472
`U.S. Patent 7,454,430
`June 18, 2004, would have been a person having at least (1) the equivalent of a
`
`bachelor’s degree
`
`in computer science, electrical engineering, computer
`
`engineering, or a similar discipline, and two (2) years of experience working with
`
`web crawlers, though additional education may substitute for less experience and
`
`vice versa. Smyth Decl. (Ex. 1003) at ¶30.
`
`D. Claim Construction
`
`At this time, Petitioner does not believe express construction of any term is
`
`
`
`necessary to resolve this dispute. See Nidec Motor Corp. v. Zhongshan Broad Ocean
`
`Motor Co. Ltd., 868 F.3d 1013, 1017 (Fed. Cir. 2017).
`
`IV. THE CHALLENGED CLAIMS ARE UNPATENTABLE
`A. Ground 1: Claims 1, 10, 12-13, 19-20, and 23 are obvious in view of
`Quass and Bharat
`1. Quass
`
`
`
`Quass published on June 27, 2002, and is prior art to the ’430 Patent under 35
`
`U.S.C. § 102(b). Quass (Ex. 1005).
`
`Quass is both within the same field of endeavor as and reasonably pertinent
`
`to the ’430 Patent. Like the ’430 Patent, Quass relates to a method of crawling that
`
`may “fill out forms so it can visit web pages hidden behind the forms.” Id. at ¶¶3,
`
`29, 37. Similar to the ’430 Patent, Quass’s methods perform steps of starting from
`
`an “initial URL list” and visiting links “in order to retrieve particular information of
`
`interest to the user.” Id. at ¶¶31-32. Therefore, Quass is analogous to the claimed
`
`invention of the ’430 Patent. Smyth Decl. (Ex. 1003) at ¶¶40-42.
`
`
`
`5
`
`
`
`IPR2019-00472
`U.S. Patent 7,454,430
`
`2. Bharat
`
`Bharat issued on June 25, 2002, and is prior art to the ’430 Patent under 35
`
`U.S.C. § 102(b). Bharat (Ex. 1006).
`
`Bharat is both within the same field of endeavor as and reasonably pertinent
`
`to the ’430 Patent. Like the ’430 Patent, Bharat relates to a method of crawling that
`
`may learn character patterns to “control the scope of web crawler searches for web
`
`pages.” Id. at 1:5-8; see also 1:39-45. Similar to the ’430 Patent, Bharat’s methods
`
`performs a breadth-first searching from a starting set of URLs to a particular depth.
`
`Id. at 4:32-47. Therefore, Bharat is analogous to the claimed invention of the ’430
`
`Patent. Smyth Decl. (Ex. 1003) at ¶¶55-56.
`
`i.
`
`Claim 1
`
`1[Preamble]. A method for crawling the internet to locate pages relevant to an
`application and thus building a Web Crawler comprising:
`
`To the extent the preamble is limiting, Quass teaches a system and method of
`
`crawling the web for pages relevant to find particular information. Quass (Ex. 1005)
`
`at ¶29. Quass expressly teaches that its crawler may be adapted to the needs of
`
`specific applications, such as job listings and book catalogs. Id. at ¶74. A PHOSITA
`
`would have understood this teaches and renders obvious locating pages “relevant to
`
`an application,” as further discussed below. Smyth Decl. (Ex. 1003) at ¶43.
`
`
`
`6
`
`
`
`IPR2019-00472
`U.S. Patent 7,454,430
`1[a]. starting from a base set of application-dependent web pages or crystallization
`points; and
`
`Quass teaches or at least renders obvious starting a crawl from an initial list
`
`of URLs, which constitute a base set of web pages. Quass (Ex. 1005) at ¶31. Quass
`
`teaches that its web crawler visits initial web pages “in order to retrieve particular
`
`information of interest to the user:”
`
`Web crawler 101 visits an initial list of web pages, plus additional web
`pages that are reachable from the initial set, in order to retrieve
`particular information of interest to the user of the present invention.
`Referring to FIG. 2, in a step 121, the web crawler 101 obtains the URL
`list 102 (FIG. 1) identifying the initial web pages to be visited. The web
`crawler 101 then enters a loop 122 and begins processing the URLs in
`the list 102 one at a time until each of the URLs has been traverse, or
`in other words, until step 123 determines that the list is empty.
`
`Id. Further, Quass touts specialization of crawler elements to particular applications
`
`such as job listings and book catalog searches. Id. at ¶74. A PHOSITA would have
`
`therefore understood that Quass teaches an “application-dependent” base set of web
`
`pages because it is a base set chosen to crawl to provide only “particular information
`
`of interest to the user” (i.e., for the user’s current application, whatever it may be).
`
`Id. at ¶33; Smyth Decl. (Ex. 1003) at ¶43.
`
`Alternatively, it also would have been obvious based on Quass for a
`
`PHOSITA to crawl a base set of web page URLs for a particular application,
`
`returning the most relevant results of interest to a user, specifically adapting its
`
`
`
`7
`
`
`
`IPR2019-00472
`U.S. Patent 7,454,430
`methods to the needs of the application. Smyth Decl. (Ex. 1003) at ¶44. A PHOSITA
`
`would have had a reasonable expectation of success in doing so because it would
`
`require only a simple substitution of a set of application-dependent URLs for the
`
`base set of URLs, with no other changes. Id. Doing so would have been a logical
`
`choice for a PHOSITA designing an efficient web crawler because a PHOSITA
`
`would have understood that the most direct way to “customize” the crawler to a
`
`specific application would be to start with pages already specific to that application.
`
`Id.
`
`1[b]. applying breadth-first recursive crawling.
`
`From the initial set of web pages, Quass teaches adding the linked web pages
`
`(“reachable from the initial set”) to the list of web pages yet to be visited:
`
`Web crawler 101 visits an initial list of web pages, plus additional web
`pages that are reachable from the initial set, in order to retrieve
`particular information of interest to the user of the present invention.
`Referring to FIG. 2, in a step 121, the web crawler 101 obtains the URL
`list 102 (FIG. 1) identifying the initial web pages to be visited. The web
`crawler 101 then enters a loop 122 and begins processing the URLs in
`the list 102 one at a time until each of the URLs has been traverse[d],
`or in other words, until step 123 determines that the list is empty.
`
`Quass (Ex. 1005) at ¶32 (emphasis added). The web crawler then recursively loops over
`
`the list, visiting each page sequentially until the list of pages to be crawled is empty. Id.
`
`at ¶36; see also id. at Fig. 2. A PHOSITA would have found it obvious to configure
`
`this breadth-first crawling process to be performed “recursively” based on Quass’s
`
`
`
`8
`
`
`
`IPR2019-00472
`U.S. Patent 7,454,430
`teaching of looping over the list of URLs, returning to step 123 as each page is
`
`crawled and new links are extracted in step 130, as illustrated below in Figure 2:
`
`
`Quass (Ex. 1005) at Fig. 2; Smyth Decl. (Ex. 1003) at ¶40.
`
`A PHOSITA would have understood that the additional discovered links
`
`would have been added to the bottom of the list of URLs to be crawled, based on
`
`Quass’s description of the base set as being the “initial” list of URLs — i.e. those
`
`crawled first. Smyth Decl. (Ex. 1003) at ¶40. Alternatively, a PHOSITA would have
`
`been motivated to try adding the additional discovered links to the bottom of the list,
`
`as doing so is one of two choices for adding the links to the list (the bottom or the
`
`
`
`9
`
`
`
`IPR2019-00472
`U.S. Patent 7,454,430
`top). Id. Thus, Quass teaches or renders obvious the ’430 Patent’s definition of
`
`breadth-first crawling, in which “all links from a particular page are first explored
`
`then each one of them is used as a starting point for the next step.” ’430 Patent (Ex.
`
`1001) at 13:32-35; Smyth Decl. (Ex. 1003) at ¶40.
`
`
`
`Alternatively, Bharat expressly teaches performing a breadth-first crawl from
`
`a “start set” of URLs. Bharat (Ex. 1006) at 4:32-47. A PHOSITA would have been
`
`motivated to perform breadth-first crawling as taught by Bharat as an operable
`
`configuration of the recursive crawler taught by Quass because there are a finite
`
`number of configurations to try as a crawl order, namely breadth-first, depth-first,
`
`and hybrid, and thus structuring the crawl to be breadth-first would have been
`
`obvious to try for a PHOSITA. Smyth Decl. (Ex. 1003) at ¶60. Therefore, Quass in
`
`view of Bharat teaches or at least renders obvious applying breadth-first recursive
`
`crawling. Id.
`
`ii. Claim 10
`
`10[Preamble]. A method for building a deep web crawler, comprising:
`
`
`
`To the extent the preamble is limiting, Quass teaches a web crawler that
`
`accesses content “concealed” behind electronic forms, specifically referring to these
`
`concealed, dynamically generated pages as “deeper” information. Quass (Ex. 1005)
`
`at ¶¶3, 12, 37; Smyth Decl. (Ex. 1003) at ¶41. The ’430 Patent defines the “deep
`
`web” as the portion of the internet containing dynamically generated pages. See, e.g.,
`
`’430 Patent (Ex. 1001) at 13:1-5, and thus a PHOSITA would have understood that
`
`
`
`10
`
`
`
`IPR2019-00472
`U.S. Patent 7,454,430
`Quass teaches a method for building a deep web crawler. Smyth Decl. (Ex. 1003) at
`
`¶41.
`
`10[a]. utilizing scout crawling rules to collect dynamic pages;
`
`Quass in view of Bharat teaches or at least renders obvious this limitation.
`
`Quass teaches that its crawling method begins with a step of “retrieving electronic
`
`data having electronic-form data” from a host database:
`
`1. An automated method for obtaining targeted information from a
`database accessible
`through an electronic form, said method
`comprising the steps of:
`retrieving electronic data having electronic-form data representative of
`said electronic form therein from a database host;
`
`Quass at Claim 1 (emphasis added); see also id. at ¶ 40, Claims 8, 9, and 15. Quass
`
`teaches its preferred embodiment is for the web, and these retrieved electronic data
`
`are HTML documents containing electronic forms to be filled out (i.e., dynamic
`
`pages) prior to allowing “deeper” information to be accessed. Id. at ¶40. A PHOSITA
`
`would have understood that this teaches scout crawling rules “collecting” dynamic
`
`pages. Smyth Decl. (Ex. 1003) at ¶45.
`
`
`
`To the extent it is argued that Quass does not expressly teach “rules” for scout
`
`crawling, Bharat teaches that the behavior of a crawler may be defined by a “walking
`
`rule,” determining which pages should and should not be walked (crawled). Bharat
`
`(Ex. 1006) at 1:39-45, 3:9-15, 3:31-39. A PHOSITA would have been motivated to
`
`achieve the scout crawling taught by Quass through a walking “rule” as taught by
`
`
`
`11
`
`
`
`IPR2019-00472
`U.S. Patent 7,454,430
`Bharat as a simple method of structuring the computer-implemented method in
`
`software, combining the prior art elements according to known methods to achieve
`
`the predictable result of a rules-based crawler. Smyth Decl. (Ex. 1003) at ¶¶57, 61.
`
`Doing so would have afforded a PHOSITA a reasonable expectation of success
`
`because computer-implemented methods operating according to rules were well-
`
`known and ubiquitous at the time of the ’430 Patent. Id.
`
`10[b]. utilizing an analyzer and extractor to determine underlying structure of
`queries;
`
`Quass teaches or at least renders obvious this limitation. The ’430 Patent
`
`describes that the claimed “queries” are synonymous with “questions” or “requests”
`
`to a database for information the database contains, presented as a dynamically
`
`generated page. ’430 Patent (Ex. 1001) at 4:54-62; Smyth Decl. (Ex. 1003) at ¶46.
`
`Therefore, a PHOSITA would have understood that the recited step of
`
`“determin[ing] underlying structure of queries” at least includes determining what
`
`types of information may be input to the dynamic page in order to appropriately
`
`configure instructions for requesting pages from a database. Id. This understanding
`
`is consistent with examples described in the specification. ’430 Patent (Ex. 1001) at
`
`13:48-55.
`
`Quass teaches this step at least through its “classifier” (i.e., the recited
`
`“analyzer”) that analyzes a collected page to determine forms it contains:
`
`One or more classifiers 166 then determine which forms should be
`filled out and how to do so. Classifiers 166 make their determination
`
`
`
`12
`
`
`
`IPR2019-00472
`U.S. Patent 7,454,430
`using each electronic form’s object model 165. Classifiers 166 may
`also employ the candidate XHTML document 163 and the candidate
`HTML document 161 in the determination process.
`
`Quass (Ex. 1005) at ¶43; Smyth Decl. (Ex. 1003) at ¶48. A PHOSITA would have
`
`understood that Quass’s classifier satisfies an “analyzer” based on its name – a classifier
`
`performs classification, which is a kind of analysis, and is thus an “analyzer.” Id.
`
`In a specific example, Quass details how a classifier may determine the underlying
`
`structure of forms such as those from the exemplary web page illustrated in Figure 3:
`
`
`FIG. 9 is an illustrative flowchart 250 of an example classifier
`illustrated as an appliance category classifier that determines whether
`or not a FormField object 224 represents a list of appliance categories.
`
`
`
`13
`
`
`
`IPR2019-00472
`U.S. Patent 7,454,430
`Step 251 matches the descriptive text for the For[m]Field's values
`against a predefined list of potential appliance categories 252. In the
`case of the category options 141 in FIG. 3, “Washers”, “Dryers”, and
`“Dishwashers” would match while “Refrigerators” would not.
`
`Quass (Ex. 1005) at ¶66 (emphasis added). Quass describes an analysis of buttons that
`
`may be pressed as an underlying structure of forms that the classifier may determine. Id.
`
`at ¶70 (emphasis added); Smyth Decl. (Ex. 1003) at ¶48.
`
`Additionally, Quass teaches a “form parser” (i.e., the recited “extractor”) that
`
`may extract information from the page to assist in query structure determination and
`
`subsequent form filling, such as by performing Optical Character recognition (OCR):
`
`A form parser can use additional components to help gather information
`that may prove useful to the form filling process. For example, an OCR
`(Optical Character Recognition) component might be employed to
`recognize fancy characters embedded in a graphic image and convert
`them into regular text strings. Another example, described in the next
`few paragraphs, is a separate parser that tries to find descriptions for
`form controls.
`Each form control is usually associated with descriptive text, icons or
`other graphics, etc. that suggest the form control's purpose. The
`association between form controls and their descriptions is often
`implicit, possibly based on how things are laid out in the form.
`
`Quass (Ex. 1005) at ¶¶50-51); Smyth Decl. (Ex. 1003) at ¶49. A PHOSITA would have
`
`understood that Quass’s form parser satisfies an “extractor” based on its name – a form
`
`parser performs extraction of information by parsing the form, and is thus an “extractor.”
`
`
`
`14
`
`
`
`IPR2019-00472
`U.S. Patent 7,454,430
`Id. Quass teaches that its “extractor” may comprise an input text parser that finds
`
`descriptive text for a given input element:
`
`The input text parser 204 uses an ordered list of rules to find descriptive
`text for an <input>element. It returns the text from the first rule that
`succeeds in finding text that is more than just blank spaces. If no rules
`succeed, the input text parser indicates that the <input>element has no
`descriptive text.
`
`Id. at ¶54. Thus, Quass teaches utilizing an analyzer (classifier) and extractor (form
`
`parser) to determine underlying structure of queries. Smyth Decl. (Ex. 1003) at ¶¶46-49.
`
`10[c]. generating instructions for a harvester, wherein the harvester provides
`requests to a server and collects available pages from the server.
`
`Quass in view of Bharat teaches or at least renders obvious this limitation.
`
`Quass teaches that the classifier’s decisions output instructions used by software (a
`
`harvester) that collectively includes: i) a “form filler” for generating requests to a
`
`server, and ii) software for collecting available pages from the server. Specifically,
`
`Quass teaches a form filler 168 that generates “requests” using the resulting structure
`
`decisions of the classifier in order to access content behind pages requiring input.
`
`Quass (Ex. 1005) at ¶44. The “decisions” of the classifier constitute instructions provided
`
`to the harvester (form filler), used to fill out forms in order to collect pages from the
`
`server. Smyth Decl. (Ex. 1003) at ¶50. Thus, Quass teaches software instructions for
`
`electronically populating forms on a network database (a “server” of dynamic content) in
`
`
`
`15
`
`
`
`IPR2019-00472
`U.S. Patent 7,454,430
`order to access information concealed by the electronic form. Id. at 3, see also id. at ¶12;
`
`Smyth Decl. (Ex. 1003) at ¶50.
`
`Quass teaches that its classifier may determine the underlying structure of forms
`
`to generate instructions for populating the forms, such as whether it is appropriate to
`
`select one particular value, spin through all values (in a plurality of requests), or change
`
`nothing:
`
`A classifier might choose from multiple classifications. For example, a
`classifier might classify a FormField object 224 as one of: (1) spin
`through all values; (2) choose one particular value; (3) don't change
`anything.
`
`Quass (Ex. 1005) at ¶68, see also id. at claim 5. Second, Quass further teaches storing
`
`retrieved pages, which constitutes “collecting” them. Id. at ¶33, see also id. at claim 1;
`
`Smyth Decl. (Ex. 1003) at ¶50. While Quass does not expressly give a name to the portion
`
`of software performing the steps of page retrieval, a PHOSITA would have understood
`
`that a portion of the software performs the steps of collecting available pages from the
`
`server. This portion of software, along with the form filler, would be understood by a
`
`PHOSITA to constitute the claimed “harvester.” Smyth Decl. (Ex. 1003) at ¶50.
`
`
`
`In the alternative, generating instructions for the claimed “harvester” would have
`
`been obvious to a PHOSITA over Quass based on a fundamental understanding that
`
`computer-implemented methods operate on instructions. Id. at ¶51. A PHOSITA would
`
`have understood that a primary purpose of methods taught by Quass is to create requests
`
`specially tailored to access all of the information hidden behind search forms, which
`
`
`
`16
`
`
`
`IPR2019-00472
`U.S. Patent 7,454,430
`would have been best accomplished by generating instructions for a harvester to collect
`
`the available pages from the server. Id. at ¶50-51; Quass (Ex. 1005) at ¶12. Specifically,
`
`since Quass directly teaches an analyzer and extractor determining the underlying
`
`structure of queries (information which was not known prior to beginning the crawl),
`
`a PHOSITA would have found the natural next step to be using the determined
`
`underlying structure information to generate instructions for a harvester in order to
`
`accomplish Quass’s purpose of accessing information hidden behind the analyzed
`
`form. Id. Therefore, a PHOSITA would have understood this would have been
`
`accomplished using only fundamental software development and programming
`
`languages used by methods and systems taught by Quass otherwise, which would
`
`have been well within the skill of a PHOSITA to accomplish. Id.
`
`To the extent it is argued that Quass does not expressly teach “instructions”
`
`for a harvester, Bharat teaches that the behavior of a crawler may be defined by
`
`rules, as discussed above. See supra Section IV.A.ii, claim 10(a). A PHOSITA
`
`would have understood that a “rule” is a form of instructions, as shown by Bharat’s
`
`express use of the term “instructions” in its claims. Smyth Decl. (Ex. 1003) at ¶58;
`
`Bharat at 12:55-13:6. A PHOSITA would have been motivated to configure the
`
`harvester taught by Quass using “instructions” as taught by Bharat as a simple
`
`method of structuring the computer-implemented method in software. A PHOSITA
`
`would have a reasonable expectation of success in doing so because computer-
`
`implemented methods operating according to instructions were well-known and
`
`
`
`17
`
`
`
`IPR2019-00472
`U.S. Patent 7,454,430
`ubiquitous at the time of the ’430 Patent, and are expressly discussed as being
`
`utilized in the invention of Quass. Quass (Ex. 1005) at ¶¶23-24, 47, Figs. 5,6; Smyth
`
`Decl. (Ex. 1003) at ¶61.
`
`iii. Claim 12
`12. The method of claim 10, wherein the scout crawling rules are divided into rules
`dealing with static pages and rules dealing with dynamic pages.
`
`Quass in view of Bharat teaches or at least renders obvious this limitation.
`
`The ’430 Patent describes these “divided” rules as merely treating static pages
`
`and dynamic pages differently, directly storing static content while analyzing and
`
`harvesting dynamic content. ’430 Patent (Ex. 1001) at 13:56-64. The ’430 Patent
`
`does not support a narrower understanding of “divided” rules in which disjointed
`
`software modules are applied to each of the two types of pages. Smyth Decl. (Ex.
`
`1003) at ¶52.
`
`
`
`Quass teaches or at least renders obvious treating static pages and dynamic
`
`pages differently, with simple storage of static pages and more sophisticated
`
`analysis, parsing, and request submission for dynamic pages and, thus, teaches this
`
`limitation. See supra Section IV.A at claims 1, 10; Smyth Decl. (Ex. 1003) at ¶53. A
`
`PHOSITA would have understood that programming a module to treat static and
`
`dynamic pages differently would have performed different steps, requiring different
`
`rules to be applied to each of the static and dynamic pages. Smyth Decl. (Ex. 1003)
`
`at ¶53. Therefore, a PHOSITA would have been motivated to use rules divided for
`
`static and dynamic pages. Id.
`
`
`
`18
`
`
`
`IPR2019-00472
`U.S. Patent 7,454,430
`Further, as previously discussed, to the extent it is argued that Quass does not
`
`
`
`expressly teach “rules,” Bharat teaches computer-implemented methods operating
`
`according to “rules,” and a PHOSITA would have been motivated to configure the
`
`computer-implemented method of Quass to operate according to scout crawling “rules.”
`
`See supra Section IV.A.ii, claim 10(a). Therefore, Quass in view of Bharat teaches or at
`
`least renders obvious scout crawling rules divided into rules dealing with static pages and
`
`rules dealing with dynamic pages Smyth Decl. (Ex. 1003) at ¶¶52-53.
`
`iv. Claim 13
`13. The method of claim 12, wherein a plurality of questions is selected to cover
`all possible patterns of the dynamic pages produced by a server, to allow the
`analyzer and the harvester to create exhaustive enumerations of questions that
`generate all dynamic pages that the server can produce.
`
`The ’430 Patent describes that the claimed “questions” are synonymous with
`
`“queries” or “requests” to a database for information the database contains. See
`
`supra Section IV.A.ii, claim 10(b). Quass in view of Bharat teaches or at least
`
`renders obvious this limitation.
`
`
`
`Quass describes a pluralit