throbber
Filed on behalf of Unified Patents Inc.
`By: Jason R. Mudd, Reg. No. 57,700
`Eric A. Buresh, Reg. No. 50,394
`jason.mudd@eriseip.com
`eric.buresh@eriseip.com
`ERISE IP, P.A.
`7015 College Blvd., Suite 700
`Overland Park, Kansas 66211
`Telephone: (913) 777-5600
`Roshan Mansinghani, Reg. No. 62,429
`roshan@unifiedpatents.com
`Unified Patents Inc.
`13355 Noel Road, Suite 1100
`Dallas, TX, 75240
`Telephone: (214) 945-0200
`
`
`
`
`
`IPR2019-00472
`U.S. Patent 7,454,430
`
`
`
`
`Jonathan Bowser, Reg. No. 54,574
`jbowser@unifiedpatents.com
`Unified Patents Inc.
`1875 Connecticut Ave. NW, Floor 10
`Washington, D.C. 20009
`Telephone: (202) 701-1015
`
`UNITED STATES PATENT AND TRADEMARK OFFICE
`____________
`
`BEFORE THE PATENT TRIAL AND APPEAL BOARD
` ____________
`
`UNIFIED PATENTS INC.
`Petitioner
`
`v.
`
`RECURSIVE WEB TECHNOLOGIES, LLC
`Patent Owner
`
`SPIDER SEARCH ANALYTICS, LLC
`Patent Owner
`____________
`
`IPR2019-00472
`U.S. 7,454,430
` ____________
`
` PETITION FOR INTER PARTES REVIEW
`OF U.S. PATENT 7,454,430
`
`
`
`

`

`IPR2019-00472
`U.S. Patent 7,454,430
`
`
`
`Table of Contents
`
`Introduction .................................................................................................... 1
`I.
`II. U.S. Patent 7,454,430 ..................................................................................... 1
`A. Alleged invention ........................................................................................... 1
`B. Prosecution history ......................................................................................... 3
`III. Requirements for inter partes review under 37 C.F.R. § 42.104 ................... 3
`A. Grounds for standing under 37 C.F.R. § 42.104(a) ........................................ 3
`B. Identification of challenge under 37 C.F.R. § 42.104(b)
`and relief requested ......................................................................................... 3
`C. Level of ordinary skill in the art ..................................................................... 4
`D. Claim Construction ......................................................................................... 5
`IV. The Challenged Claims Are Unpatentable ..................................................... 5
`A. Ground 1: Claims 1, 10, 12-13, 19-20, and 23 are obvious in view of
`Quass and Bharat ........................................................................................... 5
`B. Ground 2: Claims 2 and 3 are obvious over Quass in view of Bharat
`in further view of Vanderveldt ..................................................................... 24
`C. Ground 3: Claim 11 is obvious over Quass in view of Bharat
`in further view of EHC ................................................................................. 27
`D. Ground 4: Claims 1, 10, 12-13, and 19-20 are obvious over Bergholz
`in view of Bharat .......................................................................................... 30
`E. Ground 5: Claims 2-3, 5, and 7-8 are obvious over Bergholz in view
`of Bharat in further view of Vanderveldt ..................................................... 42
`F. Ground 6: Claim 11 is obvious over Bergholz in view of Bharat
`in further view of EHC ................................................................................. 49
`G. Ground 7: Claims 5 and 7-8 are obvious over Baldi in view of Bharat
`in further view of Vanderveldt ..................................................................... 51
`Conclusion .................................................................................................... 58
`V.
`VI. Mandatory Notices Under 37 C.F.R. § 42.8(a)(1) ........................................ 59
`A. Real Party-In-Interest ................................................................................... 59
`B. Related Matters ............................................................................................. 59
`C. Lead and Back-Up Counsel .......................................................................... 61
`
`
`
`

`

`IPR2019-00472
`U.S. Patent 7,454,430
`
`I.
`
`INTRODUCTION
`Petitioner Unified Patents Inc. (“Petitioner”) respectfully requests inter partes
`
`review (“IPR”) of claims 1-3, 5, 7-8, 10-13, 19-20, and 23 (collectively, the
`
`“Challenged Claims”) of U.S. Patent 7,454,430 (“the ’430 Patent”) (Ex. 1001).
`
`II. U.S. PATENT 7,454,430
`A. Alleged Invention
`The ’430 Patent relates to automatically finding and extracting information
`
`from electronic documents, such as web pages, in a process commonly known as a
`
`“crawl.” ’430 Patent (Ex. 1001) at Abstract, 1:17-22. The ’430 Patent also recites
`
`steps for analyzing web pages to generate requests appropriately configured to
`
`harvest resulting dynamic pages from a server (i.e., from what is known as the “Deep
`
`Web”). Id. at 13:1-5, 13:48-55, 14:59-67.
`
`Breadth First Crawling
`
`
`
`The ’430 Patent describes the well-known method of conducting a crawl in a
`
`“breadth first” manner, meaning that “all links from a particular page are first
`
`explored then each one of them is used as a starting point for the next step.” Id. at
`
`13:32-35.1 This is in contrast to a “depth first” search, in which a particular link from
`
`
`
`
` 1
`
` All emphases appearing in quotations have been added by Petitioner unless indicated
`
`otherwise.
`
`
`
`1
`
`

`

`IPR2019-00472
`U.S. Patent 7,454,430
`the particular (top) page is followed to a maximum depth of search (further explained
`
`below) before returning to explore additional links from the top page. Smyth Decl.
`
`(Ex. 1003) at ¶40.
`
`Depth and Relevance
`
`
`
`The “depth” of a subsequent page is equal to the minimum number of links
`
`that must be followed from a starting page in order to reach a subsequent page. ’430
`
`Patent (Ex. 1001) at 6:48-53. As discussed by the ’430 Patent, pages of interest to a
`
`given application (i.e. “relevant” pages) are unlikely to be at a great depth from a
`
`starting page (“…the relevant pages are in most cases no deeper than 2-3 levels down
`
`from the main page.”), and thus crawlers may be configured to only crawl to a certain
`
`maximum depth (i.e. number of links) from starting pages in the interests of speed
`
`and efficiency. ’430 Patent (Ex. 1001) at 13:24-31.
`
`Dynamic Web Pages
`
`
`
`The ’430 describes dynamic web pages as pages that do not exist until after
`
`they are requested (e.g., such as in response to user input), which was known to pose
`
`a challenge for standard web crawlers. ’430 Patent (Ex. 1001) at 4:54-67. This type
`
`of content is often stored in a server and available to users via a search form, for
`
`example as seen in job boards, online dictionaries, and airline travel websites. Id.
`
`Analysis and Request Generation
`
`
`
`In order for a crawler to access dynamic pages, the ’430 Patent teaches
`
`collecting dynamic pages and determining their underlying structure to generate
`
`
`
`2
`
`

`

`IPR2019-00472
`U.S. Patent 7,454,430
`appropriate requests to be submitted to the database. Id. at 13:48-55. A plurality of
`
`these requests may be configured to create exhaustive enumerations of questions that
`
`will generate all dynamic pages that the server can produce. Id. at 14:7-19.
`
`
`
`However, as shown below, all of the above concepts were well-known in the
`
`art prior to the ’430 Patent.
`
`B.
`Prosecution History
`During prosecution of the ’430 Patent, the examiner issued a restriction
`
`
`
`requirement but did not issue any claim rejections. File History (Ex. 1002) at pp.91-
`
`95. None of the prior art relied upon here was of record during prosecution.
`
`III. REQUIREMENTS FOR INTER PARTES REVIEW UNDER 37 C.F.R.
`§ 42.104
`A. Grounds for standing under 37 C.F.R. § 42.104(a)
`Petitioner certifies that the ’430 patent is available for IPR and that the
`
`Petitioner is not barred or estopped from requesting IPR challenging the Claims of
`
`the ’430 Patent identified in this Petition.
`
`B.
`
`Identification of challenge under 37 C.F.R. § 42.104(b) and relief
`requested
`
`In view of the prior art and evidence, at least claims 1-3, 5, 7-8, 10-13, 19-20,
`
`and 23 of the ’430 Patent are unpatentable and should be cancelled. 37 C.F.R.
`
`§ 42.104(b)(1). Based on the prior art references identified below, IPR of the
`
`Challenged Claims should be granted. 37 C.F.R. § 42.104(b)(2).
`
`
`
`3
`
`

`

`IPR2019-00472
`U.S. Patent 7,454,430
`Exhibit
`Nos.
`
`Proposed Grounds of Unpatentability
`
`
`Ground 1: Claims 1, 10, 12-13, 19-20, and 23 are obvious over
`U.S. Pub. 2002/0083068 to Quass et al. (“Quass”) in view of U.S.
`Patent 6,411,952 to Bharat et al. (“Bharat”)
`Ground 2: Claims 2 and 3 are obvious over Quass in view of
`Bharat in further view of U.S. Patent 6,266,668 to Vanderveldt et
`al. (“Vanderveldt”)
`Ground 3: Claim 11 is obvious over Quass in view of Bharat
`further in view of Enhanced Hyperlink Categorization Using
`Hyperlinks, Chakrabarti et al. (“EHC”)
`Ground 4: Claims 1, 10, 12-13, and 19-20 are obvious over
`“Crawling for Domain-Specific Hidden Web Resources,” Bergholz
`et al. (“Bergholz”) in view of Bharat
`Ground 5: Claims 2-3, 5, and 7-8 are obvious over Bergholz in
`view of Bharat further in view of Vanderveldt
`Ground 6: Claim 11 is obvious over Bergholz in view of Bharat
`in further view of EHC
`Ground 7: Claims 5 and 7-8 are obvious over Modeling the
`Internet and the Web by Baldi et al. (“Baldi”) in view of Bharat
`further in view of Vanderveldt
`
`1005, 1006
`
`1005, 1006,
`1007
`
`1005, 1006,
`1008
`
`1009, 1006
`
`1009, 1006,
`1007
`
`1009, 1006,
`1008
`
`1010, 1006,
`1007
`
`
`
`
`Section IV identifies where each element of the Challenged Claims is found
`
`in the prior art and identifies the relevance of the evidence to the challenges. 37
`
`C.F.R. § 42.104(b)(4)-(5).
`
`C.
`
`Level of ordinary skill in the art
`
`A person having ordinary skill in the art (“PHOSITA”) of the ’430 Patent by
`
`4
`
`
`
`
`
`

`

`IPR2019-00472
`U.S. Patent 7,454,430
`June 18, 2004, would have been a person having at least (1) the equivalent of a
`
`bachelor’s degree
`
`in computer science, electrical engineering, computer
`
`engineering, or a similar discipline, and two (2) years of experience working with
`
`web crawlers, though additional education may substitute for less experience and
`
`vice versa. Smyth Decl. (Ex. 1003) at ¶30.
`
`D. Claim Construction
`
`At this time, Petitioner does not believe express construction of any term is
`
`
`
`necessary to resolve this dispute. See Nidec Motor Corp. v. Zhongshan Broad Ocean
`
`Motor Co. Ltd., 868 F.3d 1013, 1017 (Fed. Cir. 2017).
`
`IV. THE CHALLENGED CLAIMS ARE UNPATENTABLE
`A. Ground 1: Claims 1, 10, 12-13, 19-20, and 23 are obvious in view of
`Quass and Bharat
`1. Quass
`
`
`
`Quass published on June 27, 2002, and is prior art to the ’430 Patent under 35
`
`U.S.C. § 102(b). Quass (Ex. 1005).
`
`Quass is both within the same field of endeavor as and reasonably pertinent
`
`to the ’430 Patent. Like the ’430 Patent, Quass relates to a method of crawling that
`
`may “fill out forms so it can visit web pages hidden behind the forms.” Id. at ¶¶3,
`
`29, 37. Similar to the ’430 Patent, Quass’s methods perform steps of starting from
`
`an “initial URL list” and visiting links “in order to retrieve particular information of
`
`interest to the user.” Id. at ¶¶31-32. Therefore, Quass is analogous to the claimed
`
`invention of the ’430 Patent. Smyth Decl. (Ex. 1003) at ¶¶40-42.
`
`
`
`5
`
`

`

`IPR2019-00472
`U.S. Patent 7,454,430
`
`2. Bharat
`
`Bharat issued on June 25, 2002, and is prior art to the ’430 Patent under 35
`
`U.S.C. § 102(b). Bharat (Ex. 1006).
`
`Bharat is both within the same field of endeavor as and reasonably pertinent
`
`to the ’430 Patent. Like the ’430 Patent, Bharat relates to a method of crawling that
`
`may learn character patterns to “control the scope of web crawler searches for web
`
`pages.” Id. at 1:5-8; see also 1:39-45. Similar to the ’430 Patent, Bharat’s methods
`
`performs a breadth-first searching from a starting set of URLs to a particular depth.
`
`Id. at 4:32-47. Therefore, Bharat is analogous to the claimed invention of the ’430
`
`Patent. Smyth Decl. (Ex. 1003) at ¶¶55-56.
`
`i.
`
`Claim 1
`
`1[Preamble]. A method for crawling the internet to locate pages relevant to an
`application and thus building a Web Crawler comprising:
`
`To the extent the preamble is limiting, Quass teaches a system and method of
`
`crawling the web for pages relevant to find particular information. Quass (Ex. 1005)
`
`at ¶29. Quass expressly teaches that its crawler may be adapted to the needs of
`
`specific applications, such as job listings and book catalogs. Id. at ¶74. A PHOSITA
`
`would have understood this teaches and renders obvious locating pages “relevant to
`
`an application,” as further discussed below. Smyth Decl. (Ex. 1003) at ¶43.
`
`
`
`6
`
`

`

`IPR2019-00472
`U.S. Patent 7,454,430
`1[a]. starting from a base set of application-dependent web pages or crystallization
`points; and
`
`Quass teaches or at least renders obvious starting a crawl from an initial list
`
`of URLs, which constitute a base set of web pages. Quass (Ex. 1005) at ¶31. Quass
`
`teaches that its web crawler visits initial web pages “in order to retrieve particular
`
`information of interest to the user:”
`
`Web crawler 101 visits an initial list of web pages, plus additional web
`pages that are reachable from the initial set, in order to retrieve
`particular information of interest to the user of the present invention.
`Referring to FIG. 2, in a step 121, the web crawler 101 obtains the URL
`list 102 (FIG. 1) identifying the initial web pages to be visited. The web
`crawler 101 then enters a loop 122 and begins processing the URLs in
`the list 102 one at a time until each of the URLs has been traverse, or
`in other words, until step 123 determines that the list is empty.
`
`Id. Further, Quass touts specialization of crawler elements to particular applications
`
`such as job listings and book catalog searches. Id. at ¶74. A PHOSITA would have
`
`therefore understood that Quass teaches an “application-dependent” base set of web
`
`pages because it is a base set chosen to crawl to provide only “particular information
`
`of interest to the user” (i.e., for the user’s current application, whatever it may be).
`
`Id. at ¶33; Smyth Decl. (Ex. 1003) at ¶43.
`
`Alternatively, it also would have been obvious based on Quass for a
`
`PHOSITA to crawl a base set of web page URLs for a particular application,
`
`returning the most relevant results of interest to a user, specifically adapting its
`
`
`
`7
`
`

`

`IPR2019-00472
`U.S. Patent 7,454,430
`methods to the needs of the application. Smyth Decl. (Ex. 1003) at ¶44. A PHOSITA
`
`would have had a reasonable expectation of success in doing so because it would
`
`require only a simple substitution of a set of application-dependent URLs for the
`
`base set of URLs, with no other changes. Id. Doing so would have been a logical
`
`choice for a PHOSITA designing an efficient web crawler because a PHOSITA
`
`would have understood that the most direct way to “customize” the crawler to a
`
`specific application would be to start with pages already specific to that application.
`
`Id.
`
`1[b]. applying breadth-first recursive crawling.
`
`From the initial set of web pages, Quass teaches adding the linked web pages
`
`(“reachable from the initial set”) to the list of web pages yet to be visited:
`
`Web crawler 101 visits an initial list of web pages, plus additional web
`pages that are reachable from the initial set, in order to retrieve
`particular information of interest to the user of the present invention.
`Referring to FIG. 2, in a step 121, the web crawler 101 obtains the URL
`list 102 (FIG. 1) identifying the initial web pages to be visited. The web
`crawler 101 then enters a loop 122 and begins processing the URLs in
`the list 102 one at a time until each of the URLs has been traverse[d],
`or in other words, until step 123 determines that the list is empty.
`
`Quass (Ex. 1005) at ¶32 (emphasis added). The web crawler then recursively loops over
`
`the list, visiting each page sequentially until the list of pages to be crawled is empty. Id.
`
`at ¶36; see also id. at Fig. 2. A PHOSITA would have found it obvious to configure
`
`this breadth-first crawling process to be performed “recursively” based on Quass’s
`
`
`
`8
`
`

`

`IPR2019-00472
`U.S. Patent 7,454,430
`teaching of looping over the list of URLs, returning to step 123 as each page is
`
`crawled and new links are extracted in step 130, as illustrated below in Figure 2:
`
`
`Quass (Ex. 1005) at Fig. 2; Smyth Decl. (Ex. 1003) at ¶40.
`
`A PHOSITA would have understood that the additional discovered links
`
`would have been added to the bottom of the list of URLs to be crawled, based on
`
`Quass’s description of the base set as being the “initial” list of URLs — i.e. those
`
`crawled first. Smyth Decl. (Ex. 1003) at ¶40. Alternatively, a PHOSITA would have
`
`been motivated to try adding the additional discovered links to the bottom of the list,
`
`as doing so is one of two choices for adding the links to the list (the bottom or the
`
`
`
`9
`
`

`

`IPR2019-00472
`U.S. Patent 7,454,430
`top). Id. Thus, Quass teaches or renders obvious the ’430 Patent’s definition of
`
`breadth-first crawling, in which “all links from a particular page are first explored
`
`then each one of them is used as a starting point for the next step.” ’430 Patent (Ex.
`
`1001) at 13:32-35; Smyth Decl. (Ex. 1003) at ¶40.
`
`
`
`Alternatively, Bharat expressly teaches performing a breadth-first crawl from
`
`a “start set” of URLs. Bharat (Ex. 1006) at 4:32-47. A PHOSITA would have been
`
`motivated to perform breadth-first crawling as taught by Bharat as an operable
`
`configuration of the recursive crawler taught by Quass because there are a finite
`
`number of configurations to try as a crawl order, namely breadth-first, depth-first,
`
`and hybrid, and thus structuring the crawl to be breadth-first would have been
`
`obvious to try for a PHOSITA. Smyth Decl. (Ex. 1003) at ¶60. Therefore, Quass in
`
`view of Bharat teaches or at least renders obvious applying breadth-first recursive
`
`crawling. Id.
`
`ii. Claim 10
`
`10[Preamble]. A method for building a deep web crawler, comprising:
`
`
`
`To the extent the preamble is limiting, Quass teaches a web crawler that
`
`accesses content “concealed” behind electronic forms, specifically referring to these
`
`concealed, dynamically generated pages as “deeper” information. Quass (Ex. 1005)
`
`at ¶¶3, 12, 37; Smyth Decl. (Ex. 1003) at ¶41. The ’430 Patent defines the “deep
`
`web” as the portion of the internet containing dynamically generated pages. See, e.g.,
`
`’430 Patent (Ex. 1001) at 13:1-5, and thus a PHOSITA would have understood that
`
`
`
`10
`
`

`

`IPR2019-00472
`U.S. Patent 7,454,430
`Quass teaches a method for building a deep web crawler. Smyth Decl. (Ex. 1003) at
`
`¶41.
`
`10[a]. utilizing scout crawling rules to collect dynamic pages;
`
`Quass in view of Bharat teaches or at least renders obvious this limitation.
`
`Quass teaches that its crawling method begins with a step of “retrieving electronic
`
`data having electronic-form data” from a host database:
`
`1. An automated method for obtaining targeted information from a
`database accessible
`through an electronic form, said method
`comprising the steps of:
`retrieving electronic data having electronic-form data representative of
`said electronic form therein from a database host;
`
`Quass at Claim 1 (emphasis added); see also id. at ¶ 40, Claims 8, 9, and 15. Quass
`
`teaches its preferred embodiment is for the web, and these retrieved electronic data
`
`are HTML documents containing electronic forms to be filled out (i.e., dynamic
`
`pages) prior to allowing “deeper” information to be accessed. Id. at ¶40. A PHOSITA
`
`would have understood that this teaches scout crawling rules “collecting” dynamic
`
`pages. Smyth Decl. (Ex. 1003) at ¶45.
`
`
`
`To the extent it is argued that Quass does not expressly teach “rules” for scout
`
`crawling, Bharat teaches that the behavior of a crawler may be defined by a “walking
`
`rule,” determining which pages should and should not be walked (crawled). Bharat
`
`(Ex. 1006) at 1:39-45, 3:9-15, 3:31-39. A PHOSITA would have been motivated to
`
`achieve the scout crawling taught by Quass through a walking “rule” as taught by
`
`
`
`11
`
`

`

`IPR2019-00472
`U.S. Patent 7,454,430
`Bharat as a simple method of structuring the computer-implemented method in
`
`software, combining the prior art elements according to known methods to achieve
`
`the predictable result of a rules-based crawler. Smyth Decl. (Ex. 1003) at ¶¶57, 61.
`
`Doing so would have afforded a PHOSITA a reasonable expectation of success
`
`because computer-implemented methods operating according to rules were well-
`
`known and ubiquitous at the time of the ’430 Patent. Id.
`
`10[b]. utilizing an analyzer and extractor to determine underlying structure of
`queries;
`
`Quass teaches or at least renders obvious this limitation. The ’430 Patent
`
`describes that the claimed “queries” are synonymous with “questions” or “requests”
`
`to a database for information the database contains, presented as a dynamically
`
`generated page. ’430 Patent (Ex. 1001) at 4:54-62; Smyth Decl. (Ex. 1003) at ¶46.
`
`Therefore, a PHOSITA would have understood that the recited step of
`
`“determin[ing] underlying structure of queries” at least includes determining what
`
`types of information may be input to the dynamic page in order to appropriately
`
`configure instructions for requesting pages from a database. Id. This understanding
`
`is consistent with examples described in the specification. ’430 Patent (Ex. 1001) at
`
`13:48-55.
`
`Quass teaches this step at least through its “classifier” (i.e., the recited
`
`“analyzer”) that analyzes a collected page to determine forms it contains:
`
`One or more classifiers 166 then determine which forms should be
`filled out and how to do so. Classifiers 166 make their determination
`
`
`
`12
`
`

`

`IPR2019-00472
`U.S. Patent 7,454,430
`using each electronic form’s object model 165. Classifiers 166 may
`also employ the candidate XHTML document 163 and the candidate
`HTML document 161 in the determination process.
`
`Quass (Ex. 1005) at ¶43; Smyth Decl. (Ex. 1003) at ¶48. A PHOSITA would have
`
`understood that Quass’s classifier satisfies an “analyzer” based on its name – a classifier
`
`performs classification, which is a kind of analysis, and is thus an “analyzer.” Id.
`
`In a specific example, Quass details how a classifier may determine the underlying
`
`structure of forms such as those from the exemplary web page illustrated in Figure 3:
`
`
`FIG. 9 is an illustrative flowchart 250 of an example classifier
`illustrated as an appliance category classifier that determines whether
`or not a FormField object 224 represents a list of appliance categories.
`
`
`
`13
`
`

`

`IPR2019-00472
`U.S. Patent 7,454,430
`Step 251 matches the descriptive text for the For[m]Field's values
`against a predefined list of potential appliance categories 252. In the
`case of the category options 141 in FIG. 3, “Washers”, “Dryers”, and
`“Dishwashers” would match while “Refrigerators” would not.
`
`Quass (Ex. 1005) at ¶66 (emphasis added). Quass describes an analysis of buttons that
`
`may be pressed as an underlying structure of forms that the classifier may determine. Id.
`
`at ¶70 (emphasis added); Smyth Decl. (Ex. 1003) at ¶48.
`
`Additionally, Quass teaches a “form parser” (i.e., the recited “extractor”) that
`
`may extract information from the page to assist in query structure determination and
`
`subsequent form filling, such as by performing Optical Character recognition (OCR):
`
`A form parser can use additional components to help gather information
`that may prove useful to the form filling process. For example, an OCR
`(Optical Character Recognition) component might be employed to
`recognize fancy characters embedded in a graphic image and convert
`them into regular text strings. Another example, described in the next
`few paragraphs, is a separate parser that tries to find descriptions for
`form controls.
`Each form control is usually associated with descriptive text, icons or
`other graphics, etc. that suggest the form control's purpose. The
`association between form controls and their descriptions is often
`implicit, possibly based on how things are laid out in the form.
`
`Quass (Ex. 1005) at ¶¶50-51); Smyth Decl. (Ex. 1003) at ¶49. A PHOSITA would have
`
`understood that Quass’s form parser satisfies an “extractor” based on its name – a form
`
`parser performs extraction of information by parsing the form, and is thus an “extractor.”
`
`
`
`14
`
`

`

`IPR2019-00472
`U.S. Patent 7,454,430
`Id. Quass teaches that its “extractor” may comprise an input text parser that finds
`
`descriptive text for a given input element:
`
`The input text parser 204 uses an ordered list of rules to find descriptive
`text for an <input>element. It returns the text from the first rule that
`succeeds in finding text that is more than just blank spaces. If no rules
`succeed, the input text parser indicates that the <input>element has no
`descriptive text.
`
`Id. at ¶54. Thus, Quass teaches utilizing an analyzer (classifier) and extractor (form
`
`parser) to determine underlying structure of queries. Smyth Decl. (Ex. 1003) at ¶¶46-49.
`
`10[c]. generating instructions for a harvester, wherein the harvester provides
`requests to a server and collects available pages from the server.
`
`Quass in view of Bharat teaches or at least renders obvious this limitation.
`
`Quass teaches that the classifier’s decisions output instructions used by software (a
`
`harvester) that collectively includes: i) a “form filler” for generating requests to a
`
`server, and ii) software for collecting available pages from the server. Specifically,
`
`Quass teaches a form filler 168 that generates “requests” using the resulting structure
`
`decisions of the classifier in order to access content behind pages requiring input.
`
`Quass (Ex. 1005) at ¶44. The “decisions” of the classifier constitute instructions provided
`
`to the harvester (form filler), used to fill out forms in order to collect pages from the
`
`server. Smyth Decl. (Ex. 1003) at ¶50. Thus, Quass teaches software instructions for
`
`electronically populating forms on a network database (a “server” of dynamic content) in
`
`
`
`15
`
`

`

`IPR2019-00472
`U.S. Patent 7,454,430
`order to access information concealed by the electronic form. Id. at 3, see also id. at ¶12;
`
`Smyth Decl. (Ex. 1003) at ¶50.
`
`Quass teaches that its classifier may determine the underlying structure of forms
`
`to generate instructions for populating the forms, such as whether it is appropriate to
`
`select one particular value, spin through all values (in a plurality of requests), or change
`
`nothing:
`
`A classifier might choose from multiple classifications. For example, a
`classifier might classify a FormField object 224 as one of: (1) spin
`through all values; (2) choose one particular value; (3) don't change
`anything.
`
`Quass (Ex. 1005) at ¶68, see also id. at claim 5. Second, Quass further teaches storing
`
`retrieved pages, which constitutes “collecting” them. Id. at ¶33, see also id. at claim 1;
`
`Smyth Decl. (Ex. 1003) at ¶50. While Quass does not expressly give a name to the portion
`
`of software performing the steps of page retrieval, a PHOSITA would have understood
`
`that a portion of the software performs the steps of collecting available pages from the
`
`server. This portion of software, along with the form filler, would be understood by a
`
`PHOSITA to constitute the claimed “harvester.” Smyth Decl. (Ex. 1003) at ¶50.
`
`
`
`In the alternative, generating instructions for the claimed “harvester” would have
`
`been obvious to a PHOSITA over Quass based on a fundamental understanding that
`
`computer-implemented methods operate on instructions. Id. at ¶51. A PHOSITA would
`
`have understood that a primary purpose of methods taught by Quass is to create requests
`
`specially tailored to access all of the information hidden behind search forms, which
`
`
`
`16
`
`

`

`IPR2019-00472
`U.S. Patent 7,454,430
`would have been best accomplished by generating instructions for a harvester to collect
`
`the available pages from the server. Id. at ¶50-51; Quass (Ex. 1005) at ¶12. Specifically,
`
`since Quass directly teaches an analyzer and extractor determining the underlying
`
`structure of queries (information which was not known prior to beginning the crawl),
`
`a PHOSITA would have found the natural next step to be using the determined
`
`underlying structure information to generate instructions for a harvester in order to
`
`accomplish Quass’s purpose of accessing information hidden behind the analyzed
`
`form. Id. Therefore, a PHOSITA would have understood this would have been
`
`accomplished using only fundamental software development and programming
`
`languages used by methods and systems taught by Quass otherwise, which would
`
`have been well within the skill of a PHOSITA to accomplish. Id.
`
`To the extent it is argued that Quass does not expressly teach “instructions”
`
`for a harvester, Bharat teaches that the behavior of a crawler may be defined by
`
`rules, as discussed above. See supra Section IV.A.ii, claim 10(a). A PHOSITA
`
`would have understood that a “rule” is a form of instructions, as shown by Bharat’s
`
`express use of the term “instructions” in its claims. Smyth Decl. (Ex. 1003) at ¶58;
`
`Bharat at 12:55-13:6. A PHOSITA would have been motivated to configure the
`
`harvester taught by Quass using “instructions” as taught by Bharat as a simple
`
`method of structuring the computer-implemented method in software. A PHOSITA
`
`would have a reasonable expectation of success in doing so because computer-
`
`implemented methods operating according to instructions were well-known and
`
`
`
`17
`
`

`

`IPR2019-00472
`U.S. Patent 7,454,430
`ubiquitous at the time of the ’430 Patent, and are expressly discussed as being
`
`utilized in the invention of Quass. Quass (Ex. 1005) at ¶¶23-24, 47, Figs. 5,6; Smyth
`
`Decl. (Ex. 1003) at ¶61.
`
`iii. Claim 12
`12. The method of claim 10, wherein the scout crawling rules are divided into rules
`dealing with static pages and rules dealing with dynamic pages.
`
`Quass in view of Bharat teaches or at least renders obvious this limitation.
`
`The ’430 Patent describes these “divided” rules as merely treating static pages
`
`and dynamic pages differently, directly storing static content while analyzing and
`
`harvesting dynamic content. ’430 Patent (Ex. 1001) at 13:56-64. The ’430 Patent
`
`does not support a narrower understanding of “divided” rules in which disjointed
`
`software modules are applied to each of the two types of pages. Smyth Decl. (Ex.
`
`1003) at ¶52.
`
`
`
`Quass teaches or at least renders obvious treating static pages and dynamic
`
`pages differently, with simple storage of static pages and more sophisticated
`
`analysis, parsing, and request submission for dynamic pages and, thus, teaches this
`
`limitation. See supra Section IV.A at claims 1, 10; Smyth Decl. (Ex. 1003) at ¶53. A
`
`PHOSITA would have understood that programming a module to treat static and
`
`dynamic pages differently would have performed different steps, requiring different
`
`rules to be applied to each of the static and dynamic pages. Smyth Decl. (Ex. 1003)
`
`at ¶53. Therefore, a PHOSITA would have been motivated to use rules divided for
`
`static and dynamic pages. Id.
`
`
`
`18
`
`

`

`IPR2019-00472
`U.S. Patent 7,454,430
`Further, as previously discussed, to the extent it is argued that Quass does not
`
`
`
`expressly teach “rules,” Bharat teaches computer-implemented methods operating
`
`according to “rules,” and a PHOSITA would have been motivated to configure the
`
`computer-implemented method of Quass to operate according to scout crawling “rules.”
`
`See supra Section IV.A.ii, claim 10(a). Therefore, Quass in view of Bharat teaches or at
`
`least renders obvious scout crawling rules divided into rules dealing with static pages and
`
`rules dealing with dynamic pages Smyth Decl. (Ex. 1003) at ¶¶52-53.
`
`iv. Claim 13
`13. The method of claim 12, wherein a plurality of questions is selected to cover
`all possible patterns of the dynamic pages produced by a server, to allow the
`analyzer and the harvester to create exhaustive enumerations of questions that
`generate all dynamic pages that the server can produce.
`
`The ’430 Patent describes that the claimed “questions” are synonymous with
`
`“queries” or “requests” to a database for information the database contains. See
`
`supra Section IV.A.ii, claim 10(b). Quass in view of Bharat teaches or at least
`
`renders obvious this limitation.
`
`
`
`Quass describes a pluralit

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket