`
`
`
`
`
`
`
`
`
`Exhibit A
`
`
`
`Case 6:20-cv-00592-ADA Document 1-1 Filed 06/29/20 Page 2 of 17
`
`(12) United States Patent
`Parikh
`
`(54)
`(76)
`
`(*)
`
`(21)
`(22)
`(65)
`
`(60)
`
`(51)
`
`(52)
`(58)
`
`Notice:
`
`DYNAMICLEARNING FOR NAVIGATION
`SYSTEMS
`Prashant Parikh, 254 E. 68' St.,
`Inventor:
`Apartment 21D, New York, NY (US)
`10O21
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 305 days.
`Appl. No.: 11/361,726
`Filed:
`Feb. 24, 2006
`Prior Publication Data
`US 2006/0200442 A1
`Sep. 7, 2006
`Related U.S. Application Data
`Provisional application No. 60/656,631, filed on Feb.
`25, 2005.
`Int. C.
`(2006.01)
`G06F 7/00
`(2006.01)
`G06F 7/30
`U.S. Cl. ........................ 707/737, 707/749; 707/756
`Field of Classification Search ....................... None
`See application file for complete search history.
`
`USOO7689617B2
`US 7.689,617 B2
`Mar. 30, 2010
`
`(10) Patent No.:
`(45) Date of Patent:
`
`(56)
`
`References Cited
`U.S. PATENT DOCUMENTS
`2005/0055344 A1
`3/2005 Liu et al. ....................... 707/3
`
`OTHER PUBLICATIONS
`Belew, "Adaptive Information Retrieval: Using a connectionist rep
`resentation to retrieve and learn about documents'. Proceedings of
`the 12th annual international ACM SIGIR conference on Research
`and development in information retrieval, pp. 11-20, ACM, 1989.*
`* cited by examiner
`Primary Examiner Neveen Abel Jalil
`Assistant Examiner Michael J Hicks
`(74) Attorney, Agent, or Firm Weitzman Law Offices, LLC
`(57)
`ABSTRACT
`
`A method performed in a system involves, at a node within the
`system, receiving an input from a user, determining that the
`input contains an unknown word, presenting at least one
`response to the user, and based upon at least one additional
`input from the user, learning one or more associations for the
`unknown word.
`20 Claims, 7 Drawing Sheets
`
`Accept query from
`the user
`
`500
`
`Break the query into
`individual words and
`discard stop words
`
`505
`
`Are there any
`keywords?
`
`No
`
`
`
`Are there any
`learned words?
`
`515
`
`Yes
`
`52O
`No
`Allow the user to navigate
`the graph and select desired
`document
`
`
`
`530
`
`Fetch documents for each
`keyword in the query
`
`Fetch keywords associated
`with each learned word
`
`535
`
`y
`Fetch documents using the
`associated keywords
`
`540
`
`Rank the documents and
`offer to user for selection
`
`545
`
`Is the user
`satisfied?
`
`Yes
`
`Store the new associations
`and scoring information
`
`S50
`
`
`
`Case 6:20-cv-00592-ADA Document 1-1 Filed 06/29/20 Page 3 of 17
`
`U.S. Patent
`
`Mar. 30, 2010
`
`Sheet 1 of 7
`
`US 7.689,617 B2
`
`
`
`10
`
`OE. -
`
`OOOOOOOOOOOOOOOOOOOOO so I H OOOOO
`
`OOOOOOOOOOOOOOOO pap
`
`ODO
`DDDDDDDDDD
`O DOD
`DDDDDDDDDDDD
`OOOOO
`
`
`
`
`
`
`
`
`
`50
`
`60
`
`
`
`Other input
`
`F.G. 1
`
`
`
`Case 6:20-cv-00592-ADA Document 1-1 Filed 06/29/20 Page 4 of 17
`
`U.S. Patent
`
`Mar. 30, 2010
`
`Sheet 2 of 7
`
`US 7.689,617 B2
`
`Associate keywords with
`documents based on
`available data on the
`device
`
`200
`
`
`
`
`
`Create data structures
`based on the associations
`to facilitate search
`
`210
`
`FIG. 2
`
`
`
`Accept query from the user - 300
`
`Select documents using
`words in user query
`
`310
`
`Offer the ranked list to user
`for selection
`
`320
`
`Get use response and
`perform learning and
`scoring if required
`
`- 330
`
`FIG. 3
`
`
`
`Case 6:20-cv-00592-ADA Document 1-1 Filed 06/29/20 Page 5 of 17
`
`U.S. Patent
`
`Mar. 30, 2010
`
`Sheet 3 of 7
`
`US 7.689,617 B2
`
`Create a list of documents on the
`device
`
`400
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`For each document create a list of
`unique words belonging to it
`
`410
`
`Discard all stop words from the list - 420
`of unique words
`
`Create an association of the
`document with each remaining
`keyword
`
`
`
`430
`
`Store the information for future use
`
`440
`
`FG. 4
`
`
`
`Case 6:20-cv-00592-ADA Document 1-1 Filed 06/29/20 Page 6 of 17
`
`U.S. Patent
`
`Mar. 30, 2010
`
`Sheet 4 of 7
`
`US 7.689,617 B2
`
`Accept query from
`the user
`
`SOO
`
`
`
`Break the query into
`individual words and
`discard stop words
`
`505
`
`
`
`- 510
`Are there any
`keywords?
`
`
`
`
`
`
`
`
`
`No
`
`Are there any
`learned words?
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`NO
`Allow the user to navigate
`the graph and select desired
`document
`
`
`
`
`
`Is the user
`satisfied?
`
`530
`
`Fetch documents for each
`keyword in the query
`
`Fetch keywords associated
`with each learned word
`
`Fetch documents using the
`associated keywords
`
`540
`
`Y
`Rank the documents and
`offer to user for selection
`
`F.G. 5
`
`Yes
`
`Store the new associations
`and scoring information
`
`550
`
`
`
`Case 6:20-cv-00592-ADA Document 1-1 Filed 06/29/20 Page 7 of 17
`
`U.S. Patent
`
`Mar. 30, 2010
`
`Sheet 5 Of 7
`
`US 7.689,617 B2
`
`Accept query from
`the user
`
`600
`
`
`
`Break the query into
`individual words and
`discard stop words
`
`605
`
`
`
`
`
`Are there any
`keywords?
`
`Fetch documents for each
`keyword in the query
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Are there any
`learned words?
`
`
`
`
`
`Allow the user to navigate
`the graph and select desired
`document
`
`Fetch keywords associated
`with each learned word
`
`
`
`Rank the documents and
`offer to user for selection
`
`Is the user
`satisfied?
`
`Yes
`
`
`
`F.G. 6
`
`Store the new associations
`and scoring information
`
`645
`
`
`
`Case 6:20-cv-00592-ADA Document 1-1 Filed 06/29/20 Page 8 of 17
`
`U.S. Patent
`
`Mar. 30, 2010
`
`Sheet 6 of 7
`
`US 7.689,617 B2
`
`Fetch the keywords associated with - 700
`the learned word
`
`
`
`Discard keywords having frequency
`of occurrence across the documents
`higher than (first) threshold value
`
`710
`
`Select only those documents having
`more associated keywords than the
`(second) threshold percentage
`
`720
`
`FIG. 7
`
`Fetch the keywords associated with
`the learned word
`
`800
`
`Discard keywords having frequency
`of occurrence across the documents
`higher than threshold value
`
`Identify the keywords that have co
`occurrence pattern similar to learned
`word using standard matrix calculation
`
`
`
`820
`
`Select documents associated with the 1 830
`identified keywords
`
`FG. 8
`
`
`
`Case 6:20-cv-00592-ADA Document 1-1 Filed 06/29/20 Page 9 of 17
`
`U.S. Patent
`
`Mar. 30, 2010
`
`Sheet 7 Of 7
`
`US 7.689,617 B2
`
`Fetch the score of every keyword
`from the user query with selected
`documents from the stored data
`
`900
`
`V
`Total the score for each document 1 910
`
`
`
`
`
`
`
`Sort the documents in the descending- 920
`order of their scores
`
`FIG 9
`
`
`
`
`
`
`
`
`
`
`
`Fetch the score of every keyword
`associated the learned word from the
`Stored data
`
`V
`For every selected document, total
`the score of all the keywords used to
`select that document
`
`1010
`
`Divide the total score for the document
`with number of keywords used to select
`that document to get average Score
`
`1020
`
`
`
`Sort the documents in the descending
`order of their scores
`
`1030
`
`F.G. 10
`
`
`
`Case 6:20-cv-00592-ADA Document 1-1 Filed 06/29/20 Page 10 of 17
`
`10
`
`15
`
`25
`
`35
`
`US 7,689,617 B2
`2
`1.
`“antique automobile” will be missed. This is true not just of
`DYNAMIC LEARNING FOR NAVIGATION
`traditional search systems but is also true of navigation sys
`SYSTEMS
`tems like IVRS and speech recognition systems which require
`the choosing of specific options (e.g. "press 3 for domestic
`CROSS-REFERENCE TO RELATED
`reservations' in an airline reservation system) or the utterance
`APPLICATIONS
`of specific keywords (e.g. “domestic reservations”) to
`This application claims benefit of priority, pursuant to 35
`advance the user's position in the navigation.
`U.S. Published Patent Application No. 200400983.81
`U.S.C. S 119(e), of U.S. Provisional Application Ser. No.
`60/656,631 filed Feb. 25, 2005, incorporated herein by refer
`entitled “Navigation in a Hierarchical Structured Transaction
`ence in its entirety.
`Processing System' to Parikh et al. disclosed a way of navi
`gating through a set of interconnected nodes, having a distin
`guished “start node' or “root node' where the nodes are
`FIELD OF THE INVENTION
`connected hierarchically or as a directed graph and each node
`The present invention relates to information processing
`is, contains or is associated with a "document” or a “verbal
`and, more particularly, learning in the context of search and
`description, through generation and use of associations
`navigation systems.
`among words from the documents or verbal descriptions.
`However, that method has some limitations in that it may not
`be possible to generate all useful associations in advance and
`NOTICE OF COPYRIGHT RIGHTS
`the disclosed techniques for learning meanings for words did
`A portion of the disclosure of this patent document con
`not handle certain cases.
`tains material that is protected by copyright. The copyright
`owner has no objection to the facsimile reproduction of the
`BRIEF DESCRIPTION OF THE DRAWINGS
`patent document or the patent disclosure as it appears in the
`FIG. 1 is a simplified representation of the various devices
`Patent and Trademark Office file or records, but otherwise
`reserves all copyright rights whatsoever.
`that can be used in connection with the present invention.
`FIG. 2 illustrates in simplified form a flowchart relating to
`the generation of data structures in the system.
`COMPUTER PROGRAM LISTINGAPPENDIX
`FIG.3 illustrates in simplified form a flowchart relating to
`a process leading to the response of the system to a user query.
`This application includes, and incorporates herein by ref
`FIG. 4 illustrates in simplified form a flowchart of a process
`erence as if fully set forth herein, program code submitted
`30
`herewith (and in the parent application) on two (2) CD-R
`for the generation of data relating to the association of docu
`compact disks (an original and a duplicate) that are direct
`ments and keywords within the system.
`FIG. 5 illustrates in simplified form a flowchart for a learn
`copies of those created on Feb. 25, 2005 submitted with the
`ing process involving keywords.
`parent application and respectively labeled “Copy 1” and
`“Copy 2 and respectively containing identical ASCII text
`FIG. 6 illustrates in simplified form a flowchart for a learn
`files of 105 kilobytes named “Computer Program Listing
`ing process involving documents.
`Appendix 4428-4003.txt”.
`FIG. 7 illustrates in simplified form a flowchart for the
`selection of documents using a learned word.
`FIG. 8 illustrates in simplified form a flowchart of an alter
`BACKGROUND OF THE INVENTION
`nate process for selecting documents using a learned word.
`FIG.9 illustrates in simplified form a flowchart of a process
`In modern life, there are a number of situations involving
`for the ranking of documents using keyword scores.
`search and navigation for relevant data. All of these share two
`characteristics: a repository of structured (e.g. relational data
`FIG. 10 illustrates in simplified form a flowchart of a pro
`bases), semi-structured (e.g. a website), or unstructured (a
`cess for the ranking of documents using learned word scores.
`text document like the Declaration of Independence or this
`patent application) data; and a device through which the user
`DETAILED DESCRIPTION
`conducts the search or navigation (e.g. a personal computer or
`The present claimed invention improves upon the inven
`multifunctional handheld unit like a PDA or cellular phone).
`The data may be stored on the device itself (e.g. on a com
`tion of Parikh et al. and can be implemented in the same
`puters hard disk or in a cellular phone's memory) or may
`contexts. In addition, for ease of understanding, the terminol
`50
`ogy of that application is generally maintained unless noted to
`reside remotely on one or more servers to which the device is
`connected in one of several possible ways during the search or
`the contrary and the text of U.S. Published Patent Application
`navigation.
`No. 20040098381 entitled “Navigation in a Hierarchical
`Structured Transaction Processing System' to Parikh et al. is
`Familiar examples of Such systems are traditional search
`incorporated herein by reference as if fully set forth herein.
`engines for finding information on the WorldWideWeb, or in
`specialized databases (e.g. EDGAR, the U.S. Patent Office's
`In overview, the present approach builds upon that
`PATFT and AppFT databases containing patent documents,
`approach of generating associations between general words
`(in particular, words that are not keywords and are not syn
`or MEDLINE, the database of medical information), or the
`many search systems available for information residing on
`onyms given in a “thesaurus' generated in advance, herein
`personal computers; navigation systems also abound (e.g.
`after called “learned words” or “unknown words' depending
`so-called IVRS or Interactive Voice Response Systems which
`on the context) and keywords and using these associations to
`are ubiquitous today and involve tracing a path down a menu
`direct the user to relevant nodes of the graph that contain
`tree one step at a time or more advanced “dialogue systems’
`information or instructions the user may be seeking. How
`that involve more Sophisticated speech recognition).
`ever, in the present case, when a user types an unknown word
`(i.e. a word that is neither a keyword or an existing synonym,
`All these systems share a limitation in common—that the
`search or navigation is based entirely on keywords. If a user is
`either generated in advance or through the learning method
`searching for a "vintage car then a document containing
`itself), the user is asked in one of several different ways to
`
`40
`
`45
`
`55
`
`60
`
`65
`
`
`
`Case 6:20-cv-00592-ADA Document 1-1 Filed 06/29/20 Page 11 of 17
`
`10
`
`15
`
`25
`
`30
`
`35
`
`US 7,689,617 B2
`3
`4
`equipment will be included. Although described, for purposes
`trace down a path from the root node of the graph (or from the
`of clarity, with reference to keyboard-type entry, it is to be
`level of the root node's children) to a node that contains
`understood that the present invention is independent of the
`information relevant to the user. Then one of two actions are
`particular mode of, or device used for, user input.
`taken: i) either the unknown word is associated with that node
`itself (or equivalently, with the document attached to that
`At the outset, it should be noted that, for the purposes of
`this description, a “document as used herein is intended to be
`node), or, ii) more powerfully, the unknown word is associ
`a very general term covering one or more characters, whether
`ated with all the keywords in the document. Thereafter, as
`alone or in conjunction with numerals, pictures or other
`before, the next time a user inputs the same hitherto unknown
`(but now learned) word to the system, the system looks up the
`items. A document's length can vary from a single “word to
`any number of words and it can contain many types of data
`associated node(s) or keywords and returns either the nodes
`other than words (e.g. numbers, images, Sounds etc.). Thus,
`associated with the learned word (if the first action above is
`ordinary documents such as pages of text are documents, but
`employed), or the nodes/documents that contain the associ
`so are spreadsheets, image files, sound files, emails, SMS text
`ated keywords (if the second action above is taken) to the user.
`AS before, the process of learning continues so a learned word
`messages etc. Not only these, but also text in the form of, or
`can acquire additional associated nodes or keywords as users
`that refers to, executable instructions/actions/transactions,
`continue to use the system.
`executable by the device being used or by a remote device
`The first method of learning is applicable in situations
`connected with the device being used, like a “menu can also
`constitute a document in this sense. (This latter type of docu
`where the documents contain very little or no information
`identifiable by the system and thus keywords are not available
`ment becomes relevant when a user is looking for a relevant
`command to execute (e.g. changing a setting on a cellular
`for association with the unknown word. A typical example
`phone or transferring a monetary amount from one account to
`would be an archive of images or photographs with no com
`another on a website or, more prosaically, deleting/moving/
`ments or meaningful names. In such a circumstance the
`renaming/copying a file, sending an SMS message, calling up
`unknown word is associated with the document itself. Thus,
`a contact etc.) by typing in a suitable query or selecting a
`this approach allows the system to learn even in the absence of
`specific option).
`keywords. In addition, this approach also optionally can pro
`Often, when referring to a document, the node in the graph
`vide for the user to train the system to select certain docu
`ments with specific words without having to rename the docu
`to which it is attached will also be referred to. This is because
`while, in some applications involving a more traditional kind
`ments or having to add comments to them.
`The second method approach ignores the issue of exactly
`of search, it is the documents themselves that are relevant, in
`other applications (e.g. an interactive Voice response unit or
`how many associated keywords should be considered in the
`“IVR), it may be the node that is relevant, since getting to a
`selection of the results conveyed to the user and thus allows
`for various options including providing some suitable func
`relevant node may then trigger an entire sequence of further
`tion of the selected results to the user rather than the results
`actions or transactions.
`Finally, documents may or may not have “names. For
`themselves.
`Optionally, a new approach to ranking can also be incor
`example, a file on a computer hard disk will typically have a
`name whereas a prompt in an IVR typically does not have a
`porated. As described in more detail below, this is done by
`keeping an appropriate type of score against each associated
`name. These names, when they exist, may simply be consid
`keyword and using this score in an appropriate way to deter
`ered to be part of the documents.
`As noted above, a “word.” for the purposes of this descrip
`mine the rank of a resulting node/document.
`tion, can be considered to be more than a string of alphabetic
`The present invention can be used with a variety of elec
`tronic devices such as the pager (10), Personal Digital Assis
`characters, it may include numeric and other symbols as well.
`Broadly, the invention also contemplates character strings
`tant ("PDA'30), conventional or IP telephone (20), cellular
`telephone (80), computer (with commonly associated periph
`that include any discrete collection of characters, symbols, or
`stroke based pictographs or ideograms, for example, those
`erals and input devices, such as keyboard (40), microphone
`used in languages like Chinese, Korean and Japanese, and
`(50), video cam (70), or other input devices (60), as shown in
`FIG. 1. The minimum requirements for any such device are
`thus can benefit from use of the present invention. Thus,
`although for simplicity the term “word is used in the follow
`Some means for accepting input from a user (as text or ver
`bally), one or more processor(s) that execute stored program
`ing discussion, it should be understood to encompass any
`instructions to process the input, storage for the data and the
`discrete collection of characters, symbols or other stroke
`based representations of communicative concepts, thoughts
`program instructions and a display or other output device of
`or ideas. Thus, the present techniques, although described
`some sort to make output visible or available to the user.
`with reference to English, are independent of any particular
`Representative, non-exhaustive, example input devices can
`language. It can be used for phonetic, pictographic or ideo
`include, but are not limited to, a keyboard (40) or telephone
`keypad (90), as well as a handwriting or Voice recognition
`graphic languages when the characters, pictograms or ideo
`system, a touchpad, a pointing device like a mouse, joystick,
`grams used therein (or "stroke' components thereof) are con
`sidered “words' and thereby are intended to be encompassed
`trackball or multi-directional pivoting switch or other analo
`gous or related input devices (individually and collectively
`by the terms “text' and “textual.” Similarly, words also
`encompass (sound data) wave files corresponding to spoken
`represented for simplicity in FIG.1 as “Other Input 60). The
`storage, whether contained in a computer, phone or other
`words in any language. That is, when indexes are mentioned
`device mentioned above preferably includes non-volatile
`later, the association between two words or a word and a node
`memory, and can also include Volatile semiconductor-based
`or document could involve association with (sound data)
`memory, electromagnetic media, optical media or other types
`wave files rather than actual text. Likewise, for simplicity in
`the following examples, the terms “typing or “typed are
`of rewriteable storage used with computer devices. If a dis
`play is used, the display may be small and capable of display
`often used to describe data entry. However, those terms
`ing only text and capable of displaying monochrome or color
`should be broadly read to encompass any and all methods of
`data entry, whether involving entry through use of a keyboard,
`images in addition to text. If another output device is used,
`like a text to speech converter, appropriate implementing
`a pointing or selection device, a stylus or other handwriting
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`
`
`Case 6:20-cv-00592-ADA Document 1-1 Filed 06/29/20 Page 12 of 17
`
`10
`
`15
`
`25
`
`30
`
`35
`
`US 7,689,617 B2
`6
`5
`system, each pertinent document is similarly stripped downto
`recognition system, as well as sound entries via a micro
`form bags and the mathematical union of these bags can be
`phone, etc. They are not in any way intended to be limited
`taken to form a larger bag.
`only to methods that make use of a typewriter-like keyboard.
`As a side note, optionally, a certain class of words, typically
`Examples of devices that can use and benefit from incor
`called “stop words.” are removed from such document-de
`poration of the invention therein range from large computer
`rived bags. Stop words are words like “the “of” “in” etc. and
`networks, where an implementation of the invention may be
`are removable because they usually are not very informative
`part of or an application on the network, to Small portable
`about the content of the document. A non-limiting example
`hand held devices of more limited or specialized function
`flowchart for handling stop words is illustrated in FIG. 4. A
`Such as cell phones, text messaging devices and pagers.
`list of documents on the device is prepared (400) and a list of
`Implementations incorporating the invention can be used to
`unique words for each document created (410). Stop words
`assist users in interacting with large databases.
`are then removed from this list (420) and an association is
`In further overview, as shown in the simplified flowchart of
`made between each of the remaining keywords and the docu
`FIG. 2, given documents, organized by attachment to the
`ment (430). These associations can be stored for future use
`(440).
`nodes in a graph, new or unknown or learned words (and also,
`optionally, keywords and their synonyms already pre-exist
`Stop words, if removed, can be removed from the bags
`either before or after a mathematical union of the bags is
`ing in the system), are associated with keywords derived from
`made, as the end result is the same. Typically stop words are
`the documents or associated directly with nodes in the graph
`identified in a list which can be used for the exclusion process.
`(200). Data structures are then created based on these asso
`Since the stop word removal process is well known it is not
`ciations to facilitate future searches (210).
`described herein. In addition, in Some implementations where
`Moreover, upon a user inputting into or querying the sys
`a stop word list is used, the list may be editable so that
`tem with Such words, the associated keywords are used in
`additional words can be defined as “stop words.” For
`multiple possible ways to determine the results (i.e. docu
`example, otherwise non-trivial words that are trivial in the
`ments/nodes) returned to the user ranked in a way determined
`particular context because they occur too often in that context
`by a scoring system associated with the associated keywords.
`(e.g. words like “shares' in stock related government filings)
`One non-limiting embodiment of this approach is depicted in
`may be added to the list of stop words either in an automated
`the simplified flowchart of FIG.3. The device accepts a query
`or manual way based on their high frequency.
`from the user (300) and selects documents using words from
`By way of simplified example, if the user has just two
`the query (310). A ranked list is then offered to the user (320)
`documents to consider: “d 1 made up of “an apple, apple
`and, based on the response of the user, new associations can
`cider and an orange' and “d2 made up of “a paper apple'
`be learned (320).
`then, each corresponding bag is apple, cider, apple, orange}
`The various approaches for doing the above are now
`and paper, apple. Their union is the larger bag apple, cider,
`described by way of example.
`apple, orange, paper, apple and a set for the bag would be
`{apple, cider, orange, paper).
`Assume that the documents in the collection (i.e. the entire
`set of documents under consideration) are organized in the
`Now considera Somewhat larger example as an illustration
`form of a graph of nodes, with one or more documents
`of how associations are formed and used in the invention.
`attached to each node. (For simplicity, it will be assumed that
`Let there be five documents d1,d2, d3, d4, and d5 contain
`there is just one document attached to each node; however, it
`ing the following words (i.e. the sets rather than the bags are
`is easy to extend the techniques when multiple documents are
`described):
`connected to a single node, although one can usually sepa
`d1: apple, cider, orange
`rately attach each document to a single node by adding more
`d2: apple, paper
`nodes to the graph.)
`d3: apple, orange, table, chair
`One way of thinking of a document that contains one or
`d4: table, chair, desk, furniture
`d5: apple, Strawberry, banana, raspberry
`more words is as a bag or multiset of words. A bag or multiset
`is like an ordinary set in mathematics, a collection, except that
`Assume further that the graph in which these documents
`are organized is particularly simple, where there is a root node
`it can contain multiple occurrences of the same element. For
`example, book, cape, pencil, book is a bag containing four
`and five children nodes at the next level to each of which one
`words of which the word “book' appears twice. The order of
`of the documents above is attached. Assume also that an
`inverted index associating each word in each document (i.e.
`occurrence of elements in a bag does not matter, and could
`equally be written as book, book, pencil, cape. Also, any
`the keywords) with the documents that contain the word has
`bag can be converted to a set just by dropping multiple occur
`been formed. This would look as follows: apple—d1,d2, d3,
`rences of the same element. Thus, the example bag above,
`d5; cider—d 1; orange—d 1, d3; and so on for the rest of the
`when converted to a set, would be book, cape, pencil. To
`words.
`With ordinary navigation or search by keywords, this
`create the bag or multiset, the contents of a document with the
`exception of numbers which are a special case are stripped of
`would work in the usual way, i.e. if a user types the word
`all internal structure (e.g. Syntactic structure, punctuation
`"desk', the system would look up the inverted index, find that
`etc.) including all non-lexical items like images, Sounds etc.
`"desk” is associated just with d4, and then return d4 to the
`(A special case is when wave files corresponding to useful
`USC. However, unlike conventional systems, systems imple
`information are retained so that the result offered to a user
`might be speech corresponding to the wave file derived from
`menting the techniques described herein provide the user, not
`a speech synthesizer.) The resulting stripped document would
`d4 itself, but some function of d4 (e.g. a parent or a child).
`be a bag or multiset of words as described above which may
`Now assume a user inputs the new or unknown word “fruit'
`into the system. (Note that the word “fruit can be considered
`also include numbers and in which some words may occur
`new because it does not belong to any document.) According
`multiple times. For a user who has a device with a number of
`to one approach, the user is taken to the root node and offered
`stored documents or a service offering its stored databases for
`search or a menu tree offered by a company through its phone
`the next level of the graph, namely, the five nodes/documents
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`
`
`Case 6:20-cv-00592-ADA Document 1-1 Filed 06/29/20 Page 13 of 17
`
`10
`
`15
`
`US 7,689,617 B2
`8
`7
`If the users query does not contain a keyword or a learned
`above. Assume the user picks d1. Then the system creates a
`new entry for “fruit as follows: fruit—apple, cider, orange,
`word, then the user is allowed to navigate to the desired
`associating all the keywords in d1 with “fruit.
`document (520), the user is deemed “satisfied by the navi
`gation and an association between the selected document and
`This is where the first set of options arise.
`the unknown word in the user query is learned from that
`One optional approach is as follows. When gathering key
`navigation making the unknown word a learned word. This
`words from the document selected by the user after tracing
`new association is stored by the system for future use (550).
`down the path in the graph, the system gathers all the key
`It should be clear that each keyword associated with a new
`words (i.e. all the words except for possible stop words) from
`or learned word is associated with a set of documents that
`the document (or optionally just some select ones based on
`contain it, and that it is possible to return any Subset of
`one or more of various exclusion criteria). For example, if the
`documents between the two extremes of intersection and
`system does not collect all the keywords, then with one exclu
`union of these associated sets. In another example, if there
`sion approach, the system selects those keywords in the docu
`had been a hundred keywords associated with a new word,
`ment based one or more criteria of “importance' (e.g. high
`then it would be possible in principle to return either only
`frequency words, capitalized words, words appearing in any
`those documents corresponding to the intersection of all asso
`titles, or any of a number of Such criteria singly or in combi
`ciated document sets (those that contain all hundred key
`nation). Here, for purposes of illustration, since the relevant
`words) or the union of these associated document sets (those
`files are small, all the keywords have been selected.
`that contain at least one of the hundred keywords) or any sort
`The next time the system receives user input with the same
`of overlap in between, expressible by combinations of the
`word “fruit the system will look up the entry for “fruit', find
`union and intersection operations on the document sets (those
`the keywords associated with it (i.e. “apple”, “cider,
`that contain anywhere between 1 and 100 of the associated
`"orange', in this example), and return the documents that
`keywords, say at least 35 keywords, as one example).
`contain some or all of these keywords (or some function of
`One simple way of implementing this approach is to return
`these documents). Since “apple' is contained in d1 d2, d3,
`those documents that contain some percentage (“p') of the
`d5, “cider' is contained injust d1, and "orange' in d1 and d3,
`associated keywords. Depending upon the implementation,
`there is now a choice of how to select the documents to be
`this parameter “p' can be set in a variety of ways: it can be set
`returned from these listed documents.
`by the user (where the user could alter it even for the same
`At one extreme, only those documents are provided tha