`
`
`
`
`
`
`
`EXHIBIT A
`
`
`
`Case 1:20-cv-23178-WPD Document 1-1 Entered on FLSD Docket 07/31/2020 Page 2 of 18
`
`(12) United States Patent
`Parikh
`
`(54) AUTOMATIC DYNAMIC CONTEXTUAL
`DATA ENTRY COMPLETON SYSTEM
`(76) Inventor: Prashant Parikh, New York, NY (US)
`-
`(*) Notice:
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`(21) Appl. No.: 12/623,093
`(22) Filed:
`Nov. 20, 2009
`(65)
`Prior Publication Data
`US 201O/OO70855A1
`Mar. 18, 2010
`O
`O
`Related U.S. Application Data
`(63) Continuation of application No. 11/040,470, filed on
`Jan. 21, 2005, now Pat. No. 7,630,980.
`(51) Int. Cl.
`(2006.01)
`G06F 7/30
`(52) U.S. Cl. ........ . 707,759; 707/763; 707/771; 707/780
`(58) Field of Classification Search .................. 707/1, 3,
`707/6, 104.1, 759, 769, 771, 763, 780
`See application file for complete search history.
`References Cited
`U.S. PATENT DOCUMENTS
`5,805,911 A
`9, 1998 Miller
`5,845,300 A 12/1998 Comer et al.
`6,377,965 B1
`4/2002 Hachamovitch et al.
`6,529, 187 B1
`3/2003 Dickelman
`6,564,213 B1
`5/2003 Ortega et al.
`6,751,603 B1
`6, 2004 Bauer et al.
`6,801,190 B1
`10/2004 Robinson et al.
`6,810,272 B2 10/2004 Kraft et al.
`6,820,075 B2 * 1 1/2004 Shanahan et al. ............. 715/205
`6.829,607 B1
`12/2004 Tafoya et al.
`6,886,010 B2
`4/2005 Kostoff
`6,947,930 B2
`9, 2005 Anicket al.
`
`(56)
`
`US00799.1784B2
`(10) Patent No.:
`US 7,991,784 B2
`(45) Date of Patent:
`Aug. 2, 2011
`
`5/2006 Bertram et al.
`7,043,700 B1
`9/2006 Goodman
`7,103,534 B2
`9/2006 Mulvey et al.
`7,111,248 B2
`10/2006 Comer et al.
`RE39,326 E
`7,139,756 B2 11/2006 Cooper et al.
`7,185,271 B2 * 2/2007 Lee et al. ...................... 71.5/226
`7,257.567 B2
`8/2007 Toshima
`7,266,554 B2
`9/2007 Kayahara et al.
`(Continued)
`OTHER PUBLICATIONS
`“A New Dictionary of Kanji Usange', Gakken Co., Ltd., 1982, pp.
`433-435.
`(Continued)
`Primary Examiner — Tim T. Vo
`Assistant Examiner — Dangelino N Gortayo
`(74) Attorney, Agent, or Firm — Weitzman Law Offices,
`LLC
`ABSTRACT
`(57)
`A method, performed in a character entry system involves
`computing contextual associations between multiple charac
`terstrings based upon occurrence of character Strings relative
`to each other in documents present in the system, wherein the
`computing contextual associations involves i) identifying
`pertinent documents present in the system, ii) creating a list of
`character strings contained within documents in the system;
`and iii) creating an interrelationship between the character
`strings to contents of the system; in response to the user
`inputting a specified threshold of individual characters, iden
`tifying at least one selectable character String from among the
`character strings used in creating the computed contextual
`associations that can complete the incomplete input character
`string in context; providing the identified at least one select
`able character string to a user for selection; and receiving, in
`the system, the user's selection and completing the incom
`plete input character string based upon the selection.
`1 Claim, 6 Drawing Sheets
`
`
`
`Suggested
`Worc
`ampletions
`
`70
`
`Contextual
`Word
`Associations
`
`R 50
`
`Completed
`input
`
`180
`
`
`
`
`
`Case 1:20-cv-23178-WPD Document 1-1 Entered on FLSD Docket 07/31/2020 Page 3 of 18
`
`US 7,991,784 B2
`Page 2
`
`U.S. PATENT DOCUMENTS
`7,386,438 B1* 6/2008 Franz et al. ....................... TO4/8
`7,610, 194 B2 * 10/2009 Bradford et al. ................ TO4f10
`7,644,102 B2* 1/2010 Gaussier et al. ... 707/999.104
`7,650,348 B2* 1/2010 Lowles et al. .......... 707/999. 101
`7,657,423 B1* 2/2010 Harik et al. .
`704.9
`7,679,534 B2 * 3/2010 Kay et al. ........................ 341 (22
`
`
`
`OTHER PUBLICATIONS
`MIT's Magazine of Innovation Technology Review, Nov. 2004, pp. 4.
`5, and 27.
`* cited by examiner
`
`
`
`Case 1:20-cv-23178-WPD Document 1-1 Entered on FLSD Docket 07/31/2020 Page 4 of 18
`
`U.S. Patent
`
`Aug. 2, 2011
`
`Sheet 1 of 6
`
`US 7.991,784 B2
`
`PDA
`
`
`
`
`
`160
`
`Character
`Input
`
`Documents
`
`140
`
`
`
`
`
`170
`
`Suddested
`d
`Completion(s
`pletion.(
`
`
`
`Coyal
`Or
`Associations
`
`150
`
`Completed
`/N-7 Input
`
`
`
`180
`
`FIG. 1
`
`
`
`Case 1:20-cv-23178-WPD Document 1-1 Entered on FLSD Docket 07/31/2020 Page 5 of 18
`
`U.S. Patent
`
`Aug. 2, 2011
`
`Sheet 2 of 6
`
`US 7.991,784 B2
`
`Create list of pertinent documents on
`the device.
`Create list of unique words from
`documents.
`
`(optionally) Remove stop words from
`WOrd list.
`
`For each document, Count number of
`OCCurrences of each Word in word list.
`
`200
`
`2O5
`
`210
`
`215
`
`Store in a matrix of documents vs.
`Words.
`
`Calculate the similarity value for all
`possible pairs of documents using
`matrix.
`
`220 1,
`
`225
`
`Compare similarity values to the
`threshold value.
`
`Discard document pairs whose
`similarity value falls below specified
`threshold value.
`Form groups of documents, using
`remaining document pairs, such that
`similarity value of all possible pairs in
`each group is above the threshold
`value. Words within each similar
`document group are Contextually
`related.
`Create lists of unique words from each
`group of similar documents.
`
`1230
`
`235
`
`240
`
`245
`
`
`
`
`
`
`
`
`
`
`
`
`
`Offline
`Or Online
`Computation
`
`FG. 2a
`
`
`
`Case 1:20-cv-23178-WPD Document 1-1 Entered on FLSD Docket 07/31/2020 Page 6 of 18
`
`U.S. Patent
`
`Aug. 2, 2011
`
`Sheet 3 of 6
`
`US 7,991,784 B2
`
`Online
`Computation
`
`Accept characters from the user until the threshold
`number of required characters are entered.
`
`250
`
`ldentify relevant document groups from entered YU255
`Characters.
`
`
`
`
`
`260
`Using the identified document groups, choose words Y\f
`that are contextually associated with and match the
`Characters entered.
`
`
`
`Offer the chosen contextually associated words to
`the user for selection to complete the entry.
`
`The user accepts or rejects the offered contextually
`associated words and the process repeats with the
`beginning of the next word entry.
`
`
`
`270
`
`FIG. 2b
`
`
`
`Case 1:20-cv-23178-WPD Document 1-1 Entered on FLSD Docket 07/31/2020 Page 7 of 18
`
`U.S. Patent
`
`Aug. 2, 2011
`
`Sheet 4 of 6
`
`US 7.991,784 B2
`
`302
`
`Matrix
`
`318
`
`310
`
`312
`
`314
`
`316
`
`506 508 51O
`
`5 12
`
`514 516
`
`
`
`FIG 6 604
`
`book, chapter, Summary, two, three, ...
`
`
`
`
`
`Case 1:20-cv-23178-WPD Document 1-1 Entered on FLSD Docket 07/31/2020 Page 8 of 18
`
`U.S. Patent
`
`Aug. 2, 2011
`
`Sheet 5 of 6
`
`US 7.991,784 B2
`
`Offline Or
`Online
`Computation
`
`Create list of pertinent
`documents. On the device.
`
`Create list of unique words from
`documents.
`
`(optionally) Remove stop words
`from Word list.
`
`400
`
`410
`
`42
`O
`
`
`
`
`
`
`
`
`
`
`
`Count frequency of co
`OCCurrence, within a unit, acroSS
`documents for all possible pairs
`of Words from list.
`Enter frequency results into a - 440
`matrix.
`
`430
`
`Use matrix to identify word pairs
`that are contextually associated
`based on their frequency of co
`OCCU6Ce.
`
`450
`
`FIG. 4a
`
`
`
`Case 1:20-cv-23178-WPD Document 1-1 Entered on FLSD Docket 07/31/2020 Page 9 of 18
`
`U.S. Patent
`
`Aug. 2, 2011
`
`Sheet 6 of 6
`
`US 7,991,784 B2
`
`Online
`Computation
`
`Accept characters from the user until the threshold
`number of required characters are entered.
`
`460
`
`
`
`Identify relevant words in the matrix from entered
`characters.
`
`470
`
`Using Identified words in matrix, choose words that
`are contextually associated.
`
`480
`
`Offer the chosen contextually associated Words to
`the user for selection to complete the entry.
`
`490
`
`FIG. 4b
`
`
`
`Case 1:20-cv-23178-WPD Document 1-1 Entered on FLSD Docket 07/31/2020 Page 10 of 18
`
`US 7,991,784 B2
`2
`1.
`usage by a user and thus, rather than offering the most
`AUTOMATIC DYNAMIC CONTEXTUAL
`recently used word, offer the user's most frequently used
`DATA ENTRY COMPLETON SYSTEM
`words.
`
`30
`
`CROSS-REFERENCE TO RELATED
`APPLICATIONS
`This application is a continuation of U.S. patent applica
`tion Ser. No. 11/040,470 filed Jan. 21, 2005 now U.S. Pat. No.
`7,630,980, the entirety of which is incorporated herein by
`reference in its entirety.
`FIELD OF THE INVENTION
`The present invention relates to information processing
`and, more particularly, computer, cellphone, personal digital
`assistant, or other similar device-based text entry.
`BACKGROUND OF THE INVENTION
`In modem life, there are a number of devices, notably
`digital computers and multifunctional handheld units that
`involve data entry, typically text, including for example cel
`lular phones and other devices like organizers and handheld
`computers. For all of these, one important use is the entry of
`linguistic items like words, phrases, and sentences. For
`example, a user may create an unstructured text document or
`might formulate an email message or a short text message to
`be sent as an SMS message on a cellphone. In such cases, text
`entry may occur through use of a keyboard or stylus for some
`handheld computers or cellphones, etc. However, data entry
`can be difficult when the keyboard is relatively small as it is on
`a handheld cell phone, organizer or computer, or uses indi
`vidual keys for entry of multiple letters, text, especially when
`a large number of characters must be entered. Similarly, with
`devices employing a stylus for text entry, entry of text can be
`slow and burdensome.
`Automated word completion programs have eased the bur
`den somewhat. Such automated word completion programs
`have appeared recently in a variety of applications in a variety
`of devices. These programs are typically based on either
`predefined word Suggestion lists (e.g. a dictionary) or are
`culled from the user's own most recently typed terms, the
`latter often called MRU (i.e. “Most Recently Used') pro
`grams. For example, the former type of program is based on
`a pre-given word Suggestion list based on a dictionary aug
`45
`mented with information about which words are more fre
`quently used. If a user types the characters “Su” in a docu
`ment, then it might Suggest 'super as the appropriate word
`completion based on the fact that it belongs to the pre-given
`word Suggestion list and has a high frequency of use in gen
`50
`eral English. On the other hand, the latter type of program
`would suggest a word completion based on the user's own
`recently used words (e.g. 'Supreme' may be suggested to a
`lawyer who has recently input “Supreme Court'). Such pro
`grams are often found in web browsers for example and will
`Suggest the most recently used “uniform resource locator” or
`URL (e.g. www.google.com when the user types "www.g.)
`as characters are input.
`A third type of program is able to detect that the user is in
`a particular type of field (e.g. the closing of a letter) and will
`60
`Suggest word completions (e.g. “Sincerely when the user
`types “Si') based on a more limited “contextual” list. An
`extension of this is to maintain many separate word Sugges
`tion lists and allow the user to choose an appropriate list for
`each document the user creates. Other variants allow users to
`actually insert entries manually into word Suggestion lists
`(e.g. a name and address) or to maintain frequencies of word
`
`SUMMARY OF THE INVENTION
`While the methods delineated above have many useful
`features, there is still a lack of a true context based system that
`is dynamic and automatic and thus, there is still much room
`for improvement when it comes to data entry in Such devices.
`Systems that maintain separate word lists and allows the user
`to choose an appropriate list are contextual to some degree,
`but still have the drawback of requiring the user to make a list
`selection each time, something that can become annoying for
`a user who typically creates several documents within the
`course of a single day. Moreover, separate word Suggestion
`lists are still inefficient because they are not automatically
`generated but instead depend on the user's guidance and
`input.
`The present invention combines certain features from
`existing techniques but goes significantly beyond them in
`creating a family of techniques that are automatic, dynamic,
`and context based as explained in greater detail herein.
`The present invention involves a method, performed in a
`character entry system. The method is used for interrelating
`character strings so that incomplete input character strings
`can be completed by a selection of a presented character
`string. The approach involves computing contextual associa
`tions between multiple character Strings based upon co-oc
`currence of character Strings relative to each other in docu
`ments present in the character entry system, identifying at
`least one selectable character string from among the com
`puted contextual associations that can complete the incom
`plete input character string in context (performed in response
`to inputting of a specified threshold of individual characters),
`and providing the identified at least one selectable character
`string to a user for selection.
`The advantages and features described herein are a few of
`the many advantages and features available from representa
`tive embodiments and are presented only to assist in under
`standing the invention. It should be understood that they are
`not to be considered limitations on the invention as defined by
`the claims, or limitations on equivalents to the claims. For
`instance, Some of these advantages or features are mutually
`exclusive or contradictory, in that they cannot be simulta
`neously present in a single embodiment. Similarly, some
`advantages are applicable to one aspect of the invention, and
`inapplicable to others. Thus, the elaborated features and
`advantages should not be considered dispositive in determin
`ing equivalence. Additional features and advantages of the
`invention will become apparent in the following description,
`from the drawings, and from the claims.
`BRIEF DESCRIPTION OF THE FIGURES
`FIG. 1 illustrates, in simplified form, a top-level flowchart
`for the automatic completion of character input using contex
`tual word associations;
`FIG. 2a illustrates a simplified flowchart for computing
`contextual associations in one example implementation of the
`invention;
`FIG.2b illustrates a simplified flowchart for the selection
`of contextual associations in an example implementation of
`the invention;
`FIG. 3 illustrates an example documents versus words
`matrix used to compute contextual associations with an
`example implementation of the invention;
`
`10
`
`15
`
`25
`
`35
`
`40
`
`55
`
`65
`
`
`
`Case 1:20-cv-23178-WPD Document 1-1 Entered on FLSD Docket 07/31/2020 Page 11 of 18
`
`15
`
`US 7,991,784 B2
`3
`4
`FIG. 4a illustrates a simplified flowchart for computing
`ideograms used therein (or “stroke' components thereof) are
`contextual associations in an alternative example implemen
`considered “words' and thereby are intended to be incom
`passed by the terms “text' and “textual.” In some cases, an
`tation of the invention;
`FIG. 4b illustrates a simplified flowchart for the selection
`entire pictogram or ideogram will be usable as a “word as
`of contextual associations in an alternative example imple- 5
`described herein with entry of a component of the pictogram
`mentation of the invention;
`or ideogram, Such as a defined stroke, being analogous to
`FIG. 5 illustrates an example matrix of pairs of words used
`entry of a letter in English. Likewise, for simplicity in the
`to compute contextual associations in the alternative example
`following examples, the terms “typing or “typed are used to
`implementation of the invention; and
`describe data entry. However, those terms should be broadly
`FIG. 6 illustrates an example set of word lists for a word 10
`read to encompass any and all methods of data entry, whether
`completion example involving the alternative example imple
`involving entry through use of a keyboard, a pointing or
`mentation.
`selection device, a stylus or other handwriting recognition
`system, etc. They are not in any way intended to be limited
`DETAILED DESCRIPTION OF THE INVENTION
`only to methods that make use of a typewriter-like keyboard.
`Examples of devices that can use and benefit from incor
`The present invention can be used with a variety of elec
`poration of the invention therein range from large computer
`tronic devices. The minimum requirements for any Such
`networks, where an implementation of the invention may be
`device are some means for accepting textual input from a user,
`part of or an application on the network, to Small portable
`one or more processor(s) that execute stored program instruc
`hand held devices of more limited or specialized function
`tions to process the input, storage for the data and the program 20
`Such as cell phones, text messaging devices and pagers.
`instructions and a display or other output device of some sort
`Implementations incorporating the invention can be used to
`to make output visible or available to the user. Representative,
`assist users in interacting with large databases by helping in
`non-exhaustive, example input devices can include, but are
`not limited to, a keyboard, a handwriting recognition system
`the entry of search terms or in data entry. Other implementa
`that makes use of a stylus, a touchpad, a telephone keypad, a 25
`tions incorporating the invention are particularly useful for
`pointing device like a mouse, joystick, trackball or multi
`portable devices, in which the input device is limited by size
`directional pivoting Switch or other analogous or related input
`and difficult to work with, because the automatic completion
`of character string entries provides greater benefits in Such
`devices. The storage preferably includes non-volatile
`devices. Still other implementations incorporating the inven
`memory, and can also include Volatile semiconductor-based
`memory, electro-magnetic media, optical media or other 30
`tion are particularly useful for devices used by those with
`types of rewriteable storage used with computer devices. If a
`physical handicaps. In addition to the methods of character
`display is used, the display may be small and capable of
`input already mentioned, devices intended for use by handi
`displaying only text or much larger and capable of displaying
`capped individuals may rely on Some type of pointing device
`to select individual characters for input. The pointing device
`monochrome or color images in addition to text. If another
`output device is used, like a text to speech converter, appro- 35
`may be controlled by movement of the eyes, head, hands, feet
`priate implementing equipment will be included. Although
`or other body part depending on the abilities of the particular
`described, for purposes of clarity, with reference to keyboard
`individual. The present invention may also be used with
`“text that is implemented in braille or other tactile represen
`type entry, it is to be understood that the present invention is
`independent of the particular mode of, or device used for, text
`tations for individuals with impaired vision.
`In overview, in connection with the invention, words from
`40
`data entry.
`At the outset, it should be noted that, for the purposes of
`one or more documents are associated, in either a fully or
`partially automated way, based on context. Context is derived
`this invention, a "document” as used herein is intended to be
`a very general term covering one or more characters, whether
`from the co-occurrence of individual words in documents. In
`alone or in conjunction with numerals, pictures or other
`addition, the associations can be pre-computed and static or
`dynamic So they can thereby evolve and improve with con
`items. A document’s length can vary from a single “word to 45
`any number of words and it can contain many types of data
`tinued use.
`For example, in an implementation of the invention, an
`other than words (e.g. numbers, images, Sounds etc.). Thus,
`association between “finance' and "Summary may be gen
`ordinary documents such as pages of text are documents, but
`so are spreadsheets, image files, sound files, emails, SMS text
`erated but not one between “finance' and “sugar; in this case,
`if a user has typed in the word “finance' followed by the
`messages etc.
`As noted above, a “word,” for the purposes of this inven
`characters "su, then, based on the association, the invention
`will suggest 'summary as the appropriate word completion
`tion, can be considered to be more than a string of alphabetic
`rather than “sugar.” Here, the word “finance' has provided the
`characters, it may include numeric and other symbols as well.
`context that suggests the appropriate completion; if instead
`Broadly, the invention provides contextual completion of
`character strings, where a character string includes not only 55
`the user had typed “two spoons of and then the characters
`alphabetic words but any other discrete collection of charac
`“Su” and if an association had been generated between,
`“spoon' and “sugar rather than “spoon' and “summary”
`ters, symbols, or stroke based pictographs or ideograms, for
`then the invention would suggest "Sugar as the contextually
`example, those used in languages like Chinese, Korean and
`appropriate completion. As more words are entered in the
`Japanese, and thus can benefit from use of the present inven
`tion. Thus, although for simplicity the term “word is used in 60
`document, the contextual associations become richer.
`The invention permits the use of different techniques for
`the following discussion, it should be understood to encom
`actually creating the associations. As a result, for purposes of
`pass any discrete collection of characters, symbols or other
`understanding, two fully automated example techniques are
`stroke based representations of communicative concepts,
`thoughts or ideas. Thus, the present invention, although
`described below with the understanding that semi-automatic
`implementation techniques are considered to be literally the
`described with reference to English, is independent of any 65
`particular language. It can be used for phonetic, pictographic
`same as the fully automated ones. The automatic or manual
`or ideographic languages when the characters, pictograms or
`nature of a technique is, in most respects, independent of the
`
`50
`
`
`
`Case 1:20-cv-23178-WPD Document 1-1 Entered on FLSD Docket 07/31/2020 Page 12 of 18
`
`10
`
`15
`
`35
`
`US 7,991,784 B2
`6
`5
`as follows. The device accepts character input from the user
`invention because it relates more to the ease of processing
`large amounts amount of text, not the technique itself.
`until a specified threshold number of characters has been
`The general approach is illustrated, in simplified overview,
`entered (250). Using the entered characters, relevant word
`in FIG. 1 with respect to a single document. The approach
`lists are identified (255). Due to the processing, the words
`begins with a device Such as a personal digital assistant, cell
`within these identified lists are deemed contextually related
`and thus, words in the identified lists having a corresponding
`phone, computeror other device (100,110, 120 or 130) which
`initial character string matching the entered characters are
`has documents (140) stored in its memory. These documents
`are used to create associations (150) between pairs of words
`chosen (260) to be offered for selection by the user to com
`plete the character entry (265).
`or character strings within the document and use these asso
`ciations to suggest word or character string completions (170)
`The above referenced process can be fully understood by
`way of the following simplified example. To assess the simi
`to the user entering text (160) in a document. The associations
`larity or dissimilarity of documents, one way of thinking of a
`among the words or strings may be static or dynamic. With
`implementations incorporating a more dynamic approach, as
`document that contains one or more words is as a bag or
`multiset of words. A bag or multiset is like an ordinary set in
`the user adds to a document or creates more documents on the
`device, the associations are recomputed or Suitably aug
`mathematics, a collection, except that it can contain multiple
`occurrences of the same element. For example, book, cape,
`mented. This will alter the set of associations by either adding
`pencil, book is a bag containing four words of which the
`new associations, deleting existing associations or both.
`Thus, with implementations of the automatic contextual word
`word “book' appears twice. The order of occurrence of ele
`completion system having this 'dynamic aspect, the system
`ments in a bag does not matter, and could equally be written
`as book, book, pencil, cape. Also, any bag can be converted
`evolves as the user adds to or creates new documents and thus
`generally improves with use. Extensions to these implemen
`to a set just by dropping multiple occurrences of the same
`element. Thus, the example bag above, when converted to a
`tations further allow the device to impliedly track the user's
`set, would be book, cape, pencil. To create the bag or
`evolving interests.
`Associations between words can be computed in a variety
`multiset, the contents of a document with the exception of
`of ways and, as non-limiting examples, two alternative auto
`numbers which are a special case are stripped of all internal
`25
`structure (e.g. Syntactic structure, punctuation etc.) including
`matic methods of doing so are described.
`all nonlexical items like images, sounds etc. The resulting
`In the first method, the first step is to assess the similarity of
`stripped document would be a bag or multiset of words as
`words within one document or from one document to other
`described above which may also include numbers and in
`documents that may exist on the user's device. In this method,
`which some words may occur multiple times. For a user who
`contextual associations are arrived at by grouping documents
`30
`based on similarity and creating lists of words that are com
`has a device with a number of stored documents, each perti
`nent document is similarly stripped down to form bags and
`monto each group. There are many known methods to assess
`document similarity including the Jaccard, Dice or cosine
`the mathematical union of these bags can be taken to form a
`larger bag.
`coefficients and the K-vec methods. For purposes of expla
`As a side note, optionally, a certain class of words, typically
`nation, one such example similarity assessment method,
`based on treating documents as vectors in a multidimensional
`called “stop words.” are removed from such document-de
`space, is used, it being understood that, depending on the
`rived bags. Stop words are words like “the “of” “in” etc. and
`particular implementation, other similarity assessment meth
`are removable because they usually are not very informative
`about the content of the document. Stop words, if removed,
`ods can be used in addition to, or instead of those used in the
`examples described herein for practical reasons.
`can be removed from the bags either before or after a math
`This example method is outlined in the flowcharts in FIGS.
`ematical union of the bags is made, as the end result is the
`same. Typically stop words are identified in a list which can
`2a and 2b. The method starts by creating a list of all the
`be used for the exclusion process. Since the stop word
`pertinent documents (200) on the device. From this list of
`pertinent documents a list of unique words is created (205).
`removal process is well known it is not described herein. In
`addition, in some implementations where a stop word list is
`An optional step, is to remove stop words from the word list
`45
`(210). Stop words are described in greater detail below but
`used, the list may be editable so that additional words can be
`defined as “stop words. For example, otherwise non-trivial
`include words like “the “at” and “in” For each word in the
`words that are trivial in the particular context because they
`word list, the number of times it occurs in each document is
`occur too often in that context (e.g. words like “shares' in
`counted (215) and this number is stored in a matrix of docu
`stock related government filings).
`ments vs. words (220). This matrix is used to calculate a
`By way of simplified example (FIG. 3), if the user has just
`similarity value (225) for each possible pair of documents in
`the document list. The similarity value for each document pair
`two documents on a device: “d 1’ (306) made up of “an apple,
`apple cider and an orange' and “d2 (308) made up of “a
`is compared to a threshold value (230) and those document
`paper apple' then, each corresponding bag is apple, cider,
`pairs whose similarity value falls below the specified thresh
`apple, orange and paper, apple. Their union is the larger
`old value are discarded (235). The remaining document pairs
`bag apple, cider, apple, orange, paper, apple and a set for
`are used to group documents such that the similarity value of
`the bag would be apple, cider, orange, paper.
`each possible pair in each group is above a specified threshold
`value (240). Lists of unique words from each group of similar
`A matrix (300) is then formed with for example, each
`documents are created (245). Words within each of these lists
`element in the set of words derived from the documents on the
`are contextually related. The steps of the example method to
`user's device listed along the columns (302) of the matrix and
`this point may be carried out independently of user text entry
`each document itself (symbolized in Some way) along the
`or, in implementations where the dynamic aspects of the
`rows (304) of the matrix. In the cell corresponding to the
`invention are utilized, carried out simultaneously with user
`intersection of a document 'd' with a word “w, the number
`text entry, so that the contextual associations are updated as
`of times “w” occurs in “d” is entered (318). For the simple
`example above, as shown in FIG.3, for the cell corresponding
`the user enters more words into the device.
`Once at least an initial set of contextual associations exists,
`to the intersection of the row for the first document “d 1’ and
`it can be used at Some point thereafter. The approach to use is
`the column for the word “apple” a “2 (318) is entered since
`
`40
`
`50
`
`55
`
`60
`
`65
`
`
`
`Case 1:20-cv-23178-WPD Document 1-1 Entered on FLSD Docket 07/31/2020 Page 13 of 18
`
`US 7,991,784 B2
`8
`7
`approaches, such as Jaccard, Dice or cosine coefficients, the
`it occurs twice in document “d 1.” This occurrence frequency
`K-vec methods or some other method, is a judgment of simi
`information is obtained from the document bags. If a word
`larity and dissimilarity of document pairs in the pertinent
`does not occur in aparticular documentatall, a Zero is entered
`in the corresponding cell. Note that depending upon the num
`document collection or set.
`This similarity judgment is then used to form groups of
`ber of documents and the number of words, the size of the
`matrix can be exceedingly large. Moreover, there is no sig
`documents, each group of which contains only documents
`that are Sufficiently similar to one another when compared in
`nificance to whether rows list documents and columns list
`a pair wise fashion. Note that the relationship of similarity is
`words or vice versa—the contents of the rows and columns
`reflexive and symmetric, but it is not necessarily transitive.
`could be exchanged without affecting the invention.
`This means that the groups may not be disjoint i.e. the same
`Once the matrix is created, each document is treated as a
`10
`document may belong to more than one group, particularly in
`vector in a multidimensional Euclidean space, with the num
`implementations where a document need not be sufficiently
`ber of dimensions being the number of words or columns of
`the matrix. Thus, the simplified example of FIG. 3, each of
`similar to every other member of the group, but only some
`specified portion thereof. In other words, as a result of the
`documents d1 and d2 can be treated as a four dimensional
`grouping, two or more groups will be formed wherein each
`vector since there are four elements in the corresponding set
`apple, cider, orange, paper. Notably, by using this
`document is meaningfully similar to at least Some specified
`portion of the other documents in that group. In general, each
`approach, the words can also be listed in