`Wical
`
`[54] CONCEPT KNOWLEDGE BASE SEARCH
`AND RETRIEVAL SYSTEM
`
`[75]
`
`Inventor: Kelly Wical, San Carlos, Calif.
`
`[73] Assignee: Oracle Corporation, Redwood Shores,
`Calif.
`
`[21] Appl. No.: 08/861,983
`
`[22] Filed:
`
`May 21, 1997
`
`Int. Cl? ...................................................... G06F 17/30
`[51]
`[52] U.S. Cl. ................................................... 707/5; 706/50
`[58] Field of Search ................................ 706/50, 61, 934;
`707/5
`
`[56]
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`............................. 706/11
`4/1992 Lanier et a!.
`5,103,498
`5,159,667 10/1992 Borrey eta!. ........................... 707/500
`5,167,011 11/1992 Priest ........................................ 706/62
`5,226,111
`7/1993 Black eta!. .............................. 706/50
`5,257,185 10/1993 Farley eta!. ............................ 707/100
`5,276,616
`1!1994 Kuga et a!. ............................... 704/10
`5,325,298
`6/1994 Gallant ........................................ 707/5
`5,369,763 11/1994 Biles ........................................... 707/3
`5,442,780
`8/1995 Takanashi eta!. .......................... 707/1
`5,555,408
`9/1996 Fujisawa et a!.
`........................... 707/5
`5,598,557
`1!1997 Doner et a!. ................................ 707/5
`5,615,112
`3/1997 Liusheng et a!. ......................... 706/50
`4/1997 Bartell eta!. ........................... 345/440
`5,625,767
`5,630,117
`5/1997 Oren eta!. .............................. 707/100
`5,630,125
`5/1997 Zellweger ............................... 707/103
`5,634,051
`5/1997 Thomson .................................... 707/5
`8/1997 Borgida eta!. ........................... 706/50
`5,659,724
`2/1998 McGuinness eta!. .................... 706/50
`5,720,008
`
`01HER PUBLICATIONS
`
`Cox, John '"Text-Analysis' Server to Simplify Queries",
`Communications Week, Apr. 19, 1993.
`
`"Verity finds the Topic," The Seybold Report on Publishing
`Systems, vol. 19(4), Oct. 1989.
`
`111111
`
`1111111111111111111111111111111111111111111111111111111111111
`US006038560A
`[11] Patent Number:
`[45] Date of Patent:
`
`6,038,560
`Mar.14,2000
`
`D.R. Cutting, et al., "Constant interaction-time scatter/
`gather browsing of very large document collections," Proc.
`Sixteenth annual international ACM SIGIR Conf. on
`Research and Development in Information Retrieval, pp.
`126-134, Dec. 1993.
`R.B. Allen "An Interface for Navigating Clustered Docu(cid:173)
`ment Sets Returned by Queries," Proc. of the Conf. on
`Organizational Computing Systems, pp. 166-171, Dec.
`1993.
`E.D. Liddy, et al., "Text Categorization for Multiple Users
`Based on Semantic Features from a Machine-Radable Dic(cid:173)
`tionary," ACM TRansactions on Information Systems, vol.
`12(3), pp. 278-295, Jul. 1994.
`R.B. Allen, "Two Digital Library Interfaces that Exploit
`Hierarchical Structure," DAGS95: Electronic Publishing
`and the Information Superhighway, (10 pages), May 1995.
`A. Celentano, et al., "Knowledge-based Document
`Retrieval in Office Environments: the Kabiria System,"
`ACM TRans. on Information Systems, vol. 13(30, pp.
`237-268, Jul. 1995.
`
`(List continued on next page.)
`
`Primary Examiner-Robert W. Downs
`Attorney, Agent, or Firm---Fliesler, Dubb, Meyer & Lovejoy
`LLP
`
`[57]
`
`ABSTRACT
`
`A knowledge base search and retrieval system, which
`includes factual knowledge base queries and concept knowl(cid:173)
`edge base queries, is disclosed. A knowledge base stores
`associations among terminology/categories that have a
`lexical, semantical or usage association. Document theme
`vectors identify the content of documents through themes as
`well as through classification of the documents in categories
`that reflects what the documents are primarily about. The
`factual knowledge base queries identify, in response to an
`input query, documents relevant to the input query through
`expansion of the query terms as well as through expansion
`of themes. The concept knowledge base query does not
`identify specific documents in response to a query, but
`specifies terminology that identifies the potential existence
`of documents in a particular area.
`
`29 Claims, 21 Drawing Sheets
`
`IBM-1006
`Page 1 of 41
`
`IPR2016-00019
`Petitioners Old Republic General Ins. Group, Inc., et al. Ex. 1034, p.1
`
`
`
`6,038,560
`Page 2
`
`01HER PUBLICATIONS
`
`M. Iwayama and T. Tokunaga, "Cluster-based Text Catego(cid:173)
`rization: a Comparison of Category Search Strategies," Proc.
`18th Annual Int'l. ACM SIGIR Conf. on Research and
`DEvelopment in Information Retrieval, pp. 273-280, Dec.
`1995.
`
`G. Salton, et al., "Automatic Text Decomposition Using Text
`Segments and Text Themes," Proc. Seventh ACM Conf. on
`Hypertext '96, pp. 53-65, Dec. 1996.
`P. Pirolli, et al., "Scatter/Gather Browsing Communicates
`the Topic Structure of a Very Large Text Collection," Conf.
`Proc. on Human Factors in Computing Systems, pp.
`213-220, Dec. 1996.
`
`IBM-1006
`Page 2 of 41
`
`IPR2016-00019
`Petitioners Old Republic General Ins. Group, Inc., et al. Ex. 1034, p.2
`
`
`
`.._.,....._ 130
`
`::I.
`::I.
`
`Document
`
`...
`
`+I
`
`Content
`Processing
`System
`11Q
`
`100
`
`Knowledge
`Scoring
`140
`
`•
`•
`
`Document(s)
`Theme
`Vector
`160
`
`Inference
`Processing
`145
`
`H Learning
`
`Processing
`165
`
`FIG. 1
`
`Knowledge
`Catalog
`150
`
`..
`
`Knowledge
`Base
`155
`T
`
`Query
`Processing
`175
`
`To
`Screen
`Module
`
`I
`
`• • • User
`
`Query
`
`d •
`\Jl
`•
`~
`~ ......
`
`~ = ......
`
`~
`~ :-:
`'"""'
`~,J;;..
`
`N c c c
`
`'JJ. =-~
`~ .....
`'"""' 0 ......,
`
`N
`'"""'
`
`0\
`....
`8
`00
`....
`Ul
`0\ =
`
`IBM-1006
`Page 3 of 41
`
`IPR2016-00019
`Petitioners Old Republic General Ins. Group, Inc., et al. Ex. 1034, p.3
`
`
`
`U.S. Patent
`
`Mar.14,2000
`
`Sheet 2 of 21
`
`6,038,560
`
`Mode
`
`User
`uery
`Q
`
`Query Processing 175
`Concept Query
`
`.. Processing
`..
`
`200
`n
`
`Query Term
`-
`.... Processing
`205
`
`1 - Retrieval
`1 - Information
`,
`Factual Query
`Processing
`210
`
`•
`
`-
`
`To Screen
`Module ,,
`230
`
`...
`•
`
`Document
`Signatures
`160
`
`Knowledge
`Base
`155
`
`FIG. 2
`
`IBM-1006
`Page 4 of 41
`
`IPR2016-00019
`Petitioners Old Republic General Ins. Group, Inc., et al. Ex. 1034, p.4
`
`
`
`U.S. Patent
`
`Mar.14,2000
`
`Sheet 3 of 21
`
`6,038,560
`
`FIG. 3
`
`Query:
`
`Legal, Betting, China
`
`610
`
`Government, Casino, Asia --- 620
`
`Gaming Industry (2)
`
`625
`
`Patents, Slot Machines, Japan
`
`630
`
`~-----~Patent Law (4)
`
`635
`
`L----~ Gaming Industry (2)
`
`640
`
`Crime, Wagering, China
`
`645
`
`.,.__.._~Insects (1)
`
`650
`
`L...--.-~~ Conservation - Ecology (2) ---655
`
`IBM-1006
`Page 5 of 41
`
`IPR2016-00019
`Petitioners Old Republic General Ins. Group, Inc., et al. Ex. 1034, p.5
`
`
`
`FIG. 4
`
`Geography
`
`Leisure and Recreation
`
`Political
`Geography
`
`(Marker)
`
`Europe
`
`8
`
`Western
`Europe
`
`Arts & Entertainment
`
`Tourism
`
`Visual
`Arts
`
`10
`
`----1
`i-Eiffel
`1
`1 Tower
`:
`~------
`
`d •
`\Jl
`•
`~
`~ ......
`~ = ......
`
`~
`~ :-:
`'"""'
`~,J;;..
`
`N c c c
`
`'JJ. =(cid:173)~
`~ .....
`,J;;..
`0 ......,
`N
`'"""'
`
`0\
`....
`8
`00
`....
`Ul
`0\ =
`
`IBM-1006
`Page 6 of 41
`
`IPR2016-00019
`Petitioners Old Republic General Ins. Group, Inc., et al. Ex. 1034, p.6
`
`
`
`U.S. Patent
`
`Mar.14,2000
`
`Sheet 5 of 21
`
`6,038,560
`
`FIG. 5
`
`Generate senses and distinct parts from query
`
`Generate query term strengths
`
`Expand query terms using knowledge base
`
`Select categories in knowledge base
`identified by expanded query terms
`
`Select documents classified for those categories
`
`Select themes from documents
`
`Sort and compile information by theme
`
`List themes in order of strongest themes
`
`Select top themes from additional documents
`based on predetermined criteria
`
`Organize themes in groups
`
`Order theme groups
`
`Order documents within groups
`
`Display groups and associated document names
`
`Display categories classified for documents
`
`400
`
`402
`
`405
`
`410
`
`420
`
`430
`
`440
`
`450
`
`460
`
`465
`
`470
`
`475
`
`480
`
`485
`
`IBM-1006
`Page 7 of 41
`
`IPR2016-00019
`Petitioners Old Republic General Ins. Group, Inc., et al. Ex. 1034, p.7
`
`
`
`U.S. Patent
`
`Mar.14,2000
`
`Sheet 6 of 21
`
`6,038,560
`
`N
`Q)
`-o
`0 z
`
`§I
`
`C\J
`<(
`Q)
`""0
`
`0 z
`
`-
`
`/
`
`~
`
`<(
`a>
`""'0
`0 z
`
`0
`'(t<(
`>..E
`,_ ..__
`(])(l)
`::If-a
`I
`I
`I
`
`/
`
`/
`
`/
`
`/
`
`/
`
`>< Q)
`-o
`0 z
`
`1.()
`
`I(cid:173)
`(])
`""'0
`
`0 z
`
`U)
`Q)
`-o
`0 z
`
`c.o
`<.9
`LL
`
`co
`
`co
`Q)
`""'0
`
`0 z
`
`u
`a>
`""'0
`0 z
`
`IBM-1006
`Page 8 of 41
`
`IPR2016-00019
`Petitioners Old Republic General Ins. Group, Inc., et al. Ex. 1034, p.8
`
`
`
`U.S. Patent
`
`Mar.14,2000
`
`Sheet 7 of 21
`
`6,038,560
`
`FIG. 7
`
`Generate applicable senses and forms for
`distinctive query terms
`
`--500
`
`Generate strengths for query terms
`
`Map query terms to knowledge base
`
`Expand query terms through knowledge base
`
`Select theme set for expanded query terms
`
`510
`
`520
`
`530
`
`540
`
`Expand theme set through knowledge base
`
`550
`
`Select common denominators of expanded themes
`among expanded query terms to satisfy input query
`
`Relevance rank query terms, expanded query
`terms, and themes
`
`Display query response
`
`560
`
`570
`
`580
`
`IBM-1006
`Page 9 of 41
`
`IPR2016-00019
`Petitioners Old Republic General Ins. Group, Inc., et al. Ex. 1034, p.9
`
`
`
`U.S. Patent
`
`Mar.14,2000
`
`Sheet 8 of 21
`
`6,038,560
`
`FIG. SA
`
`-
`
`Social Sciences
`
`-
`
`....-
`
`History
`
`. Ancient History
`..
`
`.
`--
`
`Ancient Rome
`
`-
`..
`
`~
`
`Anthropology
`
`- Customs and Practices
`..
`
`- Kinship and Marriage
`
`...
`
`-
`--..
`
`-
`
`-
`
`-
`-
`
`Peoples
`
`Races of Peoples
`
`Linguistics
`
`Languages
`
`IBM-1006
`Page 10 of 41
`
`IPR2016-00019
`Petitioners Old Republic General Ins. Group, Inc., et al. Ex. 1034, p.10
`
`
`
`U.S. Patent
`FIG. 88
`
`Mar.14,2000
`
`Sheet 9 of 21
`
`6,038,560
`
`Food and Agriculture
`
`~----~ Cereal and Grains
`
`Condiments
`
`Dairy Products
`
`Drinking and Dining
`
`Beers
`
`Liquors
`
`Liqueurs
`
`Wines
`
`Meats
`
`Beef
`
`Lamb
`
`t----.t Pate and Sausages
`
`Seafood
`
`Pastas
`
`Prepared Foods
`
`Desserts
`
`Cakes
`
`Cookies
`
`Pastries
`
`Sauces
`
`Soups and Stews
`
`IBM-1006
`Page 11 of 41
`
`IPR2016-00019
`Petitioners Old Republic General Ins. Group, Inc., et al. Ex. 1034, p.11
`
`
`
`U.S. Patent
`
`Mar.14,2000
`
`Sheet 10 of 21
`
`6,038,560
`
`FIG. 8C
`
`Geography
`
`Political Geography
`
`Europe
`
`Western Europe
`
`Austria
`
`Germany
`
`France
`
`Iberia
`
`Spain
`
`Ireland
`
`Italy
`
`Sweden
`
`Netherlands
`
`United Kingdom
`
`England
`
`Eastern Europe
`
`Greece
`
`IBM-1006
`Page 12 of 41
`
`IPR2016-00019
`Petitioners Old Republic General Ins. Group, Inc., et al. Ex. 1034, p.12
`
`
`
`FIG. 9A
`
`Leisure and Recreation
`
`Social Sciences
`
`Arts and Entertainment
`
`Performing Arts
`
`7
`
`Dance
`
`Ballet
`
`Folk Dance
`
`Marker
`
`8
`
`8
`
`9
`
`Anthropology
`
`Customs and Practices
`
`History
`
`Ancient
`History
`
`Festivals
`
`Ancient
`Rome
`
`I
`
`National
`
`Religious
`Festivals
`
`d •
`\Jl
`•
`~
`~ ......
`~ = ......
`
`~
`~ :-;
`'"""'
`~,J;;..
`
`N c c c
`
`'JJ. =-~
`~ .....
`'"""'
`'"""' 0 ......,
`
`N
`'"""'
`
`0\
`....
`8
`00
`....
`Ul
`0\ =
`
`IBM-1006
`Page 13 of 41
`
`IPR2016-00019
`Petitioners Old Republic General Ins. Group, Inc., et al. Ex. 1034, p.13
`
`
`
`FIG. 98
`
`Food and Agriculture
`
`Drinking and Dining
`
`Occupations
`
`Chefs
`
`French Chefs
`
`French Chefs
`
`Crepes
`Tripe Sausages
`Chicken Cordon Bleu
`
`8
`
`French Cheeses
`
`(Theme
`Strength = 5}
`
`Dairy Products
`
`Cheeses
`
`Brie
`
`d •
`\Jl
`•
`~
`~ ......
`~ = ......
`
`~
`~ :-:
`'"""'
`~,J;;..
`N
`g
`{Theme
`Strength = 50} =
`
`'JJ. =(cid:173)~
`~ .....
`'"""' N
`0 ......,
`N
`'"""'
`
`0\
`....
`8
`00
`....
`Ul
`0\ =
`
`IBM-1006
`Page 14 of 41
`
`IPR2016-00019
`Petitioners Old Republic General Ins. Group, Inc., et al. Ex. 1034, p.14
`
`
`
`U.S. Patent
`
`Mar.14,2000
`
`Sheet 13 of 21
`
`6,038,560
`
`FIG. 9C
`I
`----1 I) Festivals, Foods, Western Europe
`rl A) Festivals, Drinking and Dining, Germany I
`
`1) Beer
`2) Knockwurst
`3) Oktoberfest
`4) Stein
`5) Sauerkraut
`
`~ B) Festivals, Drinking and Dining, France
`
`I
`
`1) Mardi Gras
`2) Crepes
`3) Calembert
`4) Croissant
`5) Brie
`6) Tripe Sausage
`7) Onion Soup
`8) Chicken Cordon Bleu
`
`----1 II) Festivals, Food
`
`~ A) Ancient Rome, Wines
`
`1) Wine
`2) Grapes
`L..--~- 3) Fermentation
`4) Barrels
`5) Vineyards
`
`I
`
`I
`
`IBM-1006
`Page 15 of 41
`
`IPR2016-00019
`Petitioners Old Republic General Ins. Group, Inc., et al. Ex. 1034, p.15
`
`
`
`U.S. Patent
`
`Mar.14,2000
`
`Sheet 14 of 21
`
`6,038,560
`
`Internet
`
`Virtua[ C[er~
`in.. •
`Concept Search
`~- Knowledge Search
`List Topics
`Help
`
`Found 15 Documents and 5 Categories
`****Computer Networking (15)
`*
`Internet CreditBureau, Incorporated (0)
`Internet Fax SeNer (0)
`*
`*
`Internet Productions, Incorporated (0)
`
`* Internet Newbies (0)
`FIG. 1 OA
`
`IBM-1006
`Page 16 of 41
`
`IPR2016-00019
`Petitioners Old Republic General Ins. Group, Inc., et al. Ex. 1034, p.16
`
`
`
`U.S. Patent
`
`Mar.14,2000
`
`Sheet 15 of 21
`
`6,038,560
`
`FIG. 108
`
`Science and Technology (2380) I Communications (279) I Telecommunications Industry (90)
`
`Computer Networking(15)
`Electronic Mail ( 1 )
`GE Networks (1)
`Internet Technology (2)
`Messaging (1)
`NBC Networks (3)
`Networks (1)
`
`[J Documents About Computer Networking and Also:
`a Colorado
`[J 7/01/88 Business Brief: Noted...
`LJ 8/19/88 The Americas: Mexico's...
`B Mexican
`a NBC Officials
`[J 7/05/88 NBC Talks With European...
`a State Agencies
`D 1 0/07/88 Three Companies Win $180 ...
`a Television and Radio D 8/09/88 NBC-TV Trying to Beat...
`
`12§1
`~
`- New
`
`+ 112§1
`
`See Also:
`Computer Hardware Industry (56)
`Computer Industry (256)
`Computer Standards (1)
`Information Technology (9)
`Mathematics (4)
`
`IBM-1006
`Page 17 of 41
`
`IPR2016-00019
`Petitioners Old Republic General Ins. Group, Inc., et al. Ex. 1034, p.17
`
`
`
`U.S. Patent
`
`Mar.14,2000
`
`Internet
`
`Sheet 16 of 21
`
`6,038,560
`FIG. 11A-1
`
`o/irtua[ C[erft
`in. ..
`Concept Search
`~- Knowledge Search
`List Topics
`Help
`
`Stocks
`
`Found 152 Documents and 64 Categories
`(42)
`"*"*"*Commerce and Trade
`"*"* Companies
`(.11)
`(ill
`"*"* Financial Investments
`"*"* Investors
`(§_)
`"*"* Portfolios
`Q)
`"* Pharmaceutical Industry
`(§_)
`"* Magazines
`(1)
`Q)
`"* Automotive Industry
`Q)
`"* Mineralogy
`"* Computer Software Industry
`(1)
`(f)
`"* Stocks and Bonds
`"* Food and Drink Industry
`(1)
`(.1)
`"* Petroleum Products Industry
`"*Television and Radio
`(1)
`"* New York Life Insurance Company CD
`"* McGraw-Hill. Incorporated
`(.1)
`"* Banking Industry
`(2)
`*
`Industrial Goods Manufacturing (2)
`*Texaco, Incorporated
`(.1)
`(2)
`(.1)
`(.1)
`(.1)
`(2)
`(.1)
`(2)
`(f)
`
`* Insurance Industry
`* Lawyers
`* CitiCorp
`* Preferred Stocks
`* Computer Hardware Industry
`
`"* Walt Disney Company
`
`*"Diversified Companies
`*Buys
`
`IBM-1006
`Page 18 of 41
`
`IPR2016-00019
`Petitioners Old Republic General Ins. Group, Inc., et al. Ex. 1034, p.18
`
`
`
`U.S. Patent
`
`Mar.14,2000
`
`Sheet 17 of 21
`
`6,038,560
`
`FIG. 11 A-2
`"* Dun & Bradstreet Corporation
`
`* Health-care Companies
`* Brokers
`* Personal Finance
`*Lawsuits
`*Leveraged Buy-outs
`
`(1)
`(1)
`(1)
`(1)
`(1)
`(1)
`(1)
`(2)
`(1)
`(1)
`(1)
`(1)
`(1)
`(1)
`(1)
`(1)
`(1)
`(1)
`(1)
`(1)
`(1)
`*Airlines
`(1)
`*Cinema
`(1)
`*Construction Industry
`(1)
`* Automotive Service and Repair
`(1)
`* Retail Trade Industry
`(1)
`* Dow Chemical Company
`(1)
`*Real Estate
`(1)
`* Consumer Electronics
`(.1)
`* Chemical Industry
`(.1)
`* Convenience Products Businesses (1)
`(.1)
`*American Brands. Incorporated (1)
`* Motorola, Incorporated
`(.1)
`* Package Delivery Industry
`(.1)
`* Masco Corporation
`(.1)
`
`* Computer Industry
`* Aviation
`* Plastic and Rubber
`
`*Drugs
`* Clothing
`
`"* ltel Corporation
`* Hard Sciences
`* Rail Transportation
`"* Financial Lending
`"* Chrysler Corporation
`* Gillette
`* Brush Wellman. Incorporated
`* Taxes and Tariffs
`* Manufacturing
`* Japanese Companies
`
`* Shares Outstanding
`
`IBM-1006
`Page 19 of 41
`
`IPR2016-00019
`Petitioners Old Republic General Ins. Group, Inc., et al. Ex. 1034, p.19
`
`
`
`U.S. Patent
`
`Mar.14,2000
`
`Sheet 18 of 21
`
`6,038,560
`
`FIG. 118
`
`Business and Economics (5438) I Business and Industry (2889) I Corporate Practices (263)
`
`Portfolios (4)
`
`[J Documents About Portfolios and Also:
`~ Commerce and Trade Ll11/16/88 Money Managers With ...
`a Interest Rates
`lr@
`Ll 8/24/88 Your Money Matters: Many...
`D 10/10/88 These Stocks Are a...
`~
`~ Investors
`~Securities
`[j 7/14/88 Fannie Mae Net Rose 97%... ~
`
`IBM-1006
`Page 20 of 41
`
`IPR2016-00019
`Petitioners Old Republic General Ins. Group, Inc., et al. Ex. 1034, p.20
`
`
`
`U.S. Patent
`
`Mar.14,2000
`
`Sheet 19 of 21
`
`6,038,560
`
`FIG. 12
`
`1/irtua[ Cferf(_
`i.a..
`Subject Location
`~- Knowledge Search
`List Topics
`Help
`
`President George Herbert Walker Bush
`Appears in 28 Docs/17 Categories:
`
`* * President George Herbert Walker Bush
`i:t Republican Party
`*Capital Gains Taxes
`i:t White House
`*President Ronald Wilson Reagan
`*Senate
`*Democratic Party
`*
`Iran Contra Affair
`i:t Congress
`i:t Job Actions
`* Campaigns
`i:t Meetings
`*Tax Rates
`i:t Presidential Candidates
`i:t Senators
`i:t Florida Governor
`i:t AIDS- Acquired Immune Deficiency Syndrome
`
`(7)
`(6)
`(1)
`( 1)
`(1)
`(1)
`(1)
`(1)
`(1)
`( 1)
`( 1 )
`( 1)
`(1)
`( 1)
`( 1)
`( 1)
`(1)
`
`IBM-1006
`Page 21 of 41
`
`IPR2016-00019
`Petitioners Old Republic General Ins. Group, Inc., et al. Ex. 1034, p.21
`
`
`
`L.---. 130
`
`J.
`
`Document
`
`'-
`
`--
`..
`
`Linguistic
`Engine
`700
`
`J~
`
`..
`....
`
`Structured Output
`
`Contextual
`Tags
`720
`
`Thematic
`Tags
`730
`
`Stylistic
`Tags
`735
`
`Content
`carrying Words
`737
`
`FIG. 13
`
`Morphology I Section 770
`
`Knowledge
`Catalog
`150
`
`Lexicon
`760
`
`f..---. 710
`
`-- Knowledge
`,..
`
`Catalog
`Processor
`740
`
`...
`
`Theme
`Vector
`
`750
`
`,,
`... ..
`.. Processor
`..
`'
`
`L-+ Content
`Indexing
`...
`..
`770
`-
`
`d •
`\Jl
`•
`~
`~ ......
`~ = ......
`
`~
`~ :-:
`'"""'
`~,J;;..
`
`N c c c
`
`'JJ. =(cid:173)~
`~ .....
`N c
`0 ......,
`N
`'"""'
`
`0\
`....
`8
`00
`....
`Ul
`0\ =
`
`..
`....
`
`Document(s)
`Theme Vector
`160
`
`J
`
`...
`
`Knowledge
`Base
`155
`
`j
`
`IBM-1006
`Page 22 of 41
`
`IPR2016-00019
`Petitioners Old Republic General Ins. Group, Inc., et al. Ex. 1034, p.22
`
`
`
`Processor
`Unit
`1005
`
`l
`
`l
`
`Peripheral
`Device(s)
`1030
`
`I
`
`L ,
`
`l
`
`Mass
`Storage
`Device
`1020
`
`Memory
`1010
`
`l
`
`l
`
`Input
`Control
`Device(s)
`1070
`
`FIG. 14
`
`1000
`
`1025
`)
`
`:
`
`Portable
`Storage
`Medium Drive
`1040
`
`d •
`\Jl
`•
`~
`~ ......
`~ = ......
`
`~
`~ :-:
`'"""'
`~,J;;..
`
`N c c c
`
`'JJ. =(cid:173)~
`~ .....
`N
`
`'"""' 0 ......,
`
`N
`'"""'
`
`0\
`....
`8
`00
`....
`Ul
`0\ =
`
`I
`7
`
`l
`
`Graphics
`Subsystem
`1050
`
`•
`
`Output
`Display
`1060
`
`IBM-1006
`Page 23 of 41
`
`IPR2016-00019
`Petitioners Old Republic General Ins. Group, Inc., et al. Ex. 1034, p.23
`
`
`
`6,038,560
`
`1
`CONCEPT KNOWLEDGE BASE SEARCH
`AND RETRIEVAL SYSTEM
`
`BACKGROUND OF THE INVENTION
`
`1. Field of the Invention
`The present invention is directed toward the field of
`search and retrieval systems, and more particularly to a
`knowledge base search and retrieval system.
`2. Art Background
`In general, search and retrieval systems permit a user to
`locate specific information from a repository of documents,
`such as articles, books, periodicals, etc. For example, a
`search and retrieval system may be utilized to locate specific
`medical journals from a large database that consists of a
`medical library. Typically, to locate the desired information,
`a user enters a "search string" or "search query." The search
`query consists of one or more words, or terms, composed by
`the user. In response to the query, some prior art search and
`retrieval systems match words of the search query to words
`in the repository of information to locate information.
`Additionally, boolean prior art search and retrieval systems
`permit a user to specify a logic function to connect the
`search terms, such as "stocks AND bonds", or "stocks OR
`bonds."
`In response to a query, a word match based search and
`retrieval system parses the repository of information to
`locate a match by comparing the words of the query to words
`of documents in the repository. If there is an exact word
`match between the query and words of one or more 30
`documents, then the search and retrieval system identifies
`those documents. These types of prior art search and
`retrieval systems are thus extremely sensitive to the words
`selected for the query.
`The terminology used in a query reflects each individual 35
`user's view of the topic for which information is sought.
`Thus, different users may select different query terms to
`search for the same information. For example, to locate
`information about financial securities, a first user may com(cid:173)
`pose the query "stocks and bonds", and a second user may 40
`compose the query "equity and debt." For these two different
`queries, a word match based search and retrieval system
`would identify two different sets of documents (i.e., the first
`query would return all documents that have the words stocks
`and bonds and the second query would return all documents 45
`that contain the words equity and debt). Although both of
`these query terms seek to locate the same information, with
`a word search and retrieval system, different terms in the
`query generate different responses. Thus, the contents of the
`query, and subsequently the response from word based 50
`search and retrieval systems, is highly dependent upon how
`the user expresses the query term. Consequently, it is desir(cid:173)
`able to construct a search and retrieval system that is not
`highly dependent upon the exact words chosen for the query,
`but one that generates a similar response for different queries 55
`that have similar meanings.
`Prior art search and retrieval systems do not draw infer(cid:173)
`ences about the true content of documents available. If the
`search and retrieval system merely compares words in a
`document with words in a query, then the content of a
`document is not really being compared with the subject
`matter identified by the query term. For example, a restau(cid:173)
`rant review article may include words such as food quality,
`food presentation, service, etc., without expressly using the
`word restaurant because the topic, restaurant, may be 65
`inferred from the context of the article (e.g., the restaurant
`review article appeared in the dining section of a newspaper
`
`2
`or travel magazine). For this example, a word comparison
`between a query term "restaurant" and the restaurant review
`article may not generate a match. Although the main topic of
`the restaurant review article is "restaurant", the article would
`5 not be identified. Accordingly, it is desirable to infer topics
`from documents in a search and retrieval system in order to
`truly compare the content of documents with a query term.
`Some words in the English language connote more than a
`single meaning. These words have different senses (i.e.,
`10 different senses of the word connote different meanings).
`Typically, prior art search and retrieval systems do not
`differentiate between the different senses. For example, the
`query "stock" may refer to a type of financial security or to
`cattle. In prior art search and retrieval systems, a response to
`15 the query "stock" may include displaying a list of
`documents, some about financial securities and others about
`cattle. Without any further mechanism, if the query term has
`more than one sense, a user is forced to review the docu(cid:173)
`ments to determine the context of the response to the query
`20 term. Therefore, it is desirable to construct a search and
`retrieval system that displays the context of the response to
`the query.
`Some prior art search and retrieval systems include a
`classification system to facilitate in the location of inform a-
`25 tion. For these systems, information is classified into several
`pre-defined categories. For example, Yahoo! TM, an Internet
`directory guide, includes a number of categories to help
`users locate information on the World Wide Web. To locate
`information in response to a search query, Yahoo!™ com(cid:173)
`pares the words of the search query to the word strings of the
`pre-defined category. If there is a match, the user is referred
`to web sites that have been classified for the matching
`category. However, similar to the word match search and
`retrieval systems, words of the search query must match
`words in the category names. Thus, it is desirable to con(cid:173)
`struct a search and retrieval system that utilizes a classifi(cid:173)
`cation system, but does not require matching words of the
`search query with words in the name strings of the catego-
`nes.
`
`SUMMARY OF THE INVENTION
`
`Concept knowledge base query processing in a search and
`retrieval system identifies, in response to a query, the poten(cid:173)
`tial existence of documents by displaying terminology
`related to the query. The search and retrieval system includes
`a knowledge base that links terminology having a lexical,
`semantic or usage association. In response to an input query,
`the search and retrieval system selects and displays termi(cid:173)
`nology relevant to one or more terms of the input query. The
`terminology guides the user in the overall search because the
`user may view the terminology to learn different contexts for
`the query.
`The knowledge base includes a plurality of categories and
`terminology, arranged hierarchically. To process a query, the
`search and retrieval system maps the terms of the query to
`categories/terminology in the knowledge base. In one
`embodiment, an expanded set of query terms are generated
`through use of the knowledge base, and the expanded set of
`60 query terms are used to identify relevant terminology.
`The search and retrieval system further uses a plurality of
`themes that relate context information to one or more of the
`categories. In one embodiment, a content processing system
`processes a plurality of documents to identify themes for a
`document, and classifies the documents, including themes
`identified for the documents, in categories of the knowledge
`base. The themes are selected from the categories/
`
`IBM-1006
`Page 24 of 41
`
`IPR2016-00019
`Petitioners Old Republic General Ins. Group, Inc., et al. Ex. 1034, p.24
`
`
`
`3
`terminology identified by the query terms for potential
`display as terminology for the query response.
`In one embodiment, concept knowledge base query pro(cid:173)
`cessing further includes selecting additional themes, based
`on the original themes selected, by associating, through use
`of the knowledge base, the additional themes with the
`themes selected. To identify terminology for the query
`response, themes are matched with the query terms, or the
`expanded set of query terms, to select terminology common
`to both the themes and the expanded set of query terms that
`satisfies as many query terms as possible. Groupings of
`expanded query terms and themes, which satisfy more than
`one query term, are extracted for display with the query
`terms. Furthermore, the groupings and the themes are rel(cid:173)
`evance ranked to display the most relevant groups and 15
`themes first.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a block diagram illustrating one embodiment for
`the search and retrieval system of the present invention.
`FIG. 2 is a block diagram illustrating one embodiment for
`query processing.
`FIG. 3 illustrates a response to an example query config(cid:173)
`ured in accordance with one embodiment of the search and 25
`retrieval system of the present invention.
`FIG. 4 illustrates an example portion of a knowledge base
`that includes a directed graph.
`FIG. 5 is a flow diagram illustrating one embodiment for
`factual knowledge base query processing.
`FIG. 6 illustrates one embodiment for expanding query
`terms using a directed graph of the knowledge base.
`FIG. 7 is a flow diagram illustrating one embodiment for
`processing concept knowledge base queries.
`FIG. Sa illustrates one embodiment of categories,
`arranged hierarchically, for the "social sciences" topic.
`FIG. Sb illustrates one embodiment of categories,
`arranged hierarchically, for the "food and agriculture" topic.
`FIG. 8c illustrates one embodiment of categories,
`arranged hierarchically, for the "geography" topic.
`FIG. 9a illustrates an example portion of a knowledge
`base used to expand the query term "festivals."
`FIG. 9b is a block diagram illustrating a portion of an 45
`example knowledge base used to expand themes.
`FIG. 9c illustrates one embodiment for a search and
`retrieval response in accordance with the example query
`input.
`FIG. lOa illustrates an example display of the search and
`retrieval system to the query "Internet."
`FIG. lOb illustrates another example display an example
`display for the query "Internet."
`FIG. lla illustrates an example display of the search and
`retrieval system to the query "stocks."
`FIG. llb illustrates an example display in response to the
`selection to the category "portfolios" from the display
`shown in FIG. lla.
`FIG. 12 illustrates an example display for a profile query
`in accordance with one embodiment of the present inven(cid:173)
`tion.
`FIG. 13 is a block diagram illustrating one embodiment
`for a content processing system.
`FIG. 14 illustrates a high level block diagram of a general
`purpose computer system in which the search and retrieval
`system of the present invention may be implemented.
`
`6,038,560
`
`30
`
`4
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENTS
`Search and Retrieval Paradigm
`The search and retrieval system of the present invention
`5 utilizes a rich and comprehensive content processing system
`to accurately identify themes that define the content of the
`source material (e.g., documents). In response to a search
`query, the search and retrieval system identifies themes, and
`the documents classified for those themes. In addition, the
`10 search and retrieval system of the present invention draws
`inferences from the themes extracted from a document. For
`example, a document about wine, appearing in a wine club
`magazine, may include the words "vineyards",
`"Chardonnay", "barrel fermented", and "french oak", which
`are all words associated with wine. As described more fully
`below, if the article includes many content carrying words
`that relate to the making of wine, then the search and
`retrieval system infers that the main topic of the document
`is about wine, even though the word "wine" may only
`20 appear a few times, if at all, in the article. Consequently, by
`inferring topics from terminology of a document, and
`thereby identifying the content of a document, the search
`and retrieval system locates documents with the content that
`truly reflect the information sought by the user. In addition,
`the inferences of the search and retrieval system provide the
`user with a global view of the information sought by
`identifying topics related to the search query although not
`directly included in the search query.
`The search and retrieval system of the present invention
`utilizes sense associations to identify related terms and
`concepts. In general, sense associations relate terminology
`to topics or categories based on contexts for which the term
`may potentially appear. In one embodiment, to implement
`the use of sense association in a search and retrieval system,
`35 a knowledge base is compiled. The knowledge base reflects
`the context of certain terminology by associating terms with
`categories based on the use of the terms in documents. For
`the above example about wine making, the term "barrel
`fermented" may be associated with the category "wines." A
`40 user, by processing documents in the content processing
`system described herein, may compile a knowledge base
`that associates terms of the documents with categories of a
`classification system to develop contextual associations for
`terminology.
`As described more fully below, the search and retrieval
`system of the present invention maps search queries to all
`senses, and presents the results of the query to reflect the
`contextual mapping of the query to all possible senses. In
`one embodiment, the search and retrieval system presents
`50 the results relative to a classification system to reflect a
`context associated with the query result. For example, if the
`user search term is "stock", the search and retrieval system
`response may include a first list of documents under the
`category "financial securities", a second list of documents
`55 under the category "animals", and a third category under the
`category "race automobiles." In addition, the search and
`retrieval system groups categories identified in response to
`a query. The grouping of categories further reflects a context
`for the search results. Accordingly, with contextual mapping
`60 of the present invention, a user is pre