`
`[19}
`
`[11] Patent Number:
`
`5,454,106
`
`Bums et a].
`
`[45] Date of Patent:
`
`Sep. 26, 1995
`
`”111 1111 III "III "III Ill” [[11 111 1111 "III "III 11" M III" 11
`USUOS454106A
`
`[54]
`
`DATABASE RETRIEVAL SYSTEM USING
`NATURAL LANGUAGE FOR PRESENTING
`UNDERSTOOD COMPONENTS OF AN
`AMIBIGUOUS QUERY ON A USER
`INTERFACE
`
`[75]
`
`Inventors: Luanne M. Burns, Ridgefield, Conn;
`Ashe]: Malhotra, Croton-Onnfludson,
`N.Y.
`-
`
`[73]
`
`Assignee:
`
`International Business Machines
`Corporation, Armonk, N.Y.
`
`[21]
`
`[22]
`
`[51]
`[52]
`
`[531
`
`[561
`
`Appl. No.: 62,502
`
`May 17, 1993
`
`Filed:
`Int. Cl.“i
`U.S. C1.
`
`
`
`Field of Search
`
`G06F 17130
`3951600: 364141908; 3641963.];
`36419746; 3641DIG. 2
`3951600, 419;
`3641419
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`111976 Goddard eta}.
`3,932,948
`4341314
`4,241,402 1211980 Mayper,.1r.eta3.
`3951575
`
`4,506,326
`311985 Shaw el al.
`.............
`3951100
`4,553,222
`1111985 Kurland et a1.
`.
`3641401
`
`4,570,217
`211986 Allen et a1.
`3641188
`
`4,670,848
`611987 Scbmmm ........
`.. 395160
`
`4,674,042
`611987 Hernandez etal.
`3641401
`
`4,688,195
`811987 Thompson et a1.
`.. 395112
`.
`
`4,688,198
`811987 Wiggins ..........
`367146
`3951147
`4,723,211
`211988 Barker et a].
`
`364141908
`411988 Katayama et a1.
`..
`4,736,296
`511988 Hirayama et a1.
`..
`4,747,059
`3641496
`
`.. 395176
`4,803,641
`211989 Hardy et a1.
`
`3951146
`4,815,029
`311989 Barker et 81.
`
`.
`4,816,994
`311989 Freiling eta].
`395175
`511989 Tennantet a1.
`.
`364141908
`4,829,423
`
`4,829,427
`511989 Green
`..... 3641300
`
`511989 Yamatla ........... 3951133
`4,831,580
`
`4,349,398
`71.1989 Adj .............
`.. 364141921
`4,866,634
`911989 Reboh el al.
`....... 395176
`111990 Den- et a]. ............. 395176
`4,891,766
`
`364141903
`4,914,590
`411990 Loattnan et al.
`
`611990 Isle et a].
`395111
`4,931,950
`711990 Okajima et a1.
`364141908
`4,942,526
`711990 Lark etal.
`...... 395176
`4,943,932
`1111990 Amirghodsi et a].
`..
`3951275
`4,974,191
`
`1211990 Hutchins ............
`39512.63
`4,980,917
`......... 3641419
`211991 Asakawa
`4,994,967
`
`711991 lto ..............
`364141908
`5,029,085
`1111991 Pajak et a1. .......... 3951159
`5,065,347
`
`211992 Spielmal et 31.
`. 3951158
`5.088.052
`5.175.814 1211992 Articltetal.
`. 3951161
`5,301,317
`411994 Lehman et al.
`3951600
`
`..
`364141908
`5,369,575
`1111994 [amberti et a1.
`.................. 364141908
`5,377,103
`1211994 Lambefii el al.
`
`OTHER PUBLICATIONS
`
`M. D. Williams, “What Makes Rabbit Run?", 1984, Aca-
`demic Press Inc. (London) limited. pp. 333—352.
`K. Y. Whang, A Malhotra, G. Sockut, L. Burns Sr. K. Choi,
`‘Two—Dimensional Specification of Universal Quantifica-
`tion in a Graphical Database Query Language” [BEE Trans
`actions on Software Engineering, vol. 18, No. 3, Mar. 1992
`pp. 216-224.
`L. M. Burns, A Malhotra. G. Soekut, K. Whang “Aerial Ad
`Hoc Entity—Relationship Investigation and Learning", [El-3E
`Transactions on Soflware Eng. (USA vol. 18, No. 3 Mar.
`1992 pp. 115141159.
`
`Primary Examiner—Thomas G. Black
`Assistant Exmniner—Peter Y. Wang
`Attorney, Agent, or Finm—louis J. PerceDo
`
`[57]
`
`ABSTRACT
`
`Information is retrieved from a database using natural lan-
`guage (NL) queries and graphical interfaces and displays. A
`query is separated into tokens which are parsed into ele-
`ments. The parsed elements are matched to a list of database
`names. If all the parsed elements can be uniquely matched
`to database names, a database query is constructed and used
`to query the database to retrieve information and to present
`to a user. However. when an ambiguous query is encoun—
`tered, i.e. all of whose elements cannot be uniquely matched
`with database names, the understood components of the
`ambiguous query, i.e., those elements matching database
`names, are presented the user along with relationships of the
`elements to other names in the database so that the user can
`use an interface to explore the database by accessing and
`displaying this database information and these relationships.
`
`
`
`.—_.. 501E
`Ell REWIZED
`
`
`
`
`RECOGNIZED
`
`(‘NEI 229
`PICK BEST
`
`
`PARSE
`
`
`(2'3
` No)
`
`mast: /EI5
`SUEZHUL'I [F'LE
`
`HATCHED
`
`
`DISPLRY
`
`01 SP1. A? PMTIRL.
`
`DPT IONS
`'U'NDERS'IIIUD‘
`
`
`SCIIEM
`216
`
`CREME
`MTABASE
`
`QUERY RESUBHI
`GLERY
`
`
`DEFINE
`
`
`
`
`
`2&0
`
`Petitioner Microsoft Corporation - Ex. 1004, p. 1
`Petitioner Microsoft Corporation - EX. 1004, p. 1
`
`
`
`5,454,106
`Page 2
`
`The interface can take the form of views on a graphical
`interface. Using the displayed information,
`the user can
`create associations between database names and compo-
`nents not understood in the query. In other words, database
`names can be associated with the natural language words or
`phrases. These associations are added to the system knowl-
`edge and used to respond to future queries. In this way. the
`
`i.e., by using the added associations the
`system learns.
`system is able to respond to queries that it was unable to
`respond in a satisfactory manner before the association was
`added.
`
`14 Claims, 7 Drawing Sheets
`
`Petitioner Microsoft Corporation - Ex. 1004, p. 2
`Petitioner Microsoft Corporation - EX. 1004, p. 2
`
`
`
`US. Patent
`
`Sep. 26, 1995
`
`Sheet 1 of 7
`
`5,454,106
`
`{05
`
`100
`
`PROCESSOR
`
`DICTIONARY
`L35
`
`
`
`
`I _
`
`PARSER
`{20
`
`SCANNERITOKENIZER
`I40
`
`DATABASE
`130
`
`STORAGE
`125
`
`Petitioner Microsoft Corporation - Ex. 1004, p. 3
`Petitioner Microsoft Corporation - EX. 1004, p. 3
`
`
`
`US. Patent
`
`Sep.26,1995
`
`Sheet 2 of 7
`
`5,454,106
`
`
`
`._.DZPDQZH
`
`nunkmmwnza
`
`mmm
`
`hmwmMOE
`
`wmma‘m
`
`
`
`mmmmmmnumwmaiuzu
`
`mzawpmm
`
`mam
`
`
`>¢4mmHmw0m_vfim
`
`_D._.Dz
`
`xummmzuP¢z
`
`Iuh¢z
`
`
`
`mm2¢zmw¢m¢H¢Q
`
`E.mhzmzwufi
`
`omm
`
`3m
`
`
`
`E..mezzuu
`
`
`
`>mu30um¢m¢h¢m
`
`amazon—DUN...
`
`
`44¢~4¢Hhm¢m>¢4mmHn
`
`Dom
`
`0mm
`
`«FinU>m=mhum
`
`
`
`k¢4ammm.w
`
`ohm
`
`owmmeuzau
`
`mZHuwm
`
`numuunm>mm3a«zwxum
`
`
`
`um¢m¢h¢nmanoaxm
`
`mw¢mmu
`
`352085
`uzam.r||_fim
`
`w4¢¢432\mznw
`
`mum—IDES;
`
`
`
`QMIUhczwh¢umu
`
`HQ4wm2¢z
`
`.QDDHWEMQZD.
`
`dzwIUm
`
`pfizmnmmm
`
`>muna
`
`Petitioner Microsoft Corporation - Ex. 1004, p. 4
`Petitioner Microsoft Corporation - EX. 1004, p. 4
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`US. Patent
`
`Sep. 26, 1995
`
`Sheet 3 of 7
`
`5,454,106
`
`320
`
`535
`
`328 DEPARTMENT-ff 322
`
`52?
`
`355“
`WORK LOCATION
`
`
`BO
`
`D EPA RTMEMT DEPARTMENT
`
`NO
`
`NAME
`
`
`
`3
`
`10\
`
`3T4
`
`3%?
`EMPLOYEE
`
`
`
`
`
`EMPLOYEE DEPARTMENT
`
`330
`
`
`
`Petitioner Microsoft Corporation - Ex. 1004, p. 5
`Petitioner Microsoft Corporation - EX. 1004, p. 5
`
`
`
`US. Patent
`
`Sep. 26, 1995
`
`Sheet 4 of 7
`
`5,454,106
`
`mzczHES111%
`
`¢.o_n_
`
`m:
`
`Dav
`
`
`
`mmvflfffff
`
`”:225mg;Emmv
`mz¢zBu:HEv
`ma:3%Emmv
`
`E328%Enew
`39:ENEE36
`
`mm;
`
`mm;/mznfiazuuI261
`
`omv
`
`m3.
`
`3%
`
`vow
`
`wt»
`
`QN¢Nthum»...mz¢z
`
`Illulllill!
`
` 33>.mzu.pxmIIIII
`“:2Faunaim
`L.«Mia—H3
`nI“I“IIII
`mozEwEw
`
`OS»
`
`31»
`
`m:
`
`omv
`
`mmw
`
`mmv
`
`3%
`
`omv
`
`mhv
`
`NNV
`
`
`
`Sawhauzuzuuqmw
`
`MZDHHEZDU2:2;
`
`wzngumfiuma
`
`wnv
`
`ht“
`
`mnw
`
`9:.
`
`Petitioner Microsoft Corporation - Ex. 1004, p. 6
`Petitioner Microsoft Corporation - EX. 1004, p. 6
`
`
`
`
`
`US. Patent
`
`Sep. 26, 1995
`
`Sheet 5 of 7
`
`5,454,106
`
`Dom
`
`omm
`
`ovm
`
`0mm
`
`com
`
`\\2m
`mmzqzM753,9?.
`
`IUPQZbulk
`
`mmzqzMJmE.mad Iota:
` mlfiqh.and 10.2251.;wmz¢z1.5.3MZHD...JDLDZHZa‘m—z
`
`
`
`0mm.1.53mmzqzHjmqfiand.
`
`
`
`uhqzfizjm mmIUPd‘zHQIHmzqz
`«zwluw4.ZSMDZEDDMU\WQHIMZD:¢JMM
`1:335>a.ZHmmzqz
`mu£¢2mimic...mtqumfufim
`
`
`>mQMFUMZZDQ
`DZQPM:BZHHJDmmm\CQIEME
`
`
`3m?).q
`
`
`.EIHmmzqz94H:
`
`mmm
`
`mvm
`
`mmloqu
`
`wm2¢z
`
`mom
`
`Petitioner Microsoft Corporation - Ex. 1004, p. 7
`Petitioner Microsoft Corporation - EX. 1004, p. 7
`
`
`
`
`US. Patent
`
`Sep. 26, 1995
`
`Sheet 6 of 7
`
`5,454,106
`
`v
`H
`0
`
`v
`0.!
`
`O
`
`TABLE4
`
`600
`J(//
`
`FIG.6
`
`6H
`
`TABLE3_
`
`
`625([’
`
`623
`
`Petitioner Microsoft Corporation - Ex. 1004, p. 8
`Petitioner Microsoft Corporation - EX. 1004, p. 8
`
`
`
`US. Patent
`
`Sep.26,1995
`
`Sheet 7 of”?
`
`5,454,106
`
`FIG. 7
`
`700
`
`/
`
`IDENTIFY NAMES UF TABLES
`
`780
`
`CUMPRISING THE VIEW
`
`IDENTIFY MEANINCFUL JUINS BETWEEN SLECTED
`
`730
`
`TABLES DR SPECIFY NEW JDINS BETWEEN TABLES
`
`DEFINE SELECTIDN CRITERIA DN
`FIELD VALUES EIF SELECTED TABLES
`
`7
`
`4‘3
`
`SELECT FIELDS DF SELECTED TABLES
`TU BE INCLUDED IN THE WIEW
`
`750
`
`SELECT DR INPUT NL WEIRD DR PHRASE
`TD ASSDCIATE WITH THE CREATED VIEW
`
`7 0
`6
`
`STDRE THE ASSDCIATIDN [IF THE VIEW &
`THE NL CDNCERT ABDVE IN MEMDRY
`
`770
`
`Petitioner Microsoft Corporation - Ex. 1004, p. 9
`Petitioner Microsoft Corporation - EX. 1004, p. 9
`
`
`
`5,454,106
`
`1
`DATABASE RETRIEVAL SYSTEM USING
`NATURAL LANGUAGE FOR PRESENTING
`UNDERSTOOD COWONENTS OF AN
`AMLBIGUOUS QUERY ON A USER
`INTERFACE
`
`FIELD OF THE INVENTION
`
`This invention relates to the field of gaining access to
`computer database information by using a novel combina-
`tion of natural language and graphical queries and displays.
`
`BACKGROUND OF THE INVENTION
`
`The prior art includes many computer systems that allow
`the user to obtain information from databases by entering a
`nattu‘al
`language query or command. Examples include
`Intellectfl], Natural Language from NLlfz], and Language
`Aceess from IBM[3]. (language Access“ is a trademark of
`the IBM Corporation.) These prior art systems generally
`follow the same method. First a natural language query is
`posed to the system using some sort of interface like a
`computer keyboard and screen. The system runs the input
`through a scanner or tokenizer that breaks the natural
`language (NL) query into individual words or tokens and
`looks up each worda'token in a system dictionary. The system
`then uses a NL parser that parses the query into its elements.
`The output of the parser is organized as a parse tree that
`shows the relationship between the elements. The parser
`may also provide additional information about each parse
`tree element, called element attributes, that might include:
`the parse tree element part of speech. its tense, under any
`parse one element synonyms, hypouyms, and hypernyms. A
`matching step is then performed where one or more parse
`tree elements anchor attributes are matched to names in the
`database. For a relational database. the names would include
`table and table field names. If the NL query can be com—
`pletely and unambiguously parsed and if the relevant ele-
`ments can be matched to the database names and, further, if
`the NL query can be transformed into a complete and correct
`database query then the desired information is retrieved
`from the database and displayed in some format on the user
`interface (e.g., computer screen). However, if the query
`cannot be unambiguously parsed or if there is a partial or
`multiple match between the parse tree elementis] and the
`database names, or a correct database query cannot be
`constructed, then the system is unable to ‘hnderstand" the
`user request. Incorrect database information or no informa»
`tion at all will be retrieved in these cases.
`
`There are many ways in which the system can fail to
`“understand" the user request. First, the scannen'tokeniaer
`may not recognize one or more wordsltokens of the NL
`query if, for example, one or more of the wordsftokens (or
`their synonyms) making up the NL query do not match the
`entries in the system dictionary. Second, the parser may fail
`to correctly parse the natural language input. This can occur
`if the natural language input has a structure which the parser
`does not recognize. Alternatively, the parser can fail by
`yielding multiple parses. This can occur even for relatively
`simple NL queries.
`The prior art tries to resolve these problems in a number
`of ways. Often the prior art asks for clarification. Clarifica»
`tion is helpful if the natural language query can be resolved
`by using a different word or by defining the misunderstood
`word. If the system does not understand the syntactic
`structure of the query, the system may ask the user to clarify
`
`10
`
`15
`
`20
`
`35
`
`40
`
`45
`
`5t}
`
`55
`
`60
`
`65
`
`2
`the query by rephrasing the NL query in an understandable
`form. However, in case of multiple parses, the system must
`then decide to which of the possible parses it should
`respond. Several heuristics are used to determine this. For
`example, the parse that best matches the names in the
`database may be selected. However,
`these heuristics are
`often no better than guesswork.
`Failures can occur in the prior an even after a single
`correct parse. In these cases, some or all of the elements in
`the parse u-ee cannot be matched to the names in the
`database. In these instances. the process fails and a database
`query cannot be developed to retrieve the desired informa-
`tion from the database. The system can still ask for clarifi—
`cation or rephrasing but it is very ditiicult in this case to tell
`the user how to change the query. Repeated, non-specific
`requests to rephrase the query can quickly discourage the
`user and cause the system to be rejected. To avoid this, some
`prior art guesses at the meaning of the natural language
`query. Guessing sometime permits information retrieval
`from the database, but the user has no way of telling if the
`information retrieved and presented is the correct system
`response or not. Guessing and presenting the wrong infor-
`mation can rapidly cause the user to lose faith in the system
`and stop using it.
`
`STATEMENT OF PROBLEMS WITH THE
`PRIOR ART
`
`Understanding nanu‘al language using computer recogni-
`tion systems is very difiicult and is still the subject of a great
`deal of ongoing research. The prior art fails to understand a
`largefraction of natural language queries and fails to accu—
`rately translate them into database queries. As discussed
`above,
`this can occur due to a variety of causes: not
`recognizing individual wordsltokens or phrases, not recog—
`nizing the syntactic form employed, and not developing a
`proper parse. Further, at times the prior art fails to query the
`database information even though the input query can be
`parsed unambiguously because it cannot match all
`the
`elements in the parse tree unambiguously to database names
`or is unable to develop a correct database query from the
`matched elements. In all of these situations, no information
`is retrieved that responds to the input query or, worse yet, the
`retrieved infomation is erroneous.
`
`The prior art does not improve with usage. There is no
`capability to “teach" the prior an new concepts. The prior art
`systems do not improve with use over time because they
`cannot “learn“ to retrieve database information using new
`and different input words and NL phrases.
`Finally, and most importantly, the prior art cannot explain
`itself to the user. In case of partial understanding, it is unable
`tell the user what it has understood. Neither can it explain
`what concepts are included in the database that can be
`queried. Therefore. prior art systems lack the ability to teach
`the user to use system more effectively.
`
`OBJECTS OF THE INVENTION
`
`An object of this invention is an improved, user-friendly
`apparatus and method that enables a user to respond to
`ambiguous queries.
`Another object of this invention is an improved database
`query system using a combination of natural language and
`graphical displays and operations.
`Another object of this invention is an improved combined
`natural language and graphical data retrieval system and
`method that resolves ambiguous input queries by displaying
`
`Petitioner Microsoft Corporation - Ex. 1004, p. 10
`Petitioner Microsoft Corporation - EX. 1004, p. 10
`
`
`
`5,454,106
`
`3
`a subset of the database organization according to what is
`understood of the query and by allowing the user to further
`explore the database using the graphical capabilities.
`A further object of this invention is an improved data
`retrieval system and method that learns in the process of
`resolving ambiguous NI. queries so that those queries can
`subsequently be understood.
`
`SUMMARY OF THE INVENTION
`
`The present invention is a novel method and apparatus for
`retrieving information from a database using natural lan—
`guage (NL) queries and graphical interfaces and displays.
`The system is able to resolve ambiguities that result from the
`N1 query by presenting the user with information in the
`database that matches wordsr'rokens or parsed elements of
`the query. Specifically. the system presents the user what it
`“understands" of the query in the preferred form of a
`graphical display. Using the graphical display and what the
`system understands as a starting point1 the user explores the
`database by accessing and displaying database information
`and information relationships. The user is also able to
`specify queries directly by using graphical operations on the
`information presented. Finally, the user can specify a data-
`base “view” by using graphical operations on the display
`and establishing an association between the view and 21 NL
`word or phrase. This association is stored in the system and
`used to match elements in the parse tree for future queries.
`In this way. the system learns and improves with use and
`over time. As associations are added to it. the system can
`respond to an ever increasing set of NL queries.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a block diagram showing the hardware compo-
`nents of a preferred natural language data retrieval system.
`FIG. 2 is a flow chart showing the steps of the process of
`resolving ambiguous queries and retrieving matched infor-
`mation from the natural language data retrieval system.
`FIG. 3 shows one preferred database structure used to
`store information that is to be retrieved using the present
`invention.
`
`FIG. 4 shows a data structure of the present invention
`used to store the database names.
`
`FIG. 5 is a flow chart showing how database names are
`matched to input query wordsr'tokcns or parser result ele-
`ments to resolve ambiguous input queries.
`FIG. 6 is an example of a preferred schema display
`showing the user only what the system understands of an
`ambiguous query.
`FIG. 7 is a flow chart showing how the present invention
`learns to subsequently respond to an ambiguous NL query
`by associating a NL word or phrase with a selected set of
`information within the database.
`
`DETAILED DESCRlPTION OF THE
`INVENTION
`
`FIG. 1 shows a preferred computer system apparatus 100
`of the present invention comprising a user interface 105. a
`processor 110, and a storage memory 125. The user interface
`105 can he a standard keyboardICRT display. a speech
`recognition interface. andlor an interface to another system,
`such as another computer, that requests information from the
`database 130 resident on the storage memory 125. A dictio-
`nary 135 is also resident on the storage memory 125 or
`alternatively in main memory 115. User interfaces 105 like
`
`10
`
`15
`
`20
`
`25
`
`3t]
`
`35
`
`45
`
`SD
`
`55
`
`60
`
`65
`
`4
`interfaces, are well
`these. along with other equivalent
`known. The processor 110 can be a standard personal
`computer, mainframe computer or other computer system
`that is also well know. The processor 110 has some main
`memory 115, a parser program 120, and a seannen‘tolrenizer
`program 140. The storage memory 125 can be a disc drive
`or other equivalent bulk storage apparatus. Alternatively, the
`database 130 andlor dictionary 135 can be stored in the main
`memory 115 and the storage memory 125 can be omiued.
`One preferred system for embodying this invention is an
`IBM P512 Model 95 computer equipped a microchannel
`speech accelerator card #6306. a graphical CRT display. 16
`megabytes of random access memory. and 300 megabytes of
`storage memory. One preferred parser 120 is Lbeparserin the
`IBM Language Access product [3]. and one preferred scan—
`nerltokenizer 140 is the scannerftokenizer in [3]. Equivalent
`parsers 120 and seannerltokenizers 140 may also be used.
`FIG. 2 shows a. flow chart of the method 200 used by the
`present apparatus shown in FIG. 1.
`A user 202 (a human or an apparatus capable of querying
`a system database) interacts with the system interface 205.
`The interface 205. embodied in FIG. 1 as component 105,
`receives an input query. preferably a natural language (NL)
`query. from the user. The NL query can be in the form of text
`input to a keyboard interface 205 or auditory speech to a
`speech recognition interface 205. The user interface 205
`ultimately converts the user input query into a system
`recognizable format so that it can be scanned by the scanner!
`tokeru'zer 210.
`Block 210 shows the scannen'tokenizer. A seannen’token-
`izer takes the input (NL) query and breaks it up into
`identifiable word or token components. [A token is a sym-
`bol. like a NL word, that can be recognized by matching it
`to entries in a system dictionary 135.) For typed input. the
`wordltokens are identified by being delimited by blanks. For
`speech input. the wordftoken may be delinrited by pauses.
`Other ways of identifying speech tokens e.g., matching of
`words or partial words. are well known in the art.
`In one embodiment, if one or more words/tokens of the
`NL query are scanned and not understood (e.g.. the words!
`tokens fail
`to match data in the scannen’tokenizer 210
`dictionary), feedback is given to the user 202 through the
`user interface 205 by block 208. Block 208 typically would
`indicate that the NL query is not understood by the system
`100 and would request an alternate NL query. The user 202
`would continue to input alternative N1. queries until all the
`wordsftokens are understood by the scannen'tokcnizer 210.
`At this point, the scannen’tokenizer 210 would convert the
`wordsr’toltens of the NL query into a form recognized by the
`parser 212.
`The case where some of the wordsr‘tokens of the NLqucry
`are never recognized by the scannen'tokenizer 210. for our
`purposes. is called a scanner or tokenizer ambiguity 211.
`Resolution of the tokcnizer ambiguity 211 is described
`below.
`
`The parser 212 can be any known parser that is capable of
`parsing the user input into elements. As stated above, the
`parsed query elements can have information about the
`relationships of the components with in the NL query and
`possibly additional attribute information about the NLquety
`components.
`The parser 212 results can take three forms: many parses
`of the NL query. one parse of the NL query. and no parses
`of the NLqucry. For example, many parses can occur where
`one or more wordltoken in the N1. query has more than one
`meaning or can be used as more than one part of speech in
`
`Petitioner Microsoft Corporation - Ex. 1004, p. 11
`Petitioner Microsoft Corporation - EX. 1004, p. 11
`
`
`
`5,454,106
`
`5
`
`the NI. query. Other examples are also known. The one parse
`case occurs when there is only one meaning and one
`organization of the NLquery components that the parser 212
`is able to provide. The no parse case 213 might occur if the
`query contains a NL construction that the parser does not
`recognize or if there is a typographical error in NL query
`which yields recognizable wordsltokens that do not make
`sense in a sentence. For example, misspelling “show" as
`“how".
`
`Ifthere are many parse results, an algorithm is used in box
`225 to select the best or most likely parse of the NL query.
`This can be done in two ways (22? and 229). First 229, the
`system can display the misunderstood candidate parse
`results (box 230) to the user 202 through the user interface
`205 with a request to the user 202 to select the correct parse.
`Alternatively, the system can select one of candidate parse
`results using a heuristic 227. There are many known heu-
`ristics 227 for doing this (for example using statistical
`analysis to select the most probable parse), but often the
`heuristics boil down to the system guessing which candidate
`parse result represents the user’s intention. The elements of
`a single successfill parse or a selected candidate parse (if
`there is more than one parser result) are matched to names
`in the database in box 214.
`
`The last parser result form, the no parse result. is called a
`parser failure 213 for our purposes. The case of the parser
`failure 213 will be discussed below.
`
`The database 130 of the system 100 is stored in the system
`memory (typically in bulk storage 125) and can have the
`structure of databases generally known in the art. See FIG.
`3. The preferred database comprises records having fields
`(or attributes) that contain values. Collections or records are
`called files or tables. The “Harries" in the database are the
`names of the tables (these names are the same as the names
`of the records in the table) and the names of the fields of the
`table. They also include meaningful join names and database
`view names. These are explained below.
`For example, data about employees in a company is
`organized in a table 310. The table name 312 is “Employee".
`(Likewise, each given record 318 of the table 310 named
`“Employee" is canted “Employee” or “Employee Record".)
`Each given record 318 of the table named "Employee" has
`one or more fields. Each of the plurality of fields also has a
`name. In the example, three field names (also called column
`names) are shown for the “Employee" table 310: Employee
`number 314, Employee department 315, Salary 316, etc.
`Tables in a relational database may be related to one
`another. In the relational database case, there are two or
`more tables organized like the employee table 310. That is,
`each of the tables in the relational database has table (record)
`names and field (column) names for each field in a given
`record. For example, there can be an additional table 320 in
`the database with the name “Depanment” 322. The “Depart-
`ment" table also has records with one or more fields. Each
`of the fields also have names. In this example, the fields
`shown are: “Department number 325, “Department name"
`326, “Manager" 327, etc. When a given field of two or more
`tables in a relational database represents the same type of
`information, there is a possible join between the two tables.
`In the case 350, this means that infonnation can be accessed
`from the “Department" table through the "Employee" table
`by using the common field (here “Employee Department"
`315) as a key. A subset of the possible joins are designated
`meaningful joins, such as 350, because they represent a
`useful relationship between the tables. Using meaningful
`joins to access information in relational database tables is
`
`it)
`
`15
`
`20
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`well known. In the preferred embodiment, the meaningful
`joins 350 of the relational database 130 are also given names
`355.
`
`Using relational database techniques, other tables, called
`extended tables or views 330, can be created. The views are
`created using information from other related tables (310,
`320) in the database 130. The newly created tables (views)
`also have a table (view) name 332 (here “Extended
`Employee") and fields with names (here “Employee Num—
`ber” 334, “Department Name" 335, “Manager" 337. and
`“Salary” 336). Views (extended tables) can be created by the
`system designer or alternatively by the user (see box 280 of
`FIG. 2). The steps in defining a view are exactly the same as
`the steps in defining a query except dint information is not
`used to access the database (as in query) but stored in the
`system for future use.
`FIG. 3 also shows examples of the values that reside in
`fields of the database tables. A given record, 318, 328, and
`338 in tables 312, 322, and 332, is shown to have the values:
`6000 for the field named “Employee Number", Engineering
`for the field named “Department Name", Smith for the field
`name “Manage ", and $l05,000 for the field named “Sal-
`ary“, respectively.
`Refer again to FIG. 2. in box 214, the elements of the
`results of the parse can be matched to database names in a
`number of ways that are known in the art. For example. one
`or more selected elements of the parser output or their
`synonyms can be matched against a list of names in the
`database i.e., for a relational database,
`the names of the
`tables, the names of the fields or columns, the names of the
`meaningful joins and the names of the views [1].
`There are four possible results of the matching step in box
`214:
`
`1. None of the parser result elements match any database
`names.
`
`2. Each of the parser result elements matches a single
`database name.
`
`3. Some parser result elements (but not all) match data-
`base names.
`
`4. Some or all of the parser result elements match more
`than one database name.
`For the first result, the no match case 219, a message is
`given to the user 202 through the user interface 205 using a
`function like that in box 208. For the second set of parser
`results, the individually matched case, the matched parser
`result elements are convened into a database query in box
`216. Results 3 and 4 are called match ambiguities 215 and
`will be discussed below.
`When all of the parser result elements are successfully
`matched to names in the database, box 216 attempts to
`convert the parse tree and the matched elements into a
`database query. A database query is a formal specification
`used to retrieve data from a database. Thus, the output of box
`216 is a set of field and record names in the database along
`with join specifications and some logical selection of the
`values of the fields in those records. For example, a simple,
`single table database query might take the form of: Select the
`“Department" (field name 315) from “Employee" (table!
`record name 312) with “Salary" (field name 316) greater
`than “$100,000" (logical lower limit of the value in the
`“Salary" field.) This is well known prior art. See.
`for
`example, [3}.
`Sometimes the system fails to conven the successfully
`matched elements of the parser into a database query in box
`216. For our purposes, this is called a query conversion
`ambiguity 217.15. query conversion ambiguity 217 can occur
`
`Petitioner Microsoft Corporation - Ex. 1004, p. 12
`Petitioner Microsoft Corporation - EX. 1004, p. 12
`
`
`
`7
`
`8
`
`5,454,106
`
`in a number of ways. For example, if a single table and query
`name is matched an ambiguity occurs if there is no field in
`the selected table with a name that matches a field nante
`selected by the database query. The handling of query
`conversion ambiguities 217 is described below.
`In box 20, database queries, successfully converted from
`matched elements, are used to retrieve data from the data-
`base. The data is displayed (or provided} to the user 2202
`through the user interface 205. This is handled by a database
`management system using well known methods.
`In the above discussion. specific preferred examples of
`the user 202, user interface 205, searmerftokenizer 210,
`display feedback 208, parser 212, picking the “best parse”
`225, displaying misunderstood parsed options 230, match—
`ing elements 214, converting to database queries 216, and
`retrieving and displaying database data 220 where pre~
`sentcd. These aspects of the system have many alternative
`embodiments in the prior art that are envisioned within the
`scope of this invention. However, the prior art, as mentioned
`above, fails to resolve the many ambiguities (21.1, 213. 215,
`and 217} that may result in servicing a user input query,
`specifically the scanner/tokeniaer ambiguity 211, the parser
`ambiguity 213, the matching ambiguity 215, and the query
`conversion ambiguity 217 that are identified above.
`The present invention introduces a novel solution to
`resolving these ambiguities and furthermore is able to
`"learn" how to subsequently retrieve database information
`requested by input queries that initially produce ambiguities.
`These novel features are: selecting words/tokens or parser
`elements that the system 109 understands (create matched
`names list) 240, displaying the understood wordsltoltcns or
`parser result elements to the user 250, and enabling the user
`to define new concepts so that ambiguities associated with
`the NL query do not arise in the future 200, i.c., the system
`learns.
`Box 240 initially processes the scanning 211, parsing 223,
`match 215 and conversion 21';l ambiguities 211.
`In the event diat some wordsr'tokens of the input query are
`not recognized 211 [matched to entries in the dictionary} or
`the wordst‘tokens of the input query cannot be parsed 213.
`the input query wordsl‘tokens themselves are matched to the
`names in the database. In the case of a match ambiguity 215,
`the parser result elements that matched database names
`andror have multiple matches to database names in hex 214
`are passed to box 240. In the case of a conversion ambiguity
`217, all the parser result elements are passed to box 240.
`Box 240 uses matching techniques similar
`to those
`described in box 214. However, only those wordsltokens or
`parser elements that match database names (i.e., table, field,
`meaningful join or view names) are “understood” by the
`system 10!). Those matching database names pass as output
`from box 240. The other wordltokens or parser result
`elements, those not having a match to a database name, are
`ignorcdldiscardcd.'1‘hereforc, the output of box 240 is a list
`of database names that match the recognised components.
`(A component is a wordltoken or a parser result element.) In
`boxes 240 and 250, there may be one or more matches for
`each query component.
`Box 250 resolves the ambiguities created in the scanner!
`tokenizer 210 {ambiguity 211), parser 212 (ambiguity 213),
`the matching elements 214 (ambiguity 215), a