`Shwartz et al.
`
`115
`
`[54] DATABASE RETRIEVAL SYSTEM HAVING
`A NATURAL LANGUAGE INTERFACE
`
`[75]
`
`Inventors:
`
`Steven Shwartz, Orange; Claudio
`Fratarcangeli, Trumbull; Richard E.
`Cullingford, Monroe; Gregory S.
`Aimi, North Haven; Donald P.
`Strasburger, Stratford, all of Conn.
`
`[73] Assignee:
`
`Intelligent Business Systems,
`Milford, Conn.
`
`(21] Appl. No.: 345,966
`[22] Filed:
`May1, 1989
`[52] Unt, CLScc esseesenseeesseeesersecneneees G06F 15/40
`[52] U.S. Ch. oo. eecssseseeneeeteeeeenes 364/419; 395/600;
`364/DIG.1; 364/274; 364/274.2; 364/274.3;
`364/274.8
`[58] Field of Search........ 364/513, 419, 200 MSFile,
`364/900 MS File; 395/600, 12
`
`(56}
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`4,736,296 4/1988 Katayamaetal. ..........00 364/419
`4,811,207
`3/1989 Hikita et al... .esesecessesees 364/900
`
`4,839,853
`6/1989 Deerwesteret al.
`.. 364/900
`4,914,590 4/1990 Loatmanetal. .....
`. 364/419
`
`4,930,071
`5/1990 Tou etal..........
`a. 314/300
`
`w 364/419
`4,931,935
`6/1990 Ohira etal. ......
`
`7/1990 Miyamoto etal. ..
`w. 364/513
`4,943,933
`4,974,191 11/1990 Amirghodsiet al.
`we 395/700
`4,994,967
`2/1991 Asakawa ....cccscsscecersersseses 364/419
`
`FOREIGN PATENT DOCUMENTS
`
`63-219034 9/1988 Japan .
`
`OTHER PUBLICATIONS
`
`NYANAAA
`US005197005A
`[11] Patent Number:
`5,197,005
`Mar.23, 1993
`
`“Inside Computer Understanding”, Schank and Ries-
`beck, Erlbaum Press, 1981, Chapter 14.
`“The LIFER Manual: A Guide to Building Practical
`Natural Language Interfaces”, by Gary G. Hendrix,
`SRI International, Technical Note 138, Feb. 1977.
`“Human Engineering for Applied Natural Language
`Processing”; by Gary G. Hendrix, SRI International,
`Technical Note 139, Feb. °77.
`“Applied Natural Language Processing”, Shwartz,
`Steven C., Petrocelli Books, Princeton, N.J., 1987.
`Primary Examiner—Michael R. Fleming
`Assistant Examiner—Debra A. Chun
`Attorney, Agent, or Firm—Barry R.Lipsitz
`[57]
`ABSTRACT
`A database retrieval system having a natural language
`interface is provided. A database developer creates a
`knowledge base containing a structural description and
`semantic description of an application database from
`which data is to be retrieved. A database independent,
`canonical internal meaning representation of a natural
`language query is produced. An expert system accesses
`structural and semantic description information in the
`knowledge base and,
`in accordance with predefined
`rules, identifies database elements from said information
`that are necessary to satisfy the query represented by
`the internal meaning representation. A database query is
`generated among the database elements, enabling the
`retrieval and aggregation of data from the database to
`satisfy the natural language query. A debuggingfacility
`derives an external meaning representation from the
`internal meaning representation. The external meaning
`representation is database-independent, canonical, and
`easily understandable to the database developer. The
`external meaning representation enables the database
`developer to comprehend the internal meaning repre-
`sentation and verify that a natural language query is
`properly interpreted by the system to effect the accu-
`rate retrieval and aggregation of data from the database.
`The external meaning representation comprises entities
`and constraints relating to the entities, without refer-
`ence to factual or linguistic relationships between enti-
`ties that would prevent the external meaning represen-
`tation from being easily understood.
`
`Winston, “Natural Language Understanding”, Artificial
`Intelligence, CH. 9, pp. 291-334.
`Rich, “(Natural Language Interfaces”, IEEE Computer,
`Sep. 1984, pp. 39-47.
`Kao et al., Providing Quality Responses with Natural
`Language Interfaces: the Null Value Problem, IEEE
`Trans. Software Eng., vol. 14, No. 7, 1988.
`“Natural Language Interfaces: Benefits, Requirements,
`State of the Art and Applications”, by John L. Manfer-
`delli, A. 1. East, Oct. 1987.
`
`41 Claims, 11 Drawing Sheets
`Microfiche Appendix Included
`
`(11,603 Microfiche, 47 Pages)
`
`Page | of 29
`
`GOOGLEEXHIBIT 1013
`
`GOOGLE EXHIBIT 1013
`
`Page 1 of 29
`
`
`
`U.S. Patent
`
`Mar. 23, 1993
`
`Sheet 1 of 11
`
`5,197,005
`
`WINDOWS/MENUS
`
`NATURAL
`LANGUAGE
`
`
` 18 x14
`
`USER_PROFILE
`MAINTENANCE
`
`DATA DICTIONARY
`ANALYZER
`
`KNOWLEDGE BASE
`EDITOR
`
`
`
`
`
`RETRIEVAL
`
`
`KNOWLEDGE BASE
`REPORTS
`
`ee ee ee ee
`
`
`NAVIGATOR &
`APPL‘N
`
`
`QUERY LANGUAGE
`DATABASES
`
`
`GENERATOR
`
`
`
`
`
`SPECIFICATION
`
`
`
`
`NOMINAL DATA
`ANALYZER
`
`DEBUGGER
`(EXTERNAL MR)
`
`
`REPORTER
`
`DB_ ACCESS
`SYSTEM
`
`.
`
`FIG. 1
`
`Page 2 of 29
`
`Page 2 of 29
`
`
`
`U.S. Patent
`
`Mar. 23, 1993
`
`Sheet 2 of 11
`
`5,197,005
`
`
`
`FIG. 2
`
`Page 3 of 29
`
`Page 3 of 29
`
`
`
`U.S. Patent
`
`Mar. 23, 1993
`
`Sheet 3 of 11
`
`5,197,005
`
`
`
`
`NATURAL
`CODE BASED ON
`
`LANGUAGE
`GRAMMAR FOR
`
`
`INTERFACE
`INTERNAL MR
`
`
`
`DEBUGGER
`
`
`
`
`
` INTERNAL MR
`TO DBES
`
`
`
`
`EXTERNAL MR
`
`308
`
`DISPLAY TO DEVELOPER
`
`FIG. 3
`
`Page 4 of 29
`
`Page 4 of 29
`
`
`
`US. Patent
`
`Mar. 23, 1993
`
`Sheet 4 of 11
`
`5,197,005
`
`:=+™m wo
`
`ANALYZE
`DATA
`DICTIONARY
`
`DEFINE
`JOIN
`CRITERIA
`
`COLUMN
`MODIFY
`CHARACTERISTICS
`
`DEFINE
`DATA GROUPS
`
`SS
`
`402
`
`404
`:
`
`406
`
`408
`
`:
`ADD, DELETE,
`MOD{FY
`COLUMNS
`
`DELETE
`TABLES
`
`APPLICATION
`DELETE
`DATABASES
`
`DELETE
`ES
`
`416
`
`420
`
`4
`
`Ze
`
`404
`:
`
`ASSOCIATE ADD, DELETE,|*~°
`
`WORDS AND
`MODIFY
`PHRASES
`INDICIES
`
`4/0
`
`4/2
`
`4/4
`
`4/6
`
`ESTABLISH
`USER PROFILES
`
`DEFINE DBM
`AUTHORIZATION
`NAMES
`
`ESTABLISH
`SYSTEM
`PROFILE
`
`SUBTYPE
`MAINTENANCE
`
`REFINE
`COLUMN
`REFERENCES
`
`428
`
`430
`
`432
`
`Page 5 of 29
`
`FIG. 4a
`
`Page 5 of 29
`
`
`
`Sheet 5 of 11
`
`=:. > —
`
`S
`
`woa>oOPomnq@—_—mpe
`
`MAINTENANCE
`
`GENERATE
`REPORTS
`
`KNOWLEDGE
`DEBUGGER
`
`5,197,005
`
`490
`
`|¢452
`
`454
`
`. 7°
`
`458
`
`|
`
`FIG. 4b
`
`a
`
`=+
`QU
`448-UMAINTENANCE
`
`U.S. Patent
`
`Mar. 23, 1993
`
`AG4
`
`CHARACTERISTICS |
`
`ECIFY
`APPLICATION
`
`7wmoAPMCcOomovccMm=Om-=mwpomnnronvmmn <
`
`ABBREVIATIONS
`
`DEFINE WORD-
`TO-DATA CLASS
`
`CEASS-TO-TABLE
`BALES
`
`444
`
`4387
`
`44
`
`9
`
`442-4,
`
`446!
`
`a)
`SEEINITIONS
`
`a-Oo>=— oO
`
`won|HE
`
`Page 6 of 29
`
`Page 6 of 29
`
`
`
`U.S. Patent
`
`Mar, 23, 1993
`
`Sheet 6 of 11
`
`5,197,005
`
`ABBREVIATION
`
`won|
`
`EXPRESSION
`
`500
`
`WORD-TO-DATA
`506 \CLASS RULE
`
`CONCEPT
`INDEX
`
`504
`
`
`
`
`NOMINAL DATA
`COLUMN
`
`|
`COLUMN ENTRY
`
`DISPLAY COL.
`
`
`*
`
`JOIN COLUMN
`
`5/0~|
`
`DATA _CLASS-TO-
`TABLE RULE
`
`5/2
`
`NOMINAL DATA
`DEFINITION
`
`522
`
`NOMINAL
`STATUS
`524
`
`NOMINAL
`
`NOMINAL
`DEF. LEX
`
`COLUMN DISPLAY
`COLUMN
`
`526
`
`536
`
`INDEX
`53@ 5|COLS.
`
`INDICIES
`
`540
`
`KEY
`COLS.
`
`KEYS
`
`
`
`
`TAREE DISPLAY
`COLUMN
`TABLE
`ATTRIBUTES
`
`550
`
`552
`
`TABLE ROW
`COUNT
`
`554 |
`
`560
`
`TABLE
`CALENDARS
`
`TABLE
`
`MOS,
`562
`
`DEF.
`564
`
`PERI
`566
`
`556
`
`DATA-
`BASE
`
`FIG. 5
`
`NODE
`
`lessee
`
`Page 7 of 29
`
`Page 7 of 29
`
`
`
`U.S. Patent
`
`Mar. 23, 1993
`
`Sheet 7 of 11
`
`5,197,005
`
`ENTER
`
`600
`
`LOCATE INITIAL
`602
`INDEXED COLUMNS
`com fFSOIN COLUMN DOMAIN
`
`EXPAND CANDIDATE
`SET TO INCLUDE
`606
`
`COLUMN ABSENT ?
`NO
`TEST SUBTYPES
`6/2
`6/0
`YES
`SESTRICTION CONCEPT\./DOES QUERY CONTAIN CONSTRAINT
`ON NEXT COLUMN ? a OR DIRECT REFERENCE ?
`LAST COLUMN ?
`.
`YES
`6
`TEST_DATA-CLASS-T0-
`TREBLE. RULES
`
`—,
`
`YES
`
`608
`
`6/8
`
`6/4
`
`NO
`
`\\_JDELETE
`a COLUMN
`
`
`
`
`
`NO
`
`620
`RULE SPECIFIED FOR
`NEXT DATA-CLASS ?
`
`622
`\,YES_./RULE RETURNS
`TABLES ?
`
`\YES
`
`624
`/DO CANDIDATE COLS.\YES
`INCLUDE, SUCH
`TABLES) ?
`
`LAST DATA-CLASS ?)~626
`YES
`MATCH CHARACTERISTICS
`AND CONSTRAINTS
`
`DELETE ALL COLS.
`
`NOT EROM SUCH
`62g
`SLUABLE(S)
`
`
`SELECT PREFERRED COLUMNS:
`saeAUSCR OR
`
`“CONSTRAINT Ty oRe ouTeUT
`
`-COMPONENT OF PRIMARY KEY
`
`
`FIG. 6a
`
`Page 8 of 29
`
`Page 8 of 29
`
`
`
`U.S. Patent
`
`Mar. 23, 1993
`
`Sheet 8 of 11
`
`5,197,005
`
`FROM
`
`638
`
`MATCH TIME CONSTRAINTS}~640
`sao
`(CONF
`YES
`.|DELETE COLUMNS
`WITHOUT. PERFECT
`
`=z>4 x=
`
`PERFECTTIME MATCH ?
`NO
`
`
`
`
`GROUP COLUMNS
`IF POSSIBLE,
`TO MAKE PERFECT MATCH.
`DELETE
`THOSE NOT USED
`
`TEST MASTER TABLE QUANTIFIERS
`
`648
`
`2
`
`QUANTIFICATION PRESENT ?
`
`NO
`
`yes
`
`
`[SELECT MASTER FILES
`FOR CONCEPTS THAT
`
`INDEX THE PRIMARY KEY
`COLUMN THEREOF
`
`
`WHERE NO PRIMARY
`KEY INDEXED, DO NOT
`
`CHOOSE COLUMNS FROM
`ABOVE TABLES
`
`654
`
`TEST FOR DETAIL FILE
`PREFERENCE
`
`656
`
`DETAIL CONSTRAINT
`ATTACHED TO CONCEPT ?
`
`CHOOSE DETAIL
`FILE COLUMN
`
`NO
`
`658
`
`660
`
`
`Y
`ES
`
`
`
`
`PRESERVE COLUMNS
`FROM TABLES.
`
`SELECTED AT
`
`
`BOX 618
`
`
`
`
`Go To56%
`666
`
`FIG. 6b
`
`Page 9 of 29
`
`Page 9 of 29
`
`
`
`U.S. Patent
`
`Mar. 23, 1993
`
`Sheet 9 of 11
`
`5,197,005
`
`FROM
`664
`
`666
`
`668
`
`TEST FOR SUMMARY
`FILE REFERENCE
`
`670
`
`672
`
`SELECT COLUMNS WITH
`
`
`TIME ATTRIBUTES,
`UNLESS
`DO ANY CANDIDATE COLS. YES|TABLE FOROTHER COLUMN
`
`INCLUDE TIME ATTRIBUTES ?
`WAS SELECTED
`AT BOX 618
`
`
`NO
`
`ELIMINATE FOREIGN-KEY-
`ONLY TABLES
`
`674
`
`680
`678
`676
`
`
`FROM >1 TABLE ?
`KEY FIELD ?
`FROM KEY ONLY TABLES
`
`CANDIDATE COLUMNS\ YES./NON-FOREIGN\ YES ELIMINATE COLUMNS
`
`NO
`
`MINIMIZE TABLES
`
`CANDIDATE COLUMNS
`FROM >1 TABLE ?
`
`NO
`
`ELECT OPTIMAL
`NAVIGATION PATH
`
`682
`
`\_YES
`684
`
`688
`690
`
`CHOOSE COLUMNS FROM TABLE
`WITH HIGHEST SLOT COUNT
`
`686
`
`692
`
`DO CONCEPTS WITH COLUMNS\ YES
`FROM MULTIPLE TABLES
`REMAIN 7
`NO
`
`_ICREATE SET OF ALL
`POSSIBLE COMBINATIONS
`OF TABLES
`
`696
`
`.
`
`\
`
`60 TO
`700
`
`694
`
`60 0
`698
`
`FIG. 6¢
`
`Page 10 of 29
`
`Page 10 of 29
`
`
`
`U.S. Patent
`
`Mar. 23, 1993
`
`Sheet 10 of 11
`
`5,197,005
`
`FROM
`
`698
`
`ERO"
`
`700
`
`MULTIPLE TABLES\ yes|TEST ONLY PATHS
`SELECTED BY
`WITH ALL SUCH
`
`
`
`7g CONCEPT RULE ?f| TABLES
`
`NO
`
`704
`
`ORDER CANDIDATE NAVIGATION
`PATHS BY RULES
`
`706
`
`DELETE COLUMNS FROM TABLES
`NOT IN BEST PATH
`
`bw799
`
`ADD COLUMNS TO BE DISPLAYED
`
`[7/0
`
`NEW TABLES INTRODUCED ?
`
`YES
`
`60 TO
`688
`
`7/2
`
`714
`
`/ 1
`
`7
`
`6
`
`NO
`
`COMPUTE BREAK LEVELS
`
`STORE BREAK LEVELS
`
`|
`
`718
`
`(0720
`
`FIG. 6d
`
`Page 11 of 29
`
`Page 11 of 29
`
`
`
`U.S. Patent
`
`Mar.23, 1993
`
`Sheet 11 of 11
`
`5,197,005
`
`See
`
`
`
`
`jonvuaingmiiy~AVIdsidNosluvaNoo-—“XQ
`
`SAV)SSOADaiybau09yLNIVHISNOO3LNORLLaIYEERNOOY
`
`
`
`
`YagWnNVHdV.ans5onaad3A3INL3Y
`<>=>zepeFinelyLiv
`YY*;™AYVONNOSNOMONEYINVISNOO
`
`JadS-LYOSNOSINWHINCOLiv-8d1\aLN@Liv
`
`
`
`N3MOLNOILISOdaHdLNIVHLSNOO9«TIAN»SVDLeSONaUIBYBLV9SY99OV
`
` ~NyaiaiogasYOINIWHLSNOOainemLivVHdIV
`3GIAiaJOld4O-3MWA-SAVIN30Y43d/ATAICINW
`
`
`
`
`INIVELSNOO"WILLiyaa-NOSHINGNODSSOHD-NON
`oa_s~1d39NO9-YONIW1d39NO9-lez01~oab0L7_-1d3DNODBEspaa
`
`99Vv-1Lod-NOLLONN-HOIH
`
`99V-19d-19LIW-17NOLLONNS-MO
`“LNVIEVA31V93H99SOM
`
`
`
`OveaAV
`
`LOWYLNS
`
`NNJ-MO7
`
`
`
`-WILangALILNYAD
`
`99-19
`
`Page 12 of 29
`
`Page 12 of 29
`
`
`
`
`1
`
`5,197,005
`
`DATABASE RETRIEVAL SYSTEM HAVING A
`NATURAL LANGUAGE INTERFACE
`
`This application includes a microfiche appendix, hav-
`ing 47 fiche with a total of 11,603 frames.
`BACKGROUND OF THE INVENTION
`
`The present invention relates to a database retrieval
`system, and moreparticularly to such a system having a
`natural languageinterface.
`Business managers and staff require information to
`run their companies. Data processing departments of
`companies have been attempting to meet this informa-
`tion need since the early 1950’s. The record keeping of
`most organizations is now computerized, and an abun-
`dance of data of all kinds, often describing transactions
`in minute detail, resides on the central computers of
`these organizations. In theory,all this data is available
`for review by employees of such companies. In prac-
`tice, however, users of such information have faced
`serious obstacles in retrieving the information they
`need.
`A frequent response to a user s request for data from
`a database is that the data is not stored in a way that
`enablesit to be used to meet a user’s need. Additionally,
`the complexity of current database systems requires a
`trained specialist to figure out how the data requested
`by a user can be retrieved from the database. This spe-
`cialist must interpret the user’s request or “query”, de-
`termine exactly whatit is the user is looking for, and
`figure out how to get that information from the data-
`base. Then, oncethedata is retrieved,it must be format-
`ted into a report that the user can use and understand.
`In recent years, a type of database knownas a “rela-
`tional database” has come into widespread use within
`the business
`community. An “entity-relationship”
`modelis often used when mapping a real world system
`to a relational database management system. Theentity-
`relationship mode] characterizes all elements of a sys-
`tem as either an entity (e.g., a person,place, or thing) or
`a relationship between entities. Both constructs are
`represented by the samestructure, referred to as a “‘ta-
`ble”.
`A tableis a collection of data organized into rows and
`columns, and represents a unit of a relational database.
`In an order-entry system, for example, entities will in-
`clude parts and orders. Such information may be repre-
`sented in twodifferent tables. The relationship of which
`parts are requested by an order may be represented by
`a third table.
`Thus, in applying the entity-relationship model, the
`entities of a system are identified and tables are con-
`structed to represent entities. Then, relationships be-
`tween the entities are identified and the current tables
`are extended (or new tables created) to represent these
`relationships. Finally, the attributes of each entity are
`identified and the tables are extended to include such
`attributes. Those skilled in the art are well familiar with
`the application of the entity-relationship modelto rela-
`tional database management systems.
`In recent years, there have been proposals for provid-
`ing a natura! language interface to relational databases.
`An English language interface, for example, would
`enable unskilled users of a database to query the data-
`base for desired information, and receive such informa-
`tion without the need to rely on a trained specialist to
`interpret
`the query, access the database, generate a
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`Page 13 of 29
`
`2
`report, and communicate the report to the end user.
`Thus, a natural language interface would save enor-
`mous time and money for companies using relational
`databases, and would enabie users withlittle or no com-
`puter experience to use a sophisticated database system
`by merely inputting (e.g., via a keyboard) a natural
`language (e.g., English) question.
`An exampleof a natural language interface proposed
`in the past can be found in the article entitled Natural
`Language Interfaces: Benefits. Requirements, State of the
`Art and Applications, by John L. Manferdelli, A.J. East,
`October, 1987. This article describes a system in which
`an English sentence is converted into a grammatical
`structure (“‘parsed”), muchlike a sentence diagram. The
`diagrammed sentence is then translated into a “repre-
`sentation language” that is a hybrid of a semantic net-
`work and first order predicate logic. The representation
`represents time dependent facts, quantified statements,
`tense information and general sets, and is based on con-
`cepts contained in the original English sentence.
`The representation language provided by the prior
`art system referenced above is complex, and noteasily
`understandable even to a skilled user of the system.
`Thus,it is difficult for such a system to be implemented
`as a general purpose interface for any application data-
`base that might be desired. Customization of the inter-
`face to specific application databases was difficult and
`time consuming, and no means were provided for en-
`abling a skilled user to easily comprehend the represen-
`tation language producedbythe natural languageinter-
`face for a given query. Without such means, the build-
`ing and testing of an interface for a particular applica-
`tion is extremely difficult and costly.
`Various other articles have been published concern-
`ing software that is currently available to enable a natu-
`ral language, such as English, to be translated into a
`representation language that can be used by a computer
`system to respond to a natural language query. For
`example, a program known as “McELI”is available for
`this purpose and discussed in Inside Computer Under-
`standing, Schank and Riesbeck, Erlbaum Press, 1981.
`Another program known as “LIFER”is described in
`the article LIFER: A Natural Language Interface Facil-
`ity, by Gary G. Hendrix, SIGART Newsletter. Issue 61,
`1977, pp. 25-26. Each of these programswill translate a
`natural language into another formal syntax, such as a
`representation language. However, to date the repre-
`sentation language syntaxes have been complicated and
`difficult to understand. Therefore, no means have been
`available to enable anyone but the most sophisticated
`computer programmers to utilize such languages in
`providing a natural
`language interface ‘capability to
`desired applications, suchas the retrieval of information
`from a database.
`A particular problem in providing a natural language
`interface for a database resides in enabling the system to
`locate data responsive to a natural language query re-
`gardless of the words used in the original query. A
`primary objection of end users of most prior art data-
`base retrieval systems is that they have to learn the
`names of the database elements,i.c., if the term “salary”
`is used in the database, the end user would haveto use
`the same term in order to retrieve salary information,
`and could not use synonyms such as “wage”, “earns”,
`“makes”, or “pay”. This problem is referred to as the
`“synonym problem”.
`Someproducts have attempted to solve this problem
`by having the system programmers define all of the
`
`Page 13 of 29
`
`
`
`5,197,005
`
`20
`
`25
`
`35
`
`40
`
`45
`
`4
`instantly. No
`to produce a report
`guage-——“SQL”)
`knowledge of SQL, databasefield names, or other tech-
`nical jargon should be required of the end user.
`The present invention provides such a database re-
`trieval system and method for retrieving data from a
`database.
`
`3
`synonyms that can be thought of for each database
`element, and to program these synonymsinto the sys-
`tem. Such a requirement makes the setup procedure of
`a natural
`language interface extremely cumbersome,
`and often impractical.
`Another problem with providing a natural language
`interface for database retrieval stems from the fact that
`SUMMARYOFTHE INVENTION
`the end user does not know where desired information
`In accordance with the present invention, a database
`resides in the database. For example, some information
`retrieval system having a natural language interface is
`would haveto be retrieved from detail-level columns in
`provided. The system comprises a computer processor,
`the database, whereas other data would have to come
`and a natural language interface coupled to the com-
`from summary-level columns. The choice of which
`puter processor. Tool kit means are also provided to
`column(s) to use must be madeby the system,since the
`enable a database developer to create a knowledge base
`end user is unable to specify the data location. This
`containing a structural description and a semantic de-
`problem is referred to as the “data location problem”.
`scription of a database from which data is to be re-
`The assignee of the present application has marketed
`a product in the past which attempted to resolve the
`trieved. First means operatively associated with the
`data location and synonym problems. That product
`computer processor, produces a database-independent,
`included a built-in database expert system containing
`canonical, internal meaning representation ofa natural
`rules to resolve a many-to-one relationship between
`language query entered into the natural language inter-
`words/phrases and concepts, and also to resolve one-to-
`face. Second means, operatively associated with the
`manyrelationships between concepts represented in a
`computer processor, identifies database elements that
`natural language query and database columns. For ex-
`are necessary to satisfy the query represented by the
`ample, words suchas sales, sell, bought, purchases, and
`meaning representation. Third means, operatively asso-
`revenues contained in a query would be mapped to a
`ciated with the computer processor, generates a data-
`concept known as “sales”. Then, the concept “sales”
`base query among database elements identified by the
`would be mapped to the various columnsof a specific
`second means, to enablethe retrieval and aggregation of
`database containing sales information. The specific
`data from a database to satisfy the natural language
`product involved was a turnkey wholesale distribution
`query. Debugging means derive an easily understand-
`application that provideda natural languageinterface to
`able representation from the internal meaning represen-
`a specific database. The natural language interface was
`tation. The external meaning representation enables the
`custom designed for the specific database, and was not
`database developer to comprehendthe internal meaning
`database independent. The system did not provide
`representation, and verify that a natural language query
`meansto enable a skilled user thereofto tailor the inter-
`entered into the natural languageinterface is properly
`face for any other database. The representation lan-
`interpreted to effect the correct retrieval and aggrega-
`guage provided by the natural languageinterface was
`tion of data from the database.
`not easily understandable to a skilled user. Thus, it will
`The meaning representation comprises entities and
`be appreciated that the prior system was not a general
`constraints relating to the entities, without reference to
`purpose databaseretrieval system.
`factual or linguistic relationships between entities that
`It would be advantageous to provide a truly general
`would prevent the meaning representation from being
`purpose natural
`language interface for database re-
`easily understood to a system developer.
`trieval, allowing skilled users (who are not experts in
`The second means of the database retrieval system
`attificial intelligence computer theory and application)
`comprises an expert system coupled to access structural
`to easily custom tailor the interface to a specific applica-
`and semantic description information in the knowledge
`tion database. Such a system should solve both the data
`base, and identifies the database elements from the
`location problem and the synonym problem inherent in
`structural and semantic description information in ac-
`prior art natural languageinterfaces.
`cordance with predefined rules. The rules comprise
`It would be further advantageous for such a system to
`steps for identifying.an optimal set of database elements
`generate a representation language, or “meaning repre-
`to satisfy the query represented by the meaning repre-
`sentation” that is easily understandable, database inde-
`sentation.
`pendent, and canonical (i.e., two different queries hav-
`The structure of a database used in connection with
`ing the same meaning musthave the samefinal meaning
`the system of the present invention may be columnar,
`representation, and two queries having different mean-
`and the semantic description information can comprise
`ings must havedifferent final meaning representations).
`a concept index of database columns. The semantic
`Such a meaning representation should capture, at a
`description can further comprise the time frame, value
`conceptual
`level,
`the information requirement ex-
`unit of measure, and aggregation level of database col-
`pressed in the natural language query.
`umns.
`It would be further advantageous to provide such a
`Means, operatively associated with the computer
`system in whichaskilled user or “developer” builds a
`processor, can be provided for generating a formatted
`knowledge base, pertaining specifically to an applica-
`report containing data responsive to a natural language
`tion database, that enables the system to efficiently and
`query. The debugging meanscan be used by a database
`economically retrieve and report data that is a proper
`developer to view the external meaning representation.
`response to a natural language query entered by an
`A developer can also view a representation ofthe data-
`unskilled user. Such a system should interpret
`the
`base elements identified by the second means. The de-
`query, use the knowledge capturedin its database ex-
`bugging means can further enable the database devel-
`pert system to locate the relevant data tables and col-
`oper to view the database query generated by the third
`umns from a database, and then transparently generate
`means.
`the most efficient code (e.g., structured query lan-
`
`55
`
`65
`
`Page 14 of 29
`
`Page 14 of 29
`
`
`
`5,197,005
`
`35
`
`60
`
`65
`
`5
`In order to identify an optimal set of database ele-
`ments, the system can locate initial indexed columns,
`test subtypes, test data class-to-table rules, match char-
`acteristics and constraints, match time constraints,
`choose master table quantifiers, test for detail file col-
`umns, test for summary file preference, eliminate fo-
`reign-key-only tables, minimize tables, and then select
`the optimal navigation path through the database for
`satisfying a query. Data retrieved from a database in
`response to a natural language query can be displayed
`on a user’s workstation, or printed for later reference.
`In building the knowledge base, the database devel-
`operis able to enter join criteria, column semantics, data
`groupdefinitions, and word and phrase associations into
`the knowledge base. The database developer can also
`build and modify the knowledge base by adding, delet-
`ing, and modifying subtypes; refining column refer-
`ences; adding, deleting, and modifying word-to-data
`class rules; adding, deleting, and modifying data class-
`to-table rules; and adding, deleting, and modifying nom-
`inal data definitions. Much other information can also
`be entered into the knowledge base and manipulated by
`the database developer.
`
`6
`are easily understandable to the developer without such
`knowledge.
`The query system allows users with little or no com-
`puter experience to enter a conversational English (or
`othernatural language) query. A natural languageinter-
`face interprets the query and reduces it into an internal
`meaning representation used by the system, and the
`external meaning representation that is easily under-
`standable to the developer.
`The system also includes a context expert system that
`fills in the implicit meanings of a query. Then, the data
`responsive to the query is located using a database ex-
`pert system that enables retrieval of the data from
`propertables and columnsin the database. The database
`expert system is essentially an artificial intelligence en-
`gine that understands the database through the know!-
`edge base set up by the developer, and through this
`understanding is able to find things in the database.
`Turning now to FIG. 1, the system of the present
`invention is depicted in block diagram form. A user,
`which can be either a developer(a high level user with
`some computer experience) or an enduser (typically a
`manager or administrator with little or no computer
`experience) accesses the system through a workstation
`BRIEF DESCRIPTION OF THE DRAWINGS
`10. Any number of workstations 10 can be provided to
`FIG. 1 is a block diagram of the system of the present
`enable various system developers and end usersto inter-
`invention;
`act with the system.
`FIG.2 is a diagrammatic illustration ofa relational
`A computerprocessor 12, coupled to workstation 10,
`database;
`controls the overall operation of the system. Theele-
`FIG. 3 is a flowchart depicting the translation of a
`ments generally designated 14 in FIG. 1 comprise the
`natural language queryto an internal meaning represen-
`developer too! kit, which is used by system developers
`tation and an external meaning representation which is
`to communicate their knowledge of application data-
`database independent, canonical, and easily understand-
`bases to the system.It is important to recognize that the
`able to a system developer;
`overall system of the present invention is database inde-
`FIGS. 4a and 4b comprise a flowchart of the steps a
`pendent, and can be used with any application database
`system developer takes to create and maintain a know]-
`once a developer builds a knowledge base containing
`edge base in accordance with the present invention;
`information about the application database.
`FIG.5 is an entity-relationship diagram for a knowl-
`The elements generally designated 16 in FIG. 1 com-
`edge base created in accordance with the present inven-
`prise the query system whichis used to process a natural
`tion;
`language query input by an end user, and to extract
`FIGS.6a, 6b, 6c, and 6d comprise a flowchart of the
`relevant information in response to the query.
`column selection process used by the system of the
`A system developer accesses the developertool kit 14
`present invention to identify an optimal set of database
`through a series of windows and menus displayed on
`elements necessary to satisfy a query; and
`workstation 10. In building a knowledge base, the sys-
`45
`FIG. 7 is a semantic network diagram of the various
`tem developer goes throughaseries of steps which are
`concepts which can be included in an internal meaning
`described in detail below in connection with FIGS. 4a
`representation.
`and 4b. Generally, the steps taken by a developer in-
`clude setting up a profile of the end user(s) as indicated
`DETAILED DESCRIPTION OF THE
`INVENTION
`at box 18 in FIG. 1, running a data dictionary analyzer
`20, editing the knowledge base through editor 22, run-
`ning and formatting reports through knowledge base
`reporter 24, running a nominal data analyzer 26, and
`debugging the knowledge base using debugger 28.
`Data dictionary analyzer 20 automatically reads the
`relational database management system (“DBMS”)cat-
`alog for the application database to learn all about the
`database structures such as tables, fields (e., “‘col-
`umns”), and data formats. Knowledge base editor 22 is
`used by the developer to give the system an understand-
`ing of the semantics, or meaning, of the data. For exam-
`ple, knowledge base editor 22 is used to provide the
`knowledge base with information as to how the DBMS
`tables are related to one another, and to define whether
`columnsin the tables contain summary-level or detail-
`level data. Time and otherattributes (i.e., information)
`for the columnsis also entered through the knowledge
`base editor.
`
`The present invention provides a database indepen-
`dent natural
`language interface for information re-
`trieval. Unlike prior art systems, the present system can
`deal with large databases having complex semantics.
`For example, an item such as year-to-date dollars may
`be labelled in several tables of a complex database. The
`system of the present
`invention determines exactly
`whereto get data in response to a specific request. The
`data may come, for example, from a table that contains
`summary-level or detail-level values. The system in-
`cludes two key components; namely, a developer tool
`kit and a query system. The developer tool kit enables a
`system developer to build, test, and maintain a knowl-
`edge base containing information about an application
`database that the system will be used to query. The
`developer does not need to have any expertise in the
`computersciencefield ofartificial intelligence, since the
`system producesexternal meaning representations that
`
`Page 15 of 29
`
`Page 15 of 29
`
`
`
`8
`the internal meaningrepresentation is provided in FIG.
`7. Each line in the network diagram represents an
`“ISA” link. For example ISA link 702 specifies that
`minor-concept 706 is an attribute of concept 704.
`The internal meaning representation 304 is con-
`verted, in accordance with the present invention, to an
`easily understood external meaning representation 308
`by debugger 28 in the developer tool kit. The external
`meaning representation is database independent, canon-
`ical, and easily understandable. The feature of under-
`standability enables a developer to comprehend how
`the natural language interface 34 has interpreted the
`query. With this comprehension,
`the developer can
`refine the knowledgebase, as necessary, to ensure that
`the interpretation of a query by the natural language
`interface will be proper and that as a result, information
`retrieved by a query will properly reflect the intent of
`the end user.
`Debugger 28 derives the external meaning represen-
`tation from the internal meaning representation by find-
`ingall the entities in the internal meaning representation
`and placing them to the left of a colon. All of the con-
`straints associated with the entities in the internal mean-
`ing representation are then located, and placed to the
`right of the colon, to form the external meaning repre-
`sentation. Thus, in a preferred embodimentthe external
`meaning representation takes the form:
`ENTITY: CONSTRAINT
`In the external meaningrepresentation, the hierarchy
`of the constraints in the internal meaning representation
`is ignored, as this information is not pertinent to the
`developer, and its inclusion would defeat the desired
`characteristic of easy understandability. Factual (ie.,
`“real world”) and linguistic relationships between enti-
`ties in the internal meaning representation, that would
`prevent the external meaning representation from being
`easily understood, are also ignored.
`An example of an internal meaning representation,
`wh