`
`(12)
`
`United States Patent
`Manson
`
`(10) Patent N0.:
`(45) Date of Patent:
`
`US 7,085,708 B2
`Aug. 1, 2006
`
`(54) COMPUTER SYSTEM WITH NATURAL
`LANGUAGE TO MACHINE LANGUAGE
`TRANSLATOR
`
`(75) IIIVBIIIOI‘Z Keith S. Manson, Albany, CA (US)
`
`(73) Assignee; Raven?ow, Inc,’ Emeryville, CA (Us)
`
`( * ) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 1095 days.
`
`(21) Appl. N0.: 09/883,693
`
`(22) Filed:
`
`Jun. 18, 2001
`
`2/2000 Suda et a1.
`6,029,123 A
`5/2000 Richardson 6t ?l~
`6,070,134 A
`8/2000 Richardson et 211.
`6,108,620 A
`6,311,150 B1* 10/2001 Ramaswamy et a1. ....... .. 704/1
`OTHER PUBLICATIONS
`
`Pereira, Categorial Semantics and Scoping (1990) Compu
`tations Linguistics V.1, p. 1-10.*
`Uchinami et al., Linguistic model based on the generative
`topological information space, Osaka University (1980), p.
`93-100.*
`Warren et al., Using Semantics in Non-Context-Free Parsing
`of Monta e Grammar, 1982, Ameican Journal of Com u
`811
`P
`tational Linguistics, V.8, No. 3-4, Jul-Dec. 1982, p. 123
`138_*
`
`(65)
`
`Prior Publication Data
`
`* Cited by examiner
`
`US 2004/0181390 A1
`
`Sep. 16, 2004
`
`Related US Application Data
`(60) Provisional application No. 60/235,165, ?led on Sep.
`23’ 2000'
`
`(51) Int‘ Cl‘
`G06F 17/27
`
`(2006.01)
`
`_
`(52) US. Cl. ....... ...... ..: ........................... .. 704/9, 704/1
`(58) Field of ~Classi?cation Search ..............
`704/ 9
`See aPPhCanOn ?le for Complete Search hlstory-
`_
`References Cited
`
`(56)
`
`U.S. PATENT DOCUMENTS
`
`g
`gamlga et 3}‘ """""""" "
`i i
`5’682’539 A * l 0/1997 Carla; 3t 21' """""""" " 704/9
`5,878,385 A
`3/1999 Bralich et a1.
`5,884,302 A *
`3/1999 H0 .............................. .. 707/3
`5,966,686 A 10/ 1999 Heidorn et a1.
`
`Primary ExamineriRichemond Dorvil
`Assistant ExamineriLamont Spooner
`(74)Att0rney, Agent, or F irmilohn Carpenter; Reed Smith,
`LLP
`(57)
`
`ABSTRACT
`
`Presented is a system and method for converting or trans
`1
`~
`~
`~
`~
`~
`atmg expresslons 1n a natural language such as Engllsh mto
`machine executable expressions in a format tanguage_ This
`translatiOn enables a transformation from the SyntactiC Struc_
`tures of a natural language into effective algebraic forms for
`further exact processing. The invention utilizes algorithms
`employing a reduction of sequences of terms de?ned over an
`extensible lexicon into formal syntactic and semantic struc
`tures. This term reduction incorporates both syntactic type
`and semantic context to achieve an effective formal repre
`sentation and interpretation of the meaning conveyed by any
`natural language expresslon'
`
`9 Claims, 7 Drawing Sheets
`
`WWW“!
`
`nmmmdm
`WM
`mm)
`
`:ynba'in
`mm mm?“
`/ mm
`
`M — ‘in?rm-“MM
`/
`WNW)
`1.1.3
`
`22 /
`
`exra-ma/synrm
`
`General System Process and Data Flow
`
`Page 1 of 22
`
`
`
`U.S. Patent
`
`Aug. 1, 2006
`
`Sheet 1 0f 7
`
`US 7,085,708 B2
`
`‘213
`
`1.1.2
`
`:
`$ /1.1
`
`/
`
`1.3
`
`ciient i VDM ,
`
`fem“: ' “us it / server I CPU
`
`+
`
`
`
`5 coiiaterai devices 5
`
`
`
`1.1.1
`
`|
`
`
`
`i 1.4
`
`Figure 1: Computer System Architecture
`
`Page 2 of 22
`
`
`
`U.S. Patent
`
`Aug. 1, 2006
`
`Sheet 2 0f 7
`
`US 7,085,708 B2
`
`input: namlai language text
`
`..................................... .‘
`
`output: text data
`(sequence of
`rare-expressions)
`
`syntactic
`precesslbg
`
`21 .2
`
`output: syntac?c data
`(sequence of
`syntactic complexes}
`
`output: fauna! data
`(sequence oi
`formal exprewions)
`
`text
`pmcessing
`
`2.1 .1
`
`i
`
`I
`
`I
`
`2.1
`
`extemal
`processing
`
`‘
`
`22‘‘
`
`output: external data
`(sequence of
`executable expressions}
`
`operational
`environment
`
`222
`
`external sys rem
`
`2.2
`
`Figure 2: General System Process and Data Flow
`
`Page 3 of 22
`
`
`
`
`
`3 2 1 ~
`
`;
`
`g
`
`s
`
`3.2.3 '
`
`/
`
`
`
`ieagcal msertion / 3 2.1.3 3.2-1.2
`
`
`
`
`
`o
`
`s
`
`
`
`1 3-2-1 J \ 3.2.1 .4 contextua?zatinn
`
`
`
`/s.2.2.2 /3.2.2.3
`
`/
`term
`reduc?un
`
`/ 5
`term
`5
`inveIsiun I
`3
`
`.
`syntactic
`representation 1
`E
`
`\ 3 2 2
`
`U.S. Patent
`
`Aug. 1, 2006
`
`Sheet 3 0f 7
`
`US 7,085,708 B2
`
`_
`
`3.2
`
`
`
`3 2.0 , \
`
`~~~~~
`
`...............
`
`-/
`
`.
`M = A“ ‘Wes ass‘gned?
`( 0 = no I 1 = ya)
`
`
`
`g‘ assignment tyne associatxbn
`
`/3.2.2.1
`
`/
`term
`cnn'eiaticn
`
`telm resalutian
`
`syntactic processing
`
`[3.3.1
`/
`semantic
`
`3
`
`/3.3.2
`/
`formal
`
`/3.3.3
`/
`E
`formal
`
`3'3
`
`
`
`Q reptesentation 3.4.1
`
`
`
`representation 3.4.2 ' ‘ ' ' ' interpretation ' ' ' ' ' u
`
`
`
`
`
`\
`5 \ ‘
`enema!
`reprecentatiuu
`
`\\ '
`extem?
`interpretation
`
`:
`
`3-’;
`
`enema! processing
`
`0.
`
`operational!
`environment
`
`Figure 3: Detailed System Process and Data Flow
`
`Page 4 of 22
`
`
`
`U.S. Patent
`
`Aug.1, 2006
`
`Sheet 4 of 7
`
`US 7,085,708 B2
`
`0)
`2}
`
`2)
`3}
`4)
`5)
`
`6)
`7}
`B)
`9)
`10)
`
`11)
`12)
`
`13)
`24)
`
`15)
`16)
`
`17)
`
`0}
`1)
`2)
`3}
`
`4)
`5}
`6)
`7)
`8}
`
`3)
`
`2e}
`11}
`
`12)
`
`13)
`
`15)
`
`16)
`
`27)
`
`{send,act) = (send, Jextyp(0,send)) ;
`
`act = action
`
`{Bob,pnm) = {Bob, lextyp{d,Bok));
`(an,adj} = {an, lextyp(0,an));
`
`pnm = proper name/male
`adj = adjective
`
`(email,xao) = (email, lexrtyp(0,email));
`
`xao = ambiguous action/object
`
`{asking,ing) = (asking, lextyp(0,asking));
`(him,ppm) = (him, lextyp(0, him);
`(if,xde) = (if, lexeypio,if));
`
`ing = ambiguous participle/gerund
`ppm = personal pronoun/male
`xdce = ambiguous delimiter/conditional
`
`(he,ppm) = (he, lextyp({0,he));
`{is,sob) = (is, lextyp(0,is));
`
`(going,ing) = (going, iextypi0,going));
`{to,xpi) = (to, lextyp(0,to});
`
`{go,act) = (go, lextyp(0,go});
`(to,xpij = (to, lextyp(0,to));
`
`ppm = personal pronoun/male
`sob = state-of-being verb
`
`ing = ambiguous participle/gerund
`xpi = ambiguous preposition/infinitive
`
`act = action
`xpi = ambiguous preposition/infinitive
`
`psm = personal possessive/male
`this,psm) = (his, lextyp(0,his));
`(appointment,xom) = (appointment, lextyp(0,appointment)};
`xom = ambiguous object/modifier
`
`{by,prp) = (by, lextyp(o,by}};
`(himself,pxrm) = (himself, lextyp(0,himself));
`(.,trm) = {.,lextyp(0, .));
`
`prp = preposition
`prm = personal reflexive/male
`trm = termination
`
`Figure 4a: Virtual Type Assignment
`
`(send,act) = (send, lextyp(0,send});
`
`act = action
`
`(Bob,pnm) = (Bob, lextyp{0, Bob) } ;
`{an,adj} = (an, lextyp(0,an)};
`{email,obj} = {email, lextyp{1,email});
`(asking, ptc) = {asking, lextyp(1, asking) };
`(him,ppm} = (him, lextyp(0,him));
`{if,dlp) = (if, Jextyp(0,if));
`
`the,ppm) = (he, lextyp(0,he));
`(is,sob) « (is, lextyp(0,is)};
`
`pnm = proper name/male
`adj = adjective
`obj = object
`ptc = participle
`ppm = personal pronoun/male
`dip = phrase delimiter
`
`ppm = personal pronoun/male
`sob = state-of-being verb
`
`{going,ptc) = {going, lextyp{1, going) );
`(co,inf) = (te, lextyp(2,to)};
`
`pte = participle
`inf = infinitive
`
`act = action
`igo,act} = (go, lextyp(0,go));
`Pprp = preposition
`{to,prp) = (to, lextyp(1,to));
`psm = personal possessive/male
`(his,psm) = (his, lextyp(0,his)});
`{appointment,obj}) = (appointment, lextyp(1,appointment) } ;
`obj = object
`
`iby, pxp} = (by, lextyp(0,by));
`{himsel£,pxm) = (himself, lextyp(0,himsel£));
`{.,trm} = {.,lextyp(6,.));
`
`prp = preposition
`prm = personal reflexive/male
`tra = texmination
`
`Figure 4b: Actual Type Assignment
`
`Page 5 of 22
`
`Page 5 of 22
`
`
`
`U.S. Patent
`
`Aug. 1, 2006
`
`Sheet 5 0f 7
`
`US 7,085,708 B2
`
`Obj
`i
`
`PIP
`i
`i
`prp
`
`obj
`
`trm
`
`I
`obj .
`
`I
`Pr?
`l
`i
`prp
`l
`l
`prp
`l
`
`prp
`
`f
`inf
`i
`
`l
`act
`l
`
`to his appointment by himself
`send Bob an email asking him if he is going to
`90
`Ill!!!
`I
`{ill
`I
`act obj adj obj
`ptc obj dlp obj act ptc inf act prp adj
`1
`f
`: inf act prp
`I?
`11
`inf act: prp
`3!
`f
`inf act
`f
`I]
`inf
`act
`I!
`I
`inf
`-I
`IIII
`. dlp obj act ptc
`I!!!
`~liii
`. dip obj act:
`ptc
`IL____II
`~ll
`i
`, 61;:
`act
`ptc
`?lm;
`'1
`I
`. :11};
`act:
`-l__.________1
`klllili
`act obj adj obj
`ptc obj
`dlp
`EI3_IiEi
`ii! 1%]
`act obj
`obj
`pcc obj
`dlp
`[II it!
`1:‘;
`1
`act Ob]
`obj
`dlp
`I11
`I
`ill
`act obj
`obj
`(1|
`I!
`act obj
`2*:
`1
`act
`!
`
`1
`trm
`;
`|
`tm
`i
`:
`trm
`1
`(
`trm
`i
`I
`trm
`,
`i
`trm
`l
`1
`trm
`i
`
`i
`obj
`I
`E
`obj
`l
`
`i
`act
`l
`
`5
`ptc
`1
`
`I
`act
`
`Figure 5: Term Reduction Sequence
`
`Page 6 of 22
`
`
`
`U.S.
`
`Patent
`
`Aug. 1, 2006
`
`Sheet 6 of 7
`
`US 7,085,708 B2
`
`0)
`
`1)
`
`2)
`
`3)
`
`4)
`
`5)
`
`(send,act} -> (Bob, obi}
`
`{send,act) —> {email,obd) -> {an,adj}
`
`(send,act) — {email,obd) - (asking,ptc) -> (him, obd)
`
`(send,act) —- {email,obd) — (asking,ptc) > (if,dlp) ~~ (is,act) — (he,obs)
`
`(send,act} -— femail,obd) — (asking,ptc) -> (if,dlip) - (is,act) - (going,ptc) ->
`({to,inf) ~ {go,act) > (to,prp) - (appointment,obp) - (his, adj)
`
`(send,act) + (email,obd) - {asking,ptc) ~ (if,dlp) + (is,act) ->» (going,ptc) 7
`{to,inf) - (go,act}) — (by,prp} - (himself, obp)
`
`Figure Ga: Syntactic Dependency Chains
`
`+- (act, send)
`
`+~ (obi, Bob)
`
`|+
`
`|
`
`ii|||Ii]|l|{||
`
`|
`|
`
`||i{]||||]i+
`
`Page 7 of 22
`
`|+
`
`- (obp, himself)
`
`=
`
`{., txrm)
`
`Figure 6b: Syntactic Tree
`
`~- (obd, email)
`
`|+
`
`- (adj,an)
`
`|+
`
`- (ptc, asking)
`
`+- (obd, him}
`
`|+
`
`+ (dlp, if)
`
`J4
`
`+-(act,is}
`
`|+
`
`~ fobs, he)
`
`I+
`
`- (pte, going)
`
`+- {inf, to)
`|
`+- {aet,go}
`
`|+
`
`|
`
`- (pxp, £0)
`
`|cop. appointment)
`|aaj,bie)
`
`I {pxrp, by}
`
`Page 7 of 22
`
`
`
`U.S. Patent
`
`Aug. 1, 2006
`
`Sheet 7 0f 7
`
`US 7,085,708 B2
`
`fmiint
`rnsrep
`synrep
`ExplNL) ----~——> SynINU -—--—~——+ TnsfNU --—---—-+ ExpDGJ
`
`Exp(EL)
`
`semrep
`
`envexcE
`
`Figure 7a: External Representation I interpretation Schema
`
`fmlint
`tnsrep
`synrep
`Exp(NL) ---—+ Syn(NL) -——-—+ TnsiNL} ;-—--——> ExplXL)
`"""""""""""""
`L
`
`protocol
`
`................................................................... .~
`
`88mm”
`
`
`
`pdcod modixuxzxprxu ——-——+ 5mm --—--*;Exp£Eu pcmn
`
`T
`‘FEW
`L envexcE
`
`
`
`: Sem(NL)
`
`fmirep
`
`ModlXU
`
`
`
`. .......................................... Envmu M E
`
`Figure 7b: Protocol Representation / Interpretation Schema
`
`Page 8 of 22
`
`
`
`US 7,085,708 B2
`
`1
`COMPUTER SYSTEM WITH NATURAL
`LANGUAGE TO MACHINE LANGUAGE
`TRANSLATOR
`
`This utility application claims the priority date of provi
`sional application Ser. No. 60/235,165 ?led on Sep. 23, 2000
`With the same title and by the same applicant, Keith Manson.
`
`BACKGROUND OF THE INVENTION
`
`The present invention is directed to a system Which
`translates natural (human) language into an abstract formal
`language. This formal language is explicitly designed to
`serve as a universal template for further translations into a
`comprehensive variety of machine languages Which are
`executable in speci?c operational environments. Extensive
`efforts have been made, many articles have been published,
`and many patents have been issued, all directed toWard the
`goal of providing computers With the capacity to understand
`natural (human) language suf?ciently Well to respond reli
`ably and accurately to directives issued from human users.
`Many companies and research groups, such as AT&T, IBM,
`and Microsoft, and an assortment of academic institutions,
`are presently Working on natural language processing
`(NLP).
`To date, many different approaches have been tried to
`provide a system Which effectively converts natural lan
`guage to a formal language for computer applications. One
`such approach is disclosed in an article published by
`Microsoft Corporation titled “Microsoft Research: Natural
`Language Processing Hits High Gear” dated May 3, 2000.
`The article discloses that Microsoft is heavily focused on a
`database of logical forms, called MindNet (TM), and the
`creation of a machine translation application. It is stated that
`MindNet is an initiative in an area of research called
`“example-based processing”, Whereby a computer processes
`input based on something it has encountered before. The
`MindNet database is created by storing and Weighting the
`semantic graphs produced during the analysis of a document
`or collection of documents. The system uses this database to
`?nd links in meaning betWeen Words Within a single lan
`guage or across languages. These stored relationships
`among Words give the system a basis for “understanding”,
`thereby alloWing the system to respond to natural language
`input. MindNet apparently contains the contents of several
`dictionaries and an encyclopedia to increase its level of
`understanding. Another approach is disclosed in Microsoft
`U.S. Pat. No. 5,966,686. This approach provides a rule
`based computer system for semantically analyZing natural
`language sentences. The system ?rst transforms an input
`sentence into a syntactic parse tree. Semantic analysis then
`applies three sets of semantic rules to create an initial logical
`form graph from this tree. Additional rules provide seman
`tically meaningful labels to create additional logical form
`graph models and to unify redundant elements. The ?nal
`logical form graph represents the semantic analysis of the
`input sentence.
`Yet another, and apparently more common, approach is
`provided by U.S. Pat. No. 5,895,466, Wherein a database
`stores a plurality of ansWers Which are indexed to natural
`language keys. The natural language device receives a
`natural language question over the netWork from a remote
`device and the question is analyZed using a natural language
`understanding system. Based on this analysis, the database
`is then queried and an ansWer is provided to the remote
`device.
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`Applicant is aWare that various other approaches toWard
`providing a conversion from natural language to some
`machine language have been tried. HoWever, the prior art
`has not provided a truly effective conversion system of this
`sort.
`
`SUMMARY OF THE INVENTION
`
`Presented is a system and method for converting or
`translating expressions in a natural language such as English
`into machine executable expressions in a formal language.
`This translation enables a transformation from the syntactic
`structures of a natural language into effective algebraic
`forms for further exact processing. The invention utiliZes
`algorithms employing a reduction of sequences of terms
`de?ned over an extensible lexicon into formal syntactic and
`semantic structures. This term reduction incorporates both
`syntactic type and semantic context to achieve an effective
`formal representation and interpretation of the meaning
`conveyed by any natural language expression.
`The foregoing features and advantages of the present
`invention Will be apparent from the folloWing more particu
`lar description of the invention. The accompanying draW
`ings, listed herein beloW, are useful in explaining the inven
`tion.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 shoWs the hardWare architecture of a computer
`system comprising the natural language converter of the
`present invention;
`FIG. 2 shoWs the general process and data How of the
`inventive system;
`FIG. 3 shoWs a more detailed ?oW diagram for the
`inventive system;
`FIG. 4a shoWs the results of virtual type assignment
`applied to a sample text;
`FIG. 4b shoWs the results of actual type assignment for
`the same text;
`FIG. 5 shoWs the term reduction sequence for a sample
`text;
`FIG. 6a shoWs the sequence of dependency chains for a
`sample text;
`FIG. 6b shoWs the associated syntactic tree for the same
`text; and
`FIG. 7a shoWs the schema of structures and maps
`involved in the external interpretation of a text;
`FIG. 7b shoWs this external interpretation schema as
`controlled by a metasemantic protocol.
`
`BRIEF DESCRIPTION OF THE INVENTION
`
`Refer to FIG. 1 for an overvieW of the system architecture.
`As mentioned above, the inventive system, called METASCRIPT
`(TM), provides a method for translating expressions in a
`natural language such as English into machine executable
`expressions. In the embodiment of the system and method to
`be described, the user inputs text in a natural language
`through some input device to a knoWn computer system
`Which may comprise a standalone computer system, a local
`netWork of computing devices, or a global netWork such as
`the Internet, using Wired land lines, Wireless communica
`tion, or some combination thereof, etc. This computer sys
`tem includes memory for storing data, and a data processor.
`The text may be entered into the client device or local VDM
`(Video Display Monitor) (1.1) by any suitable means, such
`as direct input via keyboard (1.1.1), voice input via speech
`
`Page 9 of 22
`
`
`
`US 7,085,708 B2
`
`3
`recognition means (an SR system) (1.1.2), or indirect input
`via optical scanning (an OCR system) (1.1.3). The natural
`language text input to the system is passed along the netWork
`or local bus (1.3) to a server or local CPU (Central Process
`ing Unit) (1.2) Where it is processed in accordance With the
`inventive method and system. This processed output of the
`system is then provided to the system for distribution to the
`original input device (1.1), or to other collateral devices
`(1.4) Which may be one or more digital computers, mobile
`devices, etc. The inventive system thus comprises a natural
`language interface to any suf?ciently capable digital envi
`ronment.
`Refer noW to FIG. 2 for an overvieW of the process and
`data How of the inventive system. The invention Will be
`subsequently discussed in more detail herein beloW. Natural
`language text input is entered by the user (2.0) into the
`internal system (2.1) by means of a text processing module
`(2.1.1) Which parses the text. The output of the text pro
`cessing module comprises a parsed sequence of preexpres
`sions Which is entered into the syntactic processing module
`(2.1.2) Which provides syntactic type information, estab
`lishes proper syntactic dependencies betWeen terms in
`expressions, and represents these expressions as complexes
`in a syntactic algebra. The output of the syntactic processing
`module, comprising a sequence of these syntactic com
`plexes, is entered into the semantic processing module
`(2.1.3) in order to achieve a semantic interpretation of the
`input text. The output of the semantic processing module,
`comprising a formal interpretation of the input text, is
`entered into an external system by means of the external
`processing module (2.2.1), Which ?nally provides a
`sequence of executable expressions derived from the input
`text for use in a speci?c operational environment (2.2.2).
`As noted above, the means for providing text input to the
`system, such as through a keyboard, scanner, or speech
`recognition system, are Well knoWn in the art and are
`commercially available. Another standard component of the
`present system is a text parser, construed here in an
`extremely narroW sense as a limited process restricted to
`partitioning text strings into syntactic subcomponents such
`as paragraphs, sentences, and Words. As such, the text parser
`discussed herein does not provide farther linguistic infor
`mation such as grammatical types, syntactic dependencies,
`semantic import, etc. Such limited text parsers are standard
`components of any natural language processing system, and
`exceedingly Well knoWn in the art. Yet another component in
`the present system Which plays a relatively standard role is
`the lexicon or “electronic dictionary”. In general, lexicons
`are also Well knoWn in the art and are discussed in many
`patents including US. Pat. Nos. 5,371,807; 5,724,594;
`5,794,050; and 5,966,686. HoWever, the notion and function
`of “virtual” types, Which play a signi?cant syntactic catego
`riZation role in the passive speci?cation of lexical terms, and
`hence strongly contribute to the de?nition of the particular
`lexicon used in the inventive system, are not standard, and
`so require careful description. On the other hand, since text
`input devices, text parsers, and their operation are so Well
`knoWn, they Will not be further described in detail herein.
`Refer noW to FIG. 3, Which shoWs more details of the
`inventive system. The components, modules, and submod
`ules of the inventive system are enumerated for convenient
`reference so that the operation and application of the system
`and method may be described in detail.
`As mentioned above, natural language text is entered by
`the user (3.0) into the text input submodule (3.1.1) of the text
`processing module (3.1) via any suitable means including a
`keyboard or a speech recognition system. For the purposes
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`of this discussion, the user input signal is simply some
`linguistic data stream Which is digitiZed into a string of
`ASCII characters. This ASCII string is the input text.
`In order to clarify the folloWing discussion, it is helpful to
`note that any natural language text is typically organiZed into
`a sequence of paragraphs, each of Which is a sequence of
`sentences, each of Which is a sequence of Words, each of
`Which is a sequence of characters (alphanumeric symbols).
`All of this nested syntactic structure must be taken into
`account if an effective interpretive analysis is to be achieved.
`The role of the text parser is to determine and then present
`these nested sequential structures to the system for further
`processing. Thus in general, the adequate output of the text
`parser is a sequence of sequences of sequences of sequences
`of ASCII characters. This level of generality, hoWever, tends
`to obscure the basic points of any useful description of the
`inventive system, so a technical compromise is adopted
`herein, Whereby any text is considered to comprise a
`sequence of sentences, or more properly, of expressions,
`each of Which comprises a sequence of Words. Until recog
`niZed by the system as a meaningful unit of linguistic
`analysis, hoWever, any such Word in a text is simply treated
`as a partitioned substring of the input text string. Thus the
`proper output of the text parser is considered here to be a
`sequence of sequences of “pretokens”, Where a pretoken is
`a text fragment Which is a candidate for a Word, i.e. anASCII
`(sub)string presented for recognition as a system “token”.
`The system lexicon is a lexicographically ordered list of
`such tokens (With associated type and reference data), and
`recognition by the system of a pretoken as an actual token
`is simply a matter of exact string comparison.
`Accordingly, the output of the text parser (3.1.2) is a
`sequence of sequences of pretokens, or sequence of “pre
`expressions”, Which is then passed to the type assignment
`submodule (3.2.1.1) of the type association submodule
`(3.2.1), Where syntactic processing is initiated. Each preto
`ken is checked against the system lexicon (3.2.0) for its
`status as a recogniZed lexical token. If a pretoken is recog
`niZed, i.e. if the string comprising that pretoken is included
`in the lexicon as an actual token (With associated syntactic
`and semantic data), then it is assigned a lexically associated
`syntactic type. The system determines at decision node
`(3.2.1.2) Whether all the pretokens from the entered text
`have been recogniZed as system tokens. If the determination
`is negative, then as indicated by the “no” connection to the
`lexical insertion submodule (3.2.1.3), the user is given the
`option to add the unrecogniZed pretokens to the system as
`tokens With associated type and reference data, i.e. to insert
`neW terms into the lexicon, for further processing. On the
`other hand, if the determination is af?rmative, then the
`resulting sequence of sequences of lexically typed tokens, or
`sequence of “virtual” expressions, is passed along the “yes”
`connection to the type contextualiZation submodule
`(3.2.1.4). This submodule initiates a second order type
`assignment Which uses the initial (or virtual) lexical type
`assignments as data for a contextual process Which may
`reassign these initial types depending on the relative syn
`tactic roles of the tokens in the virtual expressions being
`processed. Upon complete (re)assignment of appropriate
`types to tokens, each virtual expression is promoted to an
`“actual” expression, and each token/type pair becomes a
`fully functional lexical term With associated semantic data.
`Thus the output of the type association submodule (3.2.1)
`of the syntactic processing module (3.2) comprises a
`sequence of (actual) expressions, and is passed to the term
`correlation submodule (3.2.2.1) of the term resolution mod
`ule (3.2.2). The output of this submodule is a sequence of
`
`Page 10 of 22
`
`
`
`US 7,085,708 B2
`
`5
`sequences of fully correlated lexical terms, Which is then
`entered into the term reduction submodule (3.2.2.2), Wherein
`proper syntactic dependencies betWeen terms in an expres
`sion are established by means of a type reduction matrix.
`The output of this submodule is a sequence of sequences of
`reduction links, Which is entered into the term inversion
`submodule (3.2.2.3), Wherein these reduction links are used
`to construct syntactic trees, each tree representing a pro
`cessed expression. The resulting sequence of syntactic trees
`is passed to the syntactic representation submodule (3.2.3),
`Wherein each expression is then represented as a syntactic
`complex, i.e. a (usually composite) term in the syntactic
`algebra.
`Semantic processing (3.3) is initiated in the semantic
`representation submodule (3.3.1), Wherein the input
`sequence of syntactic complexes from the syntactic process
`ing module (3.2) is represented as a full semantic complex,
`i.e. a structure of internal objects in the semantic algebra.
`This semantic complex is then passed to the formal repre
`sentation submodule (3.3.2), Wherein the input semantic
`complex is represented as a formal structure adhering to a
`fundamental transaction paradigm. This formal semantic
`model is then combined With the sequence of syntactic
`complexes output from the syntactic processing module to
`form the input to the formal interpretation submodule
`(3.3.3), Wherein a sequence of formal expressions is con
`structed as an interpretation of the presented syntactic and
`semantic data.
`In addition, the output of the formal representation sub
`module is passed to the external representation submodule
`(3.4.1) of the external processing module (3.4), Wherein a
`speci?c external representation appropriate for the formal
`semantic data presented is identi?ed. This external repre
`sentation is combined With the sequence of formal expres
`sions output from the formal interpretation submodule to
`form the input to the external interpretation submodule
`(3.4.2), Wherein a sequence of executable expressions is
`constructed accordingly for ultimate processing in the
`appropriate operational environment (3.5).
`
`20
`
`25
`
`30
`
`35
`
`DETAILED DESCRIPTION OF THE
`INVENTION
`
`Introduction:
`METASCRIPT is a translation from a natural language into an
`executable formal language. This translation is essentially a
`transformation from the syntactic structures of natural lan
`guage into effective algebraic forms suitable for further
`processing. The formal semantics Which ?nally determines
`the ensuing interpretations and executions of these formal
`expressions in external operational environments is object
`oriented.
`The fundamental algorithm upon Which METASCRIPT is
`based employs a reduction to formal syntactic structures
`over terms de?ned in an extensible lexicon. This term
`reduction incorporates both syntactic type and semantic
`context to achieve an effective formal representation and
`interpretation of the meaning conveyed by any natural
`language expression. Extensibility of the lexicon under
`speci?c user direction provides the capacity for the system
`to expand its knowledge of vocabulary and usage, and
`consequently, offers an effective mechanism under user
`control for establishing de?nite incremental enhancements
`to the system’s linguistic capabilities, hence substantially
`increasing the system’s familiarity With (and competence in)
`particular operational environments. Put simply, the system
`
`50
`
`55
`
`60
`
`65
`
`6
`learns as it goes. In addition, any desired level of syntactic
`disambiguation is attainable by increasing the local dimen
`sionality of the underlying reduction matrix, though this
`feature is part of the underlying algorithm, and therefore
`independent of user modulation.
`It should be noted that METASCRIPT is not a speech recog
`nition system. Instead, it is a fully capable natural language
`interpreter. Speci?cally, METASCRIPT translates natural lan
`guage expressions into expressions in a formal language
`associated With an abstract netWork protocol. A more
`detailed account of this process folloWs.
`
`Notation:
`Standard mathematical notation is used to clarify the
`presentation of certain technical features of the system. In
`particular, the folloWing set-theoretical notation appears
`throughout this discussion:
`a) a set is a collection of things, called elements. For
`example, N:{0,l,2,3, .
`.
`. } is the set of natural
`numbers.
`Note: In general, a set A is most conveniently deter
`mined by some property P of its elements, indicated
`by use of so-called “set-builder notation” as A:{x|P
`(x)}?he set of things x satisfying property P.
`b) the expression ‘xeA’ indicates that the thing x is an
`element of the set A
`c) the expression ‘Ag B’ indicates that the set A is a
`subset of the set B, i.e. that every element of A is an
`element of B as Well
`d) a map (or function) is a relation betWeen tWo sets A,B
`such that each element in A is assigned a unique
`element in B. The expression ‘f: A—>B’ indicates that f
`is a map from the set A to the set B, i.e. f assigns a
`unique element y:f(x)eB to each element xeA. The
`composition of maps f: A—>B and g: B—>C on sets
`A,B,C is the map hIgof: AQC de?ned such that
`h(x):g(f(x))eC for any xeA.
`Note: A program is a function Which maps input onto
`output in an effective manner, i.e. by means of a ?nite,
`discrete, deterministic procedure; in fact, any process or
`procedure is effective precisely to the extent that it is
`executable as a program of this sort.
`e) for any sets A,B, the Cartesian product A><B consists of
`all pairs (x,y) such that xeA and yeB, i.e. A><B:{(x,y)
`lxeA, yeB}
`f) for any sets A,B the union AUB is the set consisting of
`all elements x such that xeA or xeB, i.e. AUB:{x|xeA
`or xeB}; for any collection C:{A|0 i j in} of sets Aj for
`some neN, the overall union UC is the set consisting of
`unions over all sets A], i.e. UC:U{A].|0§j§n}:
`AOU .
`.
`. UAW
`g) for any algebras A,B and representation f: AQB, the
`correlated tensor product AfB is the distinguished sub
`set of A><B Which consists of all pairs (x,f(x)) for xeA,
`i.e. AfB:{(x,f(x))|xeA}?he graph of f; for an implicit
`representation, the map subscript may be omitted, i.e.
`ABIAfB for some implicit f: A—>B
`h) for any set A, Seq(A) is the set of ?nite sequences from
`A, i.e. Seq(A):{(aO, .
`.
`. ,an)|ajeA, neN}
`
`De?nitions:
`language: a structure over the folloWing components:
`a) alphabet: a set of basic symbols
`b) punctuation symbols: a set of basic symbols disjoint
`from the alphabet
`c) Words: admissible sequences of basic symbols
`d) punctuations: admissible sequences of punctuation
`symbols
`
`Page 11 of 22
`
`
`
`US 7,085,708 B2
`
`7
`e) expressions: admissible sequences of Words and/or
`punctuations
`f) sentences: complete expressions
`g) syntax: a speci?cation of Which
`sequences of basic symbols are admissible as Words
`sequences of punctuation symbols are admissible as
`punctuations
`sequences of Words and/or punctuations are admissible
`as expressions
`expressions are admissible as sentences
`g) semantics: a scheme of interpretation over Words
`Whereby expressions acquire meaning With respect to
`certain external structures
`A number of languages enter into this discussion:
`1) natural language: any of the human languages in
`current use, eg English, each characterized by an
`informal, and hence notoriously ambiguous, syntax and
`semantics
`2) formal language: a highly structured language, usually
`mathematical in origin and use, characteriZed by a
`uniquely readable, recursive syntax and an extensional,
`usually ?rst-order semantics; in short, a language for
`Which the syntax and semantics is effectively unam
`biguous
`3) object language: a formal language Which is interpret
`able relative to a class of extensional structures, i.e. a
`formal language With an object-oriented semantics
`4) protocol language: a formal language Which mediates
`transactions betWeen addressable nodes on a netWork
`5) executable language: a formal, programmable language
`Which encodes instructions directly implementable by a
`suitably capable machine such as a computer
`system: the integrated process Which manifests METASCRIPT,
`and Which may be implemented as softWare running on
`any programmable device
`string: a sequence of ASCII characters
`text: a string presented to the system as the fundamental unit
`of initial input
`parser: a process Which partitions texts into sequences of
`sequences o