throbber
UNIX Programming Tools
`
`O'REILLY®
`
`John R. Levine,
`Tony Mason & Doug Brown
`
`Page 1 of 30
`
`Instacart, Ex. 1013
`
`

`

`lex& yacc
`by John R. Levine, Tony Mason and Doug Brown
`
`Copyright© 1990, 1992 O'Reilly & Associates, Inc. All rights reserved.
`Printed in the United States of America.
`
`Editor: Dale Dougherty
`
`Printing History:
`May 1990:
`
`First Edition.
`
`January 1991:
`
`Minor corrections.
`
`October 1992:
`
`Second Edition. Major revisions.
`
`February 1995:
`
`Minor corrections.
`
`Nutshell Handbook, the Nutshell Handbook logo, and the O'Reilly logo are
`registered trademarks and The Java Series is a trademark of O'Reilly & Associates,
`Inc. Many of the designations used by manufacturers and sellers to distinguish their
`products are claimed as trademarks. Where those designations appear in this book,
`and O'Reilly and Associates, Inc. was aware of a trademark claim, the designations
`have been printed in caps or initial caps.
`
`While every precaution has been taken in the preparation of this book, the publisher
`assumes no responsibility for errors or omissions, or for damages resulting from the
`use of the information contained herein.
`
`ISBN: 1-56592-000-7
`[M)
`
`[l/00]
`
`Page 2 of 30
`
`

`

`lex&yacc
`
`Availabili"ty of Lex and Yacc
`Lex and yacc were both developed at Bell Laboratories in the 1970s. Yacc
`was the first of the two, developed by Stephen C. Johnson. Lex was
`designed by Mike Lesk and Eric Schmidt to work with yacc. Both lex and
`yacc have been standard UNIX utilities since 7th Edition UNIX. System V
`and older versions of BSD use the original AT&T versions, while the newest
`version of BSD uses flex (see below) and Berkeley yacc. The articles writ(cid:173)
`ten by the developers remain the primary source of information on lex and
`yacc.
`
`The GNU Project of the Free Software Foundation distributes bison, a yacc
`replacement; bison was written by Robert Corbett and Richard Stallman.
`The bison manual, written by Charles Donnelly and Richard Stallman, is
`excellent, especially for referencing specific features. Appendix D dis(cid:173)
`cusses bison.
`
`BSD and GNU Project also distribute flex (Past Le.xical Analyzer Generator),
`"a rewrite of lex intended to right some of that tool's deficiencies," accord(cid:173)
`ing to its reference page. Flex was originally written by Jef Poskanzer; Vern
`Paxson and Van Jacobson have considerably improved it and Vern currently
`maintains it. Appendix E covers topics specific to flex.
`
`There are at least two versions of lex and yacc available for MS-DOS and
`OS/2 machines. MKS (Mortice Kem Systems Inc.), publishers of the MKS
`Toolkit, offers lex and yacc as a separate product that supports many PC C
`compilers. MKS lex and yacc comes with a very good manual. Appendix F
`covers MKS lex and yacc. Abraxas Software publishes PCYACC, a version of
`lex and yacc which comes with sample parsers for a dozen widely used
`programming languages. Appendix G covers Abraxas' version lex and
`yacc.
`
`Sample Programs
`The programs in this book are available free from UUNET (that is, free
`except for UUNET's usual connect-time charges). If you have access to
`UUNET, you can retrieve the source code using UUCP orFrP. For UUCP, find
`a machine with direct access to UUNET, and type the following command:
`uucp uunet\ !~/nutshell/lexyacc/progs. tar .z yourbosf\r/youmamel
`The backslashes can be omitted if you use the Bourne shell (sh) instead of
`the C shell (csh). The file should appear some time later (up to a day or
`more) in the directory /usr/spooVuucppubliclyoumame. If you don't have
`
`xx
`
`Page 3 of 30
`
`

`

`Preface
`
`an account but would like one so that you can get electronic mail, then
`contact UUNET at 703-204-8000.
`
`To use ftp, find a machine with direct access to the Internet. Here is a
`sample session, with commands in boldface.
`% ftp ftp.oreilly.com
`Connected to ftp.oreilly.com.
`220 FTP server {Version 5.99 Wed May 23 14:40:19 EIJl' 1990) ready.
`Name {ftp.oreilly.com:youmame): anonymous
`331 Guest login ok, send ident as password.
`Password: ambar@ora.com ( use your user name and host here)
`230 Guest login ok, access restrictions apply.
`ftp> cd published/oreilly/nutshell/lexyacc
`250 CWD command successful.
`ftp> binary {you must specify binary transfer for compressed files)
`200 Type set to I.
`ftp> get progs.tar.z
`200 PORT command successful.
`150 Opening BINARY mode data connection for progs.tar.z.
`226 Transfer complete.
`ftp> quit
`221 Goodbye.
`%
`
`The file is a compressed tar archive. To extract files once you have
`retrieved the archive, type:
`% zcat progs.tar.z I tar xf -
`
`System V systems require the following tar command instead:
`% zcat progs.tar.z I tar xof -
`
`Conventions Used in This Handbook
`The following conventions are used in this book:
`
`Bold
`
`Italic
`
`is used for statements and functions, identifiers, and program
`names.
`
`is used for file, directory, and command names when they
`appear in the body of a paragraph as well as for data types
`and to emphasize new terms and concepts when they are
`introduced.
`
`xxi
`
`Page 4 of 30
`
`

`

`lex&yacc
`
`Constant
`Width
`
`is used in examples to show the contents of files or the out-
`put from commands.
`
`constant
`Bold
`
`is used in examples to show command lines and options that
`you type literally.
`
`Quotes
`
`are used to identify a code fragment in explanatory text. Sys(cid:173)
`tem messages, signs, and symbols are quoted as well.
`
`%
`
`is the Shell prompt.
`
`[]
`
`surround optional elements in a description of program syn(cid:173)
`tax. (Don't type the brackets themselves.)
`Acknowledgments
`This first edition of this book began with Tony Mason's MGL and SGL com(cid:173)
`pilers. Tony developed most of the material in this book, working with
`Dale Dougherty to make it a "Nutshell." Doug Brown contributed Chapter
`8, Yacc Ambiguities and Conflicts. Dale also edited and revised portions of
`the book. Tim O,Reilly made it a better book by withholding his editorial
`blessing until he found what he was looking for in the book. Thanks to
`Butch Anton, Ed Engler, and Mike Loukides for their comments on technical
`content. Thanks also to John W. Lockhart for reading a draft with an eye
`for stylistic issues. And thanks to Chris Reilley for his work on the graphics.
`Finally, Ruth Terry brought the book into print with her usual diligence and
`her sharp eye for every editorial detail. Though she was trying to work odd
`hours to also care for her family, it seemed she was caring for this book all
`hours of the day.
`
`For the second edition, Tony rewrote chapters 1 and 2, and Doug updated
`Chapter 8. John Levine wrote Chapters 3, 5, 6, 7, and most of the appen(cid:173)
`dices, and edited the rest of the text. Thanks to the technical reviewers, Bill
`Burke, Warren Carithers, Jon Mauney, Gary Merrill, Eugene Miya, Andy
`Oram, Bill Torcaso, and particularly Vern Paxson whose detailed
`page-by-page suggestions made the fine points much clearer. Margaret
`Levine Young's blue pencil (which was actually pink) tightened up the text
`and gave the book editorial consistency. She also compiled most of the
`index. Chris Reilly again did the graphics, and Donna Woonteiler did the
`final editing and shepherded the book through the production process.
`
`xxtt
`
`Page 5 of 30
`
`

`

`_ In __ this chapter:
`• Tbe: Simplest Lex
`Program
`. • -Recognizing Words
`with Lex
`• Grammars
`• Running Lex and
`Yacc
`• Lex vs. Hand-written
`Lexers _
`• Exercises
`
`Lex and Yacc
`
`Lex and yacc help you write programs that transform structured input. This
`includes an enormous range of applications-anything from a simple text
`search program that looks for patterns in its input file to a C compiler that
`transforms a source program into optimized object code.
`
`In programs with structured input, two tasks that occur over and over are
`dividing the input into meaningful units, and then discovering the relation(cid:173)
`ship among the units. For a text search program, the units would probably
`be lines of text, with a distinction between lines that contain a match of the
`target string and lines that don't. For a C program, the units are variable
`names, constants, strings, operators, punctuation, and so forth. This divi(cid:173)
`sion into units ( which are usually called tokens) is known as lexical analy(cid:173)
`sis, or lexing for short. Lex helps you by taking a set of descriptions of pos(cid:173)
`sible tokens and producing a C routine, which we call a lexical analyzer, or
`a lexer, or a scanner for short, that can identify those tokens. The set of
`descriptions you give to lex is called a lex specification.
`
`The token descriptions that lex uses are known as regular expressions,
`extended versions of the familiar patterns used by the grep and egrep com(cid:173)
`mands. Lex turns these regular expressions into a form that the lexer can
`use to scan the input text extremely fast, independent of the number of
`expressions that it is trying to match. A lex lexer is almost always faster
`than a lexer that you might write in C by hand.
`
`As the input is divided into tokens, a program often needs to establish the
`relationship among the tokens. A C compiler needs to find the expressions,
`statements, declarations, blocks, and procedures in the program. This task
`is known as parsing and the list of rules that define the relationships that
`
`1
`
`Page 6 of 30
`
`

`

`lex&yacc
`
`the program understands is a grammar. Yacc takes a concise description
`of a grammar and produces a C routine that can parse that grammar, a
`parser. The yacc parser automatically detects whenever a sequence of
`input tokens matches one of the rules in the grammar and also detects a
`syntax error whenever its input doesn't match any of the rules. A yacc
`parser is generally not as fast as a parser you could write by hand, but the
`ease in writing and modifying the parser is invariably worth any speed loss.
`The amount of time a program spends in a parser is rarely enough to be an
`issue anyway.
`
`When a task involves dividing the input into units and establishing some
`relationship among those units, you should think of lex and yacc. (A
`search program is so simple that it doesn't need to do any parsing so it uses
`lex but doesn't need yacc. We'll see this again in Chapter 2, where we
`build several applications using lex but not yacc.)
`
`By now, we hope we've whetted your appetite for more details. We do not
`intend for this chapter to be a complete tutorial on lex and yacc, but rather
`a gentle introduction to their use.
`
`The Simplest Lex Program
`This lex program copies its standard input to its standard output:
`
`%%
`.l\n
`%%
`
`EX::HO;
`
`It acts very much like the UNIX cat command run with no arguments.
`
`Lex automatically generates the actual C program code needed to handle
`reading the input file and sometimes, as in this case, writing the output as
`well.
`
`Whether you use lex and yacc to build parts of your program or to build
`tools to aid you in programming, once you master them they will prove
`their worth many times over by simplifying difficult input handling prob(cid:173)
`lems, providing more easily maintainable code base, and allowing for eas(cid:173)
`ier "tinkering" to get the right semantics for your program.
`
`2
`
`Page 7 of 30
`
`

`

`Lex and Yacc
`
`Recognizing Words with Lex
`Let's build a simple program that recognizes different types of English
`words. We start by identifying parts of speech (noun, verb, etc.) and will
`later extend it to handle multiword sentences that conform to a simple
`English grammar.
`
`We start by listing a set of verbs to recognize:
`
`is
`was
`do
`would
`has
`
`am
`be
`does
`should
`have
`
`are
`being
`did
`can
`had
`
`were
`been
`will
`could
`go
`
`Example 1-1 shows a simple lex specification to recognize these verbs.
`
`&ample 1-1: Word recognizercb1-02.l
`
`%{
`I*
`* this sample dem::mstrates (vei:y) s:inple recognition:
`* a verb/not a verb.
`*/
`
`[\t ]+
`
`is I
`am I
`are I
`were I
`was I
`be I
`being I
`been I
`do I
`does I
`did I
`will I
`would I
`should I
`can I
`could I
`has I
`have I
`had I
`go
`
`/* ignore whitespace*/
`
`{ printf("%s: is a verb\n•, yytext); }
`
`3
`
`Page 8 of 30
`
`

`

`lex &yacc
`
`&ample 1-1: Word recognizercbl-02.l (continued)
`
`[a-zA-Z]+
`
`{ printf ("%s: is not a verb\n", yytext);
`
`{ ECHO; /* normal default a.rt:JWaY */}
`
`.l\n
`%%
`
`main()
`{
`
`yylex();
`
`Here's what happens when we compile and run this program. What we
`type is in bold.
`% examplel
`did :r have fun?
`did: is a verb
`I: is not a verb
`have: is a verb
`fun: is not a verb
`?
`~D
`%
`
`To explain what's going on, let's start with the first section:
`
`%{
`I*
`* This sample demonstrates ver:y sinple recognition:
`* a verb/not a verb.
`*/
`
`This first section, the definition section, introduces any initial C program
`code we want copied into the final program. This is especially important if,
`for example, we have header files that must be included for code later in
`the file to work. We surround the C code with the special delimiters"%{"
`and "%}." Lex copies the material between "%{" and "%}" directly to the
`generated C file, so you may write any valid C code here.
`
`In this example, the only thing in the definition section is some C com(cid:173)
`ments. You might wonder whether we could have included the comments
`without the delimiters. Outside of "%{" and "%}", comments must be
`indented with whitespace for lex to recognize them correctly. We've seen
`some amazing bugs when people forgot to indent their comments and lex
`interpreted them as something else.
`
`4
`
`Page 9 of 30
`
`

`

`lex and Yacc
`
`The %% marks the end of this section.
`
`The next section is the rules section. Each rule is made up of two parts: a
`pattern and an action, separated by whitespace. The lexer that lex gener(cid:173)
`ates will execute the action when it recognizes the pattern. These patterns
`are UNIX-style regular expressions, a slightly extended version of the same
`expressions used by tools such as grep, sed, and ed. Chapter 6 describes all
`the rules for regular expressions. The first rule in our example is the fol(cid:173)
`lowing:
`
`[\t ]+
`
`/* ignore whitespace*/;
`
`The square brackets, "[ ]", indicate that any one of the characters within the
`brackets matches the pattern. For our example, we accept either "\ t" (a tab
`character) or" "(a space). The"+" means that the pattern matches one or
`more consecutive copies of the subpattern that precedes the plus. Thus,
`this pattern describes whitespace (any combination of tabs and spaces.)
`The second part of the rule, the action, is simply a semicolon, a do-nothing
`C statement. Its effect is to ignore the input.
`
`The next set of rules uses the " I " ( vertical bar) action. This is a special
`action that means to use the same action as the next pattern, so all of the
`verbs use the action specified for the last one.*
`
`Our first set of patterns is:
`
`is I
`am I
`are I
`were I
`was I
`be I
`being I
`been I
`do I
`does I
`did I
`should
`can I
`could I
`has I
`have I
`had I
`go
`
`{ printf(•%s: is a verb\n•, yytext); }
`
`*You can also use a vertical bar within a pattern, e.g., fool bar is a pattern that matches either
`the string "foo" or the string "bar." We leave some space between the pattern and the vertical
`bar to indicate that "bar" is the action, not part of the pattern.
`
`5
`
`Page 10 of 30
`
`

`

`lex&yacc
`
`Our patterns match any of the verbs in the list. Once we recognize a verb,
`we execute the action, a C printf statement. The array yytext contains the
`text that matched the pa~ern. This action will print the recognized verb fol(cid:173)
`lowed by the string": is a verb\n".
`
`The last two rules are:
`{ printf(•%s: is not a verb\n•, yytext);
`
`[a-zA-Z]+
`
`.l\n
`
`{ EX::HO; /* nonnal default a.rryway */}
`
`The pattern "[a-zA-Z]+" is a common one: it indicates any alphabetic string
`with at least one character. The"-" character has a special meaning when
`used inside square brackets: it denotes a range of characters beginning
`with the character to the left of the"-" and ending with the character to its
`right. Our action when we see one of these patterns is to print the matched
`token and the string": is not a verb\n".
`
`It doesn't take long to realize that any word that matches any of the verbs
`listed in the earlier rules will match this rule as well. You might then
`wonder why it won't execute both actions when it sees a verb in the list.
`And would both actions be executed when it sees the word "island," since
`"island" starts with "is"? The answer is that lex has a set of simple disambi(cid:173)
`guating rules. The two that make our lexer work are:
`1. Lex patterns only match a given input character or string once.
`2. Lex executes the action for the longest possible match for the current
`input. Thus, lex would see "island" as matching our all-inclusive rule
`because that was a longer match than "is."
`
`If you think about how a lex lexer matches patterns, you should be able to
`see how our example matches only the verbs listed.
`
`The last line is the default case. The special character"." (period) matches
`any single character other than a newline, and "\n" matches a newline
`character. The special action ECHO prints the matched pattern on the out(cid:173)
`put, copying any punctuation or other characters. We explicitly list this
`case although it is the default behavior. We have seen some complex
`lexers that worked incorrectly because of this very feature, producing occa(cid:173)
`sional strange output when the default pattern matched unanticipated input
`characters. (Even though there is a default action for unmatched input
`characters, well-written lexers invariably have explicit rules to match all
`possible input.)
`
`6
`
`Page 11 of 30
`
`

`

`Lex and Yacc
`
`The end of the rules section is delimited by another%%.
`
`The final section is the user subroutines section, which can consist of any
`legal C code. Lex copies it to the C file after the end of the lex generated
`code. We have included a main() program.
`%%
`
`main()
`{
`
`yylex();
`
`The lexer produced by lex is a C routine called yylex(), so we call it.*
`Unless the actions contain explicit return statements, yylex() won't return
`until it has processed the entire input.
`
`We placed our original example in a file called chl-02.l since it is our sec(cid:173)
`ond example. To create an executable program on our UNIX system we
`enter these commands:
`% lex chl-02.1
`% cc lex.yy.c -o first -11
`
`Lex translates the lex specification into a C source file called lex.yy.c which
`we compiled and linked with the lex library -II. We then execute the
`resulting program to check that it works as we expect, as we saw earlier in
`this section. Try it to convince yourself that this simple description really
`does recognize exactly the verbs we decided to recognize.
`
`Now that we've tackled our first example, let's "spruce it up." Our second
`example, Example 1-2, extends the lexer to recognize different parts of
`speech.
`
`Example 1-2: Lex example with multiple parts of speech chl-03.l
`
`%{
`I*
`* We expand upon the first example by adding recognition of some other
`* parts of speech.
`*/
`
`[\t ]+
`
`/* ignore whitespace*/;
`
`* Actually, we could have left out the main program because the lex library contains a default
`main routine just like this one.
`
`7
`
`Page 12 of 30
`
`

`

`lex &yacc
`
`Example 1-2: Lex example with multiple parts of speech cbl-03.l (continued)
`
`is I
`am I
`are I
`were I
`was I
`be I
`being I
`been I
`do I
`does I
`did I
`will I
`would I
`should I
`can I
`could I
`has I
`have I
`had I
`go
`
`very
`simply
`gently
`quietly I
`calmly I
`angrily
`
`to I
`fran I
`behind I
`above I
`below ·1
`between
`below
`
`if I
`then I
`and I
`b.lt I
`or
`
`their
`IITf I
`your I
`his I
`her I
`its
`
`I
`I
`you I
`he I
`
`8
`
`{ print£ (•%s: is a verb\n•, yytext); }
`
`{ print£ ( •%a: is an adverb\n•, yytext); }
`
`{ printf(•%s: is a preposition\n•, yytext); }
`
`{ printf(•%s: is a conjunction\n•, yytext); }
`
`{ print£ (•%s: is an adjective\n•, yytext); }
`
`Page 13 of 30
`
`

`

`Lex and Yacc
`
`Example 1-2: Lex example with multiple parts of speech chl-03.l (continued)
`
`she I
`we I
`they
`
`{ print£ ( •ts: is a pronoun\n•, yytext); }
`
`[a-zA-Z]+ {
`print£ ( "%s: don't recognize, might be a noun \n n, yytext) ;
`}
`
`.l\n
`
`{ ECHO; /* nonnal default air:/Wc1::/ *I}
`
`main()
`{
`
`yylex();
`
`Symbol Tables
`Our second example isn't really very different. We list more words than we
`did before, and in principle we could extend this example to as many
`words as we want. It would be more convenient, though, if we could build
`a table of words as the lexer is running, so we can add new words without
`modifying and recompiling the lex program. In our next example, we do
`just that-allow for the dynamic declaration of parts of speech as the lexer
`is running, reading the words to declare from the input file. Declaration
`lines start with the name of a part of speech followed by the words to
`declare. These lines, for example, declare four nouns and three verbs:
`noun dog cat horse cow
`verb chew eat lick
`
`The table of words is a simple symbol table, a common structure in lex and
`yacc applications. A C compiler, for example, stores the variable and struc(cid:173)
`ture names, labels, enumeration tags, and all other names used in the pro(cid:173)
`gram in its symbol table. Each name is stored along with information
`describing the name. In a C compiler the information is the type of symbol,
`declaration scope, variable type, etc. In our current example, the informa(cid:173)
`tion is the part of speech.
`
`Adding a symbol table changes the lexer quite substantially. Rather than
`putting separate patterns in the lexer for each word to match, we have a
`single pattern that matches any word and we consult the symbol table to
`decide which part of speech we've found. The names of parts of speech
`
`9
`
`Page 14 of 30
`
`

`

`lex&yacc
`
`(noun, verb, etc.) are now "reserved words" since they introduce a declara(cid:173)
`tion line. We still have a separate lex pattern for each reserved word. We
`also have to add symbol table maintenance routines,
`in this case
`add_ word(), which puts a new word into the symbol table, and
`lookup_ word(), which looks up a word which should already be entered.
`
`In the program's code, we declare a variable state that keeps track of
`whether we're looking up words, state LOOKUP, or declaring them, in
`which case state remembers what kind of words we're declaring. When(cid:173)
`ever we see a line starting with the name of a part of speech, we set the
`state to declare that kind of word; each time we see a \n we switch back to
`the normal lookup state.
`
`Example 1-3 shows the definition section.
`
`Example 1-3: Lexer with symbol table (part 1 of 3) ch 1-04.l
`
`%{
`I*
`* Word recognizer with a symbol table.
`*I
`
`enum {
`LCX)KUP = 0, /* default - looking rather than defining.*/
`VERB,
`ADJ,
`ADV,
`NOUN,
`PREP,
`PRON,
`CONJ
`
`};
`
`int state;
`
`int add,...word (int type, char *word) ;
`int lookup_word ( char *word) ;
`%}
`
`We define an enum in order to use in our table to record the types of indi(cid:173)
`vidual words, and to declare a variable state. We use this enumerated type
`both in the state variable to track what we're defining and in the symbol
`table to record what type each defined word is. We also declare our sym(cid:173)
`bol table routines.
`
`Example 1-4 shows the rules section.
`
`Page 15 of 30
`
`

`

`Lex and Yacc
`
`Example 1-4: Lexer with symbol table (part 2 of 3) chl-04.l
`
`%%
`\n
`
`{state= LOOKUP;}
`
`/* end of line, return to default state */
`
`/* whenever a line starts with a reserved part of speech name * /
`/* start defining words of that type * /
`"verb { state = VERB; }
`Aadj {state= ADJ; }
`Aadv { state = MJV; }
`Anoun { state = NOUN; }
`"prep {state= PREP; }
`"pron {state= PRON; }
`Aconj {state= CONJ; }
`
`[a-zA-Z]+
`
`/* a nonnal word, define it or look it up*/
`if(state != LOOKUP) {
`/* define the current word*/
`add,_word(state, yytext);
`else {
`switch(lookup_word(yytext))
`case VERB: printf ( 11 %s: verb\n", yytext); break;
`case ADJ: printf ( "%s: adjective\n", yytext); break;
`case MN: printf ( "%s: adverb\n •, yytext) ; break;
`case NOUN: printf ( • %s: noun \n •, yytext) ; break;
`case PREP: printf("%s: preposition\n•, yytext); break;
`case PRON: printf ("%s: pronoun\n", yytext); break;
`case CONJ: printf ("%s: conjunction\n", yytext); break;
`default:
`printf( 11 %s: don't recognize\n", yytext);
`break;
`
`}
`}
`
`/* ignore anything else*/
`
`%%
`
`For declaring words, the first group of rules sets the state to the type corre(cid:173)
`sponding to the part of speech being declared. (The caret,""", at the begin(cid:173)
`ning of the pattern makes the pattern match only at the beginning of an
`input line.) We reset the state to LOOKUP at the beginning of each line so
`that after we add new words interactively we can test our table of words to
`determine if it is working correctly. If the state is LOOKUP when the pattern
`"[a-zA-Z)+" matches, we look up the word, using lookup_ word(), and if
`found print out its type. If we're in any other state, we define the word
`with add_ word().
`
`11
`
`Page 16 of 30
`
`

`

`lex &yacc
`
`The user subroutines section in Example 1-5 contains the same skeletal
`main() routine and our two supporting functions.
`
`Example 1-5: Lexer with symbol table (part 3 o/3) chl-04.l
`
`main()
`{
`
`yylex();
`
`/* define a linked list of words and types*/
`struct word {
`char *word..]lame;
`int worcL.type;
`struct word *next;
`
`};
`
`struct word *worcL.list; /* first element in word list*/
`
`extern void *malloc ( ) ;
`
`int
`adcl..word(int type, char *word)
`{
`
`struct word *wp;
`
`!= LOOKUP) {
`if (lookup_word(word)
`printf (" ! ! ! waming: word %s already defined \n" , word) ;
`retw:n O;
`
`/* word not there, allocate a new entry and link it on the list */
`wp = (struct word*) malloc(sizeof(struct word));
`
`wp->next = worcL.list;
`
`/* have to copy the word itself as well*/
`wp->WOrd,Jlarne = (char *) malloc(strlen(word)+l);
`strcpy (wp->Wt>rd.J}ame, word) ;
`wp->WOrcL.type = type;
`worcL.list = wp;
`retw:n l;
`/* it worked * /
`
`int
`lookup_word ( char *word)
`{
`
`struct word *wp = worcl.list;
`
`/* search down the list looking for the word * /
`for(; wp; wp = wp->next) {
`
`12
`
`Page 17 of 30
`
`

`

`Lex and Yacc
`
`F.xamp/e 1-5: Lexer with symbol table (part 3 of 3) chl-04./ (continued)
`
`if (strcnp(wp->WOrd...,Jlame, word) == 0)
`return wp->WOrd.....type;
`
`return LCX>KUP;
`
`/* not found*/
`
`These last two functions create and search a linked list of words. If there
`are a lot of words, the functions will be slow since, for each word, they
`might have to search through the entire list. In a production environment
`we would use a faster but more complex scheme, probably using a hash
`table. Our simple example does the job, albeit slowly.
`
`Here is an example of a session we had with our last example:
`verb is am are was were be being been do
`is
`is: verb
`noun dog cat horse cow
`verb chew eat lick
`verb run stand sleep
`dog run
`dog: noun
`run: verb
`chew eat sleep cow horse
`chew: verb
`eat: verb
`sleep: verb
`cow: noun
`horse: noun
`verb talk
`talk
`talk: verb
`
`We strongly encourage you to play with this example until you are satisfied
`you understand it.
`
`Grammars
`For some applications, the simple kind of word recognition we've already
`done may be more than adequate; others need to recognize specific
`sequences of tokens and perform appropriate actions. Traditionally, a
`description of such a set of actions is known as a grammar. It seems espe-
`
`13
`
`Page 18 of 30
`
`

`

`lex &yacc
`
`dally appropriate for our example. Suppose that we wished to recognize
`common sentences. Here is a list of simple sentence types:
`
`noun verb.
`noun verb noun.
`
`At this point, it seems convenient to introduce some notation for describing
`grammars. We use the right facing arrow," ➔", to mean that a particular set
`of tokens can be replaced by a new symbol.* For instance:
`
`subject ➔ noun I pronoun
`
`would indicate that the new symbol subject is either a noun or a pronoun.
`We haven't changed the meaning of the underlying symbols; rather we
`have built our new symbol from the more fundamental symbols we've
`already defined. As an added example we could define an object as fol(cid:173)
`lows:
`
`object ➔ noun
`
`While not strictly correct as English grammar, we can now define a sen(cid:173)
`tence:
`
`sentence ➔ subject verb object
`
`Indeed, we could expand this definition of sentence to fit a much wider
`variety of sentences. However, at this stage we would like to build a yacc
`grammar so we can test our ideas out interactively. Before we introduce
`our yacc grammar, we must modify our lexical analyzer in order to return
`values useful to our new parser.
`
`Parser-Lexer Communication
`When you use a lex scanner and a yacc parser together, the parser is the
`higher level routine. It calls the lexer yylex() whenever it needs a token
`from the input. The lexer then scans through the input recognizing tokens.
`As soon as it finds a token of interest to the parser, it returns to the parser,
`returning the token's code as the value of yylex( ).
`
`Not all tokens are of interest to the parser-in most programming Ian- •
`guages the parser doesn't want to hear about comments and whitespace,
`
`*We say symbol rather than token here, because we reserve the name "token" for symbols re(cid:173)
`turned from the lexer, and the symbol to the left of the arrow did not come from the lexer. All
`tokens are symbols, but not all symbols are tokens.
`
`14
`
`Page 19 of 30
`
`

`

`Lex and Yacc
`
`for example. For these ignored tokens, the lexer doesn't return so that it
`can continue on to the next token without bothering the parser.
`
`The lexer and the parser have to agree what the token codes are. We solve
`this problem by letting yacc define the token codes. The tokens in our
`grammar are the parts of speech: NOUN, PRONOUN, VERB, ADVERB, ADJEC(cid:173)
`TIVE, PREPOSIDON, and CONJUNCTION. Yacc defines each of these as a
`small integer using a preprocessor #define. Here are the definitions it used
`in this example:
`# define NOUN 257
`# define PRONOUN 258
`# define VERB 259
`# define ADVERB 260
`# define ADJECTIVE 261
`# define PREPOSITIOO 262
`# define CONJUNCTIOO 263
`
`Token code zero is always returned for the logical end of the input. Yacc
`doesn't define a symbol for it, but you can yourself if you want.
`
`Yacc can optionally write a C header file containing all of the token defini(cid:173)
`tions. You include this file, called y.tab.b on UNIX systems and ytab.b or
`yytab.b on MS-DOS, in the lexer and use the preprocessor symbols in your
`lexer action code.
`
`The Parts of Speech Lexer
`Example 1-6 shows the declarations and rules sections of the new lexer.
`
`F.xample 1-6: Lexer to becalledfrom theparsercbl-05.I
`
`%{
`/*
`* We now build a lexical analyzer to be used by a higher-level parser.
`*/
`•
`
`#include •y.tab.h•
`
`/* token codes from the parser*/
`
`#define
`
`IOOKUP O
`
`/* default - not a defined word type. * /
`
`int state;
`
`%}
`
`\n
`
`{state= IOOKUP; }
`
`\.\n
`
`state= IOOKUP;
`
`15
`
`Page 20 of 30
`
`

`

`lex &yacc
`
`Example J-6: Lexer to be called from the parser chl-05.l (continued)
`
`return O; /* end of sentence*/
`
`"verb { state = VERB; }
`"adj { state = AnJECTIVE;
`"adv {state= ADVERB; }
`"nOllll {state= NOUN; }
`"prep { state = PREPOSITION;
`"pron {state= PRCNOUN; }
`"conj {state= CCNJUNCTION;
`
`[a-zA-Z)+
`
`if(state != LOOKUP) {
`add.,_word(state, yytext);
`} else {
`switch (lookup_word (yytext) )
`case VERB:
`return(VERB);
`case AnJECTIVE:
`return (AnJECTIVE) ;
`case ADVERB:
`retum(ADVERB);
`case NOUN:
`return (NOUN) ;
`case PREPOSITION:
`return(PREPOSITION);
`case PRONOUN:
`return(PRONOUN);
`case CONJUNCTION:
`return (CONJUNCTION) ;
`default:
`printf(•%s: don't recognize\n•, yytext);
`/* don't return, just ignore it*/
`
`}
`}
`
`%%
`... same add_word() and lookup_word() as before ...
`
`There are several important differences here. We've changed the part of
`speech names used in the lexer to agree with the token names in the
`parser. We have also added return statements to pass to the parser the
`token codes for the words that it recognizes. There aren't any return state(cid:173)
`ments for the tokens that define new words to the lexer, since the parser
`doesn't care about them.
`
`16
`
`Page 21 of 30
`
`

`

`Lex and Yacc
`
`These return statements show that yylex() acts like a coroutine. Each time
`the parser calls it, it takes up processing at the exact point it left off. This
`allows us to examine and operate upon the input stream incrementally.
`Our first programs didn't need to take advantage of this, but it becomes
`more useful as we use th

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket