 |
Index for Section 1 |
|
 |
Alphabetical listing for Y |
|
yacc(1)
NAME
yacc - Generates an LR(1) parsing program from input consisting of a
context-free grammar specification
SYNOPSIS
yacc [-vltds] [-b prefix] [-N number] [-p symbol_prefix] [-P pathname]
grammar
The yacc command converts a context-free grammar specification into a set
of tables for a simple automaton that executes an LR(1) parsing algorithm.
STANDARDS
Interfaces documented on this reference page conform to industry standards
as follows:
yacc: XPG4, XPG4-UNIX
Refer to the standards(5) reference page for more information about
industry standards and associated tags.
OPTIONS
-b prefix
Uses prefix instead of y as the prefix for all output filenames
(prefix.tab.c, prefix.tab.h, and prefix.output).
-d Produces the <y.tab.h> file, which contains the #define statements that
associate the yacc-assigned token codes with your token names. This
allows source files other than y.tab.c to access the token codes by
including this header file.
-l Includes no #line constructs in y.tab.c. Use this only after the
grammar and associated actions are fully debugged.
-N number
[Compaq] Provides yacc with extra storage for building its LALR
tables, which may be necessary when compiling very large grammars. The
number should be larger than 40,000 when you use this option.
-p symbol_prefix
Allows multiple yacc parsers to be linked together. Use symbol_prefix
instead of yy to prefix global symbols.
-P pathname
[Compaq] Specifies an alternative parser (instead of
/usr/ccs/lib/yaccpar). The pathname specifies the filename of the
skeleton to be used in place of yaccpar).
-s [Compaq] Breaks the yyparse() function into several smaller functions.
Because its size is somewhat proportional to that of the grammar, it is
possible for yyparse() to become too large to compile, optimize, or
execute efficiently.
-t Compiles run-time debugging code. By default, this code is not
included when y.tab.c is compiled. If YYDEBUG has a nonzero value, the
C compiler (cc) includes the debugging code, whether or not the -t
option was used. Without compiling this code, yyparse() will run more
quickly.
-v Produces the y.output file, which contains a readable description of
the parsing tables and a report on conflicts generated by grammar
ambiguities.
OPERANDS
grammar
The pathname of a file containing input instructions. The format of
this file is described in the section Syntax for yacc Input under the
DESCRIPTION.
DESCRIPTION
The yacc grammar can be ambiguous; specified precedence rules are used to
break ambiguities.
You must compile the y.tab.c output file with a C language compiler to
produce the yyparse() function. This function must be loaded with a yylex
lexical analyzer function, as well as main() and yyerror(), an error-
handling routine (you must provide these routines). The lex command is
useful for creating lexical analyzers usable by yacc.
The yacc program reads its skeleton parser from the file
/usr/ccs/lib/yaccpar. Use the environment variable YACCPAR to specify
another location for the yacc program to read from. If you use this
environment variable, the -P option is ignored, if specified.
Syntax for yacc Input
This section contains a formal description of the yacc input file (or
grammar file), which is normally named with a .y suffix. The section
provides a listing of the special values, macros, and functions recognized
by yacc.
The general format of the yacc input file is:
[ definitions ]
%%
[ rules ]
[ %%
[ user functions ]]
where
definitions
Is the section where you define the variables to be used later in the
grammar, such as in the rules section. It is also where files are
included (#include) and processing conditions are defined. This
section is optional.
rules
Is the section that contains grammar rules for the parser. A yacc
input file must have a rules section.
user functions
Is the section that contains user-supplied functions that can be used
by the actions in the rules section. This section is optional.
The NULL character must not be used in grammar rules or literals. Each
line in the definitions can be:
%{
%} When placed on lines by themselves, these enclose C code to be passed
into the global definitions of the output file. Such lines commonly
include preprocessor directives and declarations of external variables
and functions.
%token [type] token [number] [name [number]...
Lists tokens or terminal symbols to be used in the rest of the input
file. This line is needed for tokens that do not appear in other %
definitions. If type is present, the C type for all tokens on this line
is declared to be the type referenced by type. If a positive integer
number follows a token, that value is assigned to the token.
%left [<type>] token [ number][name[number]]...
Indicates that each token is an operator, that all tokens in this
definition have equal precedence, and that a succession of the
operators listed in this definition are evaluated left to right.
%right [<type>] token [number] [name [number]]...
Indicates that each token is an operator, that all tokens in this
definition have equal precedence, and that a succession of the
operators listed in this definition are evaluated right to left.
%nonassoc [<type>] name [ number ] [name [ number]]...
Indicates that each token is an operator, and that the operators listed
in this definition cannot appear in succession. Indicates that the
token cannot be used associatively.
%start symbol
Indicates the highest-level production rule to be reduced; in other
words, the rule where the parser can consider its work done and
terminate. If this definition is not included, the parser uses the
first production rule. The symbol must be non-terminal (not a token).
%type < type > symbol [ symbol ... ]
Defines each symbol as data type type, to resolve ambiguities. If this
construct is present, yacc performs type checking and otherwise assumes
all symbols to be of type integer.
%union union-def
Defines the yylval global variable as a union, where union-def is a
standard C definition in the format:
{ type member ; [type member ; ... ] }
At least one member should be an int. Any valid C data type can be
defined, including structures. When you run yacc with the -d option,
the definition of yylval is placed in the <y.tab.h> file and can be
referred to in a lex input file.
Every token (non-terminal symbol) must be listed in one of the preceding %
definitions. Multiple tokens can be separated by white space or commas.
All the tokens in %left, %right, and %nonassoc definitions are assigned a
precedence with tokens in later definitions having precedence over those in
earlier definitions.
In addition to symbols, a token can be literal character enclosed in single
quotes. (Multibyte characters are recognized by the lexical analyzer and
returned as tokens.) The following special characters can be used, just as
in C programs:
\a Alert
\n Newline
\t Tab
\v Vertical tab
\r Carriage Return
\b Backspace
\f Form Feed
\\ Backslash
\' Single Quote
\? Question mark
\n One or more octal digits specifying the integer value of the character
The rules section consists of a series of production rules that the parser
tries to reduce. The format of each production rule is:
symbol : symbol-sequence [ action ] [ | symbol-sequence [ action ] ... ] ;
where symbol-sequence consists of zero or more symbols separated by white
space. The first symbol must be the first character of the line, but
newlines and other white space can appear anywhere else in the rule. All
terminal symbols must be declared in %token definitions.
Each symbol-sequence represents an alternative way of reducing the rule. A
symbol can appear recursively in its own rule. Always use left-recursion
(where the recursive symbol appears before the terminating case in symbol-
sequence).
The specific sequence:
%prec token
indicates that the current sequence of symbols is to be preferred over
others, at the level of precedence assigned to token in the definitions
section.
The specially defined token error matches any unrecognized sequence of
input. This token causes the parser to invoke the yyerror function. By
default, the parser tries to synchronize with the input and continue
processing it by reading and discarding all input up to the symbol
following error. (You can override this behavior through the yyerrok
action.) If no error token appears in the yacc input file, the parser
exits with an error message upon encountering unrecognized input.
The parser always executes action after encountering the symbol that
precedes it. Thus, an action can appear in the middle of a symbol-
sequence, after each symbol-sequence, or after multiple instances of
symbol-sequence. In the last case, action is executed when the parser
matches any of the sequences.
The action consists of standard C code within braces and can also take the
following values, variables, and keywords.
yylval
If the token returned by the yylex function is associated with a
significant value, yylex should place the value in this global
variable. By default, yylval is of type long. The definitions section
can include a %union definition to associate with other data types,
including structures. If you run yacc with the -d option, the full
yylval definition is passed into the <y.tab.h> file for access by lex.
yyerrok
Causes the parser to start parsing tokens immediately after an
erroneous sequence, instead of performing the default action of reading
and discarding tokens up to a synchronization token. The yyerrok
action should appear immediately after the error token.
$ [ <type> ] n
Refers to symbol n, a token index in the production, counting from the
beginning of the production rule, where the first symbol after the
colon is $1. The type variable is the name of one of the union lines
listed in the %union directive in the declaration section. The <type>
syntax (non-standard) allows the value to be cast to a specific data
type. Note that you will rarely need to use the type syntax.
$ [ <type> ] $
Refers to the value returned by the matched symbol-sequence and used
for the matched symbol when reducing other rules. The symbol-sequence
generally assigns a value to $$. The type variable is the name of one
of the union lines listed in the %union directive in the declaration
section. The <type> syntax (non-standard) allows the value to be cast
to a specific data type. Note that you will rarely need to use the
type syntax.
The user functions section contains user-supplied programs. If you supply
a lexical analyzer (yylex) to the parser, it must be contained in the user
functions section.
The following functions, which are contained in the user functions section,
are invoked within the yyparse function generated by yacc.
yylex()
The lexical analyzer called by yyparse to recognize each token of
input. Usually this function is created by lex. The yylex function
reads input, recognizes expressions within the input, and returns a
token number representing the kind of token read. The function returns
an int value. A return value of 0 (zero) means the end of input.
If the parser and yylex do not agree on these token numbers, reliable
communication between them cannot occur. For (one character) literals,
the token is simply the numeric value of the character in the current
character set. The numbers for other tokens can either be chosen by
yacc, or by the user. In either case, the #define construct of C is
used to allow yylex() to return these numbers symbolically. The #define
statements are put into the code file, and the header file if that file
is requested. The set of characters permitted by yacc in an identifier
is larger than that permitted by C. Token names found to contain such
characters will not be included in the #define declarations.
If the token numbers are chosen by yacc, the tokens other than
literals, are assigned numbers greater than 256, although no order is
implied. A token can be explicitly assigned a number by following its
first appearance in the declaration section with a number. Names and
literals not defined this way retain their default definition. All
assigned token numbers are unique and distinct from the token numbers
used for literals. If duplicate token numbers cause conflicts in
parser generation, yacc reports an error; otherwise, it is unspecified
whether the token assignment is accepted or an error is reported.
The end of the input is marked by a special token called the endmarker
that has a token number that is zero or negative. All lexical analyzers
return zero or negative as a token number upon reaching the end of
their input. If the tokens up to, but not excluding, the endmarker form
a structure that matches the start symbol, the parser accepts the
input. If the endmarker is seen in any other context, it is considered
an error.
yyerror(string)
The function that the parser calls upon encountering an input error.
The default function, defined in liby.a, simply prints string to the
standard error. The user can redefine the function. The function's
type is void.
The liby.a library contains default main() and yyerror() functions. These
look like the following, respectively:
main()
{
setlocale(LC_ALL, );
(void) yyparse();
return(0);
}
int yyerror(s);
char *s;
{
fprintf(stderr,"%s\n",s);
return (0);
}
Comments, in C syntax, can appear anywhere in the user functions or
definitions sections. In the rules section, comments can appear wherever a
symbol is allowed. Blank lines or lines consisting of white space can be
inserted anywhere in the file, and are ignored.
NOTES
The LANG and LC_* variables affect the execution of the yacc command as
stated. The main() function defined by yacc calls
setlocale(LC_ALL, "")
thus, the program generated by yacc will also be affected by the contents
of these variables at runtime.
EXIT STATUS
The following exit values are returned:
0 Successful completion.
>0 An error occurred.
EXAMPLES
This section describes the example programs for the lex and yacc commands,
which together create a simple desk calculator program that performs
addition, subtraction, multiplication, and division operations. The
calculator program also allows you to assign values to variables (each
designated by a single lowercase ASCII letter), and then use the variables
in calculations. The files that contain the program are as follows:
calc.l
The lex specification file that defines the lexical analysis rules.
calc.y
The yacc grammar file that defines the parsing rules and calls the
yylex() function created by lex to provide input.
The remaining text expects that the current directory is the directory that
contains the lex and yacc example program files.
Compiling the Example Program
Perform the following steps to create the example program using lex and
yacc:
1. Process the yacc grammar file using the -d option. The -d option
tells yacc to create a file that defines the tokens it uses in
addition to the C language source code.
yacc -d calc.y
The following files are created (the *.o files are created temporarily
and then removed):
y.tab.c
The C language source file that yacc created for the parser.
<y.tab.h>
A header file containing #define statements for the tokens used by
the parser.
2. Process the lex specification file:
lex calc.l
The following file is created:
lex.yy.c
The C language source file that lex created for the lexical
analyzer.
3. Compile and link the two C language source files:
cc -o calc y.tab.c lex.yy.c
The following files are created:
y.tab.o
The object file for y.tab.c.
lex.yy.o
The object file for lex.yy.c.
calc
The executable program file.
You can then run the program directly by entering:
calc
Then enter numbers and operators in calculator fashion. After you
press <Return>, the program displays the result of the operation. If
you assign a value to a variable as follows, the cursor moves to the
next line:
m=4 <Return>
_
You can then use the variable in calculations and it will have the
value assigned to it:
m+5 <Return>
9
The Parser Source Code
The text that follows shows the contents of the file calc.y. This file has
entries in all three of the sections of a yacc grammar file: declarations,
rules, and programs.
%{
#include <stdio.h>
int regs[26];
int base;
%}
%start list
%token DIGIT LETTER
%left '|'
%left '&'
%left '+' '-'
%left '*' '/' '%'
%left UMINUS /*supplies precedence for unary minus */
%% /*beginning of rules section */
list : /*empty */
| list stat '\n'
| list error '\n'
{ yyerrok; }
;
stat : expr
{ printf("%d\n",$1); }
| LETTER '=' expr
{ regs[$1] = $3; }
;
expr : '(' expr ')'
{ $$ = $2; }
| expr '*' expr
{ $$ = $1 * $3; }
| expr '/' expr
{ $$ = $1 / $3; }
| expr '%' expr
{ $$ = $1 % $3; }
| expr '+' expr
{ $$ = $1 + $3; }
| expr '-' expr
{ $$ = $1 - $3; }
| expr '&' expr
{ $$ = $1 & $3; }
| expr '|' expr
{ $$ = $1 | $3; }
| '-' expr %prec UMINUS
{ $$ = -$2; }
| LETTER
{ $$ = regs[$1]; }
| number
;
number : DIGIT
{ $$ = $1; base = ($1==0) ? 8:10; }
| number DIGIT
{ $$ = base * $1 + $2; }
;
%%
main()
{
return(yyparse());
}
yyerror(s)
char *s;
{
fprintf(stderr,"%s\n",s);
}
yywrap()
{
return(1);
}
Declarations Section
This section contains entries that perform the following functions:
· Includes standard I/O header file.
· Defines global variables.
· Defines the list rule as the place to start processing.
· Defines the tokens used by the parser.
· Defines the operators and their precedence.
Rules Section
The rules section defines the rules that parse the input stream.
Programs Section
The programs section contains the following routines. Because these
routines are included in this file, you do not need to use the yacc library
when processing this file.
main()
The required main program that calls yyparse() to start the program.
yyerror(s)
This error handling routine only prints a syntax error message.
yywrap()
The wrap-up routine that returns a value of 1 when the end of input
occurs.
The Lexical Analyzer Source Code
This shows the contents of the file calc.l. This file contains include
statements for standard input and output, as well as for the <y.tab.h>
file. The yacc program generates that file from the yacc grammar file
information, if you use the -d option with the yacc command. The file
<y.tab.h> contains definitions for the tokens that the parser program uses.
In addition, calc.l contains the rules used to generate the tokens from the
input stream.
%{
#include <stdio.h>
#include "y.tab.h"
int c;
#if !defined (YYSTYPE)
#define YYSTYPE long
#endif
extern YYSTYPE yylval;
%}
%%
" " ;
[a-z] {
c = yytext[0];
yylval = c - 'a';
return(LETTER);
}
[0-9] {
c = yytext[0];
yylval = c - '0';
return(DIGIT);
}
[^a-z 0-9] {
c = yytext[0];
return(c);
}
ENVIRONMENT VARIABLES
The following environment variables affect the execution of yacc:
LANG
Provides a default value for the internationalization variables that
are unset or null. If LANG is unset or null, the corresponding value
from the default locale is used. If any of the internationalization
variables contain an invalid setting, the utility behaves as if none of
the variables had been defined.
LC_ALL
If set to a non-empty string value, overrides the values of all the
other internationalization variables.
LC_CTYPE
Determines the locale for the interpretation of sequences of bytes of
text data as characters (for example, single-byte as opposed to multi-
byte characters in arguments and input files).
LC_MESSAGES
Determines the locale for the format and contents of diagnostic
messages written to standard error.
NLSPATH
Determines the location of message catalogs for the processing of
LC_MESSAGES.
FILES
y.output
A readable description of parsing tables and a report on conflicts
generated by grammar ambiguities.
y.tab.c
Output file.
<y.tab.h>
Definitions for token names.
yacc.tmp
Temporary file.
yacc.debug
Temporary file.
yacc.acts
Temporary file.
/usr/ccs/lib/yaccpar
Default skeleton parser for C programs.
/usr/ccs/lib/liby.a
yacc library.
SEE ALSO
Commands: lex(1)
Standards: standards(5)
Programming Support Tools