ossp-adm/autotools/treecc.html
<HTML>
<HEAD>
<!-- Created by texi2html 1.56k from treecc.texi on 11 June 2002 -->
<TITLE>Tree Compiler-Compiler</TITLE>
</HEAD>
<BODY>
<H1>Tree Compiler-Compiler</H1>
<P>
<P><HR><P>
<H1><A NAME="SEC1" HREF="treecc_toc.html#TOC1">Overview</A></H1>
<P>
<A NAME="IDX1"></A>
<H2><A NAME="SEC2" HREF="treecc_toc.html#TOC2">Introduction</A></H2>
<P>
Traditional compiler construction tools such as lex and yacc focus on
the lexical analysis and parsing phases of compilation. But they
provide very little to support semantic analysis and code generation.
<P>
Yacc allows grammar rules to be tagged with semantic actions and values,
but it doesn't provide any routines that assist in the process of tree
building, semantic analysis, or code generation. Because those processes
are language-specific, yacc leaves the details to the programmer.
<P>
Support for semantic analysis was also a lot simpler in the languages
that were prevalent when lex and yacc were devised. C and Pascal
require declare before use, which allows the semantic information
about a statement to be determined within the parser at the point of
use.<A NAME="DOCF1" HREF="treecc_foot.html#FOOT1">(1)</A> If extensive optimization
is not required, then code generation can also be performed within
the grammar, leading to a simple one-pass compiler structure.
<P>
Modern languages allow deferred declaration of methods, fields, and
types. For example, Java allows a method to refer to a field that
is declared further down the .java source file. A field can be
declared with a type whose class definition has not yet been parsed.
<P>
Hence, most of the semantic analysis that used to be performed inline
within a yacc grammar must now be performed after the entire program
has been parsed. Tree building and walking is now more important
than it was in older declare before use languages.
<H2><A NAME="SEC3" HREF="treecc_toc.html#TOC3">Tree walking: the need for something better</A></H2>
<P>
Building parse tree data structures and walking them is not terribly
difficult, but it is extremely time-consuming and error-prone. A
modern programming language may have hundreds of node types, divided
into categories for statements, expressions, types, declarations, etc.
When a new programming language is being devised, new node types may
be added quite frequently. This has ramifications in trying to manage
the code's complexity.<A NAME="DOCF2" HREF="treecc_foot.html#FOOT2">(2)</A>
<P>
For example, consider nodes that correspond to programming language
types in a C-like language. There will be node types for integer
types, floating-point types, pointers, structures, functions, etc.
There will be semantic analysis routines for testing types for
equality, comparing types for coercions and casts, evaluating the
size of a type for memory layout purposes, determining if the type
falls into a general category such as "integer" or "pointer", etc.
<P>
Let's say we wanted to add a new "128-bit integer" type to this
language. Adding a new node type is fairly straight-forward.
But we also need to track down every place in the code where the
compiler walks a type or deals with integers and add an appropriate
case for the new type. This is very error-prone. Such code is
likely to be split over many files, and good coding practices only
help to a certain extent.
<P>
This problem gets worse when new kinds of expressions and statements
are added to the language. The change not only affects semantic
analysis, but also optimization and code generation. Some compilers
use multiple passes over the tree to perform optimization, with
different algorithms used in each pass. Code generation may use a
number of different strategies, depending upon how an expression or
statement is used. If even one of these places is missed when the
new node type is added, then there is the potential for a very nasty
bug that may go unnoticed for months or years.
<P>
Object-oriented languages such as C++ can help a bit in constructing
robust tree structures. The base class can declare abstract methods
for any semantic analysis, optimization, or code generation routine
that needs to be implemented for all members of the node category.
But another code maintainence problem arises. What happens when
we want to add a new optimization pass in the future? We must go
into hundreds of classes and implement the methods.
<P>
To avoid changing hundreds of classes, texts on Design Patterns
suggest using a Visitor pattern. Then the new optimization pass
can be encapsulated in a visitor. This would work, except for
the following drawback of visitor patterns, as described in Gamma,
et al:
<BLOCKQUOTE>
<P>
<EM>The Visitor pattern makes it hard to add new subclasses of
Element. Each new ConcreteElement gives rise to a new abstract
operation on Visitor and a corresponding implementation in
every ConcreteVisitor class.</EM>
<P>
<EM>... The Visitor class hierarchy can be difficult to maintain
when new ConcreteElement classes are added frequently. In such
cases, it's probably easier just to define operations on the
classes that make up the structure.</EM>
</BLOCKQUOTE>
<P>
That is, if we add a new node type in the future, we have a large
maintainence problem on our hands. The solution is to scatter the
implementation through-out every class, which is the situation we
were trying to avoid by using the Visitor pattern.
<P>
Because compiler construction deals with a large set of rapidly
changing node types and operations, neither of the usual approaches
work very well.
<P>
The ideal programming language for designing compilers needs to have
some way to detect when the programmer forgets to implement an operation
for a new node type, and to ensure that a new operation covers all
existing node types adequately. Existing OO languages do not perform
this kind of global error checking. What few checking procedures they
have change the maintainence problem into a different problem of
similar complexity.
<H2><A NAME="SEC4" HREF="treecc_toc.html#TOC4">Aspect-oriented programming</A></H2>
<P>
A new field in language design has emerged in recent years called
"Aspect-Oriented Programming" (AOP). A good review of the field
can be found in the October 2001 issue of the <EM>Communications of
the ACM</EM>, and on the AspectJ Web site, <A HREF="http://www.aspectj.org/">http://www.aspectj.org/</A>.
<P>
The following excerpt from the introduction to the AOP section in the
CACM issue describes the essential aspects of AOP, and the difference
between OOP and AOP:
<BLOCKQUOTE>
<P>
<EM>AOP is based on the idea that computer systems are better programmed
by separately specifying the various concerns (properties or areas
of interest) of a system and some description of their relationships,
and then relying on mechanisms in the underlying AOP environment to
weave or compose them together into a coherent program. ...
While the tendancy in OOP's is to find commonality among classes
and push it up the inheritance tree, AOP attempts to realize
scattered concerns as first-class elements, and eject them
horizontally from the object structure.</EM>
</BLOCKQUOTE>
<P>
Aspect-orientation gives us some hope of solving our compiler
complexity problems. We can view each operation on node types
(semantic analysis, optimization, code generation, etc) as an
"aspect" of the compiler's construction. The AOP language weaves
these aspects with the node types to create the final compiler.
<H2><A NAME="SEC5" HREF="treecc_toc.html#TOC5">The treecc approach</A></H2>
<P>
We don't really want to implement a new programming language
just for compiler construction. Especially since the new language's
implementation would have all of the problems described above and would
therefore also be difficult to debug and maintain.
<P>
The approach that we take with "treecc" is similar to that used by
"yacc". A simple rule-based language is devised that is used to describe
the intended behaviour declaratively. Embedded code is used to provide
the specific implementation details. A translator then converts the input
into source code that can be compiled in the usual fashion.
<P>
The translator is responsible for generating the tree building and
walking code, and for checking that all relevant operations have been
implemented on the node types. Functions are provided that make
it easier to build and walk the tree data structures from within
a "yacc" grammar and other parts of the compiler.
<H1><A NAME="SEC6" HREF="treecc_toc.html#TOC6">A simple example for expressions</A></H1>
<P>
<A NAME="IDX2"></A>
<P>
Consider the following yacc grammar for a simple expression language:
<PRE>
%token INT FLOAT
%%
expr: INT
| FLOAT
| '(' expr ')'
| expr '+' expr
| expr '-' expr
| expr '*' expr
| expr '/' expr
| '-' expr
;
</PRE>
<P>
(We will ignore the problems of precedence and associativity and
assume that the reader is familiar with how to resolve such issues
in yacc grammars).
<P>
There are 7 types of nodes for this grammar: <SAMP>`intnum'</SAMP>, <SAMP>`floatnum'</SAMP>,
<SAMP>`plus'</SAMP>, <SAMP>`minus'</SAMP>, <SAMP>`multiply'</SAMP>, <SAMP>`divide'</SAMP>, and <SAMP>`negate'</SAMP>.
They are defined in treecc as follows:
<PRE>
%node expression %abstract %typedef
%node binary expression %abstract =
{
expression *expr1;
expression *expr2;
}
%node unary expression %abstract =
{
expression *expr;
}
%node intnum expression =
{
int num;
}
%node floatnum expression =
{
float num;
}
%node plus binary
%node minus binary
%node multiply binary
%node divide binary
%node negate unary
</PRE>
<P>
We have introduced three extra node types that refer
to any expression, binary expressions, and unary expressions. These
can be seen as superclasses in an OO-style framework. We have
declared these node types as <SAMP>`abstract'</SAMP> because the yacc grammar
will not be permitted to create instances of these classes directly.
<P>
The <SAMP>`binary'</SAMP>, <SAMP>`unary'</SAMP>, <SAMP>`intnum'</SAMP>, and <SAMP>`floatnum'</SAMP>
node types have field definitions associated with them. These have
a similar syntax to C <CODE>struct</CODE> declarations.
<P>
The yacc grammar is augmented as follows to build the parse tree:
<PRE>
%union {
expression *node;
int inum;
float fnum;
}
%token INT FLOAT
%type <node> expr
%type <inum> INT
%type <fnum> FLOAT
%%
expr: INT { $$ = intnum_create($1); }
| FLOAT { $$ = floatnum_create($1); }
| '(' expr ')' { $$ = $2; }
| expr '+' expr { $$ = plus_create($1, $3); }
| expr '-' expr { $$ = minus_create($1, $3); }
| expr '*' expr { $$ = multiply_create($1, $3); }
| expr '/' expr { $$ = divide_create($1, $3); }
| '-' expr { $$ = negate_create($2); }
;
</PRE>
<P>
The treecc translator generates the <SAMP>`*_create'</SAMP> functions so that
the rest of the compiler can build the necessary data structures
on demand. The parameters to the <SAMP>`*_create'</SAMP> functions
are identical in type and order to the members of the structure for
that node type.
<P>
Because <SAMP>`expression'</SAMP>, <SAMP>`binary'</SAMP>, and <SAMP>`unary'</SAMP> are abstract,
there will be no <SAMP>`*_create'</SAMP> functions associated with them. This will
help the programmer catch certain kinds of errors.
<P>
The type that is returned from a <SAMP>`*_create'</SAMP> function is the first
superclass of the node that has a <SAMP>`%typedef'</SAMP> keyword associated with it;
<SAMP>`expression *'</SAMP> in this case.
<H2><A NAME="SEC7" HREF="treecc_toc.html#TOC7">Storing extra information</A></H2>
<P>
Normally we will want to store extra information with a node beyond
that which is extracted by the yacc grammar. In our expression
example, we probably want to store type information in the nodes
so that we can determine if the whole expression is integer or
floating point during semantic analysis. We can add type information
to the <SAMP>`expression'</SAMP> node type as follows:
<PRE>
%node expression %abstract %typedef =
{
%nocreate type_code type;
}
</PRE>
<P>
The <SAMP>`%nocreate'</SAMP> flag indicates that the field should not be passed
to the <SAMP>`*_create'</SAMP> functions as a parameter. i.e. it provides semantic
information that isn't present in the grammar. When nodes are created,
any fields that are declared as <SAMP>`%nocreate'</SAMP> will be undefined in value.
A default value can be specified as follows:
<PRE>
%node expression %abstract %typedef =
{
%nocreate type_code type = {int_type};
}
</PRE>
<P>
Default values must be enclosed in <SAMP>`{'</SAMP> and <SAMP>`}'</SAMP> because they are
pieces of code in the underlying source language (C, C++, etc), instead
of tokens in the treecc syntax. Any legitimate expression in the
underlying source language may be used.
<P>
We also need to arrange for <SAMP>`type_code'</SAMP> to be declared. One way to
do this is by adding a <SAMP>`%decls'</SAMP> section to the front of the treecc
input file:
<PRE>
%decls %{
typedef enum
{
int_type,
float_type
} type_code;
%}
</PRE>
<P>
We could have introduced the definition by placing a <SAMP>`#include'</SAMP>
directive into the <SAMP>`%decls'</SAMP> section instead, or by defining a
treecc enumerated type:
<PRE>
%enum type_code =
{
int_type,
float_type
}
</PRE>
<P>
Now that we have these definitions, type-inferencing can be implemented
as follows:
<PRE>
%operation void infer_type(expression *e)
infer_type(binary)
{
infer_type(e->expr1);
infer_type(e->expr2);
if(e->expr1->type == float_type || e->expr2->type == float_type)
{
e->type = float_type;
}
else
{
e->type = int_type;
}
}
infer_type(unary)
{
infer_type(e->expr);
e->type = e->expr->type;
}
infer_type(intnum)
{
e->type = int_type;
}
</PRE>
<P>
This example demonstrates using the abstract node types <SAMP>`binary'</SAMP> and
<SAMP>`unary'</SAMP> to define operations on all subclasses. The treecc translator
will generate code for a full C function called <SAMP>`infer_type'</SAMP> that
incorporates all of the cases.
<P>
But hang on a second! What happened to <SAMP>`floatnum'</SAMP>? Where did it
go? It turns out that treecc will catch this. It will report
an error to the effect that <SAMP>`node type `floatnum' is not handled in
operation `infer_type''</SAMP>. Here is its definition:
<PRE>
infer_type(floatnum)
{
e->type = float_type;
}
</PRE>
<P>
As we can see, treecc has just caught a bug in the language
implementation and reported it to us as soon as we introduced it.
<P>
Let's now extend the language with a <SAMP>`power'</SAMP> operator:
<PRE>
yacc:
expr: expr '^' expr { $$ = create_power($1, $3); }
;
treecc:
%node power binary
</PRE>
<P>
That's all there is to it! When treecc re-translates the input
file, it will modify the definition of <SAMP>`infer_type'</SAMP> to include the
extra case for <SAMP>`power'</SAMP> nodes. Because <SAMP>`power'</SAMP> is a subclass of
<SAMP>`binary'</SAMP>, treecc already knows how to perform type inferencing for the
new node and it doesn't warn us about a missing declaration.
<P>
What if we wanted to restrict the second argument of <SAMP>`power'</SAMP> to be
an integer value? We can add the following case to <SAMP>`infer_type'</SAMP>:
<PRE>
infer_type(power)
{
infer_type(e->expr1);
infer_type(e->expr2);
if(e->expr2->type != int_type)
{
error("second argument to `^' is not an integer");
}
e->type = e->expr1->type;
}
</PRE>
<P>
The translator now notices that there is a more specific implementation
of <SAMP>`infer_type'</SAMP> for <SAMP>`power'</SAMP>, and won't use the <SAMP>`binary'</SAMP>
case for it.
<P>
The most important thing to realise here is that the translator always
checks that there are sufficient declarations for <SAMP>`infer_type'</SAMP> to cover
all relevant node types. If it detects a lack, it will immediately
raise an error to the user. This allows tree coverage problems to
be found a lot sooner than with the traditional approach.
<P>
See section <A HREF="treecc.html#SEC23">Full expression example code</A>, for a complete listing of the above
example files.
<H1><A NAME="SEC8" HREF="treecc_toc.html#TOC8">Invoking treecc from the command-line</A></H1>
<P>
<A NAME="IDX3"></A>
<A NAME="IDX4"></A>
<P>
The general form of treecc's command-line syntax is as follows:
<PRE>
treecc [OPTIONS] INPUT ...
</PRE>
<P>
Treecc accepts the following command-line options:
<DL COMPACT>
<DT><CODE>-o FILE</CODE>
<DD>
<DT><CODE>--output FILE</CODE>
<DD>
Set the name of the output file to <SAMP>`FILE'</SAMP>. If this option is not
supplied, then the name of the first input file will be used, with its
extension changed to <SAMP>`.c'</SAMP>. If the input is standard input,
the default output file is <SAMP>`yy_tree.c'</SAMP>.
This option may be overridden using the <SAMP>`%output'</SAMP> keyword in
the input files.
<DT><CODE>-h FILE</CODE>
<DD>
<DT><CODE>--header FILE</CODE>
<DD>
Set the name of the header output file to <SAMP>`FILE'</SAMP>. This is only
used for the C and C++ output languages. If this option is not supplied,
then the name of the output file will be used, with its extension
changed to <SAMP>`.h'</SAMP>. If the input is standard input, the default header
output file is <SAMP>`yy_tree.h'</SAMP>.
This option may be overriden using the <SAMP>`%header'</SAMP> keyword in the
input files. If this option is used with a language that does not require
headers, it will be ignored.
<DT><CODE>-d DIR</CODE>
<DD>
<DT><CODE>--output-dir DIR</CODE>
<DD>
Set the name of the Java output directory to <SAMP>`DIR'</SAMP>. This is only
used for the Java language. If this option is not supplied, then the
directory corresponding to the first input file is used. If the input
is standard input, the default is the current directory.
This option may be overriden using the <SAMP>`%outdir'</SAMP> keyword in the
input files. If this option is used with a language other than Java,
it will be ignored.
<DT><CODE>-s DIR</CODE>
<DD>
<DT><CODE>--skeleton-dir DIR</CODE>
<DD>
Set the name of the directory that contains the skeleton files for the
C and C++ node memory managers to <SAMP>`DIR'</SAMP>.
<DT><CODE>-e EXT</CODE>
<DD>
<DT><CODE>--extension EXT</CODE>
<DD>
Change the default output file extension to <SAMP>`ext'</SAMP>, instead of
<SAMP>`.c'</SAMP>. The value <SAMP>`ext'</SAMP> can have a leading dot, but this is
not required.
<DT><CODE>-f</CODE>
<DD>
<DT><CODE>--force-create</CODE>
<DD>
Treecc normally attempts to optimise the creation of output files
so that they are only modified if a non-trivial change has
occurred in the input. This can reduce the number of source
code recompiles when treecc is used in combination with make.
This option forces the output files to be created, even if they
are the same as existing files with the same name.
The declaration <SAMP>`%option force'</SAMP> can be used in the input files
to achieve the same effect as this option.
<DT><CODE>-n</CODE>
<DD>
<DT><CODE>--no-output</CODE>
<DD>
Suppress the generation of output files. Treecc parses the
input files, checks for errors, and then stops.
<DT><CODE>--help</CODE>
<DD>
Print a usage message for the treecc program.
<DT><CODE>-v</CODE>
<DD>
<DT><CODE>--version</CODE>
<DD>
Print the version of the treecc program.
<DT><CODE>--</CODE>
<DD>
Marks the end of the command-line options, and the beginning of
the input filenames. You may need to use this if your filename
begins with <SAMP>`-'</SAMP>. e.g. <SAMP>`treecc -- -input.tc'</SAMP>. This is
not needed if the input is standard input: <SAMP>`treecc -'</SAMP>
is perfectly valid.
</DL>
<H1><A NAME="SEC9" HREF="treecc_toc.html#TOC9">Syntax of input files</A></H1>
<P>
<A NAME="IDX5"></A>
<P>
Treecc input files consist of zero or more declarations that define
nodes, operations, options, etc. The following sections describe each
of these elements.
<H2><A NAME="SEC10" HREF="treecc_toc.html#TOC10">Node declarations</A></H2>
<P>
<A NAME="IDX6"></A>
<A NAME="IDX7"></A>
<A NAME="IDX8"></A>
<P>
Node types are defined using the <SAMP>`node'</SAMP> keyword in input files.
The general form of the declaration is:
<PRE>
%node NAME [ PNAME ] [ FLAGS ] [ = FIELDS ]
</PRE>
<DL COMPACT>
<DT><SAMP>`NAME'</SAMP>
<DD>
An identifier that is used to refer to the node type elsewhere
in the treecc definition. It is also the name of the type that will be
visible to the programmer in literal code blocks.
<DT><SAMP>`PNAME'</SAMP>
<DD>
An identifier that refers to the parent node type that <SAMP>`NAME'</SAMP> inherits
from. If <SAMP>`PNAME'</SAMP> is not supplied, then <SAMP>`NAME'</SAMP> is a top-level
declaration. It is legal to supply a <SAMP>`PNAME'</SAMP> that has not yet
been defined in the input.
<DT><SAMP>`FLAGS'</SAMP>
<DD>
Any combination of <SAMP>`%abstract'</SAMP> and <SAMP>`%typedef'</SAMP>:
<DL COMPACT>
<DT><SAMP>`%abstract'</SAMP>
<DD>
<A NAME="IDX9"></A>
The node type cannot be constructed by the programmer. In addition,
the programmer does not need to define operation cases for this node
type if all subtypes have cases associated with them.
<DT><SAMP>`%typedef'</SAMP>
<DD>
<A NAME="IDX10"></A>
The node type is used as the common return type for node creation
functions. Top-level declarations must have a <SAMP>`%typedef'</SAMP> keyword.
</DL>
</DL>
<P>
The <SAMP>`FIELDS'</SAMP> part of a node declaration defines the fields that
make up the node type. Each field has the following general form:
<PRE>
[ %nocreate ] TYPE FNAME [ = VALUE ] ';'
</PRE>
<DL COMPACT>
<DT><SAMP>`%nocreate'</SAMP>
<DD>
<A NAME="IDX11"></A>
The field is not used in the node's constructor. When the node is
constructed, the value of this field will be undefined unless
<SAMP>`VALUE'</SAMP> is specified.
<DT><SAMP>`TYPE'</SAMP>
<DD>
The type that is associated with the field. Types can be declared
using a subset of the C declaration syntax, augmented with some C++
and Java features. See section <A HREF="treecc.html#SEC11">Types used in fields and parameters</A>, for more information.
<DT><SAMP>`FNAME'</SAMP>
<DD>
The name to associate with the field. Treecc verifies that the field
does not currently exist in this node type, or in any of its ancestor
node types.
<DT><SAMP>`VALUE'</SAMP>
<DD>
The default value to assign to the field in the node's constructor.
This can only be used on fields that are declared with <SAMP>`%nocreate'</SAMP>.
The value must be enclosed in braces. For example <SAMP>`{NULL}'</SAMP> would
be used to initialize a field with <SAMP>`NULL'</SAMP>.
The braces are required because the default value is expressed in
the underlying source language, and can use any of the usual constant
declaration features present in that language.
</DL>
<P>
When the output language is C, treecc creates a struct-based type
called <SAMP>`NAME'</SAMP> that contains the fields for <SAMP>`NAME'</SAMP> and
all of its ancestor classes. The type also contains some house-keeping
fields that are used internally by the generated code. The following
is an example:
<PRE>
typedef struct binary__ binary;
struct binary__ {
const struct binary_vtable__ *vtable__;
int kind__;
char *filename__;
long linenum__;
type_code type;
expression * expr1;
expression * expr2;
};
</PRE>
<P>
The programmer should avoid using any identifier that
ends with <SAMP>`__'</SAMP>, because it may clash with house-keeping
identifiers that are generated by treecc.
<P>
When the output language is C++, Java, or C#, treecc creates a class
called <SAMP>`NAME'</SAMP>, that inherits from the class <SAMP>`PNAME'</SAMP>.
The field definitions for <SAMP>`NAME'</SAMP> are converted into public members
in the output.
<H2><A NAME="SEC11" HREF="treecc_toc.html#TOC11">Types used in fields and parameters</A></H2>
<P>
<A NAME="IDX12"></A>
<P>
Types that are used in field and parameter declarations have a
syntax which is subset of features found in C, C++, and Java:
<PRE>
TypeAndName ::= Type [ IDENTIFIER ]
Type ::= TypeName
| Type '*'
| Type '&'
| Type '[' ']'
TypeName ::= IDENTIFIER { IDENTIFIER }
</PRE>
<P>
Types are usually followed by an identifier that names the field or
parameter. The name is required for fields and is optional for parameters.
For example <SAMP>`int'</SAMP> is usually equivalent to <SAMP>`int x'</SAMP> in parameter
declarations.
<P>
The following are some examples of using types:
<PRE>
int
int x
const char *str
expression *expr
Element[][] array
Item&
unsigned int y
const Element
</PRE>
<P>
The grammar used by treecc is slightly ambiguous. The last example above
declares a parameter called <SAMP>`Element'</SAMP>, that has type <SAMP>`const'</SAMP>.
The programmer probably intended to declare an anonymous parameter with type
<SAMP>`const Element'</SAMP> instead.
<P>
This ambiguity is unavoidable given that treecc is not fully
aware of the underlying language's type system. When treecc
sees a type that ends in a sequence of identifiers, it will
always interpret the last identifier as the field or parameter
name. Thus, the programmer must write the following instead:
<PRE>
const Element e
</PRE>
<P>
Treecc cannot declare types using the full power of C's type system.
The most common forms of declarations are supported, and the rest
can usually be obtained by defining a <SAMP>`typedef'</SAMP> within a
literal code block. See section <A HREF="treecc.html#SEC15">Literal code declarations</A>, for more information
on literal code blocks.
<P>
It is the responsibility of the programmer to use type constructs
that are supported by the underlying programming language. Types such
as <SAMP>`const char *'</SAMP> will give an error when the output is compiled
with a Java compiler, for example.
<H2><A NAME="SEC12" HREF="treecc_toc.html#TOC12">Enumerated type declarations</A></H2>
<P>
<A NAME="IDX13"></A>
<A NAME="IDX14"></A>
<A NAME="IDX15"></A>
<P>
Enumerated types are a special kind of node type that can be used
by the programmer for simple values that don't require a full abstract
syntax tree node. The following is an example of defining a list
of the primitive machine types used in a Java virtual machine:
<PRE>
%enum JavaType =
{
JT_BYTE,
JT_SHORT,
JT_CHAR,
JT_INT,
JT_LONG,
JT_FLOAT,
JT_DOUBLE,
JT_OBJECT_REF
}
</PRE>
<P>
Enumerations are useful when writing code generators and type
inferencing routines. The general form is:
<PRE>
%enum NAME = { VALUES }
</PRE>
<DL COMPACT>
<DT><SAMP>`NAME'</SAMP>
<DD>
An identifier to be used to name the enumerated type. The name must
not have been previously used as a node type, an enumerated type, or
an enumerated value.
<DT><SAMP>`VALUES'</SAMP>
<DD>
A comma-separated list of identifiers that name the values within
the enumeration. Each of the names must be unique, and must not have
been used previously as a node type, an enumerated type, or an
enumerated value.
</DL>
<P>
Logically, each enumerated value is a special node type that inherits from
a parent node type corresponding to the enumerated type <SAMP>`NAME'</SAMP>.
<P>
When the output language is C or C++, treecc generates an enumerated
typedef for <SAMP>`NAME'</SAMP> that contains the enumerated values in the
same order as was used in the input file. The typedef name can be
used elsewhere in the code as the type of the enumeration.
<P>
When the output language is Java, treecc generates a class called
<SAMP>`NAME'</SAMP> that contains the enumerated values as integer constants.
Elsewhere in the code, the type <SAMP>`int'</SAMP> must be used to declare
variables of the enumerated type. Enumerated values are referred
to as <SAMP>`NAME.VALUE'</SAMP>. If the enumerated type is used as a trigger
parameter, then <SAMP>`NAME'</SAMP> must be used instead of <SAMP>`int'</SAMP>:
treecc will convert the type when the Java code is output.
<P>
When the output language is C#, treecc generates an enumerated value
type called <SAMP>`NAME'</SAMP> that contains the enumerated values as
members. The C# type <SAMP>`NAME'</SAMP> can be used elsewhere in the code
as the type of the enumeration. Enumerated values are referred to
as <SAMP>`NAME.VALUE'</SAMP>.
<H2><A NAME="SEC13" HREF="treecc_toc.html#TOC13">Operation declarations</A></H2>
<P>
<A NAME="IDX16"></A>
<A NAME="IDX17"></A>
<A NAME="IDX18"></A>
<A NAME="IDX19"></A>
<A NAME="IDX20"></A>
<P>
Operations are declared in two parts: the declaration, and the
cases. The declaration part defines the prototype for the
operation and the cases define how to handle specific kinds of
nodes for the operation.
<P>
Operations are defined over one or more trigger parameters. Each
trigger parameter specifies a node type or an enumerated type that
is selected upon to determine what course of action to take. The
following are some examples of operation declarations:
<PRE>
%operation void infer_type(expression *e)
%operation type_code common_type([type_code t1], [type_code t2])
</PRE>
<P>
Trigger parameters are specified by enclosing them in square
brackets. If none of the parameters are enclosed in square
brackets, then treecc assumes that the first parameter is the
trigger.
<P>
The general form of an operation declaration is as follows:
<PRE>
%operation { %virtual | %inline | %split } RTYPE [CLASS::]NAME(PARAMS)
</PRE>
<DL COMPACT>
<DT><SAMP>`%virtual'</SAMP>
<DD>
<A NAME="IDX21"></A>
Specifies that the operation is associated with a node type as
a virtual method. There must be only one trigger parameter,
and it must be the first parameter.
Non-virtual operations are written to the output source files
as global functions.
<DT><SAMP>`%inline'</SAMP>
<DD>
<A NAME="IDX22"></A>
Optimise the generation of the operation code so that all cases
are inline within the code for the function itself. This can
only be used with non-virtual operations, and may improve
code efficiency if there are lots of operation cases with a
small amount of code in each.
<DT><SAMP>`%split'</SAMP>
<DD>
<A NAME="IDX23"></A>
Split the generation of the multi-trigger operation code across
multiple functions, to reduce the size of each individual function.
It is sometimes necessary to split large <CODE>%inline</CODE> operations
to avoid compiler limits on function size.
<DT><SAMP>`RTYPE'</SAMP>
<DD>
The type of the return value for the operation. This should be
<SAMP>`void'</SAMP> if the operation does not have a return value.
<DT><SAMP>`CLASS'</SAMP>
<DD>
The name of the class to place the operation's definition within.
This can only be used with non-virtual operations, and is
intended for languages such as Java and C# that cannot declare
methods outside of classes. The class name will be ignored if
the output language is C.
If a class name is required, but the programmer did not supply it,
then <SAMP>`NAME'</SAMP> will be used as the default. The exception to
this is the C# language: <SAMP>`CLASS'</SAMP> must always be supplied and
it must be different from <SAMP>`NAME'</SAMP>. This is due to a "feature"
in some C# compilers that forbid a method with the same name as
its enclosing class.
<DT><SAMP>`NAME'</SAMP>
<DD>
The name of the operation.
<DT><SAMP>`PARAMS'</SAMP>
<DD>
The parameters to the operation. Trigger parameters may be
enclosed in square brackets. Trigger parameters must be
either node types or enumerated types.
</DL>
<P>
Once an operation has been declared, the programmer can specify
its cases anywhere in the input files. It is not necessary that
the cases appear after the operation, or that they be contiguous
within the input files. This permits the programmer to place
operation cases where they are logically required for maintainence
reasons.
<P>
There must be sufficient operation cases defined to cover every
possible combination of node types and enumerated values that
inherit from the specified trigger types. An operation case
has the following general form:
<PRE>
NAME(TRIGGERS) [, NAME(TRIGGERS2) ...]
{
CODE
}
</PRE>
<DL COMPACT>
<DT><SAMP>`NAME'</SAMP>
<DD>
The name of the operation for which this case applies.
<DT><SAMP>`TRIGGERS'</SAMP>
<DD>
A comma-separated list of node types or enumerated values that
define the specific case that is handled by the following code.
<DT><SAMP>`CODE'</SAMP>
<DD>
Source code in the output source language that implements the
operation case.
</DL>
<P>
Multiple trigger combinations can be associated with a single
block of code, by listing them all, separated by commas. For
example:
<PRE>
common_type(int_type, int_type)
{
return int_type;
}
common_type(int_type, float_type),
common_type(float_type, int_type),
common_type(float_type, float_type)
{
return float_type;
}
</PRE>
<H2><A NAME="SEC14" HREF="treecc_toc.html#TOC14">Options that modify treecc's behaviour</A></H2>
<P>
<A NAME="IDX24"></A>
<A NAME="IDX25"></A>
<A NAME="IDX26"></A>
<P>
"(*)" is used below to indicate an option that is enabled by default.
<DL COMPACT>
<DT><SAMP>`%option track_lines'</SAMP>
<DD>
<A NAME="IDX27"></A>
Enable the generation of code that can track the current filename and
line number when nodes are created. See section <A HREF="treecc.html#SEC17">Tracking line numbers in source files</A>, for more
information. (*)
<DT><SAMP>`%option no_track_lines'</SAMP>
<DD>
<A NAME="IDX28"></A>
Disable the generation of code that performs line number tracking.
<DT><SAMP>`%option singletons'</SAMP>
<DD>
<A NAME="IDX29"></A>
Optimise the creation of singleton node types. These are
node types without any fields. Treecc can optimise the code
so that only one instance of a singleton node type exists in
the system. This can speed up the creation of nodes for
constants within compilers. (*)
Singleton optimisations will have no effect if <SAMP>`track_lines'</SAMP>
is enabled, because line tracking uses special hidden fields in
every node.
<DT><SAMP>`%option no_singletons'</SAMP>
<DD>
<A NAME="IDX30"></A>
Disable the optimisation of singleton node types.
<DT><SAMP>`%option reentrant'</SAMP>
<DD>
<A NAME="IDX31"></A>
Enable the generation of reentrant code that does not rely
upon any global variables. Separate copies of the compiler
state can be used safely in separate threads. However, the
same copy of the compiler state cannot be used safely in two or
more threads.
<DT><SAMP>`%option no_reentrant'</SAMP>
<DD>
<A NAME="IDX32"></A>
Disable the generation of reentrant code. The interface to
node management functions is simpler, but cannot be used
in a threaded environment. (*)
<DT><SAMP>`%option force'</SAMP>
<DD>
<A NAME="IDX33"></A>
Force output source files to be written, even if they are
unchanged. This option can also be set using the <SAMP>`-f'</SAMP>
command-line option.
<DT><SAMP>`%option no_force'</SAMP>
<DD>
<A NAME="IDX34"></A>
Don't force output source files to be written if they are the
same as before. (*)
This option can help smooth integration of treecc with make.
Only those output files that have changed will be modified.
This reduces the number of files that the underlying source
language compiler must process after treecc is executed.
<DT><SAMP>`%option virtual_factory'</SAMP>
<DD>
<A NAME="IDX35"></A>
Use virtual methods in the node type factories, so that the
programmer can subclass the factory and provide new
implementations of node creation functions. This option is
ignored for C, which does not use factories.
<DT><SAMP>`%option no_virtual_factory'</SAMP>
<DD>
<A NAME="IDX36"></A>
Don't use virtual methods in the node type factories. (*)
<DT><SAMP>`%option abstract_factory'</SAMP>
<DD>
<A NAME="IDX37"></A>
Use abstract virtual methods in the node type factories.
The programmer is responsible for subclassing the factory
to provide node creation functionality.
<DT><SAMP>`%option no_abstract_factory'</SAMP>
<DD>
<A NAME="IDX38"></A>
Don't use abstract virtual methods in the node type factories. (*)
<DT><SAMP>`%option kind_in_node'</SAMP>
<DD>
<A NAME="IDX39"></A>
Put the kind field in the node, for more efficient access at runtime. (*)
<DT><SAMP>`%option kind_in_vtable'</SAMP>
<DD>
<A NAME="IDX40"></A>
Put the kind field in the vtable, and not the node. This saves some
memory, at the cost of slower access to the kind value at runtime.
This option only applies when the language is C. The kind field is
always placed in the node in other languages, because it isn't possible
to modify the vtable.
<DT><SAMP>`%option prefix = PREFIX'</SAMP>
<DD>
<A NAME="IDX41"></A>
Specify the prefix to be used in output files in place of "yy".
<DT><SAMP>`%option state_type = NAME'</SAMP>
<DD>
<A NAME="IDX42"></A>
Specify the name of the state type. The state type is generated
by treecc to perform centralised memory management and reentrancy
support. The default value is <SAMP>`YYNODESTATE'</SAMP>. If the output language
uses factories, then this will also be the name of the factory
base class.
<DT><SAMP>`%option namespace = NAME'</SAMP>
<DD>
<A NAME="IDX43"></A>
Specify the namespace to write definitions to in the output
source files. This option is ignored when the output language
is C.
<DT><SAMP>`%option package = NAME'</SAMP>
<DD>
<A NAME="IDX44"></A>
Same as <SAMP>`%option namespace = NAME'</SAMP>. Provided because <SAMP>`package'</SAMP>
is more natural for Java programmers.
<DT><SAMP>`%option base = NUM'</SAMP>
<DD>
<A NAME="IDX45"></A>
Specify the numeric base to use for allocating numeric values to
node types. By default, node type allocation begins at 1.
<DT><SAMP>`%option lang = LANGUAGE'</SAMP>
<DD>
<A NAME="IDX46"></A>
Specify the output language. Must be one of <CODE>"C"</CODE>, <CODE>"C++"</CODE>,
<CODE>"Java"</CODE>, or <CODE>"C#"</CODE>. The default is <CODE>"C"</CODE>.
<DT><SAMP>`%option block_size = NUM'</SAMP>
<DD>
<A NAME="IDX47"></A>
Specify the size of the memory blocks to use in C and C++ node allocators.
<DT><SAMP>`%option strip_filenames'</SAMP>
<DD>
<A NAME="IDX48"></A>
Strip filenames down to their base name in <CODE>#line</CODE> directives.
i.e. strip off the directory component. This can be helpful in
combination with the <CODE>%include %readonly</CODE> command when
treecc input files may processed from different directories,
causing common output files to change unexpectedly.
<DT><SAMP>`%option no_strip_filenames'</SAMP>
<DD>
<A NAME="IDX49"></A>
Don't strip filenames in <CODE>#line</CODE> directives. (*)
</DL>
<H2><A NAME="SEC15" HREF="treecc_toc.html#TOC15">Literal code declarations</A></H2>
<P>
<A NAME="IDX50"></A>
<P>
Sometimes it is necessary to embed literal code within output <SAMP>`.h'</SAMP>
and source files. Usually this is to <SAMP>`#include'</SAMP> definitions
from other files, or to define functions that cannot be easily expressed
as operations.
<P>
A literal code block is specified by enclosing it in <SAMP>`%{'</SAMP> and
<SAMP>`%}'</SAMP>. The block can also be prefixed with the following flags:
<DL COMPACT>
<DT><SAMP>`%decls'</SAMP>
<DD>
<A NAME="IDX51"></A>
Write the literal code to the currently active declaration header file,
instead of the source file.
<DT><SAMP>`%both'</SAMP>
<DD>
<A NAME="IDX52"></A>
Write the literal code to both the currently active declaration header file
and the currently active source file.
<DT><SAMP>`%end'</SAMP>
<DD>
<A NAME="IDX53"></A>
Write the literal code to the end of the file, instead of the beginning.
</DL>
<P>
Another form of literal code block is one which begins with <SAMP>`%%'</SAMP> and
extends to the end of the current input file. This form implicitly has
the <SAMP>`%end'</SAMP> flag.
<H2><A NAME="SEC16" HREF="treecc_toc.html#TOC16">Changing input and output files</A></H2>
<P>
<A NAME="IDX54"></A>
<P>
Most treecc compiler definitions will be too large to be manageable
in a single input file. They also will be too large to write to a
single output file, because that may overload the source language
compiler.
<P>
Multiple input files can be specified on the command-line, or
they can be explicitly included by other input files with
the following declarations:
<DL COMPACT>
<DT><SAMP>`%include [ %readonly ] FILENAME'</SAMP>
<DD>
<A NAME="IDX55"></A>
<A NAME="IDX56"></A>
<A NAME="IDX57"></A>
Include the contents of the specified file at the current point
within the current input file. <SAMP>`FILENAME'</SAMP> is interpreted
relative to the name of the current input file.
If the <SAMP>`%readonly'</SAMP> keyword is supplied, then any output
files that are generated by the included file must be read-only.
That is, no changes are expected by performing the inclusion.
The <SAMP>`%readonly'</SAMP> keyword is useful for building compilers
in layers. The programmer may group a large number of useful
node types and operations together that are independent of the
particulars of a given language. The programmer then defines
language-specific compilers that "inherit" the common definitions.
Read-only inclusions ensure that any extensions that are added
by the language-specific parts do not "leak" into the common code.
</DL>
<P>
Output files can be changed using the follow declarations:
<DL COMPACT>
<DT><SAMP>`%header FILENAME'</SAMP>
<DD>
<A NAME="IDX58"></A>
<A NAME="IDX59"></A>
Change the currently active declaration header file to <SAMP>`FILENAME'</SAMP>,
which is interpreted relative to the current input file. This option
has no effect for languages without header files (Java and C#).
Any node types and operations that are defined after a <SAMP>`%header'</SAMP>
declaration will be declared in <SAMP>`FILENAME'</SAMP>.
<DT><SAMP>`%output FILENAME'</SAMP>
<DD>
<A NAME="IDX60"></A>
<A NAME="IDX61"></A>
Change the currently active source file to <SAMP>`FILENAME'</SAMP>,
which is interpreted relative to the current input file. This option
has no effect for languages that require a single class per file (Java).
Any node types and operations that are defined after a <SAMP>`%header'</SAMP>
declaration will have their implementations placed in <SAMP>`FILENAME'</SAMP>.
<DT><SAMP>`%outdir DIRNAME'</SAMP>
<DD>
<A NAME="IDX62"></A>
<A NAME="IDX63"></A>
Change the output source directory to <SAMP>`DIRNAME'</SAMP>. This is only
used for Java, which requires that a single file be used for each class.
All classes are written to the specified directory. By default,
<SAMP>`DIRNAME'</SAMP> is the current directory where treecc was invoked.
</DL>
<P>
When treecc generates the output source code, it must insert several
common house-keeping functions and classes into the code. By default,
these are written to the first header and source files. This can
be changed with the <SAMP>`%common'</SAMP> declaration:
<DL COMPACT>
<DT><SAMP>`%common'</SAMP>
<DD>
<A NAME="IDX64"></A>
<A NAME="IDX65"></A>
Output the common house-keeping code to the currently active
declaration header file and the currently active source file.
This is typically used as follows:
<PRE>
%header "common.h"
%output "common.c"
%common
</PRE>
</DL>
<H1><A NAME="SEC17" HREF="treecc_toc.html#TOC17">Tracking line numbers in source files</A></H1>
<P>
<A NAME="IDX66"></A>
<P>
When compilers emit error messages to the programmer, it is generally
a good idea to indicate which file and which line gave rise to the
error. Syntax errors can be emitted fairly easily because the parser
usually has access to the current line number. However, semantic
errors are harder to report because the parser may no longer be
active when the error is detected.
<P>
Treecc can generate code that automatically keeps track of what line
in the source file was active when a node is created. Every node
has two extra private fields that specify the name of the file and the
line number. Semantic analysis routines can query this information
when reporting errors.
<P>
Because treecc is not aware of how to obtain this information, the
programmer must supply some additional functions. See section <A HREF="treecc.html#SEC18">API's available in the generated output</A>,
for more information.
<P>
See section <A HREF="treecc.html#SEC18">API's available in the generated output</A>, for more information.
<H1><A NAME="SEC18" HREF="treecc_toc.html#TOC18">API's available in the generated output</A></H1>
<P>
<A NAME="IDX67"></A>
<P>
The source code that is generated by treecc exports a number of
application programmer interfaces (API's) to the programmer. These
can be used elsewhere in the compiler implementation to manipulate
abstract syntax trees. The following sections describe the API's
for each of the output languages.
<H2><A NAME="SEC19" HREF="treecc_toc.html#TOC19">C Language APIs</A></H2>
<P>
<A NAME="IDX68"></A>
<P>
In the C output language, each node type is converted into a <SAMP>`typedef'</SAMP>
that contains the node's fields, and the fields of its ancestor node
types. The following example demonstrates how treecc node declarations
are converted into C source code:
<PRE>
%node expression %abstract %typedef =
{
%nocreate type_code type;
}
%node binary expression %abstract =
{
expression *expr1;
expression *expr2;
}
%node plus binary
</PRE>
<P>
becomes:
<PRE>
typedef struct expression__ expression;
typedef struct binary__ binary;
typedef struct plus__ plus;
struct expression__ {
const struct expression_vtable__ *vtable__;
int kind__;
char *filename__;
long linenum__;
type_code type;
};
struct binary__ {
const struct binary_vtable__ *vtable__;
int kind__;
char *filename__;
long linenum__;
type_code type;
expression * expr1;
expression * expr2;
};
struct plus__ {
const struct plus_vtable__ *vtable__;
int kind__;
char *filename__;
long linenum__;
type_code type;
expression * expr1;
expression * expr2;
};
</PRE>
<P>
Programmers should avoid using any identifiers that end in
<SAMP>`__'</SAMP>. Such identifiers are reserved for internal use by treecc
and its support routines.
<P>
For each non-abstract node type called <SAMP>`NAME'</SAMP>, treecc generates a
function called <SAMP>`NAME_create'</SAMP> that creates nodes of that type.
The general form of the function's prototype is as follows:
<PRE>
TYPE *NAME_create([YYNODESTATE *state,] PARAMS)
</PRE>
<DL COMPACT>
<DT><SAMP>`TYPE'</SAMP>
<DD>
The return node type, which is the nearest ancestor that has the
<SAMP>`%typedef'</SAMP> flag.
<DT><SAMP>`NAME'</SAMP>
<DD>
The name of the node type that is being created.
<DT><SAMP>`state'</SAMP>
<DD>
The system state, if reentrant code is being generated.
<DT><SAMP>`PARAMS'</SAMP>
<DD>
The create parameters, consisting of every field that does not
have the <SAMP>`%nocreate'</SAMP> flag. The parameters appear in the
same order as the fields in the node types, from the top-most
ancestor down to the node type itself. For example:
<PRE>
expression *plus_create(expression * expr1, expression * expr2);
</PRE>
</DL>
<P>
Enumerated types are converted into a C <SAMP>`typedef'</SAMP> with the
same name and values:
<PRE>
%enum JavaType =
{
JT_BYTE,
JT_SHORT,
JT_CHAR,
JT_INT,
JT_LONG,
JT_FLOAT,
JT_DOUBLE,
JT_OBJECT_REF
}
</PRE>
<P>
becomes:
<PRE>
typedef enum
{
JT_BYTE,
JT_SHORT,
JT_CHAR,
JT_INT,
JT_LONG,
JT_FLOAT,
JT_DOUBLE,
JT_OBJECT_REF
} JavaType;
</PRE>
<P>
Virtual operations are converted into C macros that invoke the
correct vtable entry on a node type:
<PRE>
%operation %virtual void infer_type(expression *e)
</PRE>
<P>
becomes:
<PRE>
#define infer_type(this__) \
((*(((struct expression_vtable__ *) \
((this__)->vtable__))->infer_type_v__)) \
((expression *)(this__)))
</PRE>
<P>
Calls to <SAMP>`infer_type'</SAMP> can then be made with <SAMP>`infer_type(node)'</SAMP>.
<P>
Non-virtual operations are converted into C functions:
<PRE>
%operation void infer_type(expression *e)
</PRE>
<P>
becomes:
<PRE>
extern void infer_type(expression *e);
</PRE>
<P>
Because virtual and non-virtual operations use a similar call syntax,
it is very easy to convert a virtual operation into a non-virtual
operation when the output language is C. This isn't possible with
the other output languages.
<P>
Other house-keeping tasks are performed by the following functions
and macros. Some of these must be supplied by the programmer.
The <SAMP>`state'</SAMP> parameter is required only if a reentrant compiler is
being built.
<DL COMPACT>
<DT><CODE>int yykind(ANY *node)</CODE>
<DD>
<A NAME="IDX69"></A>
Gets the numeric kind value associated with a particular node.
The kind value for node type <SAMP>`NAME'</SAMP> is called <SAMP>`NAME_kind'</SAMP>.
<DT><CODE>const char *yykindname(ANY *node)</CODE>
<DD>
<A NAME="IDX70"></A>
Gets the name of the node kind associated with a particular node.
This may be helpful for debugging and logging code.
<DT><CODE>int yyisa(ANY *node, type)</CODE>
<DD>
<A NAME="IDX71"></A>
Determines if <SAMP>`node'</SAMP> is an instance of the node type <SAMP>`type'</SAMP>.
<DT><CODE>char *yygetfilename(ANY *node)</CODE>
<DD>
<A NAME="IDX72"></A>
Gets the filename corresponding to where <SAMP>`node'</SAMP> was created
during parsing. This macro is only generated if <SAMP>`%option track_lines'</SAMP>
was specified.
<DT><CODE>long yygetlinenum(ANY *node)</CODE>
<DD>
<A NAME="IDX73"></A>
Gets the line number corresponding to where <SAMP>`node'</SAMP> was created
during parsing. This macro is only generated if <SAMP>`%option track_lines'</SAMP>
was specified.
<DT><CODE>void yysetfilename(ANY *node, char *value)</CODE>
<DD>
<A NAME="IDX74"></A>
Sets the filename associated with <SAMP>`node'</SAMP> to <SAMP>`value'</SAMP>. The
string is not copied, so <SAMP>`value'</SAMP> must persist for the lifetime
of the node. This macro will rarely be required, unless a node
corresponds to a different line than the current parse line. This
macro is only generated if <SAMP>`%option track_lines'</SAMP> was specified.
<DT><CODE>void yysetlinenum(ANY *node, long value)</CODE>
<DD>
<A NAME="IDX75"></A>
Sets the line number associated with <SAMP>`node'</SAMP> to <SAMP>`value'</SAMP>.
This macro will rarely be required, unless a node corresponds to a
different line than the current parse line. This macro is only
generated if <SAMP>`%option track_lines'</SAMP> was specified.
<DT><CODE>char *yycurrfilename([YYNODESTATE *state])</CODE>
<DD>
<A NAME="IDX76"></A>
Get the name of the current input file from the parser. The pointer
that is returned from this function is stored as-is: the string is
not copied. Therefore, the value must persist for at least as long
as the node will persist. This function must be supplied by the programmer
if <SAMP>`%option track_lines'</SAMP> was specified.
<DT><CODE>long yycurrlinenum([YYNODESTATE *state])</CODE>
<DD>
<A NAME="IDX77"></A>
Get the number of the current input line from the parser. This
function must be supplied by the programmer if <SAMP>`%option track_lines'</SAMP>
was specified.
<DT><CODE>void yynodeinit([YYNODESTATE *state])</CODE>
<DD>
<A NAME="IDX78"></A>
Initializes the node memory manager. If the system is reentrant, then
the node memory manager is <SAMP>`state'</SAMP>. Otherwise a global node
memory manager is used.
<DT><CODE>void *yynodealloc([YYNODESTATE *state,] unsigned int size)</CODE>
<DD>
<A NAME="IDX79"></A>
Allocates a block of memory of <SAMP>`size'</SAMP> bytes in size from the
node memory manager. This function is called automatically from
the node-specific <SAMP>`*_create'</SAMP> functions. The programmer will
not normally need to call this function.
This function will return <CODE>NULL</CODE> if the system is out of
memory, or if <SAMP>`size'</SAMP> is too large to be allocated within
the node memory manager. If the system is out of memory, then
<SAMP>`yynodealloc'</SAMP> will call <SAMP>`yynodefailed'</SAMP> prior to
returning <CODE>NULL</CODE>.
<DT><CODE>int yynodepush([YYNODESTATE *state])</CODE>
<DD>
<A NAME="IDX80"></A>
Pushes the current node memory manager position. The next time
<CODE>yynodepop</CODE> is called, the node memory manager will reset to
the pushed position. This function returns zero if the system
is out of memory.
<DT><CODE>void yynodepop([YYNODESTATE *state])</CODE>
<DD>
<A NAME="IDX81"></A>
Pops the current node memory manager position. This function has
no effect if <CODE>yynodepush</CODE> was not called previously.
The <CODE>yynodepush</CODE> and <CODE>yynodepop</CODE> functions can be used
to perform a simple kind of garbage collection on nodes. When
the parser enters a scope, it pushes the node memory manager
position. After all definitions in the scope have been dealt
with, the parser pops the node memory manager to reclaim all
of the memory used.
<DT><CODE>void yynodeclear([YYNODESTATE *state])</CODE>
<DD>
<A NAME="IDX82"></A>
Clears the entire node memory manager and returns it to the
state it had after calling <CODE>yynodeinit</CODE>. This is typically
used upon program shutdown to free all remaining node memory.
<DT><CODE>void yynodefailed([YYNODESTATE *state])</CODE>
<DD>
<A NAME="IDX83"></A>
Called when <CODE>yynodealloc</CODE> or <CODE>yynodepush</CODE> detects that
the system is out of memory. This function must be supplied by
the programmer. The programmer may choose to exit to program
when the system is out of memory; in which case <CODE>yynodealloc</CODE>
will never return <CODE>NULL</CODE>.
</DL>
<H2><A NAME="SEC20" HREF="treecc_toc.html#TOC20">C++ Language APIs</A></H2>
<P>
<A NAME="IDX84"></A>
<P>
In the C++ output language, each node type is converted into a <SAMP>`class'</SAMP>
that contains the node's fields, virtual operations, and other house-keeping
definitions. The following example demonstrates how treecc node declarations
are converted into C++ source code:
<PRE>
%node expression %abstract %typedef =
{
%nocreate type_code type;
}
%node binary expression %abstract =
{
expression *expr1;
expression *expr2;
}
%node plus binary
</PRE>
<P>
becomes:
<PRE>
class expression
{
protected:
int kind__;
char *filename__;
long linenum__;
public:
int getKind() const { return kind__; }
const char *getFilename() const { return filename__; }
int getLinenum() const { return linenum__; }
void setFilename(char *filename) { filename__ = filename; }
void setLinenum(long linenum) { linenum__ = linenum; }
void *operator new(size_t);
void operator delete(void *, size_t);
protected:
expression();
public:
type_code type;
virtual int isA(int kind) const;
virtual const char *getKindName() const;
protected:
virtual ~expression();
};
class binary : public expression
{
protected:
binary(expression * expr1, expression * expr2);
public:
expression * expr1;
expression * expr2;
virtual int isA(int kind) const;
virtual const char *getKindName() const;
protected:
virtual ~binary();
};
class plus : public binary
{
public:
plus(expression * expr1, expression * expr2);
public:
virtual int isA(int kind) const;
virtual const char *getKindName() const;
protected:
virtual ~plus();
};
</PRE>
<P>
The following standard methods are available on every node type:
<DL COMPACT>
<DT><CODE>int getKind()</CODE>
<DD>
<A NAME="IDX85"></A>
Gets the numeric kind value associated with a particular node.
The kind value for node type <SAMP>`NAME'</SAMP> is called <SAMP>`NAME_kind'</SAMP>.
<DT><CODE>virtual const char *getKindName()</CODE>
<DD>
<A NAME="IDX86"></A>
Gets the name of the node kind associated with a particular node.
This may be helpful for debugging and logging code.
<DT><CODE>virtual int isA(int kind)</CODE>
<DD>
<A NAME="IDX87"></A>
Determines if the node is a member of the node type that corresponds
to the numeric kind value <SAMP>`kind'</SAMP>.
<DT><CODE>const char *getFilename()</CODE>
<DD>
<A NAME="IDX88"></A>
Gets the filename corresponding to where the node was created
during parsing. This method is only generated if <SAMP>`%option track_lines'</SAMP>
was specified.
<DT><CODE>long getLinenum()</CODE>
<DD>
<A NAME="IDX89"></A>
Gets the line number corresponding to where the node was created
during parsing. This method is only generated if <SAMP>`%option track_lines'</SAMP>
was specified.
<DT><CODE>void setFilename(char *value)</CODE>
<DD>
<A NAME="IDX90"></A>
Sets the filename associated with the node to <SAMP>`value'</SAMP>. The
string is not copied, so <SAMP>`value'</SAMP> must persist for the lifetime
of the node. This method will rarely be required, unless a node
corresponds to a different line than the current parse line. This
method is only generated if <SAMP>`%option track_lines'</SAMP> was specified.
<DT><CODE>void setLinenum(long value)</CODE>
<DD>
<A NAME="IDX91"></A>
Sets the line number associated with the node to <SAMP>`value'</SAMP>.
This method will rarely be required, unless a node corresponds to a
different line than the current parse line. This method is only
generated if <SAMP>`%option track_lines'</SAMP> was specified.
</DL>
<P>
If the generated code is non-reentrant, then the constructor for the
class can be used to construct nodes of the specified node type. The
constructor parameters are the same as the fields within the node type's
definition, except for <SAMP>`%nocreate'</SAMP> fields.
<P>
If the generated code is reentrant, then nodes cannot be constructed
using the C++ <SAMP>`new'</SAMP> operator. The <SAMP>`*Create'</SAMP> methods
on the <SAMP>`YYNODESTATE'</SAMP> factory class must be used instead.
<P>
The <SAMP>`YYNODESTATE'</SAMP> class contains a number of house-keeping methods
that are used to manage nodes:
<DL COMPACT>
<DT><CODE>static YYNODESTATE *getState()</CODE>
<DD>
<A NAME="IDX92"></A>
Gets the global <SAMP>`YYNODESTATE'</SAMP> instance that is being used by
non-reentrant code. If an instance has not yet been created,
this method will create one.
When using non-reentrant code, the programmer will normally subclass
<SAMP>`YYNODESTATE'</SAMP>, override some of the methods below, and then
construct an instance of the subclass. This constructed instance
will then be returned by future calls to <SAMP>`getState'</SAMP>.
<DT><CODE>void *alloc(size_t size)</CODE>
<DD>
<A NAME="IDX93"></A>
Allocates a block of memory of <SAMP>`size'</SAMP> bytes in size from the
node memory manager. This function is called automatically from
the node-specific constructors and <SAMP>`*Create'</SAMP> methods. The programmer
will not normally need to call this function.
This function will return <CODE>NULL</CODE> if the system is out of
memory, or if <SAMP>`size'</SAMP> is too large to be allocated within
the node memory manager. If the system is out of memory, then
<SAMP>`alloc'</SAMP> will call <SAMP>`failed'</SAMP> prior to returning <CODE>NULL</CODE>.
<DT><CODE>int push()</CODE>
<DD>
<A NAME="IDX94"></A>
Pushes the current node memory manager position. The next time
<CODE>pop</CODE> is called, the node memory manager will reset to
the pushed position. This function returns zero if the system
is out of memory.
<DT><CODE>void pop()</CODE>
<DD>
<A NAME="IDX95"></A>
Pops the current node memory manager position. This function has
no effect if <CODE>push</CODE> was not called previously.
The <CODE>push</CODE> and <CODE>pop</CODE> methods can be used
to perform a simple kind of garbage collection on nodes. When
the parser enters a scope, it pushes the node memory manager
position. After all definitions in the scope have been dealt
with, the parser pops the node memory manager to reclaim all
of the memory used.
<DT><CODE>void clear()</CODE>
<DD>
<A NAME="IDX96"></A>
Clears the entire node memory manager and returns it to the
state it had after construction.
<DT><CODE>virtual void failed()</CODE>
<DD>
<A NAME="IDX97"></A>
Called when <CODE>alloc</CODE> or <CODE>push</CODE> detects that
the system is out of memory. This method is typically
overridden by the programmer in subclasses. The programmer may
choose to exit to program when the system is out of memory; in
which case <CODE>alloc</CODE> will never return <CODE>NULL</CODE>.
<DT><CODE>virtual char *currFilename()</CODE>
<DD>
<A NAME="IDX98"></A>
Get the name of the current input file from the parser. The pointer
that is returned from this function is stored as-is: the string is
not copied. Therefore, the value must persist for at least as long
as the node will persist. This method is usually overrriden by
the programmer in subclasses if <SAMP>`%option track_lines'</SAMP> was specified.
<DT><CODE>virtual long currLinenum()</CODE>
<DD>
<A NAME="IDX99"></A>
Get the number of the current input line from the parser. This
method is usually overridden by the programmer in subclasses
if <SAMP>`%option track_lines'</SAMP> was specified.
</DL>
<P>
The programmer will typically subclass <SAMP>`YYNODESTATE'</SAMP> to provide
additional functionality, and then create an instance of this class
to act as the node memory manager and node creation factory.
<H2><A NAME="SEC21" HREF="treecc_toc.html#TOC21">Java Language APIs</A></H2>
<P>
<A NAME="IDX100"></A>
<P>
In the Java output language, each node type is converted into a <SAMP>`class'</SAMP>
that contains the node's fields, virtual operations, and other house-keeping
definitions. The following example demonstrates how treecc node declarations
are converted into Java source code:
<PRE>
%node expression %abstract %typedef =
{
%nocreate type_code type;
}
%node binary expression %abstract =
{
expression expr1;
expression expr2;
}
%node plus binary
</PRE>
<P>
becomes:
<PRE>
public class expression
{
protected int kind__;
protected String filename__;
protected long linenum__;
public int getKind() { return kind__; }
public String getFilename() { return filename__; }
public long getLinenum() const { return linenum__; }
public void setFilename(String filename) { filename__ = filename; }
public void setLinenum(long linenum) { linenum__ = linenum; }
public static final int KIND = 1;
public type_code type;
protected expression()
{
this.kind__ = KIND;
this.filename__ = YYNODESTATE.getState().currFilename();
this.linenum__ = YYNODESTATE.getState().currLinenum();
}
public int isA(int kind)
{
if(kind == KIND)
return 1;
else
return 0;
}
public String getKindName()
{
return "expression";
}
}
public class binary extends expression
{
public static final int KIND = 2;
public expression expr1;
public expression expr2;
protected binary(expression expr1, expression expr2)
{
super();
this.kind__ = KIND;
this.expr1 = expr1;
this.expr2 = expr2;
}
public int isA(int kind)
{
if(kind == KIND)
return 1;
else
return super.isA(kind);
}
public String getKindName()
{
return "binary";
}
}
public class plus extends binary
{
public static final int KIND = 3;
public plus(expression expr1, expression expr2)
{
super(expr1, expr2);
this.kind__ = KIND;
}
public int isA(int kind)
{
if(kind == KIND)
return 1;
else
return super.isA(kind);
}
public String getKindName()
{
return "plus";
}
}
</PRE>
<P>
The following standard members are available on every node type:
<DL COMPACT>
<DT><CODE>int KIND</CODE>
<DD>
<A NAME="IDX101"></A>
The kind value for the node type corresponding to this class.
<DT><CODE>int getKind()</CODE>
<DD>
<A NAME="IDX102"></A>
Gets the numeric kind value associated with a particular node.
The kind value for node type <SAMP>`NAME'</SAMP> is called <SAMP>`NAME.KIND'</SAMP>.
<DT><CODE>String getKindName()</CODE>
<DD>
<A NAME="IDX103"></A>
Gets the name of the node kind associated with a particular node.
This may be helpful for debugging and logging code.
<DT><CODE>int isA(int kind)</CODE>
<DD>
<A NAME="IDX104"></A>
Determines if the node is a member of the node type that corresponds
to the numeric kind value <SAMP>`kind'</SAMP>.
<DT><CODE>String getFilename()</CODE>
<DD>
<A NAME="IDX105"></A>
Gets the filename corresponding to where the node was created
during parsing. This method is only generated if <SAMP>`%option track_lines'</SAMP>
was specified.
<DT><CODE>long getLinenum()</CODE>
<DD>
<A NAME="IDX106"></A>
Gets the line number corresponding to where the node was created
during parsing. This method is only generated if <SAMP>`%option track_lines'</SAMP>
was specified.
<DT><CODE>void setFilename(String value)</CODE>
<DD>
<A NAME="IDX107"></A>
Sets the filename associated with the node to <SAMP>`value'</SAMP>.
This method will rarely be required, unless a node corresponds to
a different line than the current parse line. This method is only
generated if <SAMP>`%option track_lines'</SAMP> was specified.
<DT><CODE>void setLinenum(long value)</CODE>
<DD>
<A NAME="IDX108"></A>
Sets the line number associated with the node to <SAMP>`value'</SAMP>.
This method will rarely be required, unless a node corresponds to a
different line than the current parse line. This method is only
generated if <SAMP>`%option track_lines'</SAMP> was specified.
</DL>
<P>
If the generated code is non-reentrant, then the constructor for the
class can be used to construct nodes of the specified node type. The
constructor parameters are the same as the fields within the node type's
definition, except for <SAMP>`%nocreate'</SAMP> fields.
<P>
If the generated code is reentrant, then nodes cannot be constructed
using the Java <SAMP>`new'</SAMP> operator. The <SAMP>`*Create'</SAMP> methods
on the <SAMP>`YYNODESTATE'</SAMP> factory class must be used instead.
<P>
Enumerated types are converted into a Java <SAMP>`class'</SAMP>:
<PRE>
%enum JavaType =
{
JT_BYTE,
JT_SHORT,
JT_CHAR,
JT_INT,
JT_LONG,
JT_FLOAT,
JT_DOUBLE,
JT_OBJECT_REF
}
</PRE>
<P>
becomes:
<PRE>
public class JavaType
{
public static final int JT_BYTE = 0;
public static final int JT_SHORT = 1;
public static final int JT_CHAR = 2;
public static final int JT_INT = 3;
public static final int JT_LONG = 4;
public static final int JT_FLOAT = 5;
public static final int JT_DOUBLE = 6;
public static final int JT_OBJECT_REF = 7;
}
</PRE>
<P>
References to enumerated types in fields and operation parameters
are replaced with the type <SAMP>`int'</SAMP>.
<P>
Virtual operations are converted into public methods on the Java
node classes.
<P>
Non-virtual operations are converted into a static method within
a class named for the operation. For example,
<PRE>
%operation void InferType::infer_type(expression e)
</PRE>
<P>
becomes:
<PRE>
public class InferType
{
public static void infer_type(expression e)
{
...
}
}
</PRE>
<P>
If the class name (<SAMP>`InferType'</SAMP> in the above example) is omitted,
then the name of the operation is used as both the class name and the
the method name.
<P>
The <SAMP>`YYNODESTATE'</SAMP> class contains a number of house-keeping methods
that are used to manage nodes:
<DL COMPACT>
<DT><CODE>static YYNODESTATE getState()</CODE>
<DD>
<A NAME="IDX109"></A>
Gets the global <SAMP>`YYNODESTATE'</SAMP> instance that is being used by
non-reentrant code. If an instance has not yet been created,
this method will create one.
When using non-reentrant code, the programmer will normally subclass
<SAMP>`YYNODESTATE'</SAMP>, override some of the methods below, and then
construct an instance of the subclass. This constructed instance
will then be returned by future calls to <SAMP>`getState'</SAMP>.
This method will not be present if a reentrant system is being
generated.
<DT><CODE>String currFilename()</CODE>
<DD>
<A NAME="IDX110"></A>
Get the name of the current input file from the parser. This method
is usually overrriden by the programmer in subclasses if
<SAMP>`%option track_lines'</SAMP> was specified.
<DT><CODE>long currLinenum()</CODE>
<DD>
<A NAME="IDX111"></A>
Get the number of the current input line from the parser. This
method is usually overridden by the programmer in subclasses
if <SAMP>`%option track_lines'</SAMP> was specified.
</DL>
<P>
The programmer will typically subclass <SAMP>`YYNODESTATE'</SAMP> to provide
additional functionality, and then create an instance of this class
to act as the global state and node creation factory.
<H2><A NAME="SEC22" HREF="treecc_toc.html#TOC22">C# Language APIs</A></H2>
<P>
<A NAME="IDX112"></A>
<P>
In the C# output language, each node type is converted into a <SAMP>`class'</SAMP>
that contains the node's fields, virtual operations, and other house-keeping
definitions. The following example demonstrates how treecc node declarations
are converted into C# source code:
<PRE>
%node expression %abstract %typedef =
{
%nocreate type_code type;
}
%node binary expression %abstract =
{
expression expr1;
expression expr2;
}
%node plus binary
</PRE>
<P>
becomes:
<PRE>
public class expression
{
protected int kind__;
protected String filename__;
protected long linenum__;
public int getKind() { return kind__; }
public String getFilename() { return filename__; }
public long getLinenum() const { return linenum__; }
public void setFilename(String filename) { filename__ = filename; }
public void setLinenum(long linenum) { linenum__ = linenum; }
public const int KIND = 1;
public type_code type;
protected expression()
{
this.kind__ = KIND;
this.filename__ = YYNODESTATE.getState().currFilename();
this.linenum__ = YYNODESTATE.getState().currLinenum();
}
public virtual int isA(int kind)
{
if(kind == KIND)
return 1;
else
return 0;
}
public virtual String getKindName()
{
return "expression";
}
}
public class binary : expression
{
public const int KIND = 2;
public expression expr1;
public expression expr2;
protected binary(expression expr1, expression expr2)
: expression()
{
this.kind__ = KIND;
this.expr1 = expr1;
this.expr2 = expr2;
}
public override int isA(int kind)
{
if(kind == KIND)
return 1;
else
return base.isA(kind);
}
public override String getKindName()
{
return "binary";
}
}
public class plus : binary
{
public const int KIND = 5;
public plus(expression expr1, expression expr2)
: binary(expr1, expr2)
{
this.kind__ = KIND;
}
public override int isA(int kind)
{
if(kind == KIND)
return 1;
else
return base.isA(kind);
}
public override String getKindName()
{
return "plus";
}
}
</PRE>
<P>
The following standard members are available on every node type:
<DL COMPACT>
<DT><CODE>const int KIND</CODE>
<DD>
<A NAME="IDX113"></A>
The kind value for the node type corresponding to this class.
<DT><CODE>int getKind()</CODE>
<DD>
<A NAME="IDX114"></A>
Gets the numeric kind value associated with a particular node.
The kind value for node type <SAMP>`NAME'</SAMP> is called <SAMP>`NAME.KIND'</SAMP>.
<DT><CODE>virtual String getKindName()</CODE>
<DD>
<A NAME="IDX115"></A>
Gets the name of the node kind associated with a particular node.
This may be helpful for debugging and logging code.
<DT><CODE>virtual int isA(int kind)</CODE>
<DD>
<A NAME="IDX116"></A>
Determines if the node is a member of the node type that corresponds
to the numeric kind value <SAMP>`kind'</SAMP>.
<DT><CODE>String getFilename()</CODE>
<DD>
<A NAME="IDX117"></A>
Gets the filename corresponding to where the node was created
during parsing. This method is only generated if <SAMP>`%option track_lines'</SAMP>
was specified.
<DT><CODE>long getLinenum()</CODE>
<DD>
<A NAME="IDX118"></A>
Gets the line number corresponding to where the node was created
during parsing. This method is only generated if <SAMP>`%option track_lines'</SAMP>
was specified.
<DT><CODE>void setFilename(String value)</CODE>
<DD>
<A NAME="IDX119"></A>
Sets the filename associated with the node to <SAMP>`value'</SAMP>.
This method will rarely be required, unless a node corresponds to
a different line than the current parse line. This method is only
generated if <SAMP>`%option track_lines'</SAMP> was specified.
<DT><CODE>void setLinenum(long value)</CODE>
<DD>
<A NAME="IDX120"></A>
Sets the line number associated with the node to <SAMP>`value'</SAMP>.
This method will rarely be required, unless a node corresponds to a
different line than the current parse line. This method is only
generated if <SAMP>`%option track_lines'</SAMP> was specified.
</DL>
<P>
If the generated code is non-reentrant, then the constructor for the
class can be used to construct nodes of the specified node type. The
constructor parameters are the same as the fields within the node type's
definition, except for <SAMP>`%nocreate'</SAMP> fields.
<P>
If the generated code is reentrant, then nodes cannot be constructed
using the C# <SAMP>`new'</SAMP> operator. The <SAMP>`*Create'</SAMP> methods
on the <SAMP>`YYNODESTATE'</SAMP> factory class must be used instead.
<P>
Enumerated types are converted into a C# <SAMP>`enum'</SAMP> definition:
<PRE>
%enum JavaType =
{
JT_BYTE,
JT_SHORT,
JT_CHAR,
JT_INT,
JT_LONG,
JT_FLOAT,
JT_DOUBLE,
JT_OBJECT_REF
}
</PRE>
<P>
becomes:
<PRE>
public enum JavaType
{
JT_BYTE,
JT_SHORT,
JT_CHAR,
JT_INT,
JT_LONG,
JT_FLOAT,
JT_DOUBLE,
JT_OBJECT_REF,
}
</PRE>
<P>
Virtual operations are converted into public virtual methods on the C#
node classes.
<P>
Non-virtual operations are converted into a static method within
a class named for the operation. For example,
<PRE>
%operation void InferType::infer_type(expression e)
</PRE>
<P>
becomes:
<PRE>
public class InferType
{
public static void infer_type(expression e)
{
...
}
}
</PRE>
<P>
If the class name (<SAMP>`InferType'</SAMP> in the above example) is omitted,
then the name of the operation is used as both the class name and the
the method name.
<P>
The <SAMP>`YYNODESTATE'</SAMP> class contains a number of house-keeping methods
that are used to manage nodes:
<DL COMPACT>
<DT><CODE>static YYNODESTATE getState()</CODE>
<DD>
<A NAME="IDX121"></A>
Gets the global <SAMP>`YYNODESTATE'</SAMP> instance that is being used by
non-reentrant code. If an instance has not yet been created,
this method will create one.
When using non-reentrant code, the programmer will normally subclass
<SAMP>`YYNODESTATE'</SAMP>, override some of the methods below, and then
construct an instance of the subclass. This constructed instance
will then be returned by future calls to <SAMP>`getState'</SAMP>.
This method will not be present if a reentrant system is being
generated.
<DT><CODE>virtual String currFilename()</CODE>
<DD>
<A NAME="IDX122"></A>
Get the name of the current input file from the parser. This method
is usually overrriden by the programmer in subclasses if
<SAMP>`%option track_lines'</SAMP> was specified.
<DT><CODE>virtual long currLinenum()</CODE>
<DD>
<A NAME="IDX123"></A>
Get the number of the current input line from the parser. This
method is usually overridden by the programmer in subclasses
if <SAMP>`%option track_lines'</SAMP> was specified.
</DL>
<P>
The programmer will typically subclass <SAMP>`YYNODESTATE'</SAMP> to provide
additional functionality, and then create an instance of this class
to act as the global state and node creation factory.
<H1><A NAME="SEC23" HREF="treecc_toc.html#TOC23">Full expression example code</A></H1>
<P>
<A NAME="IDX124"></A>
<P>
The full treecc input file for the expression example is as follows:
<PRE>
%enum type_code =
{
int_type,
float_type
}
%node expression %abstract %typedef =
{
%nocreate type_code type = {int_type};
}
%node binary expression %abstract =
{
expression *expr1;
expression *expr2;
}
%node unary expression %abstract =
{
expression *expr;
}
%node intnum expression =
{
int num;
}
%node floatnum expression =
{
float num;
}
%node plus binary
%node minus binary
%node multiply binary
%node divide binary
%node power binary
%node negate unary
%operation void infer_type(expression *e)
infer_type(binary)
{
infer_type(e->expr1);
infer_type(e->expr2);
if(e->expr1->type == float_type || e->expr2->type == float_type)
{
e->type = float_type;
}
else
{
e->type = int_type;
}
}
infer_type(unary)
{
infer_type(e->expr);
e->type = e->expr->type;
}
infer_type(intnum)
{
e->type = int_type;
}
infer_type(floatnum)
{
e->type = float_type;
}
infer_type(power)
{
infer_type(e->expr1);
infer_type(e->expr2);
if(e->expr2->type != int_type)
{
error("second argument to `^' is not an integer");
}
e->type = e->expr1->type;
}
</PRE>
<P>
The full yacc grammar is as follows:
<PRE>
%union {
expression *node;
int inum;
float fnum;
}
%token INT FLOAT
%type <node> expr
%type <inum> INT
%type <fnum> FLOAT
%%
expr: INT { $$ = intnum_create($1); }
| FLOAT { $$ = floatnum_create($1); }
| '(' expr ')' { $$ = $2; }
| expr '+' expr { $$ = plus_create($1, $3); }
| expr '-' expr { $$ = minus_create($1, $3); }
| expr '*' expr { $$ = multiply_create($1, $3); }
| expr '/' expr { $$ = divide_create($1, $3); }
| expr '^' expr { $$ = power_create($1, $3); }
| '-' expr { $$ = negate_create($2); }
;
</PRE>
<H1><A NAME="SEC24" HREF="treecc_toc.html#TOC24">EBNF syntax for treecc input files</A></H1>
<P>
<A NAME="IDX125"></A>
<P>
The EBNF syntax for treecc input files uses the following
lexical tokens:
<PRE>
IDENTIFIER ::= <A-Za-z_> { <A-Za-z0-9_> }
STRING ::= '"' <anything that does not include '"'> '"'
| "'" <anything that does not include "'"> "'"
LITERAL_DEFNS ::= "%{" <anything except "%}"> "%}"
LITERAL_END ::= "%%" <any character sequence until EOF>
LITERAL_CODE ::= '{' <anything with matched '{' and '}'> '}'
</PRE>
<P>
In addition, anything that begins with "%" in the following syntax
is a lexical keyword.
<P>
The EBNF syntax is as follows:
<PRE>
File ::= { Declaration }
Declaration ::= Node
| Operation
| OperationCase
| Option
| Enum
| Literal
| Header
| Output
| Common
| Include
Node ::= %node IDENTIFIER [ IDENTIFIER ] { NodeFlag } [ '=' Fields ]
NodeFlag ::= %abstract | %typedef
Fields ::= '{' { Field } '}'
Field ::= [ %nocreate ] TypeAndName [ '=' LITERAL_CODE ] ';'
TypeAndName ::= Type [ IDENTIFIER ]
Type ::= TypeName
| Type '*'
| Type '&'
| Type '[' ']'
TypeName ::= IDENTIFIER { IDENTIFIER }
Operation ::= %operation { OperFlag } Type
[ ClassName ] IDENTIFIER '(' [ Params ] ')'
[ '=' LITERAL_CODE ] [ ';' ]
OperFlag ::= %virtual | %inline | %split
ClassName ::= IDENTIFIER "::"
Params ::= Param { ',' Param }
Param ::= TypeAndName | '[' TypeAndName ']'
OperationCase ::= OperationHead { ',' OperationHead } LITERAL_CODE
OperationHead ::= IDENTIFIER '(' [ TypeList ] ')'
TypeList ::= IDENTIFIER { ',' IDENTIFIER }
Option ::= %option IDENTIFIER [ '=' Value ]
Value ::= IDENTIFIER | STRING
Enum ::= %enum IDENTIFIER '=' '{' EnumBody [ ',' ] '}'
EnumBody ::= IDENTIFIER { ',' IDENTIFIER }
Literal ::= { LiteralFlag } (LITERAL_DEFNS | LITERAL_END)
LiteralFlag ::= %both | %decls | %end
Header ::= %header STRING
Output ::= %output STRING
Common ::= %common
Include ::= %include [ %readonly ] STRING
</PRE>
<H1><A NAME="SEC25" HREF="treecc_toc.html#TOC25">Index</A></H1>
<P>
Jump to:
<A HREF="#cindex_%">%</A>
-
<A HREF="#cindex_a">a</A>
-
<A HREF="#cindex_b">b</A>
-
<A HREF="#cindex_c">c</A>
-
<A HREF="#cindex_e">e</A>
-
<A HREF="#cindex_f">f</A>
-
<A HREF="#cindex_g">g</A>
-
<A HREF="#cindex_h">h</A>
-
<A HREF="#cindex_i">i</A>
-
<A HREF="#cindex_j">j</A>
-
<A HREF="#cindex_k">k</A>
-
<A HREF="#cindex_l">l</A>
-
<A HREF="#cindex_n">n</A>
-
<A HREF="#cindex_o">o</A>
-
<A HREF="#cindex_p">p</A>
-
<A HREF="#cindex_r">r</A>
-
<A HREF="#cindex_s">s</A>
-
<A HREF="#cindex_t">t</A>
-
<A HREF="#cindex_v">v</A>
-
<A HREF="#cindex_y">y</A>
<P>
<H2><A NAME="cindex_%">%</A></H2>
<DIR>
<LI><A HREF="treecc.html#IDX9">%abstract keyword</A>
<LI><A HREF="treecc.html#IDX52">%both keyword</A>
<LI><A HREF="treecc.html#IDX65">%common keyword</A>
<LI><A HREF="treecc.html#IDX51">%decls keyword</A>
<LI><A HREF="treecc.html#IDX53">%end keyword</A>
<LI><A HREF="treecc.html#IDX15">%enum keyword</A>
<LI><A HREF="treecc.html#IDX59">%header keyword</A>
<LI><A HREF="treecc.html#IDX56">%include keyword</A>
<LI><A HREF="treecc.html#IDX22">%inline keyword</A>
<LI><A HREF="treecc.html#IDX11">%nocreate keyword</A>
<LI><A HREF="treecc.html#IDX7">%node keyword</A>
<LI><A HREF="treecc.html#IDX19">%operation keyword</A>
<LI><A HREF="treecc.html#IDX26">%option keyword</A>
<LI><A HREF="treecc.html#IDX63">%outdir keyword</A>
<LI><A HREF="treecc.html#IDX61">%output keyword</A>
<LI><A HREF="treecc.html#IDX57">%readonly keyword</A>
<LI><A HREF="treecc.html#IDX23">%split keyword</A>
<LI><A HREF="treecc.html#IDX10">%typedef keyword</A>
<LI><A HREF="treecc.html#IDX21">%virtual keyword</A>
</DIR>
<H2><A NAME="cindex_a">a</A></H2>
<DIR>
<LI><A HREF="treecc.html#IDX37">abstract_factory option</A>
<LI><A HREF="treecc.html#IDX93">alloc method (C++)</A>
</DIR>
<H2><A NAME="cindex_b">b</A></H2>
<DIR>
<LI><A HREF="treecc.html#IDX45">base option</A>
<LI><A HREF="treecc.html#IDX47">block_size option</A>
</DIR>
<H2><A NAME="cindex_c">c</A></H2>
<DIR>
<LI><A HREF="treecc.html#IDX68">C APIs</A>
<LI><A HREF="treecc.html#IDX112">C# APIs</A>
<LI><A HREF="treecc.html#IDX84">C++ APIs</A>
<LI><A HREF="treecc.html#IDX54">Changing files</A>
<LI><A HREF="treecc.html#IDX96">clear method (C++)</A>
<LI><A HREF="treecc.html#IDX4">Command-line options</A>
<LI><A HREF="treecc.html#IDX64">common declaration</A>
<LI><A HREF="treecc.html#IDX122">currFilename method (C#)</A>
<LI><A HREF="treecc.html#IDX98">currFilename method (C++)</A>
<LI><A HREF="treecc.html#IDX110">currFilename method (Java)</A>
<LI><A HREF="treecc.html#IDX123">currLinenum method (C#)</A>
<LI><A HREF="treecc.html#IDX99">currLinenum method (C++)</A>
<LI><A HREF="treecc.html#IDX111">currLinenum method (Java)</A>
</DIR>
<H2><A NAME="cindex_e">e</A></H2>
<DIR>
<LI><A HREF="treecc.html#IDX125">EBNF syntax</A>
<LI><A HREF="treecc.html#IDX14">enum declaration</A>
<LI><A HREF="treecc.html#IDX13">Enumerations</A>
<LI><A HREF="treecc.html#IDX2">Expression example</A>
</DIR>
<H2><A NAME="cindex_f">f</A></H2>
<DIR>
<LI><A HREF="treecc.html#IDX97">failed method (C++)</A>
<LI><A HREF="treecc.html#IDX8">Fields</A>
<LI><A HREF="treecc.html#IDX33">force option</A>
<LI><A HREF="treecc.html#IDX124">Full expression example</A>
</DIR>
<H2><A NAME="cindex_g">g</A></H2>
<DIR>
<LI><A HREF="treecc.html#IDX117">getFilename method (C#)</A>
<LI><A HREF="treecc.html#IDX88">getFilename method (C++)</A>
<LI><A HREF="treecc.html#IDX105">getFilename method (Java)</A>
<LI><A HREF="treecc.html#IDX114">getKind method (C#)</A>
<LI><A HREF="treecc.html#IDX85">getKind method (C++)</A>
<LI><A HREF="treecc.html#IDX102">getKind method (Java)</A>
<LI><A HREF="treecc.html#IDX115">getKindName method (C#)</A>
<LI><A HREF="treecc.html#IDX86">getKindName method (C++)</A>
<LI><A HREF="treecc.html#IDX103">getKindName method (Java)</A>
<LI><A HREF="treecc.html#IDX118">getLinenum method (C#)</A>
<LI><A HREF="treecc.html#IDX89">getLinenum method (C++)</A>
<LI><A HREF="treecc.html#IDX106">getLinenum method (Java)</A>
<LI><A HREF="treecc.html#IDX121">getState method (C#)</A>
<LI><A HREF="treecc.html#IDX92">getState method (C++)</A>
<LI><A HREF="treecc.html#IDX109">getState method (Java)</A>
</DIR>
<H2><A NAME="cindex_h">h</A></H2>
<DIR>
<LI><A HREF="treecc.html#IDX58">header declaration</A>
</DIR>
<H2><A NAME="cindex_i">i</A></H2>
<DIR>
<LI><A HREF="treecc.html#IDX55">include declaration</A>
<LI><A HREF="treecc.html#IDX3">Invoking treecc</A>
<LI><A HREF="treecc.html#IDX116">isA method (C#)</A>
<LI><A HREF="treecc.html#IDX87">isA method (C++)</A>
<LI><A HREF="treecc.html#IDX104">isA method (Java)</A>
</DIR>
<H2><A NAME="cindex_j">j</A></H2>
<DIR>
<LI><A HREF="treecc.html#IDX100">Java APIs</A>
</DIR>
<H2><A NAME="cindex_k">k</A></H2>
<DIR>
<LI><A HREF="treecc.html#IDX113">KIND field (C#)</A>
<LI><A HREF="treecc.html#IDX101">KIND field (Java)</A>
<LI><A HREF="treecc.html#IDX39">kind_in_node option</A>
<LI><A HREF="treecc.html#IDX40">kind_in_vtable option</A>
</DIR>
<H2><A NAME="cindex_l">l</A></H2>
<DIR>
<LI><A HREF="treecc.html#IDX46">lang option</A>
<LI><A HREF="treecc.html#IDX66">Line tracking</A>
<LI><A HREF="treecc.html#IDX50">Literal code</A>
</DIR>
<H2><A NAME="cindex_n">n</A></H2>
<DIR>
<LI><A HREF="treecc.html#IDX43">namespace option</A>
<LI><A HREF="treecc.html#IDX38">no_abstract_factory option</A>
<LI><A HREF="treecc.html#IDX34">no_force option</A>
<LI><A HREF="treecc.html#IDX32">no_reentrant option</A>
<LI><A HREF="treecc.html#IDX30">no_singletons option</A>
<LI><A HREF="treecc.html#IDX49">no_strip_filenames option</A>
<LI><A HREF="treecc.html#IDX28">no_track_lines option</A>
<LI><A HREF="treecc.html#IDX36">no_virtual_factory option</A>
<LI><A HREF="treecc.html#IDX6">Nodes</A>
</DIR>
<H2><A NAME="cindex_o">o</A></H2>
<DIR>
<LI><A HREF="treecc.html#IDX18">operation cases</A>
<LI><A HREF="treecc.html#IDX17">operation declarations</A>
<LI><A HREF="treecc.html#IDX16">Operations</A>
<LI><A HREF="treecc.html#IDX25">option declaration</A>
<LI><A HREF="treecc.html#IDX24">Options</A>
<LI><A HREF="treecc.html#IDX62">outdir declaration</A>
<LI><A HREF="treecc.html#IDX67">Output APIs</A>
<LI><A HREF="treecc.html#IDX60">output declaration</A>
<LI><A HREF="treecc.html#IDX1">Overview</A>
</DIR>
<H2><A NAME="cindex_p">p</A></H2>
<DIR>
<LI><A HREF="treecc.html#IDX44">package option</A>
<LI><A HREF="treecc.html#IDX95">pop method (C++)</A>
<LI><A HREF="treecc.html#IDX41">prefix option</A>
<LI><A HREF="treecc.html#IDX94">push method (C++)</A>
</DIR>
<H2><A NAME="cindex_r">r</A></H2>
<DIR>
<LI><A HREF="treecc.html#IDX31">reentrant option</A>
</DIR>
<H2><A NAME="cindex_s">s</A></H2>
<DIR>
<LI><A HREF="treecc.html#IDX119">setFilename method (C#)</A>
<LI><A HREF="treecc.html#IDX90">setFilename method (C++)</A>
<LI><A HREF="treecc.html#IDX107">setFilename method (Java)</A>
<LI><A HREF="treecc.html#IDX120">setLinenum method (C#)</A>
<LI><A HREF="treecc.html#IDX91">setLinenum method (C++)</A>
<LI><A HREF="treecc.html#IDX108">setLinenum method (Java)</A>
<LI><A HREF="treecc.html#IDX29">singletons option</A>
<LI><A HREF="treecc.html#IDX42">state_type option</A>
<LI><A HREF="treecc.html#IDX48">strip_filenames option</A>
<LI><A HREF="treecc.html#IDX5">Syntax</A>
</DIR>
<H2><A NAME="cindex_t">t</A></H2>
<DIR>
<LI><A HREF="treecc.html#IDX27">track_lines option</A>
<LI><A HREF="treecc.html#IDX20">trigger parameters</A>
<LI><A HREF="treecc.html#IDX12">Types</A>
</DIR>
<H2><A NAME="cindex_v">v</A></H2>
<DIR>
<LI><A HREF="treecc.html#IDX35">virtual_factory option</A>
</DIR>
<H2><A NAME="cindex_y">y</A></H2>
<DIR>
<LI><A HREF="treecc.html#IDX76">yycurrfilename function</A>
<LI><A HREF="treecc.html#IDX77">yycurrlinenum function</A>
<LI><A HREF="treecc.html#IDX72">yygetfilename macro</A>
<LI><A HREF="treecc.html#IDX73">yygetlinenum macro</A>
<LI><A HREF="treecc.html#IDX71">yyisa macro</A>
<LI><A HREF="treecc.html#IDX69">yykind macro</A>
<LI><A HREF="treecc.html#IDX70">yykindname macro</A>
<LI><A HREF="treecc.html#IDX79">yynodealloc function</A>
<LI><A HREF="treecc.html#IDX82">yynodeclear function</A>
<LI><A HREF="treecc.html#IDX83">yynodefailed function</A>
<LI><A HREF="treecc.html#IDX78">yynodeinit function</A>
<LI><A HREF="treecc.html#IDX81">yynodepop function</A>
<LI><A HREF="treecc.html#IDX80">yynodepush function</A>
<LI><A HREF="treecc.html#IDX74">yysetfilename macro</A>
<LI><A HREF="treecc.html#IDX75">yysetlinenum macro</A>
</DIR>
<P><HR><P>
This document was generated on 11 June 2002 using
<A HREF="http://wwwinfo.cern.ch/dis/texi2html/">texi2html</A> 1.56k.
</BODY>
</HTML>