Defining Rules


Rule Definition




The majority of the grammar, that you will write, will be used to specify the syntactic structure of the language. When an input string is parsed (such as the user's program), it is stored into a tree structure that follows the syntactic structure of the language. There are several ways to specifying the structure of a grammar and different parsing systems use different notations and formats.

Regardless of the parser, practically all use a variation of a notation known as Backus-Naur Form (or BNF for short). GOLD uses BNF to describe the syntax of the grammar and attempts to stay close to the original notation.

Terminals and Nonterminals

Backus-Naur Form consists of two different types of symbols: terminals and nonterminals. Terminals represent that pieces of text that makes a valid input string (such as a user's program). When an input string is parsed, terminals will be the "leaves" of the parse tree. Nonterminals represent other syntactic structures defined in the grammar. These will be the nodes in the parse tree.

Terminals are left without special formatting or are delimited by single quotes. Examples include: if, while, '=' and identifier. Typically, nonterminals are delimited by angle-brackets. Examples include <statement> and <exp>. Both terminals and nonterminals are referred to generically as "symbols".

When text is read by the Builder, all characters delimited by single quotes are analyzed as literal strings. Any text delimited by single quotes is considered to be exactly as printed. This allows you to specify characters that would normally be limited by the notation. For instance, when defining a rule, angle brackets are used to delimit nonterminals. By typing '<' and '>', you can specify these two characters without worrying about the system misinterpreting them. A single quote character can be specified by typing two single quotes ''. 


The syntax of the grammar is defined using a series of "productions". These consist of a single nonterminal called the "head". The head is defined to consist of multiple symbols making up the production's "handle".

 For instance, the following production defines a generic If-Statement syntax. The symbols 'if', 'then', and 'end' are terminals. The symbols <Expression> and <Statements> are nonterminals. The first nonterminal, before the ::= symbol, is the "head". All the terminals/nonterminals after the ::= is called a "handle".  The "head" and "handle", together, are called a "production".

<Statement> ::= if <Expression> then <Statements> end

The production above basically defines that a <Statement> as an 'if' terminal, an <Expression> defined elsewhere, a 'then' terminal,  <Statements> defined elsewhere, and, finally, an 'end' terminal. 


If you are declaring a series of productions that derive the same nonterminal (they have the same head), you can use pipe characters '|' to create a series of different handles. The pipe symbol basically means "or". So, in the example below, <Statement> is defined using three different handles. It can be equivalent to any of these.

<Statement> ::= if <Expression> then <Statements> end
             |  while <Expression> do <Statements> end
             |  for Id = <Range> do <Statements> end

Internally, the notation above will create three different productions:

<Statement> ::= if <Expression> then <Statements> end
<Statement> ::= while <Expression> do <Statements> end
<Statement> ::= for Id = <Range> do <Statements> end

However, to prevent typos, GOLD only permits the single definition. In GOLD terminology, a series of related productions, with the same head, is called a "Rule".

Nullable Rules

A rule can also be declared as "nullable". This basically means that the nonterminal, that the rule represents, can contain zero terminals. For all intents and purposes, the nonterminal an be seen as optional. In grammars, nullable rules are often used for optional clauses (on statements) or creating lists that contain zero or more items.

A null production (and hence a nullable rule), is declared by simply creating a production that contains no symbols.

<Optional Keyword> ::= Keyword

The second handle in the definition contains no symbols. The rule is therefore nullable. GOLD version 5 also permits an alternative notation. The text <>, used by itself, will declare a null production.

<Optional Keyword> ::= Keyword
                    |  <>        



Fall-Through Logic

The following defines a rule called <Value> which can contain either an Identifier terminal or the contents of another rule called <Literal>

<Value> ::= Identifier
         |  <Literal>

<Literal> ::= Number
           |  String

The <Literal> can contain either a  Number or String terminal. As a result of this definition, a <Value> can contain an Identifier, Number or String .

Nullable Clause

The three clauses in the C-style For Statement are optional. The <Opt Exp> rule defines an optional expression.

<For Statement> ::= for '(' <Opt Exp> ';' <Opt Exp> ';' <Opt Exp> ')' <Statements>

<Opt Exp> ::= <Expression>


The following two rules define a comma delimited list of Identifiers.

<List> ::= <List> ',' Identifier
         | Identifier

Operator Precedence

Operator precedence is an important aspect of most programming languages. The following rules define the common arithmetic operators.

<Expression> ::= <Expression> '+' <Mult Exp>
               | <Expression> '-' <Mult Exp>
               | <Mult Exp>

<Mult Exp>   ::= <Mult Exp> '*' <Negate Exp>
               | <Mult Exp> '/' <Negate Exp>
               | <Negate Exp>

<Negate Exp> ::= '-' <Value>
               | <Value>

<Value>      ::= ID
               | Integer
               | '(' <Expression> ')'

See Also