Terminal Attributes

Overview

Symbol Attributes
Attribute Valid Values
Type Content / Noise
Source Lexer / Virtual

Details

Type

Terminal attributes allows you to control how the symbol is viewed by the parser. If can be merely considered noise or an essential part of the grammar.  Noise symbols are ignored by the parser. Normally, the symbols 'Whitespace' and 'Comment' are automatically defined as 'noise'. Any other terminal defaults to 'Content'.

Source

This controls whether GOLD will generate DFA states to recognize the terminal. In practically all cases, this will be the case and the terminal will be generated by the lexer. However, in rare circumstances, the developer may want to create the terminal manually at runtime. The alternative value, virtual, will place the symbol into the Symbol Table, but will not create any DFA states.

Languages such as Python, do not use symbols to mark the start and end of a block of statements. Instead, the indentation of each statement is used to determine when  a block begins and ends. For Python, content of whitespace is important - or at least the position of a token rather than solely its classification by the lexer. If a program has an indent of 10 spaces, the grammar must contain a set of rules for statements at this level. The same is true for all other levels of indentation - requiring an infinite number of rules to parse.

Virtual terminals are design to resolve problems like this by allowing the developer to define terminals that will be create manually. Each of these terminals are entered into the Symbol Table, but will not be recognized by the Engine's lexer. In other words, these terminals only exist in the Symbol Table and will never be created by the lexer's Deterministic Finite Automata.

As a result, the developer must create these terminals at runtime and push them onto the front of the input queue. The actual semantics and method calls will vary greatly between different implementations of the Engine. In the case of Python, tokens could be created to represent a increase or decrease in indentation. Of course, the developer can create special tokens for any number of reasons; not just this one alone.

 This is redundant with the old "Virtual Terminals" grammar property. Both can be used.

Examples

Virtual

The following example sets the "source" attribute of IndentIncrease and IndentDecrease to "virtual".

IndentIncrease @= { Source = Virtual }
IndentDecrease @= { Source = Virtual }

These "virtual" terminals can be used in the grammar, but must be created by the developer at parse-time.

<Statement>  ::= If <Exp> then IndentInc <Statements> IndentDec

Noise Terminal

The following example creates a noise terminal called me. It consists of two dash and 1 or more letters. Since it is set to "noise", it will be viewed as the same as whitespace and ignored.

IgnoreMe  = '--' {Letter}+

IgnoreMe @= { Type = Noise }