The Tower Bridge in Sacramento, California Future Work
Proposed Changes to GOLD Parsing System
Main
Latest News
Getting Started
Screen Shots
Download
Documentation
Contributors
Contact
About GOLD
How It Works
FAQ
Why Use GOLD?
Comparison
Revision History
Freeware License
More ...
Articles
What is a Parser?
Backus-Naur Form
DFA Lexer
LALR Parsing
Glossary
Links
More ...


This pages contains a number of proposed and possible changes to the GOLD Meta-Language. As ideas come in, I will update this page.

Previously, this page contained different notations for set ranges. This feature was subsequently added to the GOLD Builder v 2.6 after a discussion on the Yahoo Group.

 

Resolving Conflicts

Overview

Often, grammars can contain ambiguities that ultimately result in conflicts. When conflicts occur during the construction of the LALR tables, they either take the form of shift-reduce or reduce-reduce errors. Currently, shift-reduce conflicts are resolved by selecting the "shift" action.

DFA conflicts are resolved by selecting the terminal that has a fixed length. These are most likely reserved words used in the grammar and typically have the same format as identifiers.

Parameter Approach

One possible solution is to add a number of parameters that the developer can use to specify how conflicts will be resolved.

"Resolve LALR" = First declared

The exact name and values of each still needs to be decided. However, the obvious "possible" values would include 'First Declared' and 'Last Declared'.

Rank Approach

The other approach would be to resolve conflicts by assigning a "rank" to each terminal and rule definition. When the system analyzes the grammar and constructs the Deterministic Finite Automata and the LALR State Table, conflicts could be resolved by simply selecting the higher-ranked object.

<Stm> ::= IF <Exp> THEN <Stm>                #1
       |  IF <Exp> THEN <Stm> ELSE <Stm>     #2 

In the example above, the "If-Then-Else" rule has a higher rank. The system would then resolve the conflict by selecting it over the "If-Then" rule.   The notation used in the example is preliminary.

Whether #1 should be considered a higher rank than #2 or the other way around is still needed to be determined.

Compatibility

Component Changes
Meta-Language Grammars will be completely backward compatible.
Builder No Changes.
CGT File Format No Changes.
Engine No Changes.

Rule Tags

Overview

It might be a great aid to developers if additional information can be saved along with each rule. This information would be passive and would not affect how the parse tables are constructed. However, this information could be useful to the developer implement their compiler or interpreter.

These "tags" could be stored in the "Reserved" field of the Compiled Grammar Table file. This would result in a new version of the GOLD CGT file format - version 1.1. Current implementations of the Engine should be able to read the updated file, but would not have access to the field. It is currently thrown away since it contains no information.

Numeric Tags

If the tag contains a numeric value, the developer could use it to specify enumerated constants independent of those generated by the Builder.  The notation used in the example below is preliminary.

<Stm> ::= IF <Exp> THEN <Stms> END IF      %12%

Text Tags

Tags could also be stored as a string - giving more flexibility. In the example below, the developer has specified the text "If-Then".

<Stm> ::= IF <Exp> THEN <Stms> END IF      %If-Then%

However, there is the danger that tags could be used to to embed source code into the grammar. While this would follow the compiler-compiler paradigm, it would create language-dependent elements in grammars.

In either case, the information stored in the tag (and saved to the CGT), could be accessed through an object property or exported to the program template.

Compatibility

Component Changes
Meta-Language Grammars will be completely backward compatible.
Builder No Changes.
CGT File Format Compiled Grammar Table files should work all Engines. A few problems could result if the Engine reads the "reserved" field as a fixed number of bytes rather than using the entry's prefix byte.
Engine No Changes.

Includes & Namespacing

Overview

Suggested by Max Battcher. One possible change to the Meta-Language is adding "includes". Most programming languages contain the ability to add the source written in another module or file.

In C++, this is the #Include directive. Pascal and Delphi have a "uses" section.

Compatibility

Component Changes
Meta-Language Grammars will be completely backward compatible.
Builder No Changes.
CGT File Format No Changes.
Engine No Changes.

Regular Expressions - Minimum & Maximum

Overview

Suggested by Don. POSIX uses an extended version of regular expressions that allows developers to specify the minimum and maximum number occurrences of a character group.

DOS_NAME = {dos-char}[1,8] ( '.' {dos-char}[1,3] )?

The notation conflicts with GOLD's notation for set literals. Any notation used in the Meta-language will have to modified so no conflicts exist.

Compatibility

Component Changes
Meta-Language Grammars should be backwards compatible given the notation is chosen carefully.
Builder No Changes.
CGT File Format No Changes.
Engine No Changes.

Multiple Block Comments

Overview

Currently, most Engines use an internal "comment level" variable to determine when a the system is in "comment mode". When a block comment start terminal is read, the variable is incremented. While the variable is greater than or equal to one, the system will ignore all tokens and lexical errors. If a block comment end token is read, the variable is decremented by one.

This approach allows the system to easily handle the logic of block comments, but has one major drawback. Many programming languages, such as Pascal, have multiple block comments. In Pascal, the comment terminals must match. For instance, a (* must match a *)and a {  must match a }. Currently, GOLD cannot handle this logic.

Engine Changes

One approach would change how Engines handle block comments. Instead of using the single "comment level" variable, Engines would use an internal stack. Each time a Start Token is read, its symbol would be placed on a stack. If the stack has one or more items, the system would be in "comment mode".

When a End Token is read, the system would check if the top of the stack is the matching symbol. If it matches, the top of the stack would be discarded. If not, a "block comment mismatch" error must be reported.

CGT File Format Changes

The solution would require a modification of the CGT File Format. The Symbol Record in the Compiled Grammar Table file would require an additional field that contains the matching token for block comments. However, some problems can occur when the file is loaded.

Meta-language Changes

The changes to the Meta-language will be minor. The notation can be modified to to handle multiple block comments by adding a number to the Comment Start / End definition. The following is an example for Pascal:

Comment Start   = '{'
Comment End     = '}'

Comment 2 Start = '(*'
Comment 2 End   = '*)'

Compatibility

Component Changes
Meta-Language Grammars will be backwards-compatible.
Builder The Builder will have to have to allow the developers to save the parse tables in Version 1.1 (current) or Version 2.
CGT File Format The CGT file should be backwards compatible to existing Engines. However, some problems can occur when the file is loaded. This is due to the new field in the Symbol Record.
Engine To use this new feature, each Engine will have be modified to use a stack rather than a single "comment level" variable.