The Tower Bridge in Sacramento, California GOLD Parsing System
System Changes in Version 5
Main
Latest News
Getting Started
Screen Shots
Download
Documentation
Contributors
Contact
About GOLD
How It Works
FAQ
Why Use GOLD?
Comparison
Revision History
Freeware License
More ...
Articles
What is a Parser?
Backus-Naur Form
DFA Lexer
LALR Parsing
Glossary
Links
More ...


Overview

Version 5.0 of the Builder changes some of the algorithms and interactions used by the internal parsing system. This was done to address some of the shortcomings of the previous design - in particular the size (and speed) of stored character sets and how block comments are treated. The result is that there is a new file type (to store parse tables) and new "Version 5" engines that can handle the new features.  The GOLD Builder application can still save the generated parse data into the old file format, but you won't have access to some of the newer features.

Version 5.0 introduces "lexical groups". This system completely replaces the dedicated "Comment Level" logic of the original version. Groups are designed to be flexible and allow the user to specify both "noise" and "content" groups.

For information about different types of block comments, please follow the link below:

Block Comment Features

Changes

Grammars

With the addition of lexical groups, grammars were modified slightly. For practically all cases, older grammars should be able to be used without any changes. For more information, please check out the grammar documentation.

Grammar Documentation

File Changes

The Builder is able to save CGTs in either the original "1.0" format or a new "5.0". The new format will nearly identical - so a generic reader can be created by Engines to automatically support either version. However, it would be best to simply support the new format for newer engines.

The 5.0 format creates some additional records. Many of them replace the 1.0 version.

  • New Property Record. The old 1.0 CGT file contained a single record with information about the grammar. This was not easily expandable. The new 5.0 format contains a different record for each property. The record contains an index, the property name, and its associated value. The idea is to allow additional information to be added in the future. This may include more information about the grammar and/or user-defined meta-data.
  • Group records. This record stores data about each group - including their index, name, and attributes.
  • New Character Set record.  This was designed for efficiency - both to increase load times and decrease file sizes. In 1.0,  all character sets are stored using null-terminated Unicode strings. While this works well for small sets, sets consisting of hundreds (or thousands) of characters must be stored a character at a time. The new record, instead, uses a series of ranges.

The new file type is called an "Enhanced Grammar Table" file (.egt) rather than the older Compiled Grammar Table file (.cgt).

Engine

Engines must be updated to take advantage of the new semantics. However, the Builder will also allow tables to be saved using the old format. As a result, no existing Engine will be "broken" or abandoned.

Currently, most Engines use an internal "comment level" variable to determine when a the system is in "comment mode". When a block comment start terminal is read, the variable is incremented. While the variable is greater than or equal to one, the system will ignore all tokens and lexical errors. If a block comment end token is read, the variable is decremented by one.

Program Templates

Program templates were modified to include a ##GROUPS ... ##END-GROUPS directive.  There will be more information soon.