In GOLD, lexical groups are used for situations where a number of recognized tokens should be organized into a single "group". This mechanism is most commonly used to handle line and block comments. However, it is not limited to "noise", but can be used for any content.
One of the key principles in programming languages is the ability to incorporate comments and other documentation directly to the source code. Whether it is FORTRAN, COBOL or C++, the ability exists, but in varying forms. Essentially, there are three different types of comments used in programming languages: those that tell the compiler to ignore the remaining text in the current line of code and those used to denote the start and end of a multi-line comment.
To accommodate the intricacies of comments, the GOLD Parser Builder provides for this special class of terminals called "groups". These allow you to create comments with little typing.
Any time a symbol is defined ending with Start, End, and Line , a lexical group will be created. Each group acts as a container for data - such as block comments or any other group the user may need. Once a group is read, the system will treat it as a single token. This is the group's "container".
All groups have a start and end symbol. In the case of the following definition, both are quite obvious.
Comment Start = '/*'
Comment End = '*/'
This will create a normal "block" group with the start symbol '/*' and end symbol '*/'. For this block, the system will create a group called "Comment Block". The start and end definitions actually refer symbol names as opposed to regular expressions. This allows for the same symbol to be used to end multiple blocks.
Many programming languages have constructs, such as line comments, that read an entire line and end at the newline. The same group mechanism is used for block comments and line comments. However, a few things happen "behind the scenes" to make this possible. The following definition creates a line comment:
|Comment Line = '//'|
Like the block comment above, this definition has a start and ending symbol. The difference here is that the start symbol is '//' and end symbol is implied to be the newline. GOLD will automatically define 'newline' if it is needed and adjust the definition for "whitespace" as needed. The following is the logic used:
Below is a comparison of comment terminals in several common programming languages. Blanks fields denote the programming language lacks a terminal of that type. For instance, Visual Basic does not provide block comments.
|Programming Language||Line Comment||Comment Start||Comment End|
|C / C++||//||/*||*/|
Comments are not the only type of group available to developers. When a comment group is created, the container "comment" is automatically declared to be "noise" and will be ignored. This should be no surprise. Comments are generally discarded by parsers. However, you can create multiple groups - which can also be ignored or used directly in your grammar. The following chart shows what is created from group definitions. There are some examples available of when, and how, you can use them.
GOLD uses the first part of the definition, the 'name' to create the group name and container. Note: line-based groups and block-groups, of the same name, share the same container.
|Definition||Generated Group Name||Generated Container|
|name Start||name Block||name|
|name Line||name Line|