Defining Sets

Syntax

Set Definition

Set Item

Details

Basics

Literal sets of characters are delimited using the square brackets '[' and ']' and pre-defined sets are delimited by the braces '{' and '}'. For instance, the text "[abcde]" denotes a set of characters consisting of the first five letters of the alphabet; while the text "{abc}" refers to a set named "abc".

Sets can then be declared by adding and subtracting previously declared sets and literal sets.  The GOLD Builder provides a collection of pre-defined sets that contain characters often used to define terminals.

Character Constants and Set Ranges

Each character in the Basic Multilingual Plane of the Unicode Character Set is represented by a 16-bit integer. This value is known as a "code point" in Unicode terminology. The characters from 0xD800 to 0xDBFF and from 0xFFF0 to 0xFFFF are reserved by Unicode for encoding. As a result, these values cannot be used.

The developer can specify any character using either its decimal or hexadecimal code point. Decimal values are denoted by a number-sign prefix (#) and hexadecimal values are denoted by an ampersand (&).

Set ranges can be specified by using a ".." between two values. Both the start and end values can be in either decimal or hexadecimal.

Set Name Description
{#n} Using this notation, you can specify any character - in particular, those not accessible via the keyboard.  For instance, {#169} specifies the copyright character ©. The value of, n can be any number from 1 to 55295 or from 56320 to 65519. 
{&n} This is the hexadecimal notation for a single character.  The value of, n can be any number from &1 to &D7FF or from &DC00 to &FFEF.
{#n .. #m} Using this notation, you can specify a set containing the characters from n to m. The number-sign denotes a decimal value.
{&n .. &m} Set ranges can also be defined using hexadecimal values.

Examples

Basic

Declaration Resulting Set
{Bracket} = [']'] ]
{Quote} = [''] '
{Vowels} = [aeiou] aeiou
{Vowels 2} = {Vowels} + [y] aeiouy
{Set 1} = [abc] abc
{Set 2} = {Set 1} + [12] - [c] ab12
{Set 3} = {Set 2} + [0123456789] ab0123456789

Set Ranges

Declaration Characters Comments
Example1 = {#65} A This example specifies the character with the Unicode codepoint of 65 - the letter 'A'.
Example1 = {&41} A Hexadecimal value for 'A'.
Example3 = {#65 .. #70} ABCDEF This set range defines a set from from the letter 'A' (#65) to 'F' (#70).
Example4 = {&41 .. &46} ABCDEF This is the same set range using the hexadecimal values.
Example5 = {#65 .. &46} ABCDEF Both decimal and hexadecimal notation can be mixed. This, however, can be confusing and it is not recommended.

Additional Examples

The following declares a set named "Hex Char" containing the characters that are valid in a hexadecimal number.

{Hex Char} = {Digit} + [ABCDEF]

The following declares a set containing the characters that can be placed inside a normal "string". In this case, the double quote is the delimiting character (which it is in most programming languages).

{String Char} = {Printable} - ["]