Literal sets of characters are delimited using the square brackets '[' and ']' and pre-defined sets are delimited by the braces '{' and '}'. For instance, the text "[abcde]" denotes a set of characters consisting of the first five letters of the alphabet; while the text "{abc}" refers to a set named "abc".
Sets can then be declared by adding and subtracting previously declared sets and literal sets. The GOLD Builder provides a collection of pre-defined sets that contain characters often used to define terminals.
Each character in the Basic Multilingual Plane of the Unicode Character Set is represented by a 16-bit integer. This value is known as a "code point" in Unicode terminology. The characters from 0xD800 to 0xDBFF and from 0xFFF0 to 0xFFFF are reserved by Unicode for encoding. As a result, these values cannot be used.
The developer can specify any character using either its decimal or hexadecimal code point. Decimal values are denoted by a number-sign prefix (#) and hexadecimal values are denoted by an ampersand (&).
Set ranges can be specified by using a ".." between two values. Both the start and end values can be in either decimal or hexadecimal.
Set Name | Description |
---|---|
{#n} | Using this notation, you can specify any character - in particular, those not accessible via the keyboard. For instance, {#169} specifies the copyright character ©. The value of, n can be any number from 1 to 55295 or from 56320 to 65519. |
{&n} | This is the hexadecimal notation for a single character. The value of, n can be any number from &1 to &D7FF or from &DC00 to &FFEF. |
{#n .. #m} | Using this notation, you can specify a set containing the characters from n to m. The number-sign denotes a decimal value. |
{&n .. &m} | Set ranges can also be defined using hexadecimal values. |
Declaration | Resulting Set |
---|---|
{Bracket} = [']'] | ] |
{Quote} = [''] | ' |
{Vowels} = [aeiou] | aeiou |
{Vowels 2} = {Vowels} + [y] | aeiouy |
{Set 1} = [abc] | abc |
{Set 2} = {Set 1} + [12] - [c] | ab12 |
{Set 3} = {Set 2} + [0123456789] | ab0123456789 |
Declaration | Characters | Comments |
---|---|---|
Example1 = {#65} | A | This example specifies the character with the Unicode codepoint of 65 - the letter 'A'. |
Example1 = {&41} | A | Hexadecimal value for 'A'. |
Example3 = {#65 .. #70} | ABCDEF | This set range defines a set from from the letter 'A' (#65) to 'F' (#70). |
Example4 = {&41 .. &46} | ABCDEF | This is the same set range using the hexadecimal values. |
Example5 = {#65 .. &46} | ABCDEF | Both decimal and hexadecimal notation can be mixed. This, however, can be confusing and it is not recommended. |
The following declares a set named "Hex Char" containing the characters that are valid in a hexadecimal number.
{Hex Char} = {Digit} + [ABCDEF] |
The following declares a set containing the characters that can be placed inside a normal "string". In this case, the double quote is the delimiting character (which it is in most programming languages).
{String Char} = {Printable} - ["] |