Pre-Defined Character Sets

The GOLD Builder has a collection of useful pre-defined sets at your disposal. These include the sets that are often used for defining terminals as well as characters not accessible via the keyboard.

Constants

Name	Codepoint	Description
{HT}	&09	Horizontal Tab character.
{LF}	&10	Line Feed character.
{VT}	&11	Vertical Tab character. This character is rarely used.
{FF}	&12	Form Feed character. This character is also known as "New Page".
{CR}	&13	Carriage Return character {#13}.
{Space}	&20	Space character. Technically, this set is not needed since a "space" can be expressed by using single quotes: ' '. The set was added to allow the developer to more explicitly indicate the character and add readability.
{NBSP}	&A0	No-Break Space character. The No-Break Space character is used to represent a space where a line break is not allowed. It is often used in source code for indentation.
{LS}	&2028	Unicode Line Separator
{PS}	&2029	Unicode Paragraph Separator

Common Character Sets

Please see the Pre-Defined Character Set Chart for pictures of these characters.

Name	Codepoints	Description
{Number}	&30 .. &39	0123456789
{Digit}	&30 .. &39	0123456789 This set is maintained to support older grammars. The term "digit" is technically inaccurate and this set will eventually be removed, but not for a long time. Please use the {Number} set.
{Letter}	&41 .. &5A, &61 .. &7A	abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
{AlphaNumeric}	&30 .. &39, &41 .. &5A, &61 .. &7A	This set includes all the characters in {Letter} and {Number}
{Printable}	&20 .. &7E, &A0	This set includes all standard characters that can be printed onscreen. This includes the characters from #32 to #127 and #160 (No-Break Space). The No-Break Space character was included since it is often used in source code.
{Letter Extended}	&C0 .. &D6, &D8 .. &F6, &F8 .. &FF	This set includes all the letters which are part of the extended characters in the first 256 characters (ANSI).
{Printable Extended}	&A1 .. &FF	This set includes all the printable characters above #127. Although rarely used in programming languages, they could be used, for instance, as valid characters in a string literal.
{Whitespace}	&09 .. &0D, %20, &A0	This set includes all characters that are normally considered whitespace and ignored by the parser. The set consists of the Space, Horizontal Tab, Line Feed, Vertical Tab, Form Feed, Carriage Return and No-Break Space.

Useful

Name	Codepoints
{All Latin}	&41 .. &5A, &61 .. &7A, &AA, &B5, &BA, &C0 .. &D6, &D8 .. &F6, &F8 .. &024F, &1E00 .. &1EFF, &2C60 .. &2C7F, &A720 .. &A7FF
{All Letters}	This is too large to display here.
{All Printable}	&20 .. &7F, &A0 .. &200A, &2010 .. &2027, &202F .. &205F, &2065 .. &2069, &2070 .. &D7FF, &E000 .. &FEFE, &FF00 .. &FFEF
{All Space}	&20, &A0, &1680, &180E, &2000 .. &200A, &202F, &205F, &3000
{All Newline}	&0A, &0D, &2028, &2029
{All Whitespace}	&09 .. &0D, &20, &85, &A0, &1680, &180E, &2000 .. &200A, &2028, &2029, &202F, &205F, &3000

Unicode Blocks

Unicode is divided into different distinct sections for many of the world's languages. The GOLD Meta-Language contains a number of predefined sets based on the Unicode standard.

For more information please visit: www.unicode.org.

Name	Codepoints
{Basic Latin}	&00 .. &7F
{Latin-1 Supplement}	&80 .. &FF
{Latin Extended-A}	&0100 .. &017F
{Latin Extended-B}	&0180 .. &024F
{IPA Extensions}	&0250 .. &02AF
{Spacing Modifier Letters}	&02B0 .. &02FF
{Combining Diacritical Marks}	&0300 .. &036F
{Greek and Coptic}	&0370 .. &03FF
{Cyrillic}	&0400 .. &04FF
{Cyrillic Supplement}	&0500 .. &052F
{Armenian}	&0530 .. &058F
{Hebrew}	&0590 .. &05FF
{Arabic}	&0600 .. &06FF
{Syriac}	&0700 .. &074F
{Arabic Supplement}	&0750 .. &077F
{Thaana}	&0780 .. &07BF
{NKo}	&07C0 .. &07FF
{Samaritan}	&0800 .. &083F
{Devanagari}	&0900 .. &097F
{Bengali}	&0980 .. &09FF
{Gurmukhi}	&0A00 .. &0A7F
{Gujarati}	&0A80 .. &0AFF
{Oriya}	&0B00 .. &0B7F
{Tamil}	&0B80 .. &0BFF
{Telugu}	&0C00 .. &0C7F
{Kannada}	&0C80 .. &0CFF
{Malayalam}	&0D00 .. &0D7F
{Sinhala}	&0D80 .. &0DFF
{Thai}	&0E00 .. &0E7F
{Lao}	&0E80 .. &0EFF
{Tibetan}	&0F00 .. &0FFF
{Myanmar}	&1000 .. &109F
{Georgian}	&10A0 .. &10FF
{Hangul Jamo}	&1100 .. &11FF
{Ethiopic}	&1200 .. &137F
{Ethiopic Supplement}	&1380 .. &139F
{Cherokee}	&13A0 .. &13FF
{Unified Canadian Aboriginal Syllabics}	&1400 .. &167F
{Ogham}	&1680 .. &169F
{Runic}	&16A0 .. &16FF
{Tagalog}	&1700 .. &171F
{Hanunoo}	&1720 .. &173F
{Buhid}	&1740 .. &175F
{Tagbanwa}	&1760 .. &177F
{Khmer}	&1780 .. &17FF
{Mongolian}	&1800 .. &18AF
{Unified Canadian Aboriginal Syllabics Extended}	&18B0 .. &18FF
{Limbu}	&1900 .. &194F
{Tai Le}	&1950 .. &197F
{New Tai Lue}	&1980 .. &19DF
{Khmer Symbols}	&19E0 .. &19FF
{Buginese}	&1A00 .. &1A1F
{Tai Tham}	&1A20 .. &1AAF
{Balinese}	&1B00 .. &1B7F
{Sundanese}	&1B80 .. &1BBF
{Lepcha}	&1C00 .. &1C4F
{Ol Chiki}	&1C50 .. &1C7F
{Vedic Extensions}	&1CD0 .. &1CFF
{Phonetic Extensions}	&1D00 .. &1D7F
{Phonetic Extensions Supplement}	&1D80 .. &1DBF
{Combining Diacritical Marks Supplement}	&1DC0 .. &1DFF
{Latin Extended Additional}	&1E00 .. &1EFF
{Greek Extended}	&1F00 .. &1FFF
{General Punctuation}	&2000 .. &206F
{Superscripts and Subscripts}	&2070 .. &209F
{Currency Symbols}	&20A0 .. &20CF
{Combining Diacritical Marks for Symbols}	&20D0 .. &20FF
{Letterlike Symbols}	&2100 .. &214F
{Number Forms}	&2150 .. &218F
{Arrows}	&2190 .. &21FF
{Mathematical Operators}	&2200 .. &22FF
{Miscellaneous Technical}	&2300 .. &23FF
{Control Pictures}	&2400 .. &243F
{Optical Character Recognition}	&2440 .. &245F
{Enclosed Alphanumerics}	&2460 .. &24FF
{Box Drawing}	&2500 .. &257F
{Block Elements}	&2580 .. &259F
{Geometric Shapes}	&25A0 .. &25FF
{Miscellaneous Symbols}	&2600 .. &26FF
{Dingbats}	&2700 .. &27BF
{Miscellaneous Mathematical Symbols-A}	&27C0 .. &27EF
{Supplemental Arrows-A}	&27F0 .. &27FF
{Braille Patterns}	&2800 .. &28FF
{Supplemental Arrows-B}	&2900 .. &297F
{Miscellaneous Mathematical Symbols-B}	&2980 .. &29FF
{Supplemental Mathematical Operators}	&2A00 .. &2AFF
{Miscellaneous Symbols and Arrows}	&2B00 .. &2BFF
{Glagolitic}	&2C00 .. &2C5F
{Latin Extended-C}	&2C60 .. &2C7F
{Coptic}	&2C80 .. &2CFF
{Georgian Supplement}	&2D00 .. &2D2F
{Tifinagh}	&2D30 .. &2D7F
{Ethiopic Extended}	&2D80 .. &2DDF
{Cyrillic Extended-A}	&2DE0 .. &2DFF
{Supplemental Punctuation}	&2E00 .. &2E7F
{CJK Radicals Supplement}	&2E80 .. &2EFF
{Kangxi Radicals}	&2F00 .. &2FDF
{Ideographic Description Characters}	&2FF0 .. &2FFF
{CJK Symbols and Punctuation}	&3000 .. &303F
{Hiragana}	&3040 .. &309F
{Katakana}	&30A0 .. &30FF
{Bopomofo}	&3100 .. &312F
{Hangul Compatibility Jamo}	&3130 .. &318F
{Kanbun}	&3190 .. &319F
{Bopomofo Extended}	&31A0 .. &31BF
{CJK Strokes}	&31C0 .. &31EF
{Katakana Phonetic Extensions}	&31F0 .. &31FF
{Enclosed CJK Letters and Months}	&3200 .. &32FF
{CJK Compatibility}	&3300 .. &33FF
{CJK Unified Ideographs Extension A}	&3400 .. &4DBF
{Yijing Hexagram Symbols}	&4DC0 .. &4DFF
{CJK Unified Ideographs}	&4E00 .. &9FFF
{Yi Syllables}	&A000 .. &A48F
{Yi Radicals}	&A490 .. &A4CF
{Lisu}	&A4D0 .. &A4FF
{Vai}	&A500 .. &A63F
{Cyrillic Extended-B}	&A640 .. &A69F
{Bamum}	&A6A0 .. &A6FF
{Modifier Tone Letters}	&A700 .. &A71F
{Latin Extended-D}	&A720 .. &A7FF
{Syloti Nagri}	&A800 .. &A82F
{Common Indic Number Forms}	&A830 .. &A83F
{Phags-pa}	&A840 .. &A87F
{Saurashtra}	&A880 .. &A8DF
{Devanagari Extended}	&A8E0 .. &A8FF
{Kayah Li}	&A900 .. &A92F
{Rejang}	&A930 .. &A95F
{Hangul Jamo Extended-A}	&A960 .. &A97F
{Javanese}	&A980 .. &A9DF
{Cham}	&AA00 .. &AA5F
{Myanmar Extended-A}	&AA60 .. &AA7F
{Tai Viet}	&AA80 .. &AADF
{Meetei Mayek}	&ABC0 .. &ABFF
{Hangul Syllables}	&AC00 .. &D7AF
{Hangul Jamo Extended-B}	&D7B0 .. &D7FF
{Private Use Area}	&E000 .. &F8FF
{CJK Compatibility Ideographs}	&F900 .. &FAFF
{Alphabetic Presentation Forms}	&FB00 .. &FB4F
{Arabic Presentation Forms-A}	&FB50 .. &FDFF
{Variation Selectors}	&FE00 .. &FE0F
{Vertical Forms}	&FE10 .. &FE1F
{Combining Half Marks}	&FE20 .. &FE2F
{CJK Compatibility Forms}	&FE30 .. &FE4F
{Small Form Variants}	&FE50 .. &FE6F
{Arabic Presentation Forms-B}	&FE70 .. &FEFF
{Halfwidth and Fullwidth Forms}	&FF00 .. &FFEF

Miscellaneous Character Sets

Name	Codepoints	Description
{All Valid}	&01 .. &D7FF, &E000 .. &FFEF	The {All Valid} character set contains every valid character in the Basic Multilingual Plane of the Unicode Character Set. This includes the characters from &1 to &D7FF and &DC00 to &FFEF. Please note that this set also includes whitespace and control characters that are rarely used in programming languages. It is only available in versions 2.6 and later of the Builder.
{ANSI Mapped}	&0152, &0153, &0160, &0161, &0178, &017D, &017E, &0192, &02C6, &02DC, &2013, &2014, &2018 .. &201A, &201C .. &201E, &2020 .. &2022, &2026, &2030, &2039, &203A, &20AC, &2122	This set contains the characters between 128 and 159 that have different values in Unicode. The set is only available in versions 2.0.6 and later of the Builder.
{ANSI Printable}	&20 .. &7E, &A0 .. &FF	This set contains all printable characters available in ANSI. Essentially, this is a union of {Printable}, {Printable Extended} and {ANSI Mapped}. The set is only available in versions 2.0.6 and later of the Builder.
{Control Codes}	&00 .. &1F, &7F .. &9F	This set includes the characters from 1 to 31 and from 127 to 159. It is only available in versions 2.6 and later of the Builder.
{Euro Sign}	&20AC	The Euro Currency Sign.
{Formatting}	&200B .. &200F, &202A .. &202E, &2060 .. &2064, &206A .. &206F, &FEFF, &FFF9 .. &FFFB	Unicode formatting characters.