The Tower Bridge in Sacramento, California Getting Started
What is a Parser?
Main
Latest News
Getting Started
Screen Shots
Download
Documentation
Contributors
Contact
About GOLD
How It Works
FAQ
Why Use GOLD?
Comparison
Revision History
Freeware License
More ...
Articles
What is a Parser?
Backus-Naur Form
DFA Lexer
LALR Parsing
Glossary
Links
More ...


What is a "Parser"?

While the text of a program is easy to understand by humans, the computer must convert it into a form which it can understand before any emulation or compilation can begin.

This process is know generally as "parsing" and consists of two distinct parts.

Components

Lexical Analysis

The first component is called the "lexer" - sometimes also called the "scanner". The lexer takes the source text and breaks it into the reserved words, constants, identifiers, and symbols that are defined in the language.

Lexical analysis is concerned with a grammar's terminals.

The result of the lexical analysis is a series of "tokens" which contains the text of the source broken into individual pieces of data. While terminals are used to represent the classification of information, tokens contain the actual information.

Essentially, a token is an instance of a terminal.  For instance, the common identifier is a specific type of terminal, but can exist in various forms such as "Value1", "cat", "Sacramento", etc...

Syntactic Analysis

Syntactic analysis is concerned with a grammar's productions.

After the text is broken into a stream of tokens, the system needs to determine which groups of symbols form the meaningful constructs and groups used in the language.

The second component is called the "parser". This is where the terminology gets a tad confusing. Since a parser requires a lexer to function properly, the term "parser" is often used to refer to both.

The "tokens" created by the lexer are subsequently passed to the actual 'parser'  which analyzes the series of tokens and then determines when one of the language's syntax rules is complete.

Finally...

The result of the lexical and syntactic analysis components is a  tree that follows the structure of the grammar  and contains all the tokens created by the lexer. Essentially, nonterminals function as the tree's nodes while tokens represent the tree's leaves.

In this form, the program is ready to be interpreted or compiled by the application. This can be in the form of compiling it to a new program, running it through interpretation or translating the text to another programming language.

Next: How Parsers Work