Block Comments

Overview

The block comments used by programming language have two major properties that affect how data is read and interpreted. These properties are defined by the programming language spec and may not be immediately visible by inspecting the language's syntax.

First, the most obvious feature of block comment is whether the language permits them to be nested. When a block comment allows nesting, each embedded block comment is read as a whole. Otherwise, the block comment will end when the first, and only, end symbol is encountered.

The second property of block comments involves how data will be interpreted inside the comment.  The text can be looked as a series of meaningless characters or the as series of tokens. This can affect how text, embedded in terminals such as string literals, will be interpreted by the lexer.

printf("Comment below");

/*
   printf("*/ Test /*");
   / * Nested comment */
*/
printf("Done");

For the following examples, the sample code to the left will be used.  

Even though this appears to be C code, please don't instinctively apply the behavior of C comments. The syntax (and function names) is based on C since it will be familiar to most programmers. For those not familiar with C, the symbol /* is used to create a block comment. The symbol */ is used to end it.

Notice that the printf function contains a string literal containing these two symbols. The different group attributes will have a profound effect on how the text will be parsed.

Examples

Character Unnested
printf("Comment below");

/*
   printf("*/
Test /*");
   /* Nested comment */
*/
printf("Done");

With the text untokenized,  the lexer will advance one character at a time within a group. As a result, the embedded */ within the string literal, will end the group.  The following /* will start  a second lexical group.

Since the group is unnested, the next  /* will not create an embedded group. As a result, the */ at the end of line will end the group.

Character Nested
printf("Comment below");

/*
   printf("*/ Test /*");
   /* Nested comment */
*/
printf("Done");

With the text untokenized,  the lexer will advance one character at a time within a group. As a result, the embedded */ within the string literal will end the group.  The following /* will start  a second lexical group.

 Since the group is nested, the following /*  will start another, embedded, group. This causes a new record to be pushed on the group-stack. The */  at the end of the line ends the embedded group and returns to the main one. The */ on the next line ends the entire group.

Token Unnested
printf("Comment below");

/*
   printf("*/ Test /*");
   /* Nested comment */
*/
printf("Done");

With the text tokenized,  the lexer advances on each recognizable token. As a result, the embedded string literal is read all at a time. The embedded  /* and  */  are part of the string literal And do not start or end a group.

Since the group is unnested, the next  /* will not create an embedded group. As a result, the */ at the end of line will end the entire group.

Token
 Nested
printf("Comment below");

/*
   printf("*/ Test /*");
   /* Nested comment */
*/
printf("Done");

With the text tokenized,  the lexer will advance one character at a time within a group. As a result, the embedded */ within the string literal will end the group.  The following /* will start  a second lexical group.

Since the group is nested, the following /*  will start another, embedded, group. This causes a new record to be pushed on the group-stack. The */  at the end of the line ends the embedded group and returns to the main one. The */ on the next line ends the entire group. 

In this case, no syntax error will occur.

Summary

Here are the four tables again for a side-by-side view.

Case 1
Character Unnested
printf("Comment below");

/*
   printf("*/ Test /*");
   / * Nested comment */
*/
printf("Done");
 
Case 2
Token Unnested
printf("Comment below");

/*
   printf("*/ Test /*");
   /* Nested comment */
*/
printf("Done");
     
Case 3
Character Nested
printf("Comment below");

/*
   printf("*/ Test /*");
   /* Nested comment */
*/
printf("Done");
 
Case 4
Token Nested
printf("Comment below");

/*
   printf("*/ Test /*");
   /* Nested comment */
*/
printf("Done");