hvn-network

My Blog List

Sunday, August 21, 2016

Preprocessor

The C Preprocessor, often known as CPP, is a macro processor that is used automatically by the C compiler. CPP  is a separate step (the first step) in the compilation process. In simple terms, a C Preprocessor is just a text substitution tool.


1. Overview

1.1. Character sets

The files input to CPP might be in any character set at all. CPP’s very first action, before it even looks for line boundaries, is to convert the file into the character set it uses for internal processing. That set is what the C standard calls the source character set. It must be isomorphic with Unicode. CPP uses the UTF-8 encoding of Unicode. If you request textual output from the preprocessor with the ‘-E’ option, it will be in UTF-8.

1.2. Initial processing

Step 1. The input file is read into memory and broken into lines.

Different systems use different conventions to indicate the end of a line. GCC accepts the ASCII control sequences LF, CR LF and CR as end-of-line markers. 

Step 2. Trigraph Replacement

If trigraphs are enabled, they are replaced by their corresponding single characters. By default GCC ignores trigraphs, but if you request a strictly conforming mode with the ‘-std’ option, or you specify the ‘-trigraphs’ option, then it converts them.These are nine three-character sequences, all starting with ‘??’, that are defined by ISO C to stand for single characters.
Trigraph
??( ??) ??< ??> ??= ??/ ??’ ??! ??-
Replacement
[ ] { } # \ ^ | ~
Trigraphs and digraphs were created for programmers that didn't have a keyboard which supported the ISO 646 character set.

Step 3. Continued lines are merged into one long line.

A continued line is a line which ends with a backslash, ‘\’. The backslash is removed and the following line is joined with the current one. No space is inserted, so you may split a line anywhere, even in the middle of a word.If there is white space between a backslash and the end of a line, that is still a continued line. However, as this is usually the result of an editing mistake, and many compilers will not accept it as a continued line, GCC will warn you about it.

Step 4. All comments are replaced with single spaces

/\
*
*/ # /*
*/ defi\
ne NAM\
E Arno\
ld
is equivalent to #define NAME Arnold

1.3. Tokenization

After the textual transformations are finished, the input file is converted into a sequence of preprocessing tokens. These mostly correspond to the syntactic tokens used by the C compiler, but there are a few differences.Preprocessing tokens fall into five broad classes: identifiers, preprocessing numbers, string literals, punctuators, and other.

2. Macros

2.1. Predefined macros

ANSI C defines a number of macros. Although each one is available for use in programming, the predefined macros should not be directly modified.

Macro
Description
__DATE__ The current date as a character literal in "MMM DD YYYY" format.
__TIME__ The current time as a character literal in "HH:MM:SS" format.
__FILE__ This contains the current filename as a string literal.
__LINE__ This contains the current line number as a decimal constant.
__STDC__ Defined as 1 when the compiler complies with the ANSI standard.

Example:
main() {
   printf("File :%s\n", __FILE__ );
   printf("Date :%s\n", __DATE__ );
   printf("Time :%s\n", __TIME__ );
   printf("Line :%d\n", __LINE__ );
   printf("ANSI :%d\n", __STDC__ );
}

2.2. Object-like macros

An object-like macro is defined with a line of the form:
#define identifier token-sequence
where identifier will be replaced with token-sequence wherever identifier appears in regular text.Example:
#define BUFFER_SIZE 1024
#define NUMBERS 1,2,3
char *buf = (char *) malloc (BUFFER_SIZE);
int x[] = {NUMBERS};

2.3. Function-like macros

A function-like macro is defined with a line of the form:
#define identifier(identifier-list) token-sequence
Where the macro parameters are contained in the comma-separated identifier-list. The token-sequence following the identifier list determines the behavior of the macro, and is referred to as the replacement list. There can be no space between the identifier and the ``('' character. For example:
#define PRINT_INT(x) printf("x = %d\n", x)  
=> PRINT_INT (2);<=> printf("x = %d\n", 2);

2.4. Preprocessor operators

- The Macro continuation Operator(\):
The macro continuation operator (\) is used to continue a macro that is too long for a single line. For example:
#define NUMBERS 1,\
                2,\
                3
int x[] = {NUMBERS};
<=> x[] = {1, 2, 3}; 
- The Stringification Operator (#): 
The Stringification or number-sign operator ( '#' ), when used within a macro definition, converts a macro parameter into a string constant. This operator may be used only in a macro having a specified argument or parameter list. For example:
#define PRINT_INT(x) printf(#x " = %d\n", x)  
=> PRINT_INT (2);
<=> <->printf("2 = %d\n", 2); 
- The Concatenation Operator  (##):The Concatenation operator (##) within a macro definition combines two arguments. It permits two separate tokens in the macro definition to be joined into a single token. For example:
#define MAX(type, x, y)\
type type##_max(type x, type y) \
{\
 return (x > y ? x : y);\
}
MAX (float, x, y)
<=>
float float_max(float x, float y)
{
 return (x > y ? x : y);
}
- The Defined() Operator: 
The preprocessor defined operator is used in constant expressions to determine if an identifier is defined using #define. If the specified identifier is defined, the value is true (non-zero). If the symbol is not defined, the value is false (zero). The defined operator is specified as follows:
#if !defined (MAX)
        #define MAX 1000
#endif

2.5. Variadic macros

A macro can be declared to accept a variable number of arguments much as a function can. The syntax for defining the macro is similar to that of a function. Here is an example:
#define TRACE_LOG(fmt, args...) fprintf(stdout, fmt, ##args);
In main function:
TRACE_LOG("Array: ");
for (i = 0; i < 3; i ++) {
       TRACE_LOG ("arr[%d] = %d\t", i, arr[i]);
}
printf("\n");

3. File inclusion

- #include "filename" 
This variant is used for header files of your own program. It searches for a file named file first in the directory containing the current file, then in the quote directories and then the same directories used for <file>. You can prepend directories to the list of quote directories with the ‘-iquote’ option.
- #include <filename> 
This variant is used for system header files. It searches for a file named file in a standard list of system directories. You can prepend directories to this list with the ‘-I’ option
4. Conditional compilation


Directive
Description
#ifdef Returns true if this macro is defined.
#ifndef Returns true if this macro is not defined.
#if Tests if a compile time condition is true.
#else The alternative for #if.
#elif #else and #if in one statement.
#endif Ends preprocessor conditional


Other preprocessor directives: Assertions, Version control, Error generation, Pragmas.
You can find the example code at here and the lib.h library

Reference:

No comments:

Post a Comment