Skip to main content

Pre-processors

Introduction

The C compiler uses a preprocessor to process the code before compiling the program.

The preprocessor first cleans up the code, removing comments, combining multiple lines of statements into one logical line, etc. Then, the preprocessor instructions beginning with # are executed. This chapter describes the preprocessor instructions in C.

Preprocessing instructions can appear anywhere in a program, but by convention, they are often placed at the beginning of the code.

Each preprocessing instruction begins with # and is placed at the beginning of a line; the instruction can be preceded by a blank character (such as a space or tab). There can also be spaces between # and the rest of the instruction, but for compatibility with older compilers, spaces are generally left out.

All preprocessing directives are one line long, unless a backslash is used at the end of the line to break it. No semicolon is required at the end of directives.

#define

#define is the most common preprocessing directive and is used to replace a specified word with another word. It has two arguments, the first of which is the part to be replaced, and the rest of the arguments are the contents of the replacement. Each substitution rule, called a macro (macro)

#define MAX 100

In the above example, #define specifies that MAX inside the source code is to be replaced with 100 in its entirety. MAX is then called a macro.

Macro names do not allow spaces and must follow the C variable naming rules, using only letters, numbers and underscores (_), and no first character can be a number.

Macros are replace-as-you-go, replacing whatever is specified with exactly the same content.

#define HELLO "Hello, world"

// Equivalent to printf("%s", "Hello, world");
printf("%s", HELLO);

In the above example, the macro HELLO is replaced with "Hello, world" as is.

The #define directive can appear anywhere in the source file, and is valid from where the directive appears to the end of that file. It is customary to place #define at the head of the source file. The main benefit is that it makes the program more readable and easier to modify.

The #define directive starts with # and continues until the newline character. If the whole directive is too long, a backslash can be used at the line break to continue to the next line.

#define OW "The C programming language is invented \
in the 1970s."

In the example above, the backslash at the end of the first line splits the #define instruction into two lines.

#define allows multiple substitutions, i.e. a macro can contain another macro.

#define TWO 2
#define FOUR TWO*TWO

In the example above, FOUR will be replaced with 2*2.

Note that if the macro appears inside a string (i.e. in double quotes), or is part of another identifier, it will fail and no substitution will occur.

#define TWO 2

// Output TWO
printf("TWO\n");

// output 22
const TWOs = 22;
printf("%d\n", TWOs);

In the above example, the TWO inside the double quotes, and the identifier TWOs, are not replaced.

Macros with the same name can be defined repeatedly, and as long as the definitions are the same, there is no problem. If the definitions are different, an error will be reported.

// Correct
#define FOO hello
#define FOO hello

// error is reported
#define BAR hello
#define BAR world

In the above example, the macro FOO has not changed so it can be defined repeatedly, the macro BAR has changed and an error is reported.

Macros with parameters

Basic usage

The power of a macro is that its name can be followed by parentheses specifying that one or more parameters are accepted.

##define SQUARE(X) X*X

In the example above, the macro SQUARE can accept one argument X, replacing it with X*X.

Note that there should be no spaces between the name of the macro and the left parenthesis.

The usage of this macro is as follows.

// Replace with z = 2*2;
z = SQUARE(2);

This is written much like a function, but is not a function; it is a replacement exactly as it is, and will behave differently from a function.

#define SQUARE(X) X*X

// output 19
printf("%d\n", SQUARE(3 + 4));

In the example above, SQUARE(3 + 4) would have output 49 (7*7) if it were a function; the macro is an as-is replacement, so it replaces it with 3 + 4*3 + 4, which ends up outputting 19.

As you can see, as-is substitution can lead to unexpected behaviour. The solution is to use as many round brackets as possible when defining macros, which can avoid many surprises.

#define SQUARE(X) ((X) * (X))

In the example above, the replaced form of SQUARE(X), with two levels of parentheses, avoids many errors.

The parameters of the macro can also be empty.

#define getchar() getc(stdin)

In the example above, the macro getchar() has an empty argument. You can actually omit the parentheses in this case, but adding them will make it look more like a function.

In general, macros with arguments are one line. Here are two examples.

#define MAX(x, y) ((x)>(y)? (x):(y))
#define IS_EVEN(n) ((n)%2==0)

If the macro is too long, you can use a backslash (`\) to fold the line and write the macro as multiple lines.

#define PRINT_NUMS_TO_PRODUCT(a, b) { \
int product = (a) * (b); \
for (int i = 0; i < product; i++) { \
printf("%d\n", i); \
} \
}

In the example above, the replacement text is placed inside curly brackets, this is to create a block scope and to avoid variables inside the macro polluting the outside.

Macros with parameters can also be nested, with one macro containing another macro inside.

#define QUADP(a, b, c) ((-(b) + sqrt((b) * (b) - 4 * (a) * (c))) / (2 * (a)))
#define QUADM(a, b, c) ((-(b) - sqrt((b) * (b) - 4 * (a) * (c))) / (2 * (a)))
#define QUAD(a, b, c) QUADP(a, b, c), QUADM(a, b, c)

The above example is a macro for solving a system of quadratic equations. Since there are positive and negative solutions, the macro QUAD is first replaced with two other macros QUADP and QUADM, which are then each replaced with a solution.

So when do you use macros with parameters and when do you use functions?

Generally speaking, functions should be used first; they are more powerful and easier to understand. Macros sometimes produce unexpected substitutions and can often only be written as a single line, unless the newline character is escaped, but readability becomes poor.

Macros have the advantage of being relatively simple, essentially string substitution with no data types involved, unlike functions where data types must be defined. Also, macros replace every place with actual code, eliminating the overhead of function calls, so performance is better. Also, previous code made extensive use of macros, especially for simple mathematical operations, and some knowledge of it is needed in order to read the code of previous generations.

# operator, ## operator

Since macros do not involve data types, the replacement may result in values of various types. If you want the value after replacement to be a string, you can prefix the argument to the replacement text with #.

#define STR(x) #x

// Equivalent to printf("%s\n", "3.14159");
printf("%s\n", STR(3.14159));

In the example above, STR(3.14159) will be replaced with 3.14159. If x is not preceded by #, this will be interpreted as a floating point number, with # it will be converted to a string.

Here is another example.

#define XNAME(n) "x "#n

// Output x4
printf("%s\n", XNAME(4));

In the example above, #n specifies the output of the argument as a string, which is then combined with the preceding string to give a final output of "x4". If you don't add #, it would be a pain to implement here.

If the parameter needs to be concatenated with other identifiers to form a new identifier within the replaced text, the ## operator can be used. It acts as a glue, "embedding" the parameter into an identifier.

#define MK_ID(n) i##n

In the above example, n is the argument to the macro MK_ID, which needs to be glued to the identifier i, so the ## operator is used between i and n. Here is an example of how this macro is used.

int MK_ID(1), MK_ID(2), MK_ID(3);
// Replace with
int i1, i2, i3;

In the above example, the replaced text i1, i2 and i3 are three identifiers and the argument n is part of the identifier. As you can see from this example, one of the main uses of the ## operator is to generate variable names and identifiers in bulk.

Macros with an indefinite number of arguments

Macros can also have an indeterminate number of arguments (i.e. an indeterminate number of parameters), with ... indicates the remaining arguments.

#define X(a, b, ...) (10*(a) + 20*(b)), __VA_ARGS__

In the above example, X(a, b, ...) means that X() has at least two arguments, and the extra arguments are denoted by ... to indicate that. In the replacement text, __VA_ARGS__ represents the redundant parameters (each parameter is separated by a comma). Here is an example of usage.

X(5, 4, 3.14, "Hi!", 12)
// Replace with
(10*(5) + 20*(4)), 3.14, "Hi!", 12

Note that ... can only replace the trailing argument of a macro, it cannot be written like the following.

// report an error
#define WRONG(X, ... , Y) #X #__CA_ARGS__ #Y

In the example above, ... replaces the middle part of the argument, which is not allowed and will report an error.

Preceding __VA_ARGS__ with a # sign will make the output a string.

#define X(...) #__VA_ARGS__

printf("%s\n", X(1,2,3)); // Prints "1, 2, 3"

#undef

The #undef directive is used to undefine macros that have been defined using #define.

#define LIMIT 400
##undef LIMIT

The undef directive in the above example cancels the macro LIMIT that has already been defined, so that you can later redefine a macro using LIMIT.

Sometimes you want to redefine a macro, but you are not sure if it has been defined before, so you can first cancel it with #undef and then define it again. This is because macros with the same name will report an error if they are defined differently twice, whereas #undef does not report an error if its argument is a non-existent macro.

GCC's -U option can undefine macros on the command line, which is equivalent to #undef.

gcc -ULIMIT foo.c

The -U argument in the above example undefines the macro LIMIT, which is equivalent to #undef LIMIT inside the source file.

#include

The #include directive is used to load other source code files, into the current file, at compile time. It has two forms.

// Form one
## #include <foo.h> // Load the system-supplied file

// Form two
#include "foo.h" // load the user-supplied file
``''

In form one, the filename is written inside sharp brackets, indicating that the file is system-supplied, usually a library file from the standard library, and that the path does not need to be written. It is not necessary to write the path because the compiler will go to the system specified installation directory to find these files.

In form 2, the file name is written inside double quotes, indicating that the file is provided by the user. The exact path depends on the compiler's settings, and may be the current directory or the project's working directory. If the file to be included is in another location, the path needs to be specified, here is an example.

```c
#include "/usr/local/lib/foo.h"

The GCC compiler's -I argument can also be used to specify the path where the user file is loaded in the include command.

gcc -Iinclude/ -o code code.c

In the above command, -Iinclude/ specifies that the user's own files are loaded from the include subdirectory of the current directory.

The most common use of #include is to load header files (with the suffix .h) containing function prototypes, see the chapter on Multi-file compilation. The order of multiple #include directives is irrelevant, and it is legal to include the same header file multiple times.

#if... ##endif

The #if... The #endif` directive is used for preprocessor conditional judgements. If the condition is met, the internal lines are compiled, otherwise they are ignored by the compiler.

#if 0
const double pi = 3.1415; // will not be executed
#endif

The 0 after #if in the above example indicates that the judgement condition is not valid. Therefore, the internal variable definition statement will be ignored by the compiler. This way of writing #if 0 is often used as a comment, and any code that is not needed is placed inside #if 0.

The judgement condition following #if is usually an expression. If the value of the expression is not equal to 0, it means that the condition is true and the internal statement is compiled; if the value of the expression is equal to 0, it means that the condition is false and the internal statement is ignored.

#if... #endif can also include the #else directive between them, to specify the statement to be compiled if the judgement condition is not true.

#define FOO 1

#if FOO
printf("defined\n");
#else
printf("not defined\n");
#endif

In the above example, the macro FOO is replaced with 1 if it has been defined, thus outputting defined, otherwise it outputs not defined.

You can also add the #elif command if there is more than one judgement condition.

#if HAPPY_FACTOR == 0
printf("I'm not happy!\n");
#elif HAPPY_FACTOR == 1
printf("I'm just regular\n");
#else
printf("I'm extra happy!\n");
#endif

In the above example, a second judgement is specified by #elif. Note that the position of #elif must precede #else. If none of the conditions are met, the #else part is executed.

A macro that has not been defined is equivalent to 0. So if UNDEFINED is an undefined macro, then #if UNDEFINED is false and #if !UNDEFINED is true.

A common application of #if is to turn on (or off) debug mode.

#define DEBUG 1

#if DEBUG
printf("value of i : %d\n", i);
printf("value of j : %d\n", j);
#endif

In the above example, by setting DEBUG to 1, debug mode is turned on and debug information can be output.

GCC's -D parameter allows you to specify the value of the macro at compile time, so you can easily turn on the debug switch.

``bash gcc -DDEBUG=1 foo.c


In the above example, the `-D` parameter specifies the macro `DEBUG` as `1`, which is equivalent to specifying `#define DEBUG 1` in the code.

## #ifdef... #endif

The `#ifdef... #endif` directive is used to determine if a macro has been defined.

Sometimes a library may be loaded repeatedly in the source code file. To avoid this, an empty macro can be defined in the library file using `#define`. This macro is used to determine whether the library file has been loaded or not.

```c
#define EXTRA_HAPPY

In the example above, EXTRA_HAPPY is an empty macro.

The source file then uses `#ifdef... #endif to check if this macro has been defined.

#ifdef EXTRA_HAPPY
printf("I'm extra happy!\n");
#endif

In the above example, #ifdef`'' checks if the macro EXTRA_HAPPY`'' has been defined. If it already exists, it means that the library file has been loaded and a prompt line is printed.

#ifdef can be used in conjunction with the #else directive.

#ifdef EXTRA_HAPPY
printf("I'm extra happy!\n");
#else
printf("I'm just regular\n");
#endif

In the above example, the part of #else is executed if the macro EXTRA_HAPPY has not been defined before.

#ifdef... #else... #endif can be used to implement conditional loading.

#ifdef MAVIS
#include "foo.h"
#define STABLES 1
#else
#include "bar.h"
#define STABLES 2
#endif

The above example loads a different header file by determining whether the macro MAVIS has been defined or not.

defined operator

The #ifdef directive in the previous section is equivalent to #if defined.

#ifdef FOO
// Equivalent to
#if defined FOO

In the above example, defined is a preprocessing operator that returns 1 if its argument is a defined macro, and 0 otherwise.

Using this syntax, multiple judgements can be accomplished.

#if defined FOO
x = 2;
#elif defined BAR
x = 3;
#endif

One application of this operator is to load different header files for systems of different architectures.

#if defined IBMPC
#include "ibmpc.h"
#elif defined MAC
#include "mac.h"
#else
#include "general.h"
#endif

In the above example, the different architectures of the system need to define the corresponding macros. The code loads the corresponding header files according to the different macros.

#ifndef... #endif

#ifndef... #endif directive is the opposite of #ifdef... #endif is the opposite. It is used to determine if a macro has not been defined before, and then to perform the specified action.

#ifdef EXTRA_HAPPY
printf("I'm extra happy!\n");
#endif

#ifndef EXTRA_HAPPY
printf("I'm just regular\n");
#endif

In the above example, #ifdef and #ifndef specify the code that needs to be compiled for each of the two cases, depending on whether the macro EXTRA_HAPPY has been defined before or not.

#ifndef is often used to prevent double loading. For example, to prevent the header file myheader.h from being repeatedly loaded, it could be placed in #ifndef... #endif is loaded inside it.

#ifndef MYHEADER_H
#define MYHEADER_H
#include "myheader.h"
#endif

In the example above, the macro MYHEADER_H corresponds to the capitalization of the file name myheader.h. As soon as #ifndef finds that this macro has not been defined, it means that the header file has not been loaded, thus loading the internal code and will define the macro MYHEADER_H to prevent it from being loaded again.

#ifndef is equivalent to #if !defined.

#ifndef FOO
// Equivalent to
#if !defined FOO

Predefined macros

C provides some predefined macros that can be used directly.

  • __DATE__: compiles the date as a string in the format "Mmm dd yyyy" (e.g. Nov 23 2021).
  • __TIME__: compiles the time in the format "hh:mm:ss".
  • __FILE__: the current file name.
  • __LINE__: current line number.
  • __func__: the name of the function currently being executed. This predefined macro must be used in the function scope.
  • __STDC__: if set to 1, indicates that the current compiler follows the C standard.
  • __STDC_HOSTED__: if set to 1, the current compiler can provide a full standard library; otherwise it is set to 0 (standard libraries for embedded systems are often incomplete).
  • __STDC_VERSION__: the C language version used for compilation, a long integer in the format yyyymmL, C99 version 199901L, C11 version `201112L " and the C17 version is "201710L".

The following example prints the values of these predefined macros.

#include <stdio.h>

int main(void) {
printf("This function: %s\n", __func__);
printf("This file: %s\n", __FILE__);
printf("This line: %d\n", __LINE__);
printf("Compiled on: %s %s\n", __DATE__, __TIME__);
printf("C Version: %ld\n", __STDC_VERSION__);
}

/* The output is as follows

This function: main
This file: test.c
This line: 7
Compiled on: Mar 29 2021 19:19:37
C Version: 201710

*/

#line

The #line directive is used to override the predefined macro __LINE__ and change it to a custom line number. Subsequent lines will be counted from the new value of __LINE__.

// Reset the line number of the next line to 300
#line 300

In the above example, the line number of the line immediately following #line 300 will be changed to 300, and subsequent lines will be numbered incrementally from 300.

#line can also be changed out of the predefined macro __FILE__ to a custom file name.

#line 300 "newfilename"

In the example above, the next line number is reset to 300 and the filename is reset to newfilename.

#error

The #error directive is used to get the preprocessor to throw an error and terminate compilation.

##if __STDC_VERSION__ ! = 201112L
#error Not C11
#endif

The above example specifies that if the compiler does not use the C11 standard, the compilation will be aborted and the GCC compiler will report an error like the following.

$ gcc -std=c99 newish.c
newish.c:14:2: error: #error Not C11

The above example, compiled with the C99 standard by GCC, reports an error.

#if INT_MAX < 100000
#error int type is too small
#endif

In the above example, the compiler will stop compiling once it finds that the maximum value of the INT type is less than 100,000.

The #error instruction can also be used in #if... #elif... #else sections.

#if defined WIN32
// ...
#elif defined MAC_OS
// ...
#elif defined LINUX
// ...
#else
#error NOT support the operating system
#endif

#pragma

The #pragma directive is used to modify compiler attributes.

// Use the C99 standard
#pragma c9x on

The above example tells the compiler to compile with the C99 standard.