Pre-processors
Introduction
The C compiler uses a preprocessor to process the code before compiling the program.
The preprocessor first cleans up the code, removing comments, combining multiple lines of statements into one logical line, etc. Then, the preprocessor instructions beginning with #
are executed. This chapter describes the preprocessor instructions in C.
Preprocessing instructions can appear anywhere in a program, but by convention, they are often placed at the beginning of the code.
Each preprocessing instruction begins with #
and is placed at the beginning of a line; the instruction can be preceded by a blank character (such as a space or tab). There can also be spaces between #
and the rest of the instruction, but for compatibility with older compilers, spaces are generally left out.
All preprocessing directives are one line long, unless a backslash is used at the end of the line to break it. No semicolon is required at the end of directives.
#define
#define
is the most common preprocessing directive and is used to replace a specified word with another word. It has two arguments, the first of which is the part to be replaced, and the rest of the arguments are the contents of the replacement. Each substitution rule, called a macro (macro)
#define MAX 100
In the above example, #define
specifies that MAX
inside the source code is to be replaced with 100
in its entirety. MAX
is then called a macro.
Macro names do not allow spaces and must follow the C variable naming rules, using only letters, numbers and underscores (_
), and no first character can be a number.
Macros are replace-as-you-go, replacing whatever is specified with exactly the same content.
#define HELLO "Hello, world"
// Equivalent to printf("%s", "Hello, world");
printf("%s", HELLO);
In the above example, the macro HELLO
is replaced with "Hello, world"
as is.
The #define
directive can appear anywhere in the source file, and is valid from where the directive appears to the end of that file. It is customary to place #define
at the head of the source file. The main benefit is that it makes the program more readable and easier to modify.
The #define
directive starts with #
and continues until the newline character. If the whole directive is too long, a backslash can be used at the line break to continue to the next line.
#define OW "The C programming language is invented \
in the 1970s."
In the example above, the backslash at the end of the first line splits the #define
instruction into two lines.
#define
allows multiple substitutions, i.e. a macro can contain another macro.
#define TWO 2
#define FOUR TWO*TWO
In the example above, FOUR
will be replaced with 2*2
.
Note that if the macro appears inside a string (i.e. in double quotes), or is part of another identifier, it will fail and no substitution will occur.
#define TWO 2
// Output TWO
printf("TWO\n");
// output 22
const TWOs = 22;
printf("%d\n", TWOs);
In the above example, the TWO
inside the double quotes, and the identifier TWOs
, are not replaced.
Macros with the same name can be defined repeatedly, and as long as the definitions are the same, there is no problem. If the definitions are different, an error will be reported.
// Correct
#define FOO hello
#define FOO hello
// error is reported
#define BAR hello
#define BAR world
In the above example, the macro FOO
has not changed so it can be defined repeatedly, the macro BAR
has changed and an error is reported.
Macros with parameters
Basic usage
The power of a macro is that its name can be followed by parentheses specifying that one or more parameters are accepted.
##define SQUARE(X) X*X
In the example above, the macro SQUARE
can accept one argument X
, replacing it with X*X
.
Note that there should be no spaces between the name of the macro and the left parenthesis.
The usage of this macro is as follows.
// Replace with z = 2*2;
z = SQUARE(2);
This is written much like a function, but is not a function; it is a replacement exactly as it is, and will behave differently from a function.
#define SQUARE(X) X*X
// output 19
printf("%d\n", SQUARE(3 + 4));
In the example above, SQUARE(3 + 4)
would have output 49 (7*7
) if it were a function; the macro is an as-is replacement, so it replaces it with 3 + 4*3 + 4
, which ends up outputting 19.
As you can see, as-is substitution can lead to unexpected behaviour. The solution is to use as many round brackets as possible when defining macros, which can avoid many surprises.
#define SQUARE(X) ((X) * (X))
In the example above, the replaced form of SQUARE(X)
, with two levels of parentheses, avoids many errors.
The parameters of the macro can also be empty.
#define getchar() getc(stdin)
In the example above, the macro getchar()
has an empty argument. You can actually omit the parentheses in this case, but adding them will make it look more like a function.
In general, macros with arguments are one line. Here are two examples.
#define MAX(x, y) ((x)>(y)? (x):(y))
#define IS_EVEN(n) ((n)%2==0)
If the macro is too long, you can use a backslash (`\
) to fold the line and write the macro as multiple lines.
#define PRINT_NUMS_TO_PRODUCT(a, b) { \
int product = (a) * (b); \
for (int i = 0; i < product; i++) { \
printf("%d\n", i); \
} \
}
In the example above, the replacement text is placed inside curly brackets, this is to create a block scope and to avoid variables inside the macro polluting the outside.
Macros with parameters can also be nested, with one macro containing another macro inside.
#define QUADP(a, b, c) ((-(b) + sqrt((b) * (b) - 4 * (a) * (c))) / (2 * (a)))
#define QUADM(a, b, c) ((-(b) - sqrt((b) * (b) - 4 * (a) * (c))) / (2 * (a)))
#define QUAD(a, b, c) QUADP(a, b, c), QUADM(a, b, c)
The above example is a macro for solving a system of quadratic equations. Since there are positive and negative solutions, the macro QUAD
is first replaced with two other macros QUADP
and QUADM
, which are then each replaced with a solution.
So when do you use macros with parameters and when do you use functions?
Generally speaking, functions should be used first; they are more powerful and easier to understand. Macros sometimes produce unexpected substitutions and can often only be written as a single line, unless the newline character is escaped, but readability becomes poor.
Macros have the advantage of being relatively simple, essentially string substitution with no data types involved, unlike functions where data types must be defined. Also, macros replace every place with actual code, eliminating the overhead of function calls, so performance is better. Also, previous code made extensive use of macros, especially for simple mathematical operations, and some knowledge of it is needed in order to read the code of previous generations.
#
operator, ##
operator
Since macros do not involve data types, the replacement may result in values of various types. If you want the value after replacement to be a string, you can prefix the argument to the replacement text with #
.
#define STR(x) #x
// Equivalent to printf("%s\n", "3.14159");
printf("%s\n", STR(3.14159));
In the example above, STR(3.14159)
will be replaced with 3.14159
. If x
is not preceded by #
, this will be interpreted as a floating point number, with #
it will be converted to a string.
Here is another example.
#define XNAME(n) "x "#n
// Output x4
printf("%s\n", XNAME(4));
In the example above, #n
specifies the output of the argument as a string, which is then combined with the preceding string to give a final output of "x4"
. If you don't add #
, it would be a pain to implement here.
If the parameter needs to be concatenated with other identifiers to form a new identifier within the replaced text, the ##
operator can be used. It acts as a glue, "embedding" the parameter into an identifier.
#define MK_ID(n) i##n
In the above example, n
is the argument to the macro MK_ID
, which needs to be glued to the identifier i
, so the ##
operator is used between i
and n
. Here is an example of how this macro is used.
int MK_ID(1), MK_ID(2), MK_ID(3);
// Replace with
int i1, i2, i3;
In the above example, the replaced text i1
, i2
and i3
are three identifiers and the argument n
is part of the identifier. As you can see from this example, one of the main uses of the ##
operator is to generate variable names and identifiers in bulk.
Macros with an indefinite number of arguments
Macros can also have an indeterminate number of arguments (i.e. an indeterminate number of parameters), with ...
indicates the remaining arguments.
#define X(a, b, ...) (10*(a) + 20*(b)), __VA_ARGS__
In the above example, X(a, b, ...)
means that X()
has at least two arguments, and the extra arguments are denoted by ...
to indicate that. In the replacement text, __VA_ARGS__
represents the redundant parameters (each parameter is separated by a comma). Here is an example of usage.
X(5, 4, 3.14, "Hi!", 12)
// Replace with
(10*(5) + 20*(4)), 3.14, "Hi!", 12
Note that ...
can only replace the trailing argument of a macro, it cannot be written like the following.
// report an error
#define WRONG(X, ... , Y) #X #__CA_ARGS__ #Y
In the example above, ...
replaces the middle part of the argument, which is not allowed and will report an error.
Preceding __VA_ARGS__
with a #
sign will make the output a string.
#define X(...) #__VA_ARGS__
printf("%s\n", X(1,2,3)); // Prints "1, 2, 3"
#undef
The #undef
directive is used to undefine macros that have been defined using #define
.
#define LIMIT 400
##undef LIMIT
The undef
directive in the above example cancels the macro LIMIT
that has already been defined, so that you can later redefine a macro using LIMIT.
Sometimes you want to redefine a macro, but you are not sure if it has been defined before, so you can first cancel it with #undef
and then define it again. This is because macros with the same name will report an error if they are defined differently twice, whereas #undef
does not report an error if its argument is a non-existent macro.
GCC's -U
option can undefine macros on the command line, which is equivalent to #undef
.
gcc -ULIMIT foo.c
The -U
argument in the above example undefines the macro LIMIT
, which is equivalent to #undef LIMIT
inside the source file.
#include
The #include
directive is used to load other source code files, into the current file, at compile time. It has two forms.
// Form one
## #include <foo.h> // Load the system-supplied file
// Form two
#include "foo.h" // load the user-supplied file
``''
In form one, the filename is written inside sharp brackets, indicating that the file is system-supplied, usually a library file from the standard library, and that the path does not need to be written. It is not necessary to write the path because the compiler will go to the system specified installation directory to find these files.
In form 2, the file name is written inside double quotes, indicating that the file is provided by the user. The exact path depends on the compiler's settings, and may be the current directory or the project's working directory. If the file to be included is in another location, the path needs to be specified, here is an example.
```c
#include "/usr/local/lib/foo.h"
The GCC compiler's -I
argument can also be used to specify the path where the user file is loaded in the include
command.
gcc -Iinclude/ -o code code.c
In the above command, -Iinclude/
specifies that the user's own files are loaded from the include
subdirectory of the current directory.
The most common use of #include
is to load header files (with the suffix .h
) containing function prototypes, see the chapter on Multi-file compilation. The order of multiple #include
directives is irrelevant, and it is legal to include the same header file multiple times.
#if... ##endif
The #if... The
#endif` directive is used for preprocessor conditional judgements. If the condition is met, the internal lines are compiled, otherwise they are ignored by the compiler.
#if 0
const double pi = 3.1415; // will not be executed
#endif
The 0
after #if
in the above example indicates that the judgement condition is not valid. Therefore, the internal variable definition statement will be ignored by the compiler. This way of writing #if 0
is often used as a comment, and any code that is not needed is placed inside #if 0
.
The judgement condition following #if
is usually an expression. If the value of the expression is not equal to 0
, it means that the condition is true and the internal statement is compiled; if the value of the expression is equal to 0, it means that the condition is false and the internal statement is ignored.
#if... #endif
can also include the #else
directive between them, to specify the statement to be compiled if the judgement condition is not true.
#define FOO 1
#if FOO
printf("defined\n");
#else
printf("not defined\n");
#endif
In the above example, the macro FOO
is replaced with 1
if it has been defined, thus outputting defined
, otherwise it outputs not defined
.
You can also add the #elif
command if there is more than one judgement condition.
#if HAPPY_FACTOR == 0
printf("I'm not happy!\n");
#elif HAPPY_FACTOR == 1
printf("I'm just regular\n");
#else
printf("I'm extra happy!\n");
#endif
In the above example, a second judgement is specified by #elif
. Note that the position of #elif
must precede #else
. If none of the conditions are met, the #else
part is executed.
A macro that has not been defined is equivalent to 0
. So if UNDEFINED
is an undefined macro, then #if UNDEFINED
is false and #if !UNDEFINED
is true.
A common application of #if
is to turn on (or off) debug mode.
#define DEBUG 1
#if DEBUG
printf("value of i : %d\n", i);
printf("value of j : %d\n", j);
#endif
In the above example, by setting DEBUG
to 1
, debug mode is turned on and debug information can be output.
GCC's -D
parameter allows you to specify the value of the macro at compile time, so you can easily turn on the debug switch.
``bash gcc -DDEBUG=1 foo.c
In the above example, the `-D` parameter specifies the macro `DEBUG` as `1`, which is equivalent to specifying `#define DEBUG 1` in the code.
## #ifdef... #endif
The `#ifdef... #endif` directive is used to determine if a macro has been defined.
Sometimes a library may be loaded repeatedly in the source code file. To avoid this, an empty macro can be defined in the library file using `#define`. This macro is used to determine whether the library file has been loaded or not.
```c
#define EXTRA_HAPPY
In the example above, EXTRA_HAPPY
is an empty macro.
The source file then uses `#ifdef... #endif
to check if this macro has been defined.
#ifdef EXTRA_HAPPY
printf("I'm extra happy!\n");
#endif
In the above example, #ifdef`'' checks if the macro
EXTRA_HAPPY`'' has been defined. If it already exists, it means that the library file has been loaded and a prompt line is printed.
#ifdef
can be used in conjunction with the #else
directive.
#ifdef EXTRA_HAPPY
printf("I'm extra happy!\n");
#else
printf("I'm just regular\n");
#endif
In the above example, the part of #else
is executed if the macro EXTRA_HAPPY
has not been defined before.
#ifdef... #else... #endif
can be used to implement conditional loading.
#ifdef MAVIS
#include "foo.h"
#define STABLES 1
#else
#include "bar.h"
#define STABLES 2
#endif
The above example loads a different header file by determining whether the macro MAVIS
has been defined or not.
defined operator
The #ifdef
directive in the previous section is equivalent to #if defined
.
#ifdef FOO
// Equivalent to
#if defined FOO
In the above example, defined
is a preprocessing operator that returns 1 if its argument is a defined macro, and 0 otherwise.
Using this syntax, multiple judgements can be accomplished.
#if defined FOO
x = 2;
#elif defined BAR
x = 3;
#endif
One application of this operator is to load different header files for systems of different architectures.
#if defined IBMPC
#include "ibmpc.h"
#elif defined MAC
#include "mac.h"
#else
#include "general.h"
#endif
In the above example, the different architectures of the system need to define the corresponding macros. The code loads the corresponding header files according to the different macros.
#ifndef... #endif
#ifndef... #endif
directive is the opposite of #ifdef... #endif
is the opposite. It is used to determine if a macro has not been defined before, and then to perform the specified action.
#ifdef EXTRA_HAPPY
printf("I'm extra happy!\n");
#endif
#ifndef EXTRA_HAPPY
printf("I'm just regular\n");
#endif
In the above example, #ifdef
and #ifndef
specify the code that needs to be compiled for each of the two cases, depending on whether the macro EXTRA_HAPPY
has been defined before or not.
#ifndef
is often used to prevent double loading. For example, to prevent the header file myheader.h
from being repeatedly loaded, it could be placed in #ifndef... #endif
is loaded inside it.
#ifndef MYHEADER_H
#define MYHEADER_H
#include "myheader.h"
#endif
In the example above, the macro MYHEADER_H
corresponds to the capitalization of the file name myheader.h
. As soon as #ifndef
finds that this macro has not been defined, it means that the header file has not been loaded, thus loading the internal code and will define the macro MYHEADER_H
to prevent it from being loaded again.
#ifndef
is equivalent to #if !defined
.
#ifndef FOO
// Equivalent to
#if !defined FOO
Predefined macros
C provides some predefined macros that can be used directly.
__DATE__
: compiles the date as a string in the format "Mmm dd yyyy" (e.g. Nov 23 2021).__TIME__
: compiles the time in the format "hh:mm:ss".__FILE__
: the current file name.__LINE__
: current line number.__func__
: the name of the function currently being executed. This predefined macro must be used in the function scope.__STDC__
: if set to 1, indicates that the current compiler follows the C standard.__STDC_HOSTED__
: if set to 1, the current compiler can provide a full standard library; otherwise it is set to 0 (standard libraries for embedded systems are often incomplete).__STDC_VERSION__
: the C language version used for compilation, a long integer in the formatyyyymmL
, C99 version199901L
, C11 version `201112L " and the C17 version is "201710L".
The following example prints the values of these predefined macros.
#include <stdio.h>
int main(void) {
printf("This function: %s\n", __func__);
printf("This file: %s\n", __FILE__);
printf("This line: %d\n", __LINE__);
printf("Compiled on: %s %s\n", __DATE__, __TIME__);
printf("C Version: %ld\n", __STDC_VERSION__);
}
/* The output is as follows
This function: main
This file: test.c
This line: 7
Compiled on: Mar 29 2021 19:19:37
C Version: 201710
*/
#line
The #line
directive is used to override the predefined macro __LINE__
and change it to a custom line number. Subsequent lines will be counted from the new value of __LINE__
.
// Reset the line number of the next line to 300
#line 300
In the above example, the line number of the line immediately following #line 300
will be changed to 300, and subsequent lines will be numbered incrementally from 300.
#line
can also be changed out of the predefined macro __FILE__
to a custom file name.
#line 300 "newfilename"
In the example above, the next line number is reset to 300
and the filename is reset to newfilename
.
#error
The #error
directive is used to get the preprocessor to throw an error and terminate compilation.
##if __STDC_VERSION__ ! = 201112L
#error Not C11
#endif
The above example specifies that if the compiler does not use the C11 standard, the compilation will be aborted and the GCC compiler will report an error like the following.
$ gcc -std=c99 newish.c
newish.c:14:2: error: #error Not C11
The above example, compiled with the C99 standard by GCC, reports an error.
#if INT_MAX < 100000
#error int type is too small
#endif
In the above example, the compiler will stop compiling once it finds that the maximum value of the INT
type is less than 100,000
.
The #error
instruction can also be used in #if... #elif... #else
sections.
#if defined WIN32
// ...
#elif defined MAC_OS
// ...
#elif defined LINUX
// ...
#else
#error NOT support the operating system
#endif
#pragma
The #pragma
directive is used to modify compiler attributes.
// Use the C99 standard
#pragma c9x on
The above example tells the compiler to compile with the C99 standard.