Skip to main content

Multi-document items

Introduction

A software project often contains multiple source files, which need to be compiled together to produce a single executable file when compiling.

Suppose a project has two source files, foo.c and bar.c, where foo.c is the main file and bar.c is the library file. The main'' file is the project entry file that contains the main()` function, which references the various functions defined in the library file.

// File foo.c
#include <stdio.h>

int main(void) {
printf("%d\n", add(2, 3)); // 5!
}

In the above code, the main file foo.c calls the function add(), which is defined inside the library file bar.c.

// File bar.c

int add(int x, int y) {
return x + y;
}

Now, compile the two files together.

$ gcc -o foo foo.c bar.c

# A more trouble-free way to write
$ gcc -o foo *.c

In the above command, the -o argument to gcc specifies the filename of the resulting binary executable, in this case foo.

When this command is run, the compiler issues a warning because during the compilation of foo.c, the compiler found an unrecognized function add() for which there is no prototype or definition in foo.c. It is therefore a good idea to modify foo.c to include the prototype of add() in the head of the file.

// File foo.c
#include <stdio.h>

int add(int, int);

int main(void) {
printf("%d\n", add(2, 3)); // 5!
}

Now compile again without the warning.

You might immediately think that if there are multiple files that use this function add(), then each file will need to include the function prototype. As soon as the function add() needs to be modified (e.g. to change the number of arguments), it will be very cumbersome and will need to be changed one file at a time. So it is common practice to create a new dedicated header file bar.h to hold the prototypes of all the functions defined inside bar.c.

// File bar.h

int add(int, int);

Then use the include`'' command to load this header file bar.h`' inside the source file that uses this function.

// File foo.c

#include <stdio.h>
#include "bar.h"

int main(void) {
printf("%d\n", add(2, 3)); // 5!
}

In the code above, #include "bar.h" means to include the header file bar.h. The fact that this file is not inside the pointed brackets indicates that it is user-supplied; the fact that it does not have a path written indicates that it is in the same directory as the current source code file.

It is then a good idea to load this header file inside bar.c as well, so that the compiler can verify that the function prototype is consistent with the function definition.

// File bar.c
#include "bar.h"

int add(int a, int b) {
return a + b;
}

Now recompile and you'll get the binary executable without any problems.

gcc -o foo foo.c bar.c

Repeat loading

Other header files can be loaded inside the header file, so it is possible to create duplicate loads. For example, a.h and b.h both load c.h, and then foo.c loads both a.h and b.h, which means that foo.c will compile c.h twice.

It is best to avoid this duplicate loading. While it is not an error to define the same function prototype multiple times, there are some statements that will report an error if used repeatedly, such as defining the same Struct data structure multiple times. A common solution to this problem is to set a special macro in the header file, so that once the macro is found to exist when loading, the current file is not loaded any further.

// File bar.h
#ifndef BAR_H
#define BAR_H
int add(int, int);
#endif

In the example above, the header file bar.h sets up a conditional judgement using #ifndef and #endif. Whenever this header file is loaded, this judgement is executed to see if the macro BAR_H has been set. If it is set, the header file has already been loaded and will not be loaded again, otherwise the macro will be set and the function prototype will be loaded.

extern descriptor

The current file can also use variables defined in other files, in which case the extern descriptor is used, declaring in the current file that the variable is defined in another file.

extern int myVar;

In the example above, the extern specifier tells the compiler that the variable myvar is declared in another script file and that no memory space needs to be allocated for it here.

Since there is no need to allocate memory space, extern declares the array without giving the length of the array.

extern int a[];

This declaration of shared variables can be written directly inside the source file or placed in the header file and loaded via the #include directive.

static descriptors

Under normal circumstances, global variables inside the current file can be used by other files. Sometimes it is not desirable that this happens, but rather that a variable is restricted to the current file and not referenced by other files.

In this case, you can use the static keyword when declaring the variable to make it private to the current file.

static int foo = 3;

In the above example, the variable foo can only be used inside the current file and cannot be referenced by other files.

Compilation strategy

Projects with multiple source files need to be compiled with all files together. Even if only one line is changed, it needs to be compiled from scratch, which is very time-consuming.

To save time, it is common practice to split the compilation into two steps. In the first step, each source file is compiled separately as an object file using GCC's -c parameter. In the second step, all object files are linked together and merged into a single binary executable.

$ gcc -c foo.c # Generate foo.o
$ gcc -c bar.c # Generate bar.o

# A more fuss-free way to write
$ gcc -c *.c

The above command generates the object files foo.o and bar.o for the source files foo.c and bar.c respectively.

The object files are not executable files, they are just a stage in the compilation process, the file name is the same as the source file but the suffix is changed to .o.

Once you have all the object files, use the gcc command again to merge them into one executable file by linking them.

$ gcc -o foo foo.o bar.o

# A more fuss-free way of writing
$ gcc -o foo *.o

Later, whichever source file is modified, recompile that file as an object file; the other files don't need to be recompiled, you can continue to use the original object file, and finally just link all the object files again. Since linking takes much less time than compiling, this saves a lot of time.

make command

Compiling large projects can be very cumbersome and error-prone if done manually. This is usually done using a special automated compilation tool, such as make.

make is a command line tool that automatically searches the current directory for the configuration file makefile (which can also be written as Makefile). This file defines all the compilation rules, each of which corresponds to a compilation product. In order to get this compilation product, it needs to know two things.

  • dependencies (which files are needed to generate this compilation product)
  • Generate command (the command to generate this compilation product)

For example, the object file foo.o is a compilation product whose dependency is foo.c and the generation command is gcc -c foo.c. The corresponding compilation rules are as follows.

foo.o: foo.c
gcc -c foo.c

In the above example, the compilation rule consists of two lines. The first line starts with the compilation product, followed by its dependencies after the colon, and the second line contains the generate command.

Note that the indentation of the second line must be done using the Tab key, if you use the space bar you will get an error.

The complete configuration file makefile consists of several compilation rules, and may look like the following.

foo: foo.o bar.o
gcc -o foo foo.o bar.o

foo.o: bar.h foo.c
gcc -c foo.c

bar.o: bar.h bar.c
gcc -c bar.c

The above is a sample file of a makefile. It contains three compilation rules corresponding to three compilation products (foo.o, bar.o and foo), each separated by a blank line.

With a makefile, the corresponding compilation rule is automatically invoked when compiling by specifying the compilation target (the name of the compilation product) after the make command.

$ make foo.o

# or
$ make bar.o

# or
$ make foo

In the above example, the make command generates different compilation products depending on the command.

If the compilation target is omitted, the make command will execute the first compilation rule and build the corresponding product.

make

In the above example, make is not followed by a compilation target, so the first compilation rule of the makefile is executed, in this case make foo. Since the user expects the final executable after make, it is always advisable to put the compilation rules for the final executable first in the makefile. makefile itself does not require the compilation rules to be in any order.

The power of the make command is that it does not compile every time the command is executed, but checks to see if recompilation is necessary. This is done by checking the timestamp of each source file to determine which files have changed since the last compilation. Those compilation products that are affected (i.e. compilation products that are directly or indirectly dependent on those source files that have changed) are then recompiled, and those that are not affected are not recompiled.

For example, after the last build, foo.c was modified, but not bar.c or bar.h. So, when you re-run the make foo command, Make will find that bar.c and bar.h have not been changed, so instead of recompiling bar.o, you only need to recompile foo.o. Once you have the new foo.o, you can recompile it with bar.o and make it into the new executable foo.

The great advantage of this design is that Make automatically handles the compilation process and only recompiles the changed files, thus saving a lot of time.