Understanding C program compilation process GCC | by Cristian Camilo Peralta | Oct, 2021

Shared By



Visit The Original Post

Cristian Camilo Peralta

An article written by Cristian Peralta…

C is a general-purpose procedural computer programming language with features economy expression, modern flow control, data structures and a rich set of operators.

As a successor to the programming language B, C was originally designed for and implemented on the UNIX operating system by Denis Ritchie. Supporting structures programming, lexical variable scope, and recursion, C is not a very high level language, nor a big one, nevertheless:

“its absence of restrictions and its generality make it more convenient and effective for many task than supposedly powerful languages” according to Brian W Kernighan and Dennis Ritchie, in their most famous book C-Programming-Language written in 1978.

So, the growing popularity of C, the changes in the language over the years, and the need of programmers to translate the source code from a “high level programming language” (usually human readable) into a “lower level programming language” in order to create an executable program, allowed the creation of compilers by groups, one of this is not other than GCC.

Bur first of all, many of you might wonder, what is a compiler? And basically the answer is simple but instructive: in computing, a compiler is a program that translates computer code written in one programming language (the source language) into another language (the target language) to create an executable program that works across many operating systems, various technologies, and platforms in different useful forms. All in all, a compiler is likely to perform some important operations in order to make a program executable.

Thus, compiling a C program is a multi-stage process that have basically four steps to do it:

Preprocessing.

Compilation.

Assembly.

Linking.

The GNU Compiler Collection (GCC) is an optimizing compiler produced by the GNU Project supporting various programming languages, hardware architectures and operating systems.

As C is a compiled language its source code is written using any editor of a programmer’s choice in the form of a text file, like Vi, Emacs, Visual Studio Code, etc. once we have done that then it has to be compiled into machine code.

Let’s see that:

Preprocessing:

In this stage, the preprocessor takes the source code as an input, and it removes all the comments from the source code, expanding macros and included files, in this way, the preprocessor erase the irrelevant sections of the code reducing its repetition. For example, lines starting with a # character are interpreted by the preprocessor as preprocessor commands. These commands form a simple macro language with its own syntax and semantics.

To print the result of the preprocessing stage, pass the -E option to cc:

With this command the preprocessor produce the contents of the stdio.h header file with the contents of the hello_world.c erasing irrelevant sections of the code.

Compilation:

In this stage, the preprocessed code is translated to assembly instructions specific to the target processor architecture. These form an intermediate human readable language. This step can generate machine code directly invoking the assembler.The existence of this step allows for C code to contain inline assembly instructions generating an output that the assembler will read and translate.

To save the result of the compilation stage, pass the -S option to cc:

This will create a file named hello_world.s, containing the generated assembly instructions.

Assembly:

During this stage, an assembler is used to translate the assembly instructions to object code. The output consists of actual instructions to be run by the target processor. Running the command -c option to cc will create a file with the extension .o containing the object code of the program in a binary format.

Linking:

During this stage, the object code generated in the assembly stage is composed of machine instructions that the processor usually understands. nevertheless, sometimes some pieces of the program are out of order, or missing.

In order to generate an executable program, those pieces have to be rearrange. This process is called linking.

The object code generated in the assembly stage is composed of machine instructions that the processor understands but some pieces of the program are out of order or missing. To produce an executable program, the existing pieces have to be rearranged and the missing ones filled in. This process is called linking.

In the case of the “Hello, World!” program, the linker will add the object code for the puts function. The result of this stage is the final executable program. In this way, cc will name this file a.out. To name the file something else, pass the -o option to cc:

If you want to know more, you can always consult the man page on gcc or watch some interesting tutorials on internet.

Leave a Reply

Your email address will not be published. Required fields are marked *