History of UNIX Project Build Tools

(The following is derived from the HACKING.txt file of the old open source project which I stopped supporting many years ago. R.I.P.)

You might have noticed above that there are SIX STEPS required to do a rebuild after editing configure.in. Why is it so complicated?

You might remember the days when all the dependencies and rules were encapsulated in one file (called Makefile) and no matter what you changed (including the Makefile itself) the make command would figure out how to rebuild everything. That's not true anymore.

The original Makefile model worked well before the proliferation of many different types of UNIX and the advent of cross-platform compatibility.

To handle all the different types of Unix, the Makefile has to be complex — so complex that it is no longer practical to edit by hand. Furthermore, many things have to be figured out by the machine where the program is going to be built, rather than the machine where the programmer developed it. So, the "meta-rules" for creating Makefiles got too complicated to edit by hand. Eventually there got to be a lot of different types of source files, and a lot of different rules for what to do if you want to change something. The best way to explain this is by going through the history in chronological order — starting prior to the origin of Makefile and make.

Originally, all changes were made just by changing C source code (in the foo.c and foo.h files) and typing cc to compile it into a binary:

sources -. \ cc \ `-> binary

Then came compiling and linking as a separate step. This created the situation that if you change just one foo.c file, you only have to re-compile that one file and then link, but if you change a foo.h file you probably had to compile everything. To save time, people figured out how to make a list of rules telling which ".c" files depend on which ".h" files. The make tool was developed, a program that would automatically figure out what needed to be recompiled. make takes a new source file, called Makefile, and also uses all the program source files. Rebuilding still consisted of just one step:

Makefile and -. sources \ make \ `-> binary

make wasn't quite as smart as it should have been. For example, there's no way to get it to check if Makefile itself was changed. To work around this, programmers started adding a "target" called "clean" (or something similar) that removes all the object files. Then, if you change Makefile (for example, after realizing you left out an "#include" dependency) you can type make clean to force it to remove the objects, and make to recompile everything. Commands like make clean are still used today.

After a few years, different versions of Unix started to exist (like BSD vs. AT&T System III), and people noticed that you had to change the foo.c and foo.h files and Makefile in different ways depending on what type of Unix you were compiling for. Those changes were called "configuration". Manual configuration is tedious — it takes a lot of knowledge and diligence to make all the correct changes for your own particular type of Unix. Eventually it was decided all such changes should be controlled by #ifdef tests (like "|#ifdef BSD_4_1|"), and the #defines could be specified in the Makefile or a header file called config.h (or something similar). Building then required two steps. In the first step, you edit the Makefile to #define each of the defines (like BSD_4_1) that you need on your system:

generic Makefile and -. config.h \ STEP 1. manually-edit \ `-> custom Makefile and config.h Makefile and -. sources \ STEP 2. make \ `-> binary

Next, standard "configuration systems" were created. Usually a configuration system was a set of shell scripts that made all the tests and modifications automatically. For example, it is easy to test for BSD version 4.1, and everyone agreed to use BSD_4_1 to indicate you're on that system. So, (in theory) all it takes is one big set of instructions, (typically, a shell script called configure), to test for all the different types of hardware, oeprating systems, libraries, etc. and generate the #defines for that system. The result, for most programmers, was to replace the first manual step with something more automatic.

Because Makefile was now auto-generated by configure, the configure file was what you edited when you wanted to change the Makefile, and the Makefile became an uneditable, automatically-generated file just like the program binary. The two build steps became:

configuration-files -. \ STEP 1. configure \ `-> Makefile Makefile and -. sources \ STEP 2. make \ `-> binary

Several different types of configuration systems were in place by 1992. Some consisted of a script called configure that did all the tests to see what type of Unix you're running on, then generated the Makefiles. The configure script had to know a lot about the syntax of makefiles, as well as knowing a lot about how to test for different features of operating systems.

Eventually, the job of doing the operating-system tests and the job of creating the Makefiles from "Makefile templates" was split up into two different tools.

By 1994 it was generally agreed that the best tool for the operating-system tests was autoconf. It took one new source file: configure.in and generated a script called configure as output. This configure script, in turn, took one new source file called Makefile.in, and generated Makefile as an output file. At this point the build had three steps that worked like this:

configure.in -. \ STEP 0. autoconf \ `-> configure - - - - tarfile is distributed in this form - - - - Makefile.in -. \ STEP 1. configure \ `-> Makefile Makefile and -. sources \ STEP 2. make \ `-> binary

Note that Step 0 only had to be done if you changed the configuration requirements, like if you added a major new feature that depended on something that is different on different systems (an example would be adding a graphical user interface to a program that was previously text-only). Therefore, the build process was now split into the "user installation" steps (steps 1 and 2) and the "complete rebuild from scratch" (steps 0 1 and 2). Typically, the programmer would perform step 0 and distribute the result to the users, who perform steps 1 and 2. This is indicated above where it says "tarfile is distributed in this form".

The weak point in this system was Makefile.in. This had to be a very large and complex file, because it contained all the rules for how to generate a Makefile, and Makefiles were by this point very complex (about as complex as a programming language) and vary a lot from one OS to another. Since Makefile.in was a source file it had to be edited manually. Most of Makefile.in was the same regardless of what program you were building, and programmers found it cumbersome.

The solution to that was automake. It automatically creates Makefile.in from another new source file, called Makefile.am. By 1996, the standard build process had four steps (two for users doing an install and two more for people adding new features) and the steps were:

configure.in -. \ STEP 0-A. autoconf \ `-> configure Makefile.am -. \ STEP 0-B. automake \ `-> Makefile.in - - - - tarfile is distributed in this form - - - - Makefile.in -. \ STEP 1. configure \ `-> Makefile Makefile and -. sources \ STEP 2. make \ `-> binary

Over the next couple years, configure.in got bigger and included lots of code to test for lots of different types of libraries, drivers, operating systems, etc. Eventually configure.in became the biggest and hardest-to-maintain file, just like Makefile.in had been. More recent versions of autoconf have solved this by allowing for the use of a "macros" file called aclocal.m4. The "macros" are written in a language called m4, and they contain the rules for performing all sorts of different operating-system tests. As far as the build process is concerned, these can be treated as part of step 0-A, except that you don't ever have to worry about changing the contents of aclocal.m4:

configure.in aclocal.m4 -. \ STEP 0-A. autoconf \ `-> configure STEP 0-B. (automake step, same as above) - - - - tarfile is distributed in this form - - - - STEP 1. (configure step, same as above) STEP 2. (make step, same as above)

Around the same time it also became common to use a tool called aclocal to generate aclocal.m4, from a directory of macros files called "macros". This added a fifth step to the full build process:

configure.in macros/*.m4 -. \ STEP 0-A. aclocal -I macros \ `-> aclocal.m4 configure.in aclocal.m4 -. \ STEP 0-B. autoconf \ `-> configure STEP 0-C. (automake step, same as above) - - - - tarfile is distributed in this form - - - - STEP 1. (configure step, same as above) STEP 2. (make step, same as above)

This was the way things were done by around the year 2000.

Complete list of files and the order in which they are built:

ORIGINAL FILES the file: configure.in is created from: typed in by hand the file: Makefile.am is created from: typed in by hand the file: src/adam.c is created from: typed in by hand the file: src/adam.h is created from: typed in by hand the file: src/anything.c (any ".c" not listed below) is created from: typed in by hand the file: src/anything.h (any ".h" not listed below) is created from: typed in by hand AUTO_GENERATED FILES the file: config.h.in is created from: acconfig.h configure.in acconfig.h by: autoheader the file: config.h is created from: config.h.in by: ./configure the file: Makefile is created from: Makefile.in by: ./configure the file: configure is created from: configure.in aclocal.m4 by: autoconf the file: aclocal.m4 is created from: configure.in macros/*.m4 by: aclocal -I macros the file: Makefile.in is created from: Makefile.am by: automake

This page was written in the "embarrassingly readable" markup language RHTF, and was last updated on 2011 Dec 21.

s.27