Make
This page provides a simple introduction to using make. Most of this page is based on https://github.com/theicfire/makefiletutorial with modifications specific to MicroData.
Last updated
This page provides a simple introduction to using make. Most of this page is based on https://github.com/theicfire/makefiletutorial with modifications specific to MicroData.
Last updated
Makefiles are used to help decide which parts of a large program need to be recompiled. In a data-intensive project it can be helpful in determining which data transformations or analysis codes need to be re-run due to a change. In general terms, it can be used when you need a series of instructions to run depending on what files have changed. This tutorial will focus on the data analysis use case.
Here's an example dependency graph that you might build with Make. If any file's list of dependencies changes, then the file will get re-created:
To run these examples, you'll need a terminal and "make" installed. For each example, put the contents in a file called Makefile
, and in that directory run the command make
. Let's start with the simplest of Makefiles:
Here is the output of running the above example:
That's it!
A Makefile consists of a set of rules. A rule generally looks like this:
The targets are file names, separated by spaces. Typically, there is only one per rule.
The commands are a series of steps typically used to make the target(s). These need to start with a tab character, not spaces.
The prerequisites are also file names, separated by spaces. These files need to exist before the commands for the target are run. These are also called dependencies
The following Makefile has three separate rules. When you run make sample.csv
in the terminal, it will create a dataset called sample.csv in a series of steps:
Make is given sample.csv
as the target, so it first searches for this target
sample.csv
requires consistent.csv
, so make searches for the consistent.csv
target
consistent.csv
requires source.csv
, so make searches for the source.csv
target
source.csv
has no dependencies, so the test
command is run
The python
command is then run, because all of the consistent.csv
dependencies are finished
The top stata
command is run, because all the sample.csv
dependencies are finished
That's it: sample.csv
is our last dataset to create.
This makefile has a single target, called some_file
. The default target is the first target, so in this case some_file
will run. Notice that the command below does not create any actual file, it is a simple echo
command.
This file will make some_file
, which is an actual file in this case. The first time we run this code the file some_file
will be created. The second time you try to make it - since it's already made and the dependencies did not change - it will result in make: 'some_file' is up to date.
Here, the target some_file
"depends" on other_file
. When we run make
, the default target (some_file
, since it's first) will get called. It will first look at its list of dependencies, and if any of them are older, it will first run the targets for those dependencies, and then run itself. The second time this is run, neither target will run because both targets exist.
This will always run both targets, because some_file
depends on other_file
, which is never created.
clean
is often used as a target that removes the output of other targets, but it is not a special word in make
.
Variables can only be strings. Here's an example of using them:
You can reference variables using ${}
or $().
Making multiple targets and you want all of them to run? Make an all
target.
When there are multiple targets for a rule, the commands will be run for each target
$@
is an automatic variable that contains the target name.
Both *
and %
are called wildcards in Make, but they mean entirely different things. *
searches your filesystem for matching filenames. I suggest that you always wrap it in the wildcard
function, because otherwise you may fall into a common pitfall described below. It's oddly unhelpful and I find it more confusing than useful.
*
may be used in the target, prerequisites, or in the wildcard
function.
Danger: *
may not be directly used in variable definitions
Danger: When *
matches no files, it is left as it is (unless run in the wildcard
function)
%
is really useful, but is somewhat confusing because of the variety of situations it can be used in.
When used in "matching" mode, it matches one or more characters in a string. This match is called the stem.
When used in "replacing" mode, it takes the stem that was matched and replaces that in a string.
%
is most often used in rule definitions and in some specific functions.
See these sections on examples of it being used:
Static Pattern Rules
Pattern Rules
String Substitution
The vpath Directive
There are many automatic variables, but often only a few show up:
Make loves c compilation. And every time it expresses its love, things get confusing. Here's the syntax for a new type of rule called a static pattern:
The essence is that the given target is matched by the target-pattern (via a %
wildcard). Whatever was matched is called the stem. The stem is then substituted into the prereq-pattern, to generate the target's prereqs.
A typical use case is to compile .c
files into .o
files. Here's the manual way:
Here's the more efficient way, using a static pattern rule:
While I introduce functions later on, I'll forshadow what you can do with them. The filter
function can be used in Static pattern rules to match the correct files. In this example, I made up the .raw
and .result
extensions.
Perhaps the most confusing part of make is the magic rules and variables that are made. Here's a list of implicit rules:
Compiling a C program: n.o
is made automatically from n.c
with a command of the form $(CC) -c $(CPPFLAGS) $(CFLAGS)
Compiling a C++ program: n.o
is made automatically from n.cc
or n.cpp
with a command of the form $(CXX) -c $(CPPFLAGS) $(CXXFLAGS)
Linking a single object file: n
is made automatically from n.o
by running the command $(CC) $(LDFLAGS) n.o $(LOADLIBES) $(LDLIBS)
As such, the important variables used by implicit rules are:
CC
: Program for compiling C programs; default cc
CXX
: Program for compiling C++ programs; default G++
CFLAGS
: Extra flags to give to the C compiler
CXXFLAGS
: Extra flags to give to the C++ compiler
CPPFLAGS
: Extra flags to give to the C preprosessor
LDFLAGS
: Extra flags to give to compilers when they are supposed to invoke the linker
Pattern rules are often used but quite confusing. You can look at them as two ways:
A way to define your own implicit rules
A simpler form of static pattern rules
Let's start with an example first:
Pattern rules contain a '%' in the target. This '%' matches any nonempty string, and the other characters match themselves. ‘%’ in a prerequisite of a pattern rule stands for the same stem that was matched by the ‘%’ in the target.
Here's another example:
Double-Colon Rules are rarely used, but allow multiple rules to be defined for the same target. If these were single colons, a warning would be printed and only the second set of commands would run.
Add an @
before a command to stop it from being printed
You can also run make with -s
to add an @
before each line
Each command is run in a new shell (or at least the effect is as such)
The default shell is /bin/sh
. You can change this by changing the variable SHELL:
-k
, -i
, and -
Add -k
when running make to continue running even in the face of errors. Helpful if you want to see all the errors of Make at once.
Add a -
before a command to suppress the error
Add -i
to make to have this happen for every command.
Note only: If you ctrl+c
make, it will delete the newer targets it just made.
To recursively call a makefile, use the special $(MAKE)
instead of make
because it will pass the make flags for you and won't itself be affected by them.
The export directive takes a variable and makes it accessible to sub-make commands. In this example, cooly
is exported such that the makefile in subdir can use it.
Note: export has the same syntax as sh, but they aren't related (although similar in function)
You need to export variables to have them run in the shell as well.
.EXPORT_ALL_VARIABLES
exports all variables for you.
There's a nice list of options that can be run from make. Check out --dry-run
, --touch
, --old-file
.
You can have multiple targets to make, i.e. make clean run test
runs the clean
goal, then run
, and then test
.
There are two flavors of variables:
recursive (use =
) - only looks for the variables when the command is used, not when it's defined.
simply expanded (use :=
) - like normal imperative programming -- only those defined so far get expanded
Simply expanded (using :=
) allows you to append to a variable. Recursive definitions will give an infinite loop error.
?=
only sets variables if they have not yet been set
Spaces at the end of a line are not stripped, but those at the start are. To make a variable with a single space, use $(nullstring)
An undefined variable is actually an empty string!
Use +=
to append
String Substitution is also a really common and useful way to modify variables. Also check out Text Functions and Filename Functions.
You can override variables that come from the command line by using override
. Here we ran make with make option_one=hi
"define" is actually just a list of commands. It has nothing to do with being a function. Note here that it's a bit different than having a semi-colon between commands, because each is run in a separate shell, as expected.
Variables can be assigned for specific targets
You can assign variables for specific target patterns
ifdef does not expand variable references; it just sees if something is defined at all
This example shows you how to test make flags with findstring
and MAKEFLAGS
. Run this example with make -i
to see it print out the echo statement.
Functions are mainly just for text processing. Call functions with $(fn, arguments)
or ${fn, arguments}
. You can make your own using the call builtin function. Make has a decent amount of builtin functions.
If you want to replace spaces or commas, use variables
Do NOT include spaces in the arguments after the first. That will be seen as part of the string.
$(patsubst pattern,replacement,text)
does the following:
"Finds whitespace-separated words in text that match pattern and replaces them with replacement. Here pattern may contain a ‘%’ which acts as a wildcard, matching any number of any characters within a word. If replacement also contains a ‘%’, the ‘%’ is replaced by the text that matched the ‘%’ in pattern. Only the first ‘%’ in the pattern and replacement is treated this way; any subsequent ‘%’ is unchanged." (GNU docs)
The substitution reference $(text:pattern=replacement)
is a shorthand for this.
There's another shorthand that that replaces only suffixes: $(text:suffix=replacement)
. No %
wildcard is used here.
Note: don't add extra spaces for this shorthand. It will be seen as a search or replacement term.
The foreach function looks like this: $(foreach var,list,text)
. It converts one list of words (separated by spaces) to another. var
is set to each word in list, and text
is expanded for each word.
This appends an exclamation after each word:
if
checks if the first argument is nonempty. If so runs the second argument, otherwise runs the third.
Make supports creating basic functions. You "define" the function just by creating a variable, but use the parameters $(0)
, $(1)
, etc. You then call the function with the special call
function. The syntax is $(call variable,param,param)
. $(0)
is the variable, while $(1)
, $(2)
, etc. are the params.
shell - This calls the shell, but it replaces newlines with spaces!
The include directive tells make to read one or more other makefiles. It's a line in the makefile makefile that looks like this:
This is particularly useful when you use compiler flags like -M
that create Makefiles based on the source. For example, if some c files includes a header, that header will be added to a Makefile that's written by gcc. I talk about this more in the Makefile Cookbook
Use vpath to specify where some set of prerequisites exist. The format is vpath <pattern> <directories, space/colon separated>
<pattern>
can have a %
, which matches any zero or more characters.
You can also do this globallyish with the variable VPATH
The backslash ("\") character gives us the ability to use multiple lines when the commands are too long
Adding .PHONY
to a target will prevent make from confusing the phony target with a file name. In this example, if the file clean
is created, make clean will still be run. .PHONY
is great to use, but I'll skip it in the rest of the examples for simplicity.
The make tool will stop running a rule (and will propogate back to prerequisites) if a command returns a nonzero exit status.
DELETE_ON_ERROR
will delete the target of a rule if the rule fails in this manner. This will happen for all targets, not just the one it is before like PHONY. It's a good idea to always use this, even though make does not for historical reasons.
Let's go through a really juicy Make example that works well for medium sized projects.
The neat thing about this makefile is it automatically determines dependencies for you. All you have to do is put your C/C++ files in the src/
folder.