This chapter describes the m4 macro preprocessor, a front-end filter that lets you define macros by placing m4 macro definitions at the beginning of your source files. You can use the m4 preprocessor with either program source files or document source files.
Macros ease your programming or writing tasks by allowing you to substitute a simple word or two for a great amount of material. Macro calls in a source file have the following form:
name [ ( arg1 [ ,arg2... ] ) ]
For example, suppose you have a C program in which you want to print
the same message at several points. You could code a series of
printf
statements like the following:
printf("\nThese %d files are in %s:\n\n",cnt,dir);
As your program evolves, you decide to change the wording; but you have
to edit each instance of the message. Defining a macro like the
following will save you a great deal of work:
define(filmsg,`printf("\nThese%d files are in %s:\n\n",$1,$2)')
Then, everywhere you want to output this message, you use the macro
this way:
filmsg(cnt,dir);
With this implementation, you need only edit the message in one place.
A
macro definition
consists of a symbolic name (called a
token)
and the character string
that is to replace it. A
token is any string of alphanumeric characters (letters, numbers, and
underscores) beginning with a letter or an underscore and delimited by
nonalphanumeric characters (punctuation or white space). For example,
N12
and
N
are both tokens but
A+B
is not a token. When you process your file through
m4,
each occurrence of a recognized macro is replaced by its definition.
In addition to replacing symbolic names with text,
m4
can also perform the following operations:
The m4 program reads each token in the file and determines if the token is a macro name. Macro names that are embedded in other tokens are not recognized; for example, m4 does not interpret N12 as containing an occurrence of the token N. If the token is a macro name, m4 replaces it with its defining text and pushes the resulting string back onto the input to be rescanned.
Macro expansion is thus recursive; macro definitions can include nested occurrences of other macros to any depth of nesting. You can call macros with arguments, in which case the arguments are collected and substituted into the right places in the defining text before the defining text is rescanned.
The
m4
preprocessor is a standard UNIX filter. It accepts input from
standard input or from a list of input files and writes its output to
standard output. The following lines illustrate correct
m4
usage:
%
grep -v '#include' file1 file2 | m4 > outfile
%
m4 file1 file2 | cc
The m4 program processes each argument in order. If there are no arguments, or if an argument is a minus sign (-), m4 reads standard input as its input file.
You create a macro definition with the
define
command, one of about 20 built-in macros provided by
m4.
For example:
define(N,100)
The open parenthesis must follow the word define with no intervening space.
Given this macro definition, the token N will be replaced by 100 wherever it appears in the file being processed. The defining text can be any text, except that if the text contains parentheses, the number of open (left) parentheses must match the number of close (right) parentheses unless you protect an unmatched parenthesis by quoting it. See Section 5.2.1 for an explanation of quoting.
Built-in and user-defined macros work the same way except that some of the built-in macros change the state of the process. Refer to Section 5.3 for a list of the built-in macros.
You can define macros in terms of other macros. For example:
define(N,100) define(M,N)
This example defines both
M
and
N
to be
100.
If you later change the definition of
N
and assign it a new value,
M
retains the value of
100,
not the new value you give
N.
The value of
M
does not track that of
N
because the
m4
preprocessor expands macro names into their defining text as soon as
possible. The overall result, as far as
M
is concerned, is the same as using the following input in the first
place:
define(M,100)
If you want the value of
M
to track the value of
N,
you can reverse the order of the definitions, as follows:
define(M,N) define(N,100)
Now M is defined to be the string N. When the value of M is requested later, the M is replaced by N, which is then rescanned and replaced by whatever value N has at that time.
Macro definitions made with the
define
command do not delete characters following the close parenthesis. For
example:
Now is the time for all good persons. define(N,100) Testing N definition.
This example produces the following result:
Now is the time for all good persons.
Testing 100 definition.
The blank line results from the presence of a newline character at the
end of the line containing the
define
macro. The built-in
dnl
macro deletes all characters that follow it, up to and including the
next newline character. Use this macro to delete empty lines. For
example:
Now is the time for all good persons. define(N,100)dnl Testing N definition.
This example produces the following result:
Now is the time for all good persons. Testing 100 definition.
To delay the expansion of a
define
macro's arguments, enclose them in a matched pair of quote characters.
The default quote characters are left and right
single quotation marks
(`
and
'),
but you can use the built-in
changequote
macro to specify different characters. (See
Section 5.3.)
Any text surrounded by quote characters is not expanded
immediately, but the quote characters are removed. The value of a
quoted string is the string with the quote characters removed.
Consider the following example:
define(N,100) define(M,`N')
The quote characters around the
N
are removed as the argument is being collected. The result of using
quote characters is to define
M
as the string
N,
not
100.
This example makes the value of
M
track that of
N,
and it is thus another way to accomplish the effect of the following
definitions, shown in
Section 5.2:
define(M,N) define(N,100)
The general rule is that
m4
always strips off one level of quote characters whenever it evaluates
something. This is true even outside macros. For example, to make
the word "define" appear in the output, enter the word in quote
characters, as follows:
`define' = 1
Because of the way
m4
handles quoted strings, you must be careful about nesting macros.
For example:
define(dog,canine) define(cat,animal chased by `dog') define(mouse,animal chased by cat)
When the definition of cat is processed, dog is not replaced with canine because it is quoted. But when mouse is processed, the definition of cat (animal chased by dog) is used; this time, dog is not quoted, and the definition of mouse becomes animal chased by animal chased by canine.
When you redefine an existing macro, you must quote the first
argument (the macro name), as follows:
define(N,100)
.
.
.
define(`N',200)
Without the quote characters, the second
define
macro sees
N,
recognizes it, and substitutes its value, producing the following
result:
define(100,200)
The m4 program ignores this statement because it can only define names, not numbers.
The simplest form of macro processing is replacing one string with
another (fixed) string as illustrated in the previous sections.
However, macros can also have arguments, so that you can use a given
macro in different places with different results. To indicate where an
argument is to be used within the replacement text for a macro (the
second argument of its definition), use the symbol
$n to indicate the
nth
argument. For example, the symbol
$1
refers to the first argument of a macro. When the macro is used,
m4
replaces the symbol with the value of the indicated argument. For
example:
define(bump,$1=$1+1)
.
.
.
bump(x);
In this example, m4 will replace the bump(x) statement with x=x+1.
A macro can have as many arguments as needed. However, you can access only nine arguments by using the $n symbols ($1 through $9). To access arguments past the ninth argument, use the shift macro, which drops the first argument and reassigns the remaining arguments to the $n symbols (second argument to $1, third to $2, and so on). Using the shift macro more than once allows access to all arguments used with the macro.
The symbol
$0
returns the name of the macro. Arguments that are not supplied are
replaced by null strings, so that you can define a macro that
concatenates its arguments as follows:
define(cat,$1$2$3$4$5$6$7$8$9)
.
.
.
cat(x,y,z)
This example replaces the cat(x,y,z) statement with xyz. Arguments $4 through $9 in this example are null because corresponding arguments were not provided.
When scanning a macro, the
m4
program discards leading unquoted blanks, tabs, or newline characters
in arguments, but keeps all other white space. For example:
define(a, "$1 $2$3")
.
.
.
a(b, c, d)
This example expands the
a
macro to be
"b cd".
In the
define
macro, however, newline characters are meaningful. For example:
define(a,$1 $2$3)
.
.
.
a(b,c,d)
This latter example expands the
a
macro as follows:
b cd
Macro arguments are separated by commas. Use parentheses to enclose
arguments containing commas, so that the commas are not misinterpreted
as ending the arguments containing them. For example, the following
statement has only two arguments:
define(a, (b,c))
The first argument is
a,
and the second is
(b,c).
To use a single parenthesis in an argument, enclose it in quote
characters:
define(a,b`)'c)
In this example, b)c is the second argument.
The m4 program provides a set of macros that are already defined (built-in macros). Table 5-1 lists all of these macros and describes them briefly. The following sections further explain many of the macros and how to use them.
Macro | Description |
changecom(l,r) | Changes the left and right comment characters to the characters represented by l and r. The two characters must be different. |
changequote(l,r) | Changes the left and right quote characters to the characters represented by l and r. The two characters must be different. |
decr(n) | Returns the value of n-1. |
define(name,replacement) | Defines a new macro, named name, with a value of replacement. |
defn(name) | Returns the quoted definition of name. |
divert(n) | Changes the output stream to the temporary file number n. |
divnum | Returns the number of the currently active temporary file. |
dnl | Deletes text up to a newline character. |
dumpdef(`name'[,`name'...]) | Prints the names and current definitions of the named macros. |
errprint(str) | Prints str to the standard error file. |
eval(expr) | Evaluates expr as a 32-bit arithmetic expression. |
ifdef(`name',arg1,arg2) | If macro name is defined, returns arg1; otherwise, returns arg2. |
ifelse(str1,str2,arg1,arg2) | Compares the strings str1 and str2. If they match, ifelse returns the value of arg1; otherwise, it returns the value of arg2. |
include(file)
sinclude(file) |
Returns the contents of file. The sinclude macro does not report an error if it cannot access the file. |
incr(n) | Returns the value of n+1. |
index(str1,str2) | Returns the character position in string str1 where str2 starts, or -1 if str1 does not contain str2. |
len(str)
dlen(str) |
Returns the number of characters in str. The dlen macro operates on strings containing 2-byte representations of international characters. |
m4exit(code) | Exits m4 with a return code of code. |
m4wrap(name) | Runs macro name before exiting, after completing all other processing. |
maketemp(strXXXXXstr) | Creates a unique file name by replacing the literal string XXXXX in the argument string with the current process ID. |
popdef(name) | Replaces the current definition of name with the previous definition, saved with the pushdef macro. |
pushdef(name,replacement) | Saves the current definition of name and then defines name to be replacement in the same way as define. |
shift(param_list) | Shifts the parameter list leftward one position, destroying the original first element of the list. |
substr(string,pos,len) | Returns the substring of string that begins at character position pos and is len characters long. |
syscmd(command) | Executes the specified system command with no return value. |
sysval | Gets the return code from the last use of the syscmd macro. |
traceoff(macro_list) | Turns off trace for any macro in the list. If macro_list is null, turns off all tracing. |
traceon(name) | Turns on trace for the named macro. If name is null, turns trace on for all macros. |
translit(string,set1,set2) | Replaces any characters from set1 that appear in string with the corresponding characters from set2. |
undefine(`name') | Removes the definition of the named macro. |
undivert(n,n[,n...]) | Appends the contents of the indicated temporary files to the current temporary file. |
To include comments in your
m4
programs, delimit the comment lines with the comment characters. The
default left comment character is the number sign
(#);
the default right comment character is the newline character. If these
characters are not convenient, use the
built-in
changecom
macro. For example:
changecom({,})
This example makes the left and right braces the new comment
characters. To restore the original comment characters, use
changecom
as follows:
changecom(#, )
Using changecom with no arguments disables commenting.
The default quote characters are the left and right single quotation
marks
(`
and
').
If these characters are not convenient, change the quote characters
with the built-in
changequote
macro. For example:
changequote([,])
This example makes the left and right brackets the new quote
characters. To restore the original quote characters, use
changequote
without arguments, as follows:
changequote
The
undefine
macro removes macro definitions. For example:
undefine(`N')
This example removes the definition of N. You must quote the name of the macro to be undefined. You can use undefine to remove built-in macros, but once you remove a built-in macro, you cannot recover that macro for later use.
The built-in ifdef macro determines if a macro is currently defined. The ifdef macro accepts three arguments. If the first argument is defined, the value of ifdef is the second argument. If the first argument is not defined, the value of ifdef is the third argument. If there is no third argument, the value of ifdef is null.
The m4 program provides the following built-in functions for doing arithmetic on integers only:
incr | Increments its numeric argument by 1 |
decr | Decrements its numeric argument by 1 |
eval | Evaluates an arithmetic expression |
For example, you can create a variable
N1
such that its value will always be one greater than
N,
as follows:
define(N,100) define(N1,`incr(N)')
The eval function can evaluate expressions containing the following operators (listed in decreasing order of precedence):
Use parentheses to group operations where needed. All operands of an
expression must be numeric. The numeric value of a true relation such
as
1>0
is 1, and false is 0 (zero). The precision in
eval
is 32 bits. For example, to define
M
as
2==N+1,
use
eval
as follows:
define(N,3) define(M,`eval(2==N+1)')
Use quote characters around the text that defines a macro, unless the text is simple and contains no instances of macro names.
To merge a new file in the input, use the built-in
include
macro as follows:
include(myfile)
This example inserts the contents of myfile in place of the include command. As the included file is read, m4 scans it for macros as if it were part of the primary input.
With the include macro, a fatal error occurs if the named file cannot be accessed. To avoid an error, use the alternative form, sinclude (silent include). The sinclude macro continues without error if the named file cannot be accessed.
You can redirect the output of
m4
to temporary files during processing, and the collected material can be
output upon command. The
m4
program can maintain up to nine temporary files, numbered 1 through 9. To
redirect output, use the
divert
macro as in the following example:
divert(4)
When this comand is encountered, m4 begins writing its output to the end of temporary file 4. The m4 program discards the output if you redirect the output to a temporary file other than 1 through 9; you can use this feature to make m4 omit a portion of the input file. Use divert(0) or divert with no argument to return the output to the standard output stream.
At the end of its processing, m4 writes all redirected output to the standard output stream, reading from the temporary files in numeric order and then destroying the temporary files.
To retrieve the information from all temporary files in numeric order at any time before processing is completed, use the built-in undivert macro with no arguments. To retrieve selected temporary files in a specified order, use undivert with arguments. When using undivert, m4 discards the temporary files that are recovered and does not search the recovered information for macros.
The value of undivert is not the diverted text.
The built-in divnum macro returns the number of the currently active temporary file. If you do not change the output file with the divert macro, m4 puts all output in temporary file 0 (zero).
You can run any program in the operating system from a program by using
the built-in
syscmd
macro. If the system command returns information, that information is
the value of the
syscmd
macro; otherwise, the macro's value is null. For example:
syscmd(date)
Use the built-in
maketemp
macro to make a unique file name from a program. If the literal string
XXXXX
is present in the macro's argument,
m4
replaces the
XXXXX
with the process ID of the current process. For example:
maketemp(myfileXXXXX)
If the current process ID is 23498, this example returns
myfile23498.
You can use this string to name a temporary file.
The built-in
ifelse
macro performs conditional testing. The simplest form is the following:
ifelse(a,b,c,d)
This example compares the two strings
a
and
b.
If they are identical,
ifelse
returns string
c.
If they are not identical, it returns string
d.
For example, you can define a macro called
compare
to compare two strings and return
yes
if they are the same or
no
if they are different, as follows:
define(compare, `ifelse($1,$2,yes,no)')
The quote characters prevent the evaluation of ifelse from occurring too early. If the fourth argument is missing, it is treated as empty.
The
ifelse
macro can have any number of arguments, and it therefore provides a
limited form of multiple path decision capability. For example:
ifelse(a,b,c,d,e,f,g)
This statement is logically the same as the following fragment:
if(a == b) x = c; else if(d == e) x = f; else x = g; return(x);
If the final argument is omitted, the result is null.
The built-in len macro returns the byte length of the string that makes up its argument. For example, len(abcdef) is 6, and len((a,b)) is 5.
The built-in dlen macro returns the length of the displayable characters in a string. In certain international usages, 2-byte codes are displayed as one character. Thus, if the string contains any 2-byte international character codes, the result of dlen will differ from the result of len.
The built-in
substr
macro returns the substring (beginning at the character position
specified by the second argument) from a specified string
(first argument). The third argument specifies the length in bytes of
the returned substring. For example:
substr(Krazy Kat,6,5)
This example returns "Kat", which is the 3-character substring beginning at character position 6 of the string "Krazy Kat". The first character in the string is at position 0 (zero). If the third argument is omitted or if the string is not long enough to satisfy the third argument, as in this example, the rest of the string is returned.
The built-in
index
macro returns the byte position, or index, in a string (first argument)
where a substring (second argument) begins. If the substring is not
present,
index
returns -1. As with
substr,
the origin for strings is 0 (zero). For example:
index(Krazy Kat,Kat)
This example returns 6.
The built-in
translit
macro performs one-for-one character substitution, or transliteration.
The first argument is a string to be processed. The second and third
arguments are lists of characters. Each instance of a character from
the second argument that is found in the string is replaced by the
corresponding character from the third argument. For example:
translit(the quick brown fox jumps over the lazy dog,aeiou,AEIOU)
This example returns the following:
thE qUIck brOwn fOx jUmps OvEr thE lAzy dOg
If the third argument is shorter than the second argument, characters from the second argument that are not in the third argument are deleted. If the third argument is missing, all characters present in the second argument are deleted.
Note
The substr, index, and translit macros do not differentiate between 1- and 2-byte displayable characters and can return unexpected results in some international usages.
The built-in
errprint
macro writes its arguments to the standard error file. For example:
errprint (`error')
The built-in dumpdef macro dumps the current names and definitions of items named as arguments. Names must be quoted. If you supply no arguments, dumpdef prints all current names and definitions. The dumpdef macro writes to the standard error file.