M4 is a macro processor available on UNIXand GCOS. Its primary use has been as a front end for Ratfor for those cases where parameterless macros are not adequately powerful. It has also been used for languages as disparate as C and Cobol. M4 is particularly suited for functional languages like Fortran, PL/I and C since macros are specified in a functional notation.
M4 provides features seldom found even in much larger macro processors, including
This paper is a user's manual for M4.
A macro processor is a useful way to enhance a programming language, to make it more palatable or more readable, or to tailor it to a particular application. The #define statement in C and the analogous define in Ratfor are examples of the basic facility provided by any macro processor _ replacement of text by other text.
The M4 macro processor is an extension of a macro processor called M3 which was written by D. M. Ritchie for the AP-3 minicomputer; M3 was in turn based on a macro processor implemented for [1]. Readers unfamiliar with the basic ideas of macro processing may wish to read some of the discussion there.
M4 is a suitable front end for Ratfor and C, and has also been used successfully with Cobol. Besides the straightforward replacement of one string of text by another, it provides macros with arguments, conditional macro expansion, arithmetic, file manipulation, and some specialized string processing functions.
The basic operation of M4 is to copy its input to its output. As the input is read, however, each alphanumeric ``token'' (that is, string of letters and digits) is checked. If it is the name of a macro, then the name of the macro is replaced by its defining text, and the resulting string is pushed back onto the input to be rescanned. Macros may be called with arguments, in which case the arguments are collected and substituted into the right places in the defining text before it is rescanned.
M4 provides a collection of about twenty built-in macros which perform various useful operations; in addition, the user can define new macros. Built-ins and user-defined macros work exactly the same way, except that some of the built-in macros have side effects on the state of the process.
On UNIX, use
The primary built-in function of M4 is define, which is used to define new macros. The input
Thus, as a typical example,
The left parenthesis must immediately follow the word define, to signal that define has arguments. If a macro or built-in name is not followed immediately by `(', it is assumed to have no arguments. This is the situation for N above; it is actually a macro with no arguments, and thus when it is used there need be no (...) following it.
You should also notice that a macro name is only recognized as such if it appears surrounded by non-alphanumerics. For example, in
Things may be defined in terms of other things. For example,
What happens if N is redefined? Or, to say it another way, is M defined as N or as 100? In M4, the latter is true _ M is 100, so even if N subsequently changes, M does not.
This behavior arises because M4 expands macro names into their defining text as soon as it possibly can. Here, that means that when the string N is seen as the arguments of define are being collected, it is immediately replaced by 100; it's just as if you had said
If this isn't what you really want, there are two ways out of it. The first, which is specific to this situation, is to interchange the order of the definitions:
The more general solution is to delay the expansion of the arguments of define by quoting them. Any text surrounded by the single quotes ` and ' is not expanded immediately, but has the quotes stripped off. If you say
As another instance of the same thing, which is a bit more surprising, consider redefining N:
If ` and ' are not convenient for some reason, the quote characters can be changed with the built-in changequote:
There are two additional built-ins related to define. undefine removes the definition of some macro or built-in:
The built-in ifdef provides a way to determine if a macro is currently defined. In particular, M4 has pre-defined the names unix and gcos on the corresponding systems, so you can tell which one you're using:
ifdef actually permits three arguments; if the name is undefined, the value of ifdef is then the third argument, as in
So far we have discussed the simplest form of macro processing _ replacing one string by another (fixed) string. User-defined macros may also have arguments, so different invocations can have different results. Within the replacement text for a macro (the second argument of its define) any occurrence of $n will be replaced by the nth argument when the macro is actually used. Thus, the macro bump, defined as
A macro can have as many arguments as you want, but only the first nine are accessible, through $1 to $9. (The macro name itself is $0, although that is less commonly used.) Arguments that are not supplied are replaced by null strings, so we can define a macro cat which simply concatenates its arguments, like this:
Leading unquoted blanks, tabs, or newlines that occur during argument collection are discarded. All other white space is retained. Thus
Arguments are separated by commas, but parentheses are counted properly, so a comma ``protected'' by parentheses does not terminate an argument. That is, in
M4 provides two built-in functions for doing arithmetic on integers (only). The simplest is incr, which increments its numeric argument by 1. Thus to handle the common programming situation where you want a variable to be defined as ``one more than N'', write
The more general mechanism for arithmetic is a built-in called eval, which is capable of arbitrary arithmetic on integers. It provides the operators (in decreasing order of precedence)
As a simple example, suppose we want M to be 2**N+1. Then
You can include a new file in the input at any time by the built-in function include:
It is a fatal error if the file named in include cannot be accessed. To get some control over this situation, the alternate form sinclude can be used; sinclude (``silent include'') says nothing and continues if it can't access the file.
It is also possible to divert the output of M4 to temporary files during processing, and output the collected material upon command. M4 maintains nine of these diversions, numbered 1 through 9. If you say
Diverted text is normally output all at once at the end of processing, with the diversions output in numeric order. It is possible, however, to bring back diversions at any time, that is, to append them to the current diversion.
The value of undivert is not the diverted stuff. Furthermore, the diverted material is not rescanned for macros.
The built-in divnum returns the number of the currently active diversion. This is zero during normal processing.
You can run any program in the local operating system with the syscmd built-in. For example,
To facilitate making unique file names, the built-in maketemp is provided, with specifications identical to the system function mktemp: a string of XXXXX in the argument is replaced by the process id of the current process.
There is a built-in called ifelse which enables you to perform arbitrary conditional testing. In the simplest form,
If the fourth argument is missing, it is treated as empty.
ifelse can actually have any number of arguments, and thus provides a limited form of multi-way decision capability. In the input
The built-in len returns the length of the string that makes up its argument. Thus
The built-in substr can be used to produce substrings of strings. substr(s, i, n) returns the substring of s that starts at the ith position (origin zero), and is n characters long. If n is omitted, the rest of the string is returned, so
index(s1, s2) returns the index (position) in s1 where the string s2 occurs, or -1 if it doesn't occur. As with substr, the origin for strings is 0.
The built-in translit performs character transliteration.
There is also a built-in called dnl which deletes all characters that follow it up to and including the next newline; it is useful mainly for throwing away empty lines that otherwise tend to clutter up M4 output. For example, if you say
Another way to achieve this, due to J. E. Weythman, is
The built-in errprint writes its arguments out on the standard error file. Thus you can say
dumpdef is a debugging aid which dumps the current definitions of defined terms. If there are no arguments, you get everything; otherwise you get the ones you name as arguments. Don't forget to quote the names!
Each entry is preceded by the page number where it is described.
We are indebted to Rick Becker, John Chambers, Doug McIlroy, and especially Jim Weythman, whose pioneering use of M4 has led to several valuable improvements. We are also deeply grateful to Weythman for several substantial contributions to the code.