Awk is a programming language designed to make many common information retrieval and text manipulation tasks easy to state and to perform.
The basic operation of awk is to scan a set of input lines in order, searching for lines which match any of a set of patterns which the user has specified. For each pattern, an action can be specified; this action will be performed on each line that matches the pattern.
Readers familiar with the UNIXprogram grep unix program manual will recognize the approach, although in awk the patterns may be more general than in grep, and the actions allowed are more involved than merely printing the matching line. For example, the awk program
The command
An awk program is a sequence of statements of the form:
Either the pattern or the action may be left out, but not both. If there is no action for a pattern, the matching line is simply copied to the output. (Thus a line which matches several patterns can be printed several times.) If there is no pattern for an action, then the action is performed for every input line. A line which matches no pattern is ignored.
Since patterns and actions are both optional, actions must be enclosed in braces to distinguish them from patterns.
Awk input is divided into ``records'' terminated by a record separator. The default record separator is a newline, so by default awk processes its input a line at a time. The number of the current record is available in a variable named NR.
Each input record is considered to be divided into ``fields.'' Fields are normally separated by white space -- blanks or tabs -- but the input field separator may be changed, as described below. Fields are referred to as $1, $2, and so forth, where $1 is the first field, and $0 is the whole input record itself. Fields may be assigned to. The number of fields in the current record is available in a variable named NF.
The variables FS and RS refer to the input field and record separators; they may be changed at any time to any single character. The optional command-line argument -Fc may also be used to set FS to the character c.
If the record separator is empty, an empty input line is taken as the record separator, and blanks, tabs and newlines are treated as field separators.
The variable FILENAME contains the name of the current input file.
An action may have no pattern, in which case the action is executed for all lines. The simplest action is to print some or all of a record; this is accomplished by the awk command print. The awk program
The predefined variables NF and NR can be used; for example
Output may be diverted to multiple files; the program
Naturally there is a limit on the number of output files; currently it is 10.
Similarly, output can be piped into another process (on UNIX only); for instance,
The variables OFS and ORS may be used to change the current output field separator and output record separator. The output record separator is appended to the output of the print statement.
Awk also provides the printf statement for output formatting: