awk, section 1.

1. Introduction

Awk is a programming language designed to make many common information retrieval and text manipulation tasks easy to state and to perform.

The basic operation of awk is to scan a set of input lines in order, searching for lines which match any of a set of patterns which the user has specified. For each pattern, an action can be specified; this action will be performed on each line that matches the pattern.

Readers familiar with the UNIXprogram grep unix program manual will recognize the approach, although in awk the patterns may be more general than in grep, and the actions allowed are more involved than merely printing the matching line. For example, the awk program

: {print $3, $2}

prints the third and second columns of a table in that order. The program

: $2 ~ /A|B|C/

prints all input lines with an A, B, or C in the second field. The program

: $1 != prev { print; prev = $1 }

prints all lines in which the first field is different from the previous first field.

1.1. Usage

The command

: awk program [files]

executes the awk commands in the string program on the set of named files, or on the standard input if there are no files. The statements can also be placed in a file pfile, and executed by the command

: awk -f pfile [files]

1.2. Program Structure

An awk program is a sequence of statements of the form:

: pattern { action }
pattern { action }
...

Each line of input is matched against each of the patterns in turn. For each pattern that matches, the associated action is executed. When all the patterns have been tested, the next line is fetched and the matching starts over.

Either the pattern or the action may be left out, but not both. If there is no action for a pattern, the matching line is simply copied to the output. (Thus a line which matches several patterns can be printed several times.) If there is no pattern for an action, then the action is performed for every input line. A line which matches no pattern is ignored.

Since patterns and actions are both optional, actions must be enclosed in braces to distinguish them from patterns.

1.3. Records and Fields

Awk input is divided into ``records'' terminated by a record separator. The default record separator is a newline, so by default awk processes its input a line at a time. The number of the current record is available in a variable named NR.

Each input record is considered to be divided into ``fields.'' Fields are normally separated by white space -- blanks or tabs -- but the input field separator may be changed, as described below. Fields are referred to as $1, $2, and so forth, where $1 is the first field, and $0 is the whole input record itself. Fields may be assigned to. The number of fields in the current record is available in a variable named NF.

The variables FS and RS refer to the input field and record separators; they may be changed at any time to any single character. The optional command-line argument -Fc may also be used to set FS to the character c.

If the record separator is empty, an empty input line is taken as the record separator, and blanks, tabs and newlines are treated as field separators.

The variable FILENAME contains the name of the current input file.

1.4. Printing

An action may have no pattern, in which case the action is executed for all lines. The simplest action is to print some or all of a record; this is accomplished by the awk command print. The awk program

: { print }

prints each record, thus copying the input to the output intact. More useful is to print a field or fields from each record. For instance,

: print $2, $1

prints the first two fields in reverse order. Items separated by a comma in the print statement will be separated by the current output field separator when output. Items not separated by commas will be concatenated, so

: print $1 $2

runs the first and second fields together.

The predefined variables NF and NR can be used; for example

: { print NR, NF, $0 }

prints each record preceded by the record number and the number of fields.

Output may be diverted to multiple files; the program

: { print $1 >"foo1"; print $2 >"foo2" }

writes the first field, $1, on the file foo1, and the second field on file foo2. The >> notation can also be used:

: print $1 >>"foo"

appends the output to the file foo. (In each case, the output files are created if necessary.) The file name can be a variable or a field as well as a constant; for example,

: print $1 >$2

uses the contents of field 2 as a file name.

Naturally there is a limit on the number of output files; currently it is 10.

Similarly, output can be piped into another process (on UNIX only); for instance,

: print | "mail bwk"

mails the output to bwk.

The variables OFS and ORS may be used to change the current output field separator and output record separator. The output record separator is appended to the output of the print statement.

Awk also provides the printf statement for output formatting:

: printf format expr, expr, ...

formats the expressions in the list according to the specification in format and prints them. For example,

: printf "%8.2f %10ld\n", $1, $2

prints $1 as a floating point number 8 digits wide, with two after the decimal point, and $2 as a 10-digit long decimal number, followed by a newline. No output separators are produced automatically; you must add them yourself, as in this example. The version of printf is identical to that used with C. C programm language prentice hall 1978