3.  Actions

      An awk action is a sequence of action statements terminated by newlines or semicolons. These action statements can be used to do a variety of bookkeeping and string manipulating tasks.

3.1.  Built-in Functions

      Awk provides a ``length'' function to compute the length of a string of characters. This program prints each record, preceded by its length:

{print length, $0}
length by itself is a ``pseudo-variable'' which yields the length of the current record; length(argument) is a function which yields the length of its argument, as in the equivalent
{print length($0), $0}
The argument may be any expression.

      Awk also provides the arithmetic functions sqrt, log, exp, and int, for square root, base e logarithm, exponential, and integer part of their respective arguments.

      The name of one of these built-in functions, without argument or parentheses, stands for the value of the function on the whole record. The program

length < 10 || length > 20
prints lines whose length is less than 10 or greater than 20.

      The function substr(s, m, n) produces the substring of s that begins at position m (origin 1) and is at most n characters long. If n is omitted, the substring goes to the end of s. The function index(s1, s2) returns the position where the string s2 occurs in s1, or zero if it does not.

      The function sprintf(f, e1, e2, ...) produces the value of the expressions e1, e2, etc., in the printf format specified by f. Thus, for example,

x = sprintf("%8.2f %10ld", $1, $2)
sets x to the string produced by formatting the values of $1 and $2.

3.2.  Variables, Expressions, and Assignments

      Awk variables take on numeric (floating point) or string values according to context. For example, in

x = 1
x is clearly a number, while in
x = "smith"
it is clearly a string. Strings are converted to numbers and vice versa whenever context demands it. For instance,
x = "3" + "4"
assigns 7 to x. Strings which cannot be interpreted as numbers in a numerical context will generally have numeric value zero, but it is unwise to count on this behavior.

      By default, variables (other than built-ins) are initialized to the null string, which has numerical value zero; this eliminates the need for most BEGIN sections. For example, the sums of the first two fields can be computed by

{ s1 += $1; s2 += $2 }
END { print s1, s2 }

      Arithmetic is done internally in floating point. The arithmetic operators are +, -, *, /, and % (mod). The C increment ++ and decrement -- operators are also available, and so are the assignment operators +=, -=, *=, /=, and %=. These operators may all be used in expressions.

3.3.  Field Variables

      Fields in awk share essentially all of the properties of variables _ they may be used in arithmetic or string operations, and may be assigned to. Thus one can replace the first field with a sequence number like this:

{ $1 = NR; print }
or accumulate two fields into a third, like this:
{ $1 = $2 + $3; print $0 }
or assign a string to a field:
{ if ($3 > 1000)
$3 = "too big"
print
}
which replaces the third field by ``too big'' when it is, and in any case prints the record.

      Field references may be numerical expressions, as in

{ print $i, $(i+1), $(i+n) }
Whether a field is deemed numeric or string depends on context; in ambiguous cases like
if ($1 == $2) ...
fields are treated as strings.

      Each input line is split into fields automatically as necessary. It is also possible to split any variable or string into fields:

n = split(s, array, sep)
splits the the string s into array[1], ..., array[n]. The number of elements found is returned. If the sep argument is provided, it is used as the field separator; otherwise FS is used as the separator.

3.4.  String Concatenation

      Strings may be concatenated. For example

length($1 $2 $3)
returns the length of the first three fields. Or in a print statement,
print $1 " is " $2
prints the two fields separated by `` is ''. Variables and numeric expressions may also appear in concatenations.

3.5.  Arrays

      Array elements are not declared; they spring into existence by being mentioned. Subscripts may have any non-null value, including non-numeric strings. As an example of a conventional numeric subscript, the statement

x[NR] = $0
assigns the current input record to the NR-th element of the array x. In fact, it is possible in principle (though perhaps slow) to process the entire input in a random order with the awk program
{ x[NR] = $0 }
END { ... program ... }
The first action merely records each input line in the array x.

      Array elements may be named by non-numeric values, which gives awk a capability rather like the associative memory of Snobol tables. Suppose the input contains fields with values like apple, orange, etc. Then the program

/apple/ { x["apple"]++ }
/orange/ { x["orange"]++ }
END { print x["apple"], x["orange"] }
increments counts for the named array elements, and prints them at the end of the input.

3.6.  Flow-of-Control Statements

      Awk provides the basic flow-of-control statements if-else, while, for, and statement grouping with braces, as in C. We showed the if statement in section 3.3 without describing it. The condition in parentheses is evaluated; if it is true, the statement following the if is done. The else part is optional.

      The while statement is exactly like that of C. For example, to print all input fields one per line,

i = 1
while (i <= NF) {
print $i
++i
}

      The for statement is also exactly that of C:

for (i = 1; i <= NF; i++)
print $i
does the same job as the while statement above.

      There is an alternate form of the for statement which is suited for accessing the elements of an associative array:

for (i in array)
statement
does statement with i set in turn to each element of array. The elements are accessed in an apparently random order. Chaos will ensue if i is altered, or if any new elements are accessed during the loop.

      The expression in the condition part of an if, while or for can include relational operators like <, <=, >, >=, == (``is equal to''), and != (``not equal to''); regular expression matches with the match operators ~ and !~; the logical operators ||, &&, and !; and of course parentheses for grouping.

      The break statement causes an immediate exit from an enclosing while or for; the continue statement causes the next iteration to begin.

      The statement next causes awk to skip immediately to the next record and begin scanning the patterns from the top. The statement exit causes the program to behave as if the end of the input had occurred.

      Comments may be placed in awk programs: they begin with the character # and end with the end of the line, as in

print x, y # this is a comment