2.  Patterns

      A pattern in front of an action acts as a selector that determines whether the action is to be executed. A variety of expressions may be used as patterns: regular expressions, arithmetic relational expressions, string-valued expressions, and arbitrary boolean combinations of these.

2.1.  BEGIN and END

      The special pattern BEGIN matches the beginning of the input, before the first record is read. The pattern END matches the end of the input, after the last record has been processed. BEGIN and END thus provide a way to gain control before and after processing, for initialization and wrapup.

      As an example, the field separator can be set to a colon by

BEGIN { FS = ":" }
... rest of program ...
Or the input lines may be counted by
END { print NR }
If BEGIN is present, it must be the first pattern; END must be the last if used.

2.2.  Regular Expressions

      The simplest regular expression is a literal string of characters enclosed in slashes, like

/smith/
This is actually a complete awk program which will print all lines which contain any occurrence of the name ``smith''. If a line contains ``smith'' as part of a larger word, it will also be printed, as in
blacksmithing

      Awk regular expressions include the regular expression forms found in the UNIX text editor ed unix program manual and grep (without back-referencing). In addition, awk allows parentheses for grouping, | for alternatives, + for ``one or more'', and ? for ``zero or one'', all as in lex. Character classes may be abbreviated: [a-zA-Z0-9] is the set of all letters and digits. As an example, the awk program

/[Aa]ho|[Ww]einberger|[Kk]ernighan/
will print all lines which contain any of the names ``Aho,'' ``Weinberger'' or ``Kernighan,'' whether capitalized or not.

      Regular expressions (with the extensions listed above) must be enclosed in slashes, just as in ed and sed. Within a regular expression, blanks and the regular expression metacharacters are significant. To turn of the magic meaning of one of the regular expression characters, precede it with a backslash. An example is the pattern

/\/.*\//
which matches any string of characters enclosed in slashes.

      One can also specify that any field or variable matches a regular expression (or does not match it) with the operators ~ and !~. The program

$1 ~ /[jJ]ohn/
prints all lines where the first field matches ``john'' or ``John.'' Notice that this will also match ``Johnson'', ``St. Johnsbury'', and so on. To restrict it to exactly [jJ]ohn, use
$1 ~ /^[jJ]ohn$/
The caret ^ refers to the beginning of a line or field; the dollar sign $ refers to the end.

2.3.  Relational Expressions

      An awk pattern can be a relational expression involving the usual relational operators <, <=, ==, !=, >=, and >. An example is

$2 > $1 + 100
which selects lines where the second field is at least 100 greater than the first field. Similarly,
NF % 2 == 0
prints lines with an even number of fields.

      In relational tests, if neither operand is numeric, a string comparison is made; otherwise it is numeric. Thus,

$1 >= "s"
selects lines that begin with an s, t, u, etc. In the absence of any other information, fields are treated as strings, so the program
$1 > $2
will perform a string comparison.

2.4.  Combinations of Patterns

      A pattern can be any boolean combination of patterns, using the operators || (or), && (and), and ! (not). For example,

$1 >= "s" && $1 < "t" && $1 != "smith"
selects lines where the first field begins with ``s'', but is not ``smith''. && and || guarantee that their operands will be evaluated from left to right; evaluation stops as soon as the truth or falsehood is determined.

2.5.  Pattern Ranges

      The ``pattern'' that selects an action may also consist of two patterns separated by a comma, as in

pat1, pat2 { ... }
In this case, the action is performed for each line between an occurrence of pat1 and the next occurrence of pat2 (inclusive). For example,
/start/, /stop/
prints all lines between start and stop, while
NR == 100, NR == 200 { ... }
does the action for lines 100 through 200 of the input.