awk, section 5.

5. Implementation

The actual implementation of awk uses the language development tools available on the UNIX operating system. The grammar is specified with yacc; yacc johnson cstr the lexical analysis is done by lex; the regular expression recognizers are deterministic finite automata constructed directly from the expressions. An awk program is translated into a parse tree which is then directly executed by a simple interpreter.

Awk was designed for ease of use rather than processing speed; the delayed evaluation of variable types and the necessity to break input into fields makes high speed difficult to achieve in any case. Nonetheless, the program has not proven to be unworkably slow.

Table I below shows the execution (user + system) time on a PDP-11/70 of the UNIX programs wc, grep, egrep, fgrep, sed, lex, and awk on the following simple tasks:

1.: count the number of lines.
2.: print all lines containing ``doug''.
3.: print all lines containing ``doug'', ``ken'' or ``dmr''.
4.: print the third field of each line.
5.: print the third and second fields of each line, in that order.
6.: append all lines containing ``doug'', ``ken'', and ``dmr'' to files ``jdoug'', ``jken'', and ``jdmr'', respectively.
7.: print each line prefixed by ``line-number : ''.
8.: sum the fourth column of a table.

The program wc merely counts words, lines and characters in its input; we have already mentioned the others. In all cases the input was a file containing 10,000 lines as created by the command ls -l; each line has the form

: -rw-rw-rw- 1 ava 123 Oct 15 17:05 xxx

The total length of this input is 452,960 characters. Times for lex do not include compile or load.

As might be expected, awk is not as fast as the specialized tools wc, sed, or the programs in the grep family, but is faster than the more general tool lex. In all cases, the tasks were about as easy to express as awk programs as programs in these other languages; tasks involving fields were considerably easier to express as awk programs. Some of the test programs are shown in awk, sed and lex. $LIST$

				 Task
Program    1	   2	   3	  4	 5	 6	7      8
--------+------+-------+-------+------+------+-------+------+------+
  wc	|  8.6 |       |       |      |      |	     |	    |	   |
 grep	| 11.7 |  13.1 |       |      |      |	     |	    |	   |
 egrep	|  6.2 |  11.5 |  11.6 |      |      |	     |	    |	   |
 fgrep	|  7.7 |  13.8 |  16.1 |      |      |	     |	    |	   |
  sed	| 10.2 |  11.6 |  15.8 | 29.0 | 30.5 |	16.1 |	    |	   |
  lex	| 65.1 | 150.1 | 144.2 | 67.7 | 70.3 | 104.0 | 81.7 | 92.8 |
  awk	| 15.0 |  25.6 |  29.9 | 33.3 | 38.9 |	46.4 | 71.4 | 31.1 |
--------+------+-------+-------+------+------+-------+------+------+

Table I. Execution Times of Programs. (Times are in sec.)

The programs for some of these jobs are shown below. The lex programs are generally too long to show.

AWK:

: 1. END {print NR}

: 2. /doug/

: 3. /ken|doug|dmr/

: 4. {print $3}

: 5. {print $3, $2}

: 6. /ken/ {print >"jken"}
/doug/ {print >"jdoug"}
/dmr/ {print >"jdmr"}

: 7. {print NR ": " $0}

: 8. {sum = sum + $4}
END {print sum}

SED:

: 1. $=

: 2. /doug/p

: 3. /doug/p
/doug/d
/ken/p
/ken/d
/dmr/p
/dmr/d

: 4. /[^ ]* [ ]*[^ ]* [ ]*$[^ ]*$ .*/s//\1/p

: 5. /[^ ]* [ ]*$[^ ]*$ [ ]*$[^ ]*$ .*/s//\2 \1/p

: 6. /ken/w jken
/doug/w jdoug
/dmr/w jdmr

LEX:

: 1. %{
int i;
%}
%%
\n i++;
. ;
%%
yywrap() {
printf("%d\n", i);
}

: 2. %%
^.*doug.*$ printf("%s\n", yytext);
. ;
\n ;