As a trivial problem, consider copying an input file while
adding 3 to every positive number divisible by 7.
Here is a suitable Lex source program
center;
l l.
%%
int k;
[0-9]+ {
k = atoi(yytext);
if (k%7 == 0)
printf("%d", k+3);
else
printf("%d",k);
}
to do just that.
The rule [0-9]+ recognizes strings of digits;
atoi
converts the digits to binary
and stores the result in
k.
The operator % (remainder) is used to check whether
k
is divisible by 7; if it is,
it is incremented by 3 as it is written out.
It may be objected that this program will alter such
input items as
49.63
or
X7.
Furthermore, it increments the absolute value
of all negative numbers divisible by 7.
To avoid this, just add a few more rules after the active one,
as here:
center;
l l.
%%
int k;
-?[0-9]+ {
k = atoi(yytext);
printf("%d", k%7 == 0 ? k+3 : k);
}
-?[0-9.]+ ECHO;
[A-Za-z][A-Za-z0-9]+ ECHO;
Numerical strings containing
a ``.'' or preceded by a letter will be picked up by
one of the last two rules, and not changed.
The
if-else
has been replaced by
a C conditional expression to save space;
the form
a?b:c
means ``if
a
then
b
else
c''.
For an example of statistics gathering, here
is a program which histograms the lengths
of words, where a word is defined as a string of letters.
center;
l l.
int lengs[100];
%%
[a-z]+ lengs[yyleng]++;
. |
\n ;
%%
l s.
yywrap()
{
int i;
printf("Length No. words\n");
for(i=0; i<100; i++)
if (lengs[i] > 0)
printf("%5d%10d\n",i,lengs[i]);
return(1);
}
This program
accumulates the histogram, while producing no output. At the end
of the input it prints the table.
The final statement
return(1);
indicates that Lex is to perform wrapup. If
yywrap
returns zero (false)
it implies that further input is available
and the program is
to continue reading and processing.
To provide a
yywrap
that never
returns true causes an infinite loop.
As a larger example,
here are some parts of a program written by N. L. Schryer
to convert double precision Fortran to single precision Fortran.
Because Fortran does not distinguish upper and lower case letters,
this routine begins by defining a set of classes including
both cases of each letter:
center;
l l.
a [aA]
b [bB]
c [cC]
...
z [zZ]
An additional class recognizes white space:
center;
l l.
W [ \t]*
The first rule changes
``double precision'' to ``real'', or ``DOUBLE PRECISION'' to ``REAL''.
center;
l.
{d}{o}{u}{b}{l}{e}{W}{p}{r}{e}{c}{i}{s}{i}{o}{n} {
printf(yytext[0]=='d'? "real" : "REAL");
}
Care is taken throughout this program to preserve the case
(upper or lower)
of the original program.
The conditional operator is used to
select the proper form of the keyword.
The next rule copies continuation card indications to
avoid confusing them with constants:
center;
l l.
^" "[^ 0] ECHO;
In the regular expression, the quotes surround the
blanks.
It is interpreted as
``beginning of line, then five blanks, then
anything but blank or zero.''
Note the two different meanings of
^.
There follow some rules to change double precision
constants to ordinary floating constants.
center;
l.
[0-9]+{W}{d}{W}[+-]?{W}[0-9]+ |
[0-9]+{W}"."{W}{d}{W}[+-]?{W}[0-9]+ |
"."{W}[0-9]+{W}{d}{W}[+-]?{W}[0-9]+ {
/* convert constants */
for(p=yytext; *p != 0; p++)
{
if (*p == 'd' || *p == 'D')
*p=+ 'e'- 'd';
ECHO;
}
After the floating point constant is recognized, it is
scanned by the
for
loop
to find the letter
d
or
D.
The program than adds
'e'-'d',
which converts
it to the next letter of the alphabet.
The modified constant, now single-precision,
is written out again.
There follow a series of names which must be respelled to remove
their initial d.
By using the
array
yytext
the same action suffices for all the names (only a sample of
a rather long list is given here).
center;
l l.
{d}{s}{i}{n} |
{d}{c}{o}{s} |
{d}{s}{q}{r}{t} |
{d}{a}{t}{a}{n} |
...
{d}{f}{l}{o}{a}{t} printf("%s",yytext+1);
Another list of names must have initial d changed to initial a:
center;
l l.
{d}{l}{o}{g} |
{d}{l}{o}{g}10 |
{d}{m}{i}{n}1 |
{d}{m}{a}{x}1 {
yytext[0] =+ 'a' - 'd';
ECHO;
}
And one routine
must have initial d changed to initial r:
center;
l l.
{d}1{m}{a}{c}{h} {yytext[0] =+ 'r' - 'd';
ECHO;
}
To avoid such names as dsinx being detected as instances
of dsin, some final rules pick up longer words as identifiers
and copy some surviving characters:
center;
l l.
[A-Za-z][A-Za-z0-9]* |
[0-9]+ |
\n |
. ECHO;
Note that this program is not complete; it
does not deal with the spacing problems in Fortran or
with the use of keywords as identifiers.