3.  Shell control structures and command scripts

3.1.  Introduction

      It is possible to place commands in files and to cause shells to be invoked to read and execute commands from these files, which are called shell scripts. We here detail those features of the shell useful to the writers of such scripts.

3.2.  Make

      It is important to first note what shell scripts are not useful for. There is a program called make which is very useful for maintaining a group of related files or performing sets of operations on related files. For instance a large program consisting of one or more files can have its dependencies described in a makefile which contains definitions of the commands used to create these different files when changes occur. Definitions of the means for printing listings, cleaning up the directory in which the files reside, and installing the resultant programs are easily, and most appropriately placed in this makefile. This format is superior and preferable to maintaining a group of shell procedures to maintain these files.

      Similarly when working on a document a makefile may be created which defines how different versions of the document are to be created and which options of nroff or troff are appropriate.

3.3.  Invocation and the argv variable

      A csh command script may be interpreted by saying

% csh script ...
where script is the name of the file containing a group of csh commands and `...' is replaced by a sequence of arguments. The shell places these arguments in the variable argv and then begins to read commands from the script. These parameters are then available through the same mechanisms which are used to reference any other shell variables.

      If you make the file `script' executable by doing

chmod 755 script
and place a shell comment at the beginning of the shell script (i.e. begin the file with a `#' character) then a `/bin/csh' will automatically be invoked to execute `script' when you type
script
If the file does not begin with a `#' then the standard shell `/bin/sh' will be used to execute it. This allows you to convert your older shell scripts to use csh at your convenience.

3.4.  Variable substitution

      After each input line is broken into words and history substitutions are done on it, the input line is parsed into distinct commands. Before each command is executed a mechanism know as variable substitution is done on these words. Keyed by the character `$' this substitution replaces the names of variables by their values. Thus

echo $argv
when placed in a command script would cause the current value of the variable argv to be echoed to the output of the shell script. It is an error for argv to be unset at this point.

      A number of notations are provided for accessing components and attributes of variables. The notation

$?name
expands to `1' if name is set or to `0' if name is not set. It is the fundamental mechanism used for checking whether particular variables have been assigned values. All other forms of reference to undefined variables cause errors.

      The notation

$#name
expands to the number of elements in the variable name. Thus
% set argv=(a b c)
% echo $?argv
1
% echo $#argv
3
% unset argv
% echo $?argv
0
% echo $argv
Undefined variable: argv.
%

      It is also possible to access the components of a variable which has several values. Thus

$argv[1]
gives the first component of argv or in the example above `a'. Similarly
$argv[$#argv]
would give `c', and
$argv[1-2]
would give `a b'. Other notations useful in shell scripts are
$n
where n is an integer as a shorthand for
$argv[n]
the nth parameter and
$*
which is a shorthand for
$argv
The form
$$
expands to the process number of the current shell. Since this process number is unique in the system it can be used in generation of unique temporary file names. The form
$<
is quite special and is replaced by the next line of input read from the shell's standard input (not the script it is reading). This is useful for writing shell scripts that are interactive, reading commands from the terminal, or even writing a shell script that acts as a filter, reading lines from its input file. Thus the sequence
echo 'yes or no?\c'
set a=($<)
would write out the prompt `yes or no?' without a newline and then read the answer into the variable `a'. In this case `$#a' would be `0' if either a blank line or end-of-file (^D) was typed.

      One minor difference between `$n' and `$argv[n]' should be noted here. The form `$argv[n]' will yield an error if n is not in the range `1-$#argv' while `$n' will never yield an out of range subscript error. This is for compatibility with the way older shells handled parameters.

      Another important point is that it is never an error to give a subrange of the form `n-'; if there are less than n components of the given variable then no words are substituted. A range of the form `m-n' likewise returns an empty vector without giving an error when m exceeds the number of elements of the given variable, provided the subscript n is in range.

3.5.  Expressions

      In order for interesting shell scripts to be constructed it must be possible to evaluate expressions in the shell based on the values of variables. In fact, all the arithmetic operations of the language C are available in the shell with the same precedence that they have in C. In particular, the operations `==' and `!=' compare strings and the operators `&&' and `||' implement the boolean and/or operations. The special operators `=~' and `!~' are similar to `==' and `!=' except that the string on the right side can have pattern matching characters (like *, ? or []) and the test is whether the string on the left matches the pattern on the right.

      The shell also allows file enquiries of the form

-? filename
where `?' is replace by a number of single characters. For instance the expression primitive
-e filename
tell whether the file `filename' exists. Other primitives test for read, write and execute access to the file, whether it is a directory, or has non-zero length.

      It is possible to test whether a command terminates normally, by a primitive of the form `{ command }' which returns true, i.e. `1' if the command succeeds exiting normally with exit status 0, or `0' if the command terminates abnormally or with exit status non-zero. If more detailed information about the execution status of a command is required, it can be executed and the variable `$status' examined in the next command. Since `$status' is set by every command, it is very transient. It can be saved if it is inconvenient to use it only in the single immediately following command.

      For a full list of expression components available see the manual section for the shell.

3.6.  Sample shell script

      A sample shell script which makes use of the expression mechanism of the shell and some of its control structure follows:

% cat copyc
#
# Copyc copies those C programs in the specified list
# to the directory ~/backup if they differ from the files
# already in ~/backup
#
set noglob
foreach i ($argv) 

        if ($i !~ *.c) continue  # not a .c file so do nothing

        if (! -r ~/backup/$i:t) then
                echo $i:t not in backup... not cp\'ed
                continue
        endif

        cmp -s $i ~/backup/$i:t # to set $status

        if ($status != 0) then
                echo new backup of $i
                cp $i ~/backup/$i:t
        endif
end

      This script makes use of the foreach command, which causes the shell to execute the commands between the foreach and the matching end for each of the values given between `(' and `)' with the named variable, in this case `i' set to successive values in the list. Within this loop we may use the command break to stop executing the loop and continue to prematurely terminate one iteration and begin the next. After the foreach loop the iteration variable (i in this case) has the value at the last iteration.

      We set the variable noglob here to prevent filename expansion of the members of argv. This is a good idea, in general, if the arguments to a shell script are filenames which have already been expanded or if the arguments may contain filename expansion metacharacters. It is also possible to quote each use of a `$' variable expansion, but this is harder and less reliable.

      The other control construct used here is a statement of the form

if ( expression ) then
	command
	...
endif
The placement of the keywords here is not flexible due to the current implementation of the shell.**

      The shell does have another form of the if statement of the form

if ( expression ) command
which can be written
if ( expression ) \
	command
Here we have escaped the newline for the sake of appearance. The command must not involve `|', `&' or `;' and must not be another control command. The second form requires the final `\' to immediately precede the end-of-line.

      The more general if statements above also admit a sequence of else-if pairs followed by a single else and an endif, e.g.:

if ( expression ) then
	commands
else if (expression ) then
	commands
...

else
	commands
endif

      Another important mechanism used in shell scripts is the `:' modifier. We can use the modifier `:r' here to extract a root of a filename or `:e' to extract the extension. Thus if the variable i has the value `/mnt/foo.bar' then

% echo $i $i:r $i:e
/mnt/foo.bar /mnt/foo bar
%

shows how the `:r' modifier strips off the trailing `.bar' and the the `:e' modifier leaves only the `bar'. Other modifiers will take off the last component of a pathname leaving the head `:h' or all but the last component of a pathname leaving the tail `:t'. These modifiers are fully described in the csh manual pages in the User's Reference Manual. It is also possible to use the command substitution mechanism described in the next major section to perform modifications on strings to then reenter the shell's environment. Since each usage of this mechanism involves the creation of a new process, it is much more expensive to use than the `:' modification mechanism.*** Finally, we note that the character `#' lexically introduces a shell comment in shell scripts (but not from the terminal). All subsequent characters on the input line after a `#' are discarded by the shell. This character can be quoted using `'' or `\' to place it in an argument word.

3.7.  Other control structures

      The shell also has control structures while and switch similar to those of C. These take the forms

while ( expression )
	commands
end
and
switch ( word )

case str1:
	commands
	breaksw

 ...

case strn:
	commands
	breaksw

default:
	commands
	breaksw

endsw
For details see the manual section for csh. C programmers should note that we use breaksw to exit from a switch while break exits a while or foreach loop. A common mistake to make in csh scripts is to use break rather than breaksw in switches.

      Finally, csh allows a goto statement, with labels looking like they do in C, i.e.:

loop:
	commands
	goto loop

3.8.  Supplying input to commands

      Commands run from shell scripts receive by default the standard input of the shell which is running the script. This is different from previous shells running under UNIX. It allows shell scripts to fully participate in pipelines, but mandates extra notation for commands which are to take inline data.

      Thus we need a metanotation for supplying inline data to commands in shell scripts. As an example, consider this script which runs the editor to delete leading blanks from the lines in each argument file:

% cat deblank
# deblank -- remove leading blanks
foreach i ($argv)
ed - $i << 'EOF'
1,$s/^[ ]*//
w
q
'EOF'
end
%
The notation `<< 'EOF'' means that the standard input for the ed command is to come from the text in the shell script file up to the next line consisting of exactly `'EOF''. The fact that the `EOF' is enclosed in `'' characters, i.e. quoted, causes the shell to not perform variable substitution on the intervening lines. In general, if any part of the word following the `<<' which the shell uses to terminate the text to be given to the command is quoted then these substitutions will not be performed. In this case since we used the form `1,$' in our editor script we needed to insure that this `$' was not variable substituted. We could also have insured this by preceding the `$' here with a `\', i.e.:
1,\$s/^[ ]*//
but quoting the `EOF' terminator is a more reliable way of achieving the same thing.

3.9.  Catching interrupts

      If our shell script creates temporary files, we may wish to catch interruptions of the shell script so that we can clean up these files. We can then do

onintr label
where label is a label in our program. If an interrupt is received the shell will do a `goto label' and we can remove the temporary files and then do an exit command (which is built in to the shell) to exit from the shell script. If we wish to exit with a non-zero status we can do
exit(1)
e.g. to exit with status `1'.

3.10.  What else?

      There are other features of the shell useful to writers of shell procedures. The verbose and echo options and the related -v and -x command line options can be used to help trace the actions of the shell. The -n option causes the shell only to read commands and not to execute them and may sometimes be of use.

      One other thing to note is that csh will not execute shell scripts which do not begin with the character `#', that is shell scripts that do not begin with a comment. Similarly, the `/bin/sh' on your system may well defer to `csh' to interpret shell scripts which begin with `#'. This allows shell scripts for both shells to live in harmony.

      There is also another quotation mechanism using `"' which allows only some of the expansion mechanisms we have so far discussed to occur on the quoted string and serves to make this string into a single word as `'' does.