4.  Information Handling

o SORT
Sort or merge ASCII files line-by-line. No limit on input size.
O
Sort up or down.
O
Sort lexicographically or on numeric key.
O
Multiple keys located by delimiters or by character position.
O
May sort upper case together with lower into dictionary order.
O
Optionally suppress duplicate data.
o TSORT
Topological sort -- converts a partial order into a total order.
o UNIQ
Collapse successive duplicate lines in a file into one line.
O
Publish lines that were originally unique, duplicated, or both.
O
May give redundancy count for each line.
o TR
Do one-to-one character translation according to an arbitrary code.
O
May coalesce selected repeated characters.
O
May delete selected characters.
o DIFF
Report line changes, additions and deletions necessary to bring two files into agreement.
O
May produce an editor script to convert one file into another.
O
A variant compares two new versions against one old one.
o COMM
Identify common lines in two sorted files. Output in up to 3 columns shows lines present in first file only, present in both, and/or present in second only.
o JOIN
Combine two files by joining records that have identical keys.
o GREP
Print all lines in a file that satisfy a pattern as used in the editor ED.
O
May print all lines that fail to match.
O
May print count of hits.
O
May print first hit in each file.
o LOOK
Binary search in sorted file for lines with specified prefix.
o WC
Count the lines, ``words'' (blank-separated strings) and characters in a file.
o SED
Stream-oriented version of ED. Can perform a sequence of editing operations on each line of an input stream of unbounded length.
O
Lines may be selected by address or range of addresses.
O
Control flow and conditional testing.
O
Multiple output streams.
O
Multi-line capability.
o AWK
Pattern scanning and processing language. Searches input for patterns, and performs actions on each line of input that satisfies the pattern.
O
Patterns include regular expressions, arithmetic and lexicographic conditions, boolean combinations and ranges of these.
O
Data treated as string or numeric as appropriate.
O
Can break input into fields; fields are variables.
O
Variables and arrays (with non-numeric subscripts).
O
Full set of arithmetic operators and control flow.
O
Multiple output streams to files and pipes.
O
Output can be formatted as desired.
O
Multi-line capabilities.