Table 2.1 outlines the opcode typing convention. The expression ``a above b'' means that `a' is on top of the stack with `b' below it. Table 2.3 describes each of the opcodes. The character `*' at the end of a name specifies that all operations with the root prefix before the `*' are summarized by one entry. Table 2.2 gives the codes used to describe the type inline data expected by each instruction.
box center; c s s c s s c c c n ap-2 a. Table 2.1 - Operator Suffixes = Unary operator suffixes Suffix Example Argument type 2 NEG2 Short integer (2 bytes) 4 SQR4 Long integer (4 bytes) 8 ABS8 Real (8 bytes) _ c s s c c c n ap-2 a. Binary operator suffixes Suffix Example Argument type 2 ADD2 Two short integers 24 MUL24 Short above long integer 42 REL42 Long above short integer 4 DIV4 Two long integers 28 DVD28 Short integer above real 48 REL48 Long integer above real 82 SUB82 Real above short integer 84 MUL84 Real above long integer 8 ADD8 Two reals _ c s s c c c n ap-2 a. Other Suffixes Suffix Example Argument types T ADDT Sets G RELG Strings
An address offset is given in the word following the instruction. T} _ A T{ An address offset is given in the four bytes following the instruction. T} _ l T{ An index into the display is given in the sub-opcode. T} _ r T{ A relational operator is encoded in the sub-opcode. (see section 2.3) T} _ s T{ A small integer is placed in the sub-opcode, or in the next word if it is zero or too large. T} _ v T{ Variable length inline data. T} _ w T{ A word value in the following word. T} _ W T{ A long value in the following four bytes. T} _ " T{ An inline constant string. T}
box center; c s c | c ci | aw(3.25i). Table 2.2 - Inline data type codes _ Code Description = a T{
lb lb c a. Code Operation _ 0 a = b 2 a <> b 4 a < b 6 a > b 8 a <= b 10 a >= b Each operation does a test to set the condition code appropriately and then does an indexed branch based on the sub-operation code to a test of the condition here specified, pushing a Boolean value on the stack. Consider the statement fragment:
if a = b then If a and b are integers this generates the following code:
lp-2w(8) l. RV4:l a RV4:l b REL4 = IF Else part offset c s. ... Then part code ...
The Boolean operators AND, OR, and NOT manipulate values on the top of the stack. All Boolean values are kept in single bytes in memory, or in single words on the stack. Zero represents a Boolean false, and one a Boolean true.
_LRV4: cvtbl (lc)+,r0 #r0 has display index addl3 _display(r0),(lc)+,r1 #r1 has variable address pushl (r1) #put value on the stack jmp (loop) Here the interpreter places the display level in r0. It then adds the appropriate display value to the inline offset and pushes the value at this location onto the stack. Control then returns to the main interpreter loop. The RV* operators have short inline data that reduces the space required to address the first 32K of stack space in each stack frame. The operators RV14 and RV24 provide explicit conversion to long as the data is pushed. This saves the generation of STOI to align arguments to C subroutines.
_CON1: cvtbw (lc)+,-(sp) jmp (loop) Here note that little work was required as the required constant was available at (lc)+. For longer constants, lc must be incremented before moving the constant. The operator CON takes a length specification in the sub-opcode and can be used to load strings and other variable length data onto the stack. The operators CON14 and CON24 provide explicit conversion to long as the constant is pushed.
i := 1 where i is a full-length, 4 byte, integer, will generate the code sequence
lp-2w(8) l. LV:l i CON1:1 AS24 Here LV will load the address of i, that is really given as a block number in the sub-opcode and an offset in the following word, onto the stack, occupying a single word. CON1, that is a single word instruction, then loads the constant 1, that is in its sub-opcode, onto the stack. Since there are not one byte constants on the stack, this becomes a 2 byte, single word integer. The interpreter then assigns a length 2 integer to a length 4 integer using AS24. The code sequence for AS24 is given by:
_AS24: incl lc cvtwl (sp)+,*(sp)+ jmp (loop) Thus the interpreter gets the single word off the stack, extends it to be a 4 byte integer gets the target address off the stack, and finally stores the value in the target. This is a typical use of the constant and assignment operators.
_LLV: cvtbl (lc)+,r0 #r0 has display index addl3 _display(r0),(lc)+,-(sp) #push address onto the stack jmp (loop) It calculates an address in the block specified in the sub-opcode by adding the associated display entry to the offset that appears in the following word. The LV operator has a short inline data that reduces the space required to address the first 32K of stack space in each call frame.
p^.f1 pi would generate the sequence
lp-2w(8) l. RV:l p OFF f1 where the RV loads the value of p, given its block in the sub-opcode and offset in the following word, and the interpreter then adds the offset of the field f1 in its record to get the correct address. OFF takes its argument in the sub-opcode if it is small enough.
lp-2w(8) l. RV:l p NIL OFF f1 where the NIL operation checks for a nil pointer and generates the appropriate runtime error if it is.
a[i] := 2.0 with i an integer and a an ``array [1..1000] of real'' would generate
lp-2w(8) l. LV:l a RV4:l i INX4:8 1,999 CON8 2.0 AS8 Here the LV operation takes the address of a and places it on the stack. The value of i is then placed on top of this on the stack. The array address is indexed by the length 4 index (a length 2 index would use INX2) where the individual elements have a size of 8 bytes. The code for INX4 is:
_INX4: cvtbl (lc)+,r0 bneq L1 cvtwl (lc)+,r0 #r0 has size of records L1: cvtwl (lc)+,r1 #r1 has lower bound movzwl (lc)+,r2 #r2 has upper-lower bound subl3 r1,(sp)+,r3 #r3 has base subscript cmpl r3,r2 #check for out of bounds bgtru esubscr mull2 r0,r3 #calculate byte offset addl2 r3,(sp) #calculate actual address jmp (loop) esubscr: movw $ESUBSCR,_perrno jbr error Here the lower bound is subtracted, and range checked against the upper minus lower bound. The offset is then scaled to a byte offset into the array and added to the base address on the stack. Multi-dimension subscripts are translated as a sequence of single subscriptings.
The interpreter has many arithmetic operators.
All operators produce results long enough to prevent overflow
unless the bounds of the base type are exceeded.
The basic operators available are
Addition: ADD*, SUCC* Subtraction: SUB*, PRED* Multiplication: MUL*, SQR* Division: DIV*, DVD*, MOD* Unary: NEG*, ABS*
The interpreter has several range checking operators. The important distinction among these operators is between values whose legal range begins at zero and those that do not begin at zero, for example a subrange variable whose values range from 45 to 70. For those that begin at zero, a simpler ``logical'' comparison against the upper bound suffices. For others, both the low and upper bounds must be checked independently, requiring two comparisons. On the 11/780"" VAX 11/780 both checks are done using a single index instruction so the only gain is in reducing the inline data.
The interpreter includes three operators for case statements that are used depending on the width of the case label type. For each width, the structure of the case data is the same, and is represented in figure 2.4.
center, box;
cw(15).
CASEOP
_
No. of cases
_
Case transfer table
_
Array of case label values
Figure 2.4 - Case data structure
The CASEOP case statement operators do a sequential search through the case label values. If they find the label value, they take the corresponding entry from the transfer table and cause the interpreter to branch to the specified statement. If the specified label is not found, an error results.
The
CASE
operators take the number of cases as a sub-opcode
if possible.
Three different operators are needed to handle single byte,
word, and long case transfer table values.
For example, the
CASEOP1
operator has the following code sequence:
_CASEOP1: cvtbl (lc)+,r0 bneq L1 cvtwl (lc)+,r0 #r0 has length of case table L1: movaw (lc)[r0],r2 #r2 has pointer to case labels movzwl (sp)+,r3 #r3 has the element to find locc r3,r0,(r2) #r0 has index of located element beql caserr #element not found mnegl r0,r0 #calculate new lc cvtwl (r2)[r0],r1 #r1 has lc offset addl2 r1,lc jmp (loop) caserr: movw $ECASE,_perrno jbr error
Here the interpreter first computes the address of the beginning of the case label value area by adding twice the number of case label values to the address of the transfer table, since the transfer table entries are 2 byte address offsets. It then searches through the label values, and generates an ECASE error if the label is not found. If the label is found, the index of the corresponding entry in the transfer table is extracted and that offset is added to the interpreter location counter.
The following operations are defined to do execution profiling.
The set operations: union ADDT, intersection MULT, element removal SUBT, and the set relationals RELT are straightforward. The following operations are more interesting.
if character in [`+', '-', `*', `/'] or
if character in [`a'..`z', `$', `_'] These constructs are common in Pascal, and INCT makes them run much faster in the interpreter, as if they were written as an efficient series of if statements.
Other miscellaneous operators that are present in the interpreter are ASRT that causes the program to end if the Boolean value on the stack is not true, and STOI, STOD, ITOD, and ITOS that convert between different length arithmetic operands for use in aligning the arguments in procedure and function calls, and with some untyped built-ins, such as SIN and COS.
Finally, if the program is run with the run-time testing disabled, there are special operators for for statements and special indexing operators for arrays that have individual element size that is a power of 2. The code can run significantly faster using these operators.
The transcendental functions SIN, COS, ATAN, EXP, LN, SQRT, SEED, and RANDOM are taken from the standard UNIX mathematical package. These functions take double precision floating point values and return the same.
The functions EXPO, TRUNC, and ROUND take a double precision floating point number. EXPO returns an integer representing the machine representation of its argument's exponent, TRUNC returns the integer part of its argument, and ROUND returns the rounded integer part of its argument.
The other system time procedures are
DATE
and
TIME
that copy an appropriate text string into a pascal string array.
The function
ARGC
returns the number of command line arguments passed to the program.
The procedure
ARGV
takes an index on the stack and copies the specified
command line argument into a pascal string array.
The function
CHR*
converts a suitably small integer into an ascii character.
Its primary purpose is to do a range check.
The function
ODD*
returns
true
if its argument is odd and returns
false
if its argument is even.
The function
UNDEF
always returns the value
false.