WO1988005190A1 - Microprogrammable language emulation system - Google Patents

Microprogrammable language emulation system Download PDF

Info

Publication number
WO1988005190A1
WO1988005190A1 PCT/US1987/003444 US8703444W WO8805190A1 WO 1988005190 A1 WO1988005190 A1 WO 1988005190A1 US 8703444 W US8703444 W US 8703444W WO 8805190 A1 WO8805190 A1 WO 8805190A1
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
token
processor
tokens
program
Prior art date
Application number
PCT/US1987/003444
Other languages
French (fr)
Inventor
Arthur E. Speckhard
Joseph M. Thames
Original Assignee
International Meta Systems, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Meta Systems, Inc. filed Critical International Meta Systems, Inc.
Priority to PCT/US1987/003444 priority Critical patent/WO1988005190A1/en
Priority to EP19880900933 priority patent/EP0343171A4/en
Publication of WO1988005190A1 publication Critical patent/WO1988005190A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/32Address formation of the next instruction, e.g. by incrementing the instruction counter
    • G06F9/322Address formation of the next instruction, e.g. by incrementing the instruction counter for non-sequential address
    • G06F9/328Address formation of the next instruction, e.g. by incrementing the instruction counter for non-sequential address for runtime instruction patching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/31Programming languages or programming paradigms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/22Microcontrol or microprogram arrangements
    • G06F9/226Microinstruction function, e.g. input/output microinstruction; diagnostic microinstruction; microinstruction format
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/22Microcontrol or microprogram arrangements
    • G06F9/24Loading of the microprogram
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines

Definitions

  • the present invention generally relates to a system for execution of high level computer language programs. More particularly, it relates to a system for emulating high level computer languages and executing programs written in such languages.
  • Computer languages have traditionally been divided into two classes: programming languages and machine languages. As the name implies, machine languages have been used to refer to machine elements of computers and have been directed to each action to be performed by the computers for which they are written.
  • Programming languages are generally considered to be "high level” languages because their operators and constructs correspond more to properties of applications or problems to be solved rather than actual machine functions or physical elements. Because high level languages typically do not make reference to actual machine hardware, high level languages have been translated into machine languages in order for computers to execute the statements of high level languages.
  • the traditional method for translating high level languages has been to compile several machine language instructions for each high level language statement. Multiple machine language instructions have been used because the logic of high level languages typically cannot be expressed in one-to- one correspondence with machine language instructions .
  • a disadvantage of compiling several machine language instructions for each high level language statement is that most computers execute only one instruction at a time. Moreover, each instruction must be fetched from a memory location one at. a time, presenting a "bottleneck" between memory access time and machine processing speed.
  • VLSI very large scale integrated
  • Microfiche appendices which constitute a part of this specification, are as follows: MICROFICHE APPENDIX A is a computer program listing of the encoder program of the preferred embodiment, contained on three microfiche having 201 frames; and
  • MICROFICHE APPENDIX B is a computer program listing of the emulator program of the preferred embodiment, contained on 2 microfiche having 162 frames.
  • the present invention solves the problems associated with the compiling of high level languages by encoding such languages in variable length tokens representative of characteristics intrinsic to such languages.
  • the tokens are then executed by a processor which is microprogrammed to emulate a high level language.
  • the system of the present invention may be used independently or in conjunction with a host computer system, such as an IBM PC AT or equivalent system, to provide high speed processing of application programs written in a multiplicity of high level languages.
  • a host computer system such as an IBM PC AT or equivalent system
  • the processor of the present invention is microprogrammable, encoder and emulator programs written for specific languages may be used as microcode for the processor.
  • FIG. 1 is a block diagram of apparatus of the preferred embodiment
  • FIG. 2 is a block diagram of a program procedure
  • FIG. 3 is a logic flow-chart showing the word padding function of the preferred embodiment
  • FIG. 4 is a sample listing. of tokens of the preferred embodiment, which are used to represent a high level language statement
  • FIG. 5 is a conceptualized depiction of the branching operation of the preferred embodiment
  • FIGs. 6a and 6b are graphic representations of alternative instruction formats of the processor of the preferred embodiment.
  • FIG. 7 is a block diagram of the processor of the preferred embodiment.
  • An expansion board 10 is added to a host computer 20, such as an IBM PC AT or equivalent system.
  • the expansion board 10 includes a main memory 30, a processor 40, a cache memory 50, an instruction memory 60, an interface 70 with the host computer 20 and bus and control lines 80.
  • the main memory 30 is a dynamic random access memory unit having a storage capacity of approximately 1 megabyte and information is stored in the general purpose memory 30 in 32-bit words.
  • An encoder program is loaded into the instruction memory 60 by the host computer 20 through the interface 70 and bus lines 80.
  • the instruction memory is a loadable read only memory (ROM) unit having a storage capacity of 64K 32- bit words.
  • ROM read only memory
  • a listing of a representative encoder program is attached to this disclosure as Microfiche Appendix A and incorporated by reference.
  • the encoder program causes the processor 40 to fetch statements of the high level language program from the main memory 30,. encode each statement in a representative stream of variable length bit fields ("tokens") without regard to word boundaries, and store the encoded statements in the main memory 30.
  • an emulator program is loaded into the instruction memory 60 by the host computer 20 through the interface 70 and bus lines 80.
  • a listing of a representative emulator program is attached to this disclosure as Microfiche Appendix B and incorporated by reference.
  • the emulator program causes the microprocessor to fetch the encoded statements from the main memory 30 and interprets the tokens of the encoded statements into microcode instructions resulting in execution of the high level language. That is, the instruction tokens executed by the emulator program have a syntax and data structure resembling the syntax and data structure of the high level language rather than the physical elements of the processor. Instruction tokens are packed into the main memory 30 and subsequently fetched and executed by the processor 40.
  • Each procedure 100 has a specific function to perform and typically comprises three major parts: a header 110; a code body 120; and a data contour 130.
  • the header 110 contains a table which describes the bit size and storage locations of the code body 120 and, the data contour 130.
  • the code body 120 contains the logic statements of the procedure and the data contour 130 contains data or storage locations of data to be used in the procedure.
  • absolute addresses ar not used to provide location references to elements of the code body 12.0 or data contour 130. Rather, all location references are relative to fixed positions in the data contour 130 or the header 110. Thus, machine references are not directly made. An exception exists, however, for global references which are used to refer to other procedures and to common or global variables.
  • Each statement of the high level language is represented in a stream of instruction tokens generally without regard to word size, again allowing for independence from machine constraints.
  • Each token is a variable length bit field representable by an integer pair having the following general form.
  • the first integer of the pair is a constant of the emulator program, fixed by the order of tokens in the language representation, which indicates the length of the token in bits.
  • the second integer is the value of the token, representative of an operator or operand corresponding to characteristics that are intrinsic to the execution of the high level language.
  • the integer pair provides a switching context within the emulator program potentially having a branch for each integer in the set of all integers representable by the token of the specified length. That is, during execution, the emulator program interprets each token to cause the processor 40 to branch to locations in analagous fashion to a FORTRAN computed GO TO statement or a PASCAL CASE statement.
  • token stream format is presented herein in a left-to-right ordering but in actual implementation, the ordering is right-to-left. That is, token streams fill memory words of the main memory 30 beginning at the least significant bit (bit position 0) of a memory word and ending at the most significant bit (bit position 31).
  • bit position 0 the least significant bit
  • bit position 31 the most significant bit
  • the tokens are generated without regard to word size, they must be packed in memory according to the word boundaries of the main memory 30, that is, 32 bits. Accordingly, as shown in FIG. 3, the number of bits remaining in a memory word is calculated (200) for each token stored.
  • the first integer in each primary operator token representation indicates that the instruction token is six bits in length. This integer is not part of the instruction token stream, but is part of the emulator program which executes the token stream. Its value is predictable according to the ordering of tokens in the language representation.
  • the second integer indicates which statement of the high level language is being encoded. For each of these operators there are suboperations corresponding to the suboperations of the high level language, thus providing isomorphic representation of the high level language. A common suboperation is the
  • an operand which contains a blend of operators and operands.
  • An operand may in turn correspond to syntax names or literals as in the high level language.
  • Operands are token structures corresponding to syntax names that are used to refernece the, content of data structures. However, operand references are not simply memory addresses. Rather, the mapping of references to data content is a dynamic process. Two kinds of operand refernces are employed in code-body 120 expressions: variable references and name references. Variable references are references to variable data structures while name referneces correspond to names of vqariable data structures. In general, variable references are used to retrieve. data, whereas name references are used to store data. A name is considered a literal.
  • the second field of the name reference is actually a type code ⁇ 4:3 ⁇ referring to type "name," since a name is a literal data structure.
  • the reference structure has a four- bit class field, a variable format primary reference and may have a secondary reference relative to the primary one (common variables only).
  • the primary reference is an index to one of the tables in the contour 130 portion of the object program, as designated by the class field.
  • the four-bit class field and associated class designations are listed in TABLE IV below:
  • the primary reference format has a two bit length code and a value subfield.
  • the length of the value subfield is encoded in the length specification shown in TABLE V below:
  • the preceding class code and the length code are contiguous in the same word so that a zero length code may be distinguished from a zero pad, mentioned above.
  • the value represents a relative pointer into one of the tables in the contour 130 as specified by the class field.
  • the length code applies only to the block-value.
  • the offset value is a "word encoded" literal, meaning that its size is determined by the number of remaining bits in the word containing the preceding fields. If enough bits remain to contain the value (which cannot be zero), these bits are used as the offset-value field. Otherwise, the remaining bits are zero, and the next full word is used as the offset field.
  • Literal operands and operand data structures have similar formats. Literal operands exist in the code body 120. Operand data structures, if they exist prior to execution, are stored in the data contour 130. Otherwise, they are created and stored dynamically and are referenced indirectly through the data contour 130 tables. Data structures are prefaced by a four-bit type code, as shown in TABLE VIII below:
  • Such data structures possess great variability in length and may not fit in a single word of memory. If not, the entire value, including the sign bit is stored in the next memory word.
  • the low order bit of the data structure represents the sign of the value, containing a 0 for a positive sign or a 1 for a negative sign.
  • the literal value of zero has a negative sign bit (1) to distinguish it from a 0 pad field, which is ignored by the emulator program.
  • ⁇ 000001 ⁇ represents a literal value of zero
  • ⁇ 000000 ⁇ represent a pad field.
  • the statement 300 is a FORTRAN assignment statement assigning the sum of 10 + 5 to the variable A.
  • the first token 310 is a primary operator having a length of six bits and indicating an assignment statement, as shown in TABLE I above.
  • the second token 320 is a literal operand discriminator, as shown in TABLE II above, which initiates an expression.
  • the third token 330 indicates that the literal operand is an integer, as shown in TABLE VIII above.
  • the fourth token 340 has a length of 20 bits to fill the remainder of a 32 bit memory word and a value of 20 which indicates that its true value of 10 has been shifted to add a positive sign bit to the zero bit position of the memory word. Such single position shifting has the appearance of multiplying the value by 2 since it is encoded in binary form.
  • the fifth token 350 is another literal operand discriminator to preface a literal value.
  • the sixth token 360 indicates that the literal operand is an integer and the seventh token 370 has a length of 26 bits to fill the remainder of a 32-bit memory word.
  • the literal value of the seventh token 370 is 10 which indicates that a positive sign bit was added to the memory word to shift the true value of 5 by 1 bit.
  • the eighth token 380 is an operator discriminator to indicate that an operation is to be performed in the expression. It follows the literal operand token groups described above to indicate Polish post-fix notation. That is, the operator is applied to operands which precede it. During execution of the encoded statement, the literal operands are loaded into a push-down stack so that operators are applied to them on a last-in-first-out basis.
  • the ninth token 390 indicates an addition operation is to be performed, as shown in TABLE III above.
  • the tenth token 400 is another operator discriminator and the eleventh token 410 indicates an end to the expression, as shown in TABLE III above.
  • the twelfth token 420 initiates a new expression containing a literal operand as shown in TABLE II.
  • the thirteenth token 430 indicates that the literal operand is a name reference as shown in TABLE VIII. (The structure of this reference is shown in TABLE VI, in which its first two tokens correspond to 420 and 430 in FIG. 4.)
  • the fourteenth token 440 is the class code of the name reference, indicating the name of a local variable as shown in TABLE IV.
  • the fifteenth token 450 is the length code of the reference as shown in TABLE V, indicating that the subsequent (sixteenth) value token 460 is 5 bits in length.
  • the value token 460 is a relative pointer to the location of the local variable in the data contour 130, as shown in FIG. 2. Its value, one, indicates the location of the first local variable (local variable A 130G1) in the Local Variables Table 130G of the data contour 130 located by the local variables offset pointer in the Header 110.
  • the seventeenth token 470 is another operator discriminator and the eighteenth token 480 is an Expression-End operator, as shown in TABLE III.
  • the integer pair symbolizing each token represents a logical switching context in the- emulator program containing a branch corresponding to each integer in the set of integers representable by the token of the specified length.
  • the branching operation of the emulator program may be conceptualized as a logical tree system.
  • the token stream defines a path through a hierarchy of integer pairs, which achieves the execution of the statement.
  • the primary operator token, ⁇ 6:1 ⁇ is a member of the highest order set which represents the statement context of the high-level language.
  • the integer one (500), indicated by the value of the token, designates a branch to the Assign (primary) operator, as shown in TABLE I.
  • the Assign operator automatically invokes expression subcontext which begins with discriminator subcontext, as shown in TABLE II.
  • the length of the discriminator token (510) allows three branches (zero values are invalid in most contexts since zeros are used as padding, causing the emulator program to proceed to the next word).
  • the value of the discriminator token (510), three, selects the third branch indicating that a literal data structure follows.
  • the literal data structure begins with a type subcontext as shown in TABLE VIII.
  • the length of the type token (520), four bits, allows up to 15 type branches (excluding the zero value).
  • the value, four, of the type token (520) selects the integer type.
  • the literal data structure is concluded with the value of the integer. If enough bits remain in the word containing the preceding tokens to represent the integer value, then the remainder of the word is used. Otherwise, the remainder of the word is zero-filled (and ignored by the emulator program) and the subsequent full word is used (as a token) to contain the value of the integer.
  • twenty bits remain in the 32-bit word, and are used as the value token.
  • the value itself consists of a zero in the low order bit (the sign bit) indicating a plus sign,, and the value 10 in the high-order nineteen bits.
  • the characteristics of the high level language are thus directly represented by the content and ordering of token streams. Branches are made isomorphically to the high level language so it is unnecessary to compile multiple machine instructions to interpret the logic of high level language statements. Rather, such statements are executed in direct correspondence with their intrinsic characteristics as represented by the token streams.
  • An emulator program such as shown in Microfiche Appendix B and incorporated herein, is loaded into the instruction memory 50 and causes the processor 40 to execute token streams.
  • the emulator program is written specifically to support the language of the program being executed. That is, each high level language requires its own emulator program in order to provide direct interpretation of tokens via micro-programmed branching.
  • the emulator program interprets token streams using microcode instructions of the machine language of the processor 40.
  • the processor 40 has an instruction set of 24 hardware operations that may be combined into a composite (dual) instruction format having a left hand side (LHS) and a right hand side (RHS), as shown in FIG. 6a.
  • the composite instruction is 32 bits wide and contains seven fields. Alternatively, the instruction may have only a LHS and contain only five fields, as shown in FIG. 6b.
  • the LHS portion of an instruction contains an arithmetic, logic or shift operation between two operands with the result assigned to a third operand,
  • the RHS portion of an instruction contains a second operatipn, including external bus instructions, subroutine link/return skips, transfers, or memory indexing.
  • a three address instruction format is used for LHS operations, having the following symbolic form:
  • A 10 + 5
  • the T field of an LHS instruction specifies a register containing the local variable "A”
  • the A field specifies a register containing the value of 10
  • the B field specifies a register containing the value of 5.
  • the processor 40 of the preferred embodiment is contained on a single very large scale integrated circuit (VLSI) silicon chip 800. It executes instructions in a "pipeline" manner in four phases Ph I, Ph II, Ph III and Ph IV. That is, four sequential instructions are concurrently executed in one of the four phases, each phase being one clock cycle in duration. As execution of an instruction is completed, it exits the pipeline and a new instruction enters it. The intermediate instructions simultaneously advance to their next phase of execution. During the first phase Ph I, an instruction is fetched from the instruction memory 50 and loaded into an instruction register 810. The address of the instruction is indicated by a location counter 830 which is either incremented sequentially or given values by transfer, return or conditional transfer instructions.
  • VLSI very large scale integrated circuit
  • the instruction is then decoded in the second phase Ph II by an instruction decoder 820.
  • the LHS is decoded for the Operator, A and B fields but not for the T field, which is passed unaltered to the next phase.
  • the A and B fields designate registers in a general register file 840, or literal registers 850 and 860, whose values are passed to an A Register 870 and a B Register 880.
  • the Operator field is passed to a Opcode Decoder 890 which determines which operation is specified by the Operator field.
  • the RHS is also decoded during the second phase Ph II, but only the Unconditional Transfer, Link, Link Conditional, Return and Load K Register instructions are acted upon during the second phase. All other RHS instructions are passed to the next phase.
  • the address field of the instruction is used to select the next value for the location counter 830, which references the address for the instruction to be fetched immediately after the instruction in Ph I advances.
  • the Link and Link Conditional instructions are executed in similar fashion to the Unconditional Transfer instruction but in addition they "push” the accompanying location counter value onto the Link Stack Register 900 for subsequent use by the Return instruction. That instruction "pops" a value from the Link Stack Register 900 and adds the value of its address field to to the popped value. The sum is used as the next value of the location counter 830.
  • the Load K Register instruction loads its address field into the K Register 910, which is a special purpose register used in conjunction with a Memory Address Register 920 to reference the cache memory 60.
  • the third phase Ph III only the LHS instruction is acted upon.
  • the values selected in the second phase Ph II are operated upon as specified by the Operator field and the result is sent to an X Register 930, which gives the succeeding instruction access to that result when it advances in the next clock cycle.
  • the result of the R3 + R2 operation is stored in the X Register 930 and then added to R1 when the follow up instruction advances into the third phase Ph III.
  • the result of the LHS instruction held in the X Register 930 is stored in a general register or special register as specified by the T field. It is also used for conditional transfer or skip testing by the Test and Skip Logic 940. If a transfer or skip is indicated by result, the address field of the instruction is used to select the next value of the location counter 830 and causes the instructions in the first three phases to be inhibited.
  • the processor 40 communicates with the cache memory 60 through a Cache Memory Interface 950. It communicates with external systems through External Bus Logic 960 which is connected to a 32-bit-wide bi-directional bus 970.
  • the interface 70 between the processor 40 and the host computer 20 is connected to the external bus 970 and responds to commands under control of the program in the instruction memory 50.
  • the external bus 970 is 32-bits wide, only bits 15-0 and parity bits 1-0 are used for communicating with the interface 70.
  • Two read and two write commands are implemented as follows:
  • the letters "ss” refer to the subsystem address of the interface 70.
  • the Write Data function causes 16 bits of data to be sent from the processor 40 to the interface 70. A "data" bit in a status register of the interface 70 is also set to indicate that data has been transmitted. However, if either the data bit or control is already set the operation is deferred.
  • the Write Control function causes 16 bits to be sent from the processor 40 to interface 70 and sets a control bit in the interface status register if neither the data bit or control bit is already set. If either bit is already set the operation is deferred.
  • the Read Data Register function causes the contents of a 16-bit data register of the interface 70 to be sent to the processor 40.
  • the contents of the interface data register are loaded by the host computer 20 as either data or control information.
  • the data/control bit of the interface status register is then reset unless neither the data or control bit is set. In that event, the operation is deferred.
  • the Read Status function causes the interface 70 to send to the processor 40 the contents of the interface status register in a format as shown in TABLE XI:
  • the Read Status function is never deferred.
  • the host computer 20 communicates with the processor 40 as if it were an input/output device. Accordingly, the interface to the host computer 20 is formatted according to standard input/output commands of the host computer.
  • a data processing system executes a high level computer language program by encoding statements of the progra into variable length tokens and then executing the tokens.
  • Each token is a variable length bit field having a value represe tative of a semantic element of a program statement and a length representative of the context of the semantic element.
  • VLSI very large scale integrated circuit
  • the present invention generally relates to a system for execution of high level computer language programs. More particularly, it relates to a system for emulating high level computer languages and executing programs written in such languages.
  • Computer languages have traditionally been divided into two classes: programming languages and machine languages. As the name implies, machine languages have been used to refer to machine elements of computers and have been directed to each action to be performed by the computers for which they are written.
  • Programming languages are generally considered to be "high level” languages because their operators and constructs correspond more to properties of applications or problems to be solved rather than actual machine functions or physical elements. Because high level languages typically do not make reference to actual machine hardware, high level languages have been translated into machine languages in order for computers to execute the statements of high level languages.
  • the traditional method for translating high level languages has been to compile several machine language instructions for each high level language statement. Multiple machine language instructions have been used because the logic of high level languages typically cannot be expressed in one-to- one correspondence with machine language instructions.
  • a disadvantage of compiling several machine language instructions for each high level language statement is that most computers execute only one instruction at a time. Moreover, each instruction must be fetched from a memory location one at a time, presenting a "bottleneck” between memory access time and machine processing speed.
  • VLSI very large scale integrated
  • MICROFICHE APPENDIX A is a computer program listing of the encoder program of the preferred embodiment, contained on three microfiche having 201 frames;
  • MICROFICHE APPENDIX B is a computer program listing of the emulator program of the preferred embodiment, contained on 2 microfiche having 162 frames.
  • the present invention solves the problems associated with the compiling of high level languages by encoding such languages in variable length tokens representative of characteristics intrinsic to such languages.
  • the tokens are then executed by a processor which is microprogrammed to emulate a high level language.
  • the system of the present invention may be used independently or in conjunction with a host computer system, such as an IBM PC AT or equivalent system, to provide high speed processing of application programs written in a multiplicity of high level languages. Because the processor of the present invention is microprogrammable, encoder and emulator programs written for specific languages may be used as microcode for the processor.
  • FIG. 1 is a block diagram of apparatus of the preferred embodiment
  • FIG. 2 is a block diagram of a program procedure
  • FIG. 3 is a logic flow-chart showing the word padding function of the preferred embodiment
  • FIG. 4 is a sample listing of tokens of the preferred embodiment, which are used to represent a high level language statement
  • FIG. 5 is a conceptualized depiction of the branching operation of the preferred embodiment
  • FIGs. 6a and 6b are graphic representations of alternative instruction formats of the processor of the preferred embodiment.
  • FIG. 7 is a block diagram of the processor of the preferred embodiment.
  • An expansion board 10 is added to a host computer 20, such as an IBM PC AT or equivalent system.
  • the expansion board 10 includes a main memory 30, a processor 40, a cache memory 50, an instruction memory 60, an interface 70 with the host computer 20 and bus and control lines 80.
  • the main memory 30 is a dynamic random access memory unit having a storage capacity of approximately 1 megabyte and information is stored in the general purpose memory 30 in 32-bit words.
  • An encoder program is loaded into the instruction memory 60 by the host computer 20 through the interface 70 and bus lines 80.
  • the instruction memory is a loadable read only memory (ROM) unit having a storage capacity of 64K 32- bit words.
  • ROM read only memory
  • a listing of a representative encoder program is attached to this disclosure as Microfiche Appendix A and incorporated by reference.
  • the encoder program causes the processor 40 to fetch statements of the high level language program from the main memory 30, encode each statement in a representative stream of variable length bit fields ("tokens") without regard to word boundaries, and store the encoded statements in the main memory 30.
  • an emulator program is loaded into the instruction memory 60 by the host computer 20 through the interface 70 and bus lines 80.
  • a listing of a representative emulator program is attached to this disclosure as Microfiche Appendix B and incorporated by reference.
  • the emulator program causes the microprocessor to fetch the encoded statements from the main memory 30 and interprets the tokens of the encoded statements into microcode instructions resulting in execution of the high level language. That is, the instruction tokens executed by the emulator program have a syntax and data structure resembling the syntax and data structure of the high level language rather than the physical elements of the processor. Instruction tokens are packed into the main memory 30 and subsequently fetched and executed by the processor 40.
  • Each procedure 100 has a specific function to perform and typically comprises three major parts: a header 110; a code body 120; and a data contour 130.
  • the header 110 contains a table which describes the bit size and storage locations of the code body 120 and the data contour 130.
  • the code body 120 contains the logic statements of the procedure and the data contour 130 contains data or storage locations of data to be used in the procedure.
  • absolute addresses are not used to provide location references to elements of the code body 120 or data contour 130. Rather, all location references are relative to fixed positions in the data contour 130 or the header 110. Thus, machine references are not directly made. An exception exists, however, for global references which are used to refer to other procedures and to common or global variables.
  • Each statement of the high level language is represented in a stream of instruction tokens generally without regard to word size, again allowing for independence from machine constraints.
  • Each token is a variable length bit field representable by an integer pair having the following general form.
  • the first integer of the pair is a constant of the emulator program, fixed by the order of tokens in the language representation, which indicates the length of the token in bits.
  • the second integer is the value of the token, representative of an operator or operand corresponding to characteristics that are intrinsic to the execution of the high level language.
  • the integer pair provides a switching context within the emulator program potentially having a branch for each integer in the set of all integers representable by the token of the specified length. That is, during execution, the emulator program interprets each token to cause the processor 40 to branch to locations in analagous fashion to a FORTRAN computed GO TO statement or a PASCAL CASE statement.
  • token stream format is presented herein in a left-to-right ordering but in actual implementation, the ordering is right-to-left. That is, token streams fill memory words of the main memory 30 beginning at the least significant bit (bit position 0) of a memory word and ending at the most significant bit (bit position 31).
  • bit position 0 the least significant bit
  • bit position 31 the most significant bit
  • the tokens are generated without regard to word size, they must be packed in memory according to the word boundaries of the main memory 30, that is, 32 bits. Accordingly, as shown in FIG. 3, the number of bits remaining in a memory word is calculated (200) for each token stored.
  • the first integer in each primary operator token representation indicates that the instruction token is six bits in length. This integer is not part of the instruction token stream, but is part of the emulator program which executes the token stream. Its value is predictable according to the ordering of tokens in the language representation.
  • the second integer indicates which statement of the high level language is being encoded. For each of these operators there are suboperations corresponding to the suboperations of the high level language, thus providing isomorphic representation of the high level language. A common suboperation is the
  • an operand which contains a blend of operators and operands.
  • An operand may in turn correspond to syntax names or literals as in the high level language.
  • Operands are token structures corresponding to syntax names that are used to refernece the content of data structures. However, operand references are not simply memory addresses. Rather, the mapping of references to data content is a dynamic process. Two kinds of operand refernces are employed in code-body 120 expressions: variable references and name references. Variable references are references to variable data structures while name referneces correspond to names of vqariable data structures. In general, variable references are used to retrieve data, whereas name references are used to store data. A name is considered a literal.
  • the second field of the name reference is actually a type code ⁇ 4:3 ⁇ referring to type "name," since a name is a literal data structure.
  • the reference structure has a four- bit class field, a variable format primary reference and may have a secondary reference relative to the primary one
  • the primary reference is an index to one of the tables in the contour 130 portion of the object program, as designated by the class field.
  • the four-bit class field and associated class designations are listed in TABLE IV below:
  • the primary reference format has a two bit length code and a value subfield.
  • the length of the value subfield is encoded in the length specification shown in TABLE V below:
  • the preceding class code and the length code are contiguous in the same word so that a zero length code may be distinguished from a zero pad, mentioned above.
  • the value represents a relative pointer into one of the tables in the contour 130 as specified by the class field.
  • the length code applies only to the block-value.
  • the offset value is a "word encoded" literal, meaning that its size is determined by the number of remaining bits in the word containing the preceding fields. If enough bits remain to contain the value (which cannot be zero), these bits are used as the offset-value field. Otherwise, the remaining bits are zero, and the next full word is used as the offset field.
  • Literal operands and operand data structures have similar formats. Literal operands exist in the code body 120. Operand data structures, if they exist prior to execution, are stored in the data contour 130. Otherwise, they are created and stored dynamically and are referenced indirectly through the data contour 130 tables. Data structures are prefaced by a four-bit type code, as shown in TABLE VIII below:
  • Such data structures possess great variability in length and may not fit in a single word of memory. If not, the entire value, including the sign bit is stored in the next memory word.
  • the low order bit of the data structure represents the sign of the value, containing a 0 for a positive sign or a 1 for a negative sign.
  • the literal value of zero has a negative sign bit (1) to distinguish it from a 0 pad field, which is ignored by the emulator program.
  • ⁇ 000001 ⁇ represents a literal value of zero
  • ⁇ 000000 ⁇ represents a pad field.
  • the statement 300 is a FORTRAN assignment statement assigning the sum of 10 + 5 to the variable A.
  • the first token 310 is a primary operator having a length of six bits and indicating an assignment statement, as shown in TABLE I above.
  • the second token 320 is a literal operand discriminator, as shown in TABLE II above, which initiates an expression.
  • the third token 330 indicates that the literal operand is an integer, as shown in TABLE VIII above.
  • the fourth token 340 has a length of 20 bits to fill the remainder of a 32 bit memory word and a value of 20 which indicates that its true value of 10 has been shifted to add a positive sign bit to the zero bit position of the memory word. Such single position shifting has the appearance of multiplying the value by 2 since it is encoded in binary form.
  • the fifth token 350 is another literal operand discriminator to preface a literal value.
  • the sixth token 360 indicates that the literal operand is an integer and the seventh token 370 has a length of 26 bits to fill the remainder of a 32-bit memory word.
  • the literal value of the seventh token 370 is 10 which indicates that a positive sign bit was added to the memory word to shift the true value of by 1 bit.
  • the eighth token 380 is an operator discriminator to indicate that an operation is to be performed in the expression. It follows the literal operand token groups described above to indicate Polish post-fix notation. That is, the operator is applied to operands which precede it. During execution of the encoded statement, the literal operands are loaded into a push-down stack so that operators are applied to them on a last-in-first-out basis.
  • the ninth token 390 indicates an addition operation is to be performed as shown in TABLE III above.
  • the tenth token 400 is another operator discriminator and the eleventh token 410 indicates an end to the expression, as shown in TABLE III above.
  • the twelfth token 420 initiates a new expression containing a literal operand as shown in TABLE II.
  • the thirteenth token 430 indicates that the literal operand is a name reference as shown in TABLE VIII. (The structure of this reference is shown in TABLE VI, in which its first two tokens correspond to 420 and 430 in FIG. 4.)
  • the fourteenth token 440 is the class code of the name reference, indicating the name of a local variable as shown in TABLE IV.
  • the fifteenth token 450 is the length code of the reference as shown in TABLE V, indicating that the subsequent (sixteenth) value token 460 is 5 bits in length.
  • the value token 460 is a relative pointer to the location of the local variable in the data contour 130, as shown in FIG. 2. Its value, one, indicates the location of the first local variable (local variable A 130G1) in the Local Variables Table 130G of the data contour 130 located by the local variables offset pointer in the Header 110.
  • the seventeenth token 470 is another operator discriminator and the eighteenth token 480 is an Expression-End operator, as shown in TABLE III.
  • the integer pair symbolizing each token represents a logical switching context in the emulator program containing a branch corresponding to each integer in the set of integers representable by the token of the specified length.
  • the branching operation of the emulator program may be conceptualized as a logical tree system.
  • the token stream defines a path through a hierarchy of integer pairs, which achieves the execution of the statement.
  • the primary operator token (6:1), is a member of the highest order set which represents the statement context of the high-level language.
  • the integer one (500), indicated by the value of the token, designates a branch to the Assign, (primary) operator, as shown in TABLE I.
  • the Assign operator automatically invokes expression subcontext which begins with discriminator subcontext, as shown in TABLE II.
  • the length of the discriminator token (510) allows three branches (zero values are invalid in most contexts since zeros are used as padding, causing the emulator program to proceed to the next word).
  • the value of the discriminator token (510), three, selects the third branch indicating that a literal data structure follows.
  • the literal data structure begins with a type subcontext as shown in TABLE VIII.
  • the length of the type token (520), four bits, allows up to 15 type branches (excluding the zero value).
  • the value, four, of the type token (520) selects the integer type.
  • the literal data structure is concluded with the value of the integer. If enough bits remain in the word containing the preceding tokens to represent the integer value, then the remainder of the word is used. Otherwise, the remainder of the word is zero-filled (and ignored by the emulator program) and the subsequent full word is used (as a token) to contain the value of the integer.
  • twenty bits remain in the 32-bit word, and are used as the value token.
  • the value itself consists of a zero in the low order bit (the sign bit) indicating a plus sign, and the value 10 in the high-order nineteen bits.
  • the characteristics of the high level language are thus directly represented by the content and ordering of token streams. Branches are made isomorphically to the high level language so it is unnecessary to compile multiple machine instructions to interpret the logic of high level language statements. Rather, such statements are executed in direct correspondence with their intrinsic characteristics as represented by the token streams.
  • An emulator program such as shown in Microfiche Appendix B and incorporated herein, is loaded into the instruction memory 50 and causes the processor 40 to execute token streams.
  • the emulator program is written specifically to support the language of the program being executed. That is, each high level language requires its own emulator program in order to provide direct interpretation of tokens via micro-programmed branching.
  • the emulator program interprets token streams using microcode instructions of the machine language of the processor 40.
  • the processor 40 has an instruction set of 24 hardware operations that may be combined into a composite (dual) instruction format having a left hand side (LHS) and a right hand side (RHS), as shown in FIG. 6a.
  • the composite instruction is 32 bits wide and contains seven fields. Alternatively, the instruction may have only a LHS and contain only five fields, as shown in FIG. 6b.
  • the LHS portion of an instruction contains an arithmetic, logic or shift operation between two operands with the result assigned to a third operand.
  • the RHS portion of an instruction contains a second operation, including external bus instructions, subroutine link/return skips, transfers, or memory indexing.
  • a three address instruction format is used for LHS operations, having the following symbolic form:
  • A 10 + 5
  • the T field of an LHS instruction specifies a register containing the local variable "A”
  • the A field specifies a register containing the value of 10
  • the B field specifies a register containing the value of 5.
  • the processor 40 of the preferred embodiment is contained on a single very large scale integrated circuit (VLSI) silicon chip 800. It executes instructions in a "pipeline" manner in four phases Ph I, Ph II, Ph III and Ph IV. That is, four sequential instructions are concurrently executed in one of the four phases, each phase being one clock cycle in duration. As execution of an instruction is completed, it exits the pipeline and a new instruction enters it. The intermediate instructions simultaneously advance to their next phase of execution. During the first phase Ph I, an instruction is fetched from the instruction memory 50 and loaded into an instruction register 810. The address of the instruction is indicated by a location counter 830 which is either incremented sequentially or given values by transfer, return or conditional transfer instructions.
  • VLSI very large scale integrated circuit
  • the instruction is then decoded in the second phase Ph II by an instruction decoder 820.
  • the LHS is decoded for the Operator, A and B fields but not for the T field, which is passed unaltered to the next phase.
  • the A and B fields designate registers in a general register file 840, or literal registers 850 and 860, whose values are passed to an A Register 870 and a B Register 880.
  • the Operator field is passed to a Opcode Decoder 890 which determines which operation is specified by the Operator field.
  • the RHS is also decoded during the second phase PhII, but only the Unconditional Transfer, Link, Link Conditional, Return and Load K Register instructions are acted upon during the second phase. All other RHS instructions are passed to the next phase.
  • the address field of the instruction is used to select the next value for the location counter 830, which references the address for the instruction to be fetched immediately after the instruction in Ph I advances.
  • the Link and Link Conditional instructions are executed in similar fashion to the Unconditional Transfer instruction but in addition they "push” the accompanying location counter value onto the Link Stack Register 900 for subsequent use by the Return instruction. That instruction "pops" a value from the Link Stack Register 900 and adds the value of its address field to to the popped value. The sum is used as the next value of the location counter 830.
  • the Load K Register instruction loads its address field into the K Register 910, which is a special purpose register used in conjunction with a Memory Address Register 920 to reference the cache memory 60.
  • the third phase Ph III only the LHS instruction is acted upon.
  • the values selected in the second phase Ph II are operated upon as specified by the Operator field and the result is sent to an X Register 930, which gives the succeeding instruction access to that result when it advances in the next clock cycle.
  • the result of the R3 + R2 operation is stored in the X Register 930 and then added to R1 when the follow up instruction advances into the third phase Ph III.
  • the result of the LHS instruction held in the X Register 930 is stored in a general register or special register as specified by the T field. It is also used for conditional transfer or skip testing by the Test and Skip Logic 940. If a transfer or skip is indicated by result, the address field of the instruction is used to select the next value of the location counter 830 and causes the instructions in the first three phases to be inhibited.
  • the processor 40 communicates with the cache memory 60 through a Cache Memory Interface 950. It communicates with external systems through External Bus Logic 960 which is connected to a 32-bit-wide bi-directional bus 970.
  • the interface 70 between the processor 40 and the host computer 20 is connected to the external bus 970 and responds to commands under control of the program in the instruction memory 50.
  • the external bus 970 is 32-bits wide, only bits 15-0 and parity bits 1-0 are used for communicating with the interface 70.
  • Two read and two write commands are implemented as follows:
  • the letters "ss” refer to the subsystem address of the interface 70.
  • the Write Data function causes 16 bits of data to be sent from the processor 40 to the interface 70. A "data" bit in a status register of the interface 70 is also set to indicate that data has been transmitted. However, if either the data bit or control is already set the operation is deferred.
  • the Write Control function causes 16 bits to be sent from the processor 40 to interface 70 and sets a control bit in the interface status register if neither the data bit or control bit is already set. If either bit is already set the operation is deferred.
  • the Read Data Register function causes the contents of a 16-bit data register of the interface 70 to be sent to the processor 40.
  • the contents of the interface data register are loaded by the host computer 20 as either data or control information.
  • the data/control bit of the interface status register is then reset unless neither the data or control bit is set. In that event, the operation is deferred.
  • the Read Status function causes the interface 70 to send to the processor 40 the contents of the interface status register in a format as shown in TABLE XI:
  • the Read Status function is never deferred.
  • the host computer 20 communicates with the processor 40 as if it were an input/output device. Accordingly, the interface to the host computer 20 is formatted according to standard input/output commands of the host computer.

Abstract

A data processing system executes a high level computer language program by encoding statements of the program into variable length tokens and then executing the tokens. Each token is a variable length bit field having a value representative of a semantic element of a program statement and a length representative of the context of the semantic element. A single very large scale integrated circuit (VLSI) processor (40) is microprogrammed to execute the encoded program in a pipeline manner and may be used in conjunction with a host computer (20), such as an IBM PC AT or equivalent system.

Description

MICROPROGRAMMABLE LANGUAGE EMULATION SYSTEM BACKGROUND OF THE INVENTION
The present invention generally relates to a system for execution of high level computer language programs. More particularly, it relates to a system for emulating high level computer languages and executing programs written in such languages. Computer languages have traditionally been divided into two classes: programming languages and machine languages. As the name implies, machine languages have been used to refer to machine elements of computers and have been directed to each action to be performed by the computers for which they are written. Programming languages, on the other hand, are generally considered to be "high level" languages because their operators and constructs correspond more to properties of applications or problems to be solved rather than actual machine functions or physical elements. Because high level languages typically do not make reference to actual machine hardware, high level languages have been translated into machine languages in order for computers to execute the statements of high level languages. The traditional method for translating high level languages has been to compile several machine language instructions for each high level language statement. Multiple machine language instructions have been used because the logic of high level languages typically cannot be expressed in one-to- one correspondence with machine language instructions . A disadvantage of compiling several machine language instructions for each high level language statement is that most computers execute only one instruction at a time. Moreover, each instruction must be fetched from a memory location one at. a time, presenting a "bottleneck" between memory access time and machine processing speed. Recent improvements in very large scale integrated (VLSI) circuit technology have dramatically increased processing speed and illuminated the "bottleneck" problem.
MICROFICHE APPENDICES
Microfiche appendices, which constitute a part of this specification, are as follows: MICROFICHE APPENDIX A is a computer program listing of the encoder program of the preferred embodiment, contained on three microfiche having 201 frames; and
MICROFICHE APPENDIX B is a computer program listing of the emulator program of the preferred embodiment, contained on 2 microfiche having 162 frames.
SUMMARY OF THE INVENTION
The present invention solves the problems associated with the compiling of high level languages by encoding such languages in variable length tokens representative of characteristics intrinsic to such languages. The tokens are then executed by a processor which is microprogrammed to emulate a high level language.
The system of the present invention may be used independently or in conjunction with a host computer system, such as an IBM PC AT or equivalent system, to provide high speed processing of application programs written in a multiplicity of high level languages. Because the processor of the present invention is microprogrammable, encoder and emulator programs written for specific languages may be used as microcode for the processor.
It is to be understood that the following description of the preferred embodiment is illustrative of the present invention but other embodiments are possible without departing from the spirit and scope of the present invention.
BRIEF DESCRIPTION OF THE DRAWINGS
The drawings, which constitute a part of this specification, are as follows:
FIG. 1 is a block diagram of apparatus of the preferred embodiment;
FIG. 2 is a block diagram of a program procedure;
FIG. 3 is a logic flow-chart showing the word padding function of the preferred embodiment;
FIG. 4 is a sample listing. of tokens of the preferred embodiment, which are used to represent a high level language statement;
FIG. 5 is a conceptualized depiction of the branching operation of the preferred embodiment;
FIGs. 6a and 6b are graphic representations of alternative instruction formats of the processor of the preferred embodiment; and
FIG. 7 is a block diagram of the processor of the preferred embodiment;
DETAILED DESCRIPTION OF THE DRAWINGS
Referring initially to FIG. 1, a general overview of the preferred embodiment will now be described. An expansion board 10 is added to a host computer 20, such as an IBM PC AT or equivalent system. The expansion board 10 includes a main memory 30, a processor 40, a cache memory 50, an instruction memory 60, an interface 70 with the host computer 20 and bus and control lines 80.
Electrical signals representative of a high level program are loaded into the main memory 30 by the processor 40 from the host computer 20 through the interface 70 and bus lines 80. The main memory 30 is a dynamic random access memory unit having a storage capacity of approximately 1 megabyte and information is stored in the general purpose memory 30 in 32-bit words.
An encoder program is loaded into the instruction memory 60 by the host computer 20 through the interface 70 and bus lines 80. The instruction memory is a loadable read only memory (ROM) unit having a storage capacity of 64K 32- bit words. A listing of a representative encoder program is attached to this disclosure as Microfiche Appendix A and incorporated by reference. The encoder program causes the processor 40 to fetch statements of the high level language program from the main memory 30,. encode each statement in a representative stream of variable length bit fields ("tokens") without regard to word boundaries, and store the encoded statements in the main memory 30.
After the high level language statements have been encoded and stored, an emulator program is loaded into the instruction memory 60 by the host computer 20 through the interface 70 and bus lines 80. A listing of a representative emulator program is attached to this disclosure as Microfiche Appendix B and incorporated by reference. The emulator program causes the microprocessor to fetch the encoded statements from the main memory 30 and interprets the tokens of the encoded statements into microcode instructions resulting in execution of the high level language. That is, the instruction tokens executed by the emulator program have a syntax and data structure resembling the syntax and data structure of the high level language rather than the physical elements of the processor. Instruction tokens are packed into the main memory 30 and subsequently fetched and executed by the processor 40.
Referring now to FIG. 2, the instruction token architecture for encoding a high level language will be described. For purposes of illustration, FORTRAN 77 is used as a representative high level language. A program written in that language is generally a set of programming units known as "procedures." Each procedure 100 has a specific function to perform and typically comprises three major parts: a header 110; a code body 120; and a data contour 130. The header 110 contains a table which describes the bit size and storage locations of the code body 120 and, the data contour 130. The code body 120 contains the logic statements of the procedure and the data contour 130 contains data or storage locations of data to be used in the procedure.
In the preferred embodiment, absolute addresses ar not used to provide location references to elements of the code body 12.0 or data contour 130. Rather, all location references are relative to fixed positions in the data contour 130 or the header 110. Thus, machine references are not directly made. An exception exists, however, for global references which are used to refer to other procedures and to common or global variables.
Each statement of the high level language is represented in a stream of instruction tokens generally without regard to word size, again allowing for independence from machine constraints. Each token is a variable length bit field representable by an integer pair having the following general form. The first integer of the pair is a constant of the emulator program, fixed by the order of tokens in the language representation, which indicates the length of the token in bits. The second integer is the value of the token, representative of an operator or operand corresponding to characteristics that are intrinsic to the execution of the high level language. In combination, the integer pair provides a switching context within the emulator program potentially having a branch for each integer in the set of all integers representable by the token of the specified length. That is, during execution, the emulator program interprets each token to cause the processor 40 to branch to locations in analagous fashion to a FORTRAN computed GO TO statement or a PASCAL CASE statement.
For convenience, the token stream format is presented herein in a left-to-right ordering but in actual implementation, the ordering is right-to-left. That is, token streams fill memory words of the main memory 30 beginning at the least significant bit (bit position 0) of a memory word and ending at the most significant bit (bit position 31). Although the tokens are generated without regard to word size, they must be packed in memory according to the word boundaries of the main memory 30, that is, 32 bits. Accordingly, as shown in FIG. 3, the number of bits remaining in a memory word is calculated (200) for each token stored. If a token requires more than the number of bits remaining in a word (210), that word is padded with 0's (220), which are ignored by the emulator program, and the token is packed at the beginning of the next word (230). Such padding allows the generation of tokens to be free from considerations of word size.
The primary operators are shown in TABLE I below:
Figure imgf000008_0001
Figure imgf000009_0001
The first integer in each primary operator token representation indicates that the instruction token is six bits in length. This integer is not part of the instruction token stream, but is part of the emulator program which executes the token stream. Its value is predictable according to the ordering of tokens in the language representation. The second integer (the token value) indicates which statement of the high level language is being encoded. For each of these operators there are suboperations corresponding to the suboperations of the high level language, thus providing isomorphic representation of the high level language. A common suboperation is the
"expression," which contains a blend of operators and operands. An operand may in turn correspond to syntax names or literals as in the high level language.
Expressions are initiated by preceding operations which imply the beginning of an expression and concluded with an "expression-end" operator. All operators, variable- operands and literal operands within the expression are prefaced with a discriminator token as shown in TABLE II below:
Figure imgf000010_0002
Secondary operators , used in expressions , are shown in TABLE III below, together with an operator discriminator token:
Figure imgf000010_0001
Figure imgf000011_0001
Operands (data references) are token structures corresponding to syntax names that are used to refernece the, content of data structures. However, operand references are not simply memory addresses. Rather, the mapping of references to data content is a dynamic process. Two kinds of operand refernces are employed in code-body 120 expressions: variable references and name references. Variable references are references to variable data structures while name referneces correspond to names of vqariable data structures. In general, variable references are used to retrieve. data, whereas name references are used to store data. A name is considered a literal.
The reference structure of these two types of operands is as follows:
Variable: {2:2}{4:class}{primary}[{secondary}] Name: {2:3}{4:3}{4:class}{primary}[{secondary}]
The second field of the name reference is actually a type code {4:3} referring to type "name," since a name is a literal data structure. The reference structure has a four- bit class field, a variable format primary reference and may have a secondary reference relative to the primary one (common variables only). The primary reference is an index to one of the tables in the contour 130 portion of the object program, as designated by the class field. The four-bit class field and associated class designations are listed in TABLE IV below:
Figure imgf000012_0003
The primary reference format has a two bit length code and a value subfield. The length of the value subfield is encoded in the length specification shown in TABLE V below:
Figure imgf000012_0002
The preceding class code and the length code are contiguous in the same word so that a zero length code may be distinguished from a zero pad, mentioned above.
In all reference classes except common variables, only a primary reference exists. In these cases, references have the token structure shown in TABLE VI below:
Figure imgf000012_0001
The value represents a relative pointer into one of the tables in the contour 130 as specified by the class field.
Common variable references have the token structure shown in TABLE VII below:
Figure imgf000013_0001
The length code applies only to the block-value. The offset value is a "word encoded" literal, meaning that its size is determined by the number of remaining bits in the word containing the preceding fields. If enough bits remain to contain the value (which cannot be zero), these bits are used as the offset-value field. Otherwise, the remaining bits are zero, and the next full word is used as the offset field. Literal operands and operand data structures have similar formats. Literal operands exist in the code body 120. Operand data structures, if they exist prior to execution, are stored in the data contour 130. Otherwise, they are created and stored dynamically and are referenced indirectly through the data contour 130 tables. Data structures are prefaced by a four-bit type code, as shown in TABLE VIII below:
Figure imgf000013_0002
Figure imgf000014_0001
Such data structures possess great variability in length and may not fit in a single word of memory. If not, the entire value, including the sign bit is stored in the next memory word. The low order bit of the data structure represents the sign of the value, containing a 0 for a positive sign or a 1 for a negative sign. The literal value of zero has a negative sign bit (1) to distinguish it from a 0 pad field, which is ignored by the emulator program. Thus, {000001} represents a literal value of zero, while {000000} represent a pad field.
Referring now to FIG. 4, an example of an encoded high level language statement will be described. The statement 300 is a FORTRAN assignment statement assigning the sum of 10 + 5 to the variable A. The first token 310 is a primary operator having a length of six bits and indicating an assignment statement, as shown in TABLE I above. The second token 320 is a literal operand discriminator, as shown in TABLE II above, which initiates an expression. The third token 330 indicates that the literal operand is an integer, as shown in TABLE VIII above. The fourth token 340 has a length of 20 bits to fill the remainder of a 32 bit memory word and a value of 20 which indicates that its true value of 10 has been shifted to add a positive sign bit to the zero bit position of the memory word. Such single position shifting has the appearance of multiplying the value by 2 since it is encoded in binary form.
The fifth token 350 is another literal operand discriminator to preface a literal value. The sixth token 360 indicates that the literal operand is an integer and the seventh token 370 has a length of 26 bits to fill the remainder of a 32-bit memory word. The literal value of the seventh token 370 is 10 which indicates that a positive sign bit was added to the memory word to shift the true value of 5 by 1 bit.
The eighth token 380 is an operator discriminator to indicate that an operation is to be performed in the expression. It follows the literal operand token groups described above to indicate Polish post-fix notation. That is, the operator is applied to operands which precede it. During execution of the encoded statement, the literal operands are loaded into a push-down stack so that operators are applied to them on a last-in-first-out basis. The ninth token 390 indicates an addition operation is to be performed, as shown in TABLE III above.
The tenth token 400 is another operator discriminator and the eleventh token 410 indicates an end to the expression, as shown in TABLE III above. The twelfth token 420 initiates a new expression containing a literal operand as shown in TABLE II. The thirteenth token 430 indicates that the literal operand is a name reference as shown in TABLE VIII. (The structure of this reference is shown in TABLE VI, in which its first two tokens correspond to 420 and 430 in FIG. 4.)
The fourteenth token 440 is the class code of the name reference, indicating the name of a local variable as shown in TABLE IV. The fifteenth token 450 is the length code of the reference as shown in TABLE V, indicating that the subsequent (sixteenth) value token 460 is 5 bits in length. The value token 460 is a relative pointer to the location of the local variable in the data contour 130, as shown in FIG. 2. Its value, one, indicates the location of the first local variable (local variable A 130G1) in the Local Variables Table 130G of the data contour 130 located by the local variables offset pointer in the Header 110. The seventeenth token 470 is another operator discriminator and the eighteenth token 480 is an Expression-End operator, as shown in TABLE III.
Recall that the integer pair symbolizing each token represents a logical switching context in the- emulator program containing a branch corresponding to each integer in the set of integers representable by the token of the specified length. As shown in FIG. 5, the branching operation of the emulator program may be conceptualized as a logical tree system. Considering the example assignment statement referred to in FIG. 4, the token stream defines a path through a hierarchy of integer pairs, which achieves the execution of the statement.
The primary operator token, {6:1}, is a member of the highest order set which represents the statement context of the high-level language. The integer one (500), indicated by the value of the token, designates a branch to the Assign (primary) operator, as shown in TABLE I. The Assign operator automatically invokes expression subcontext which begins with discriminator subcontext, as shown in TABLE II.
The length of the discriminator token (510) allows three branches (zero values are invalid in most contexts since zeros are used as padding, causing the emulator program to proceed to the next word). The value of the discriminator token (510), three, selects the third branch indicating that a literal data structure follows.
The literal data structure begins with a type subcontext as shown in TABLE VIII. The length of the type token (520), four bits, allows up to 15 type branches (excluding the zero value). The value, four, of the type token (520) selects the integer type. The literal data structure is concluded with the value of the integer. If enough bits remain in the word containing the preceding tokens to represent the integer value, then the remainder of the word is used. Otherwise, the remainder of the word is zero-filled (and ignored by the emulator program) and the subsequent full word is used (as a token) to contain the value of the integer. In the example (530), twenty bits remain in the 32-bit word, and are used as the value token. The value itself consists of a zero in the low order bit (the sign bit) indicating a plus sign,, and the value 10 in the high-order nineteen bits.
A hierarchy of switching contexts exist in the emulator for each primary operator of the high level language. The characteristics of the high level language are thus directly represented by the content and ordering of token streams. Branches are made isomorphically to the high level language so it is unnecessary to compile multiple machine instructions to interpret the logic of high level language statements. Rather, such statements are executed in direct correspondence with their intrinsic characteristics as represented by the token streams.
An emulator program, such as shown in Microfiche Appendix B and incorporated herein, is loaded into the instruction memory 50 and causes the processor 40 to execute token streams. The emulator program is written specifically to support the language of the program being executed. That is, each high level language requires its own emulator program in order to provide direct interpretation of tokens via micro-programmed branching.
The emulator program interprets token streams using microcode instructions of the machine language of the processor 40. In the preferred embodiment, the processor 40 has an instruction set of 24 hardware operations that may be combined into a composite (dual) instruction format having a left hand side (LHS) and a right hand side (RHS), as shown in FIG. 6a. The composite instruction is 32 bits wide and contains seven fields. Alternatively, the instruction may have only a LHS and contain only five fields, as shown in FIG. 6b. A composite instruction is designated by F=0 in the F field at bit position 20 and a LHS-only instruction is designated by F=1 in the F field.
The LHS portion of an instruction contains an arithmetic, logic or shift operation between two operands with the result assigned to a third operand, The RHS portion of an instruction contains a second operatipn, including external bus instructions, subroutine link/return skips, transfers, or memory indexing.
A three address instruction format is used for LHS operations, having the following symbolic form:
T := A op B
wherein a register specified by the "T" field is assigned (:=) the value of some binary operation (op) performed on the contents of the register specified by the "A" field and the contents specified by the "B" field. Considering the example FORTRAN assignment statement of FIG. 4, A = 10 + 5, the T field of an LHS instruction specifies a register containing the local variable "A," the A field specifies a register containing the value of 10 and the B field specifies a register containing the value of 5.
The 24 operations implemented in the processor are designated by the Operator (Op) and C fields, as shown TABLE IX below:
1
Figure imgf000018_0001
1
Figure imgf000019_0001
The implementation of certain of the operations listed above will be described in connection with a description of the processor 40 of the preferred embodiment below. However, it is to be recognized that the operations listed above are generally understood by those of ordinary skill in the art. Referring now to FIG. 7, the processor 40 of the preferred embodiment is contained on a single very large scale integrated circuit (VLSI) silicon chip 800. It executes instructions in a "pipeline" manner in four phases Ph I, Ph II, Ph III and Ph IV. That is, four sequential instructions are concurrently executed in one of the four phases, each phase being one clock cycle in duration. As execution of an instruction is completed, it exits the pipeline and a new instruction enters it. The intermediate instructions simultaneously advance to their next phase of execution. During the first phase Ph I, an instruction is fetched from the instruction memory 50 and loaded into an instruction register 810. The address of the instruction is indicated by a location counter 830 which is either incremented sequentially or given values by transfer, return or conditional transfer instructions.
The instruction is then decoded in the second phase Ph II by an instruction decoder 820. The LHS is decoded for the Operator, A and B fields but not for the T field, which is passed unaltered to the next phase. The A and B fields designate registers in a general register file 840, or literal registers 850 and 860, whose values are passed to an A Register 870 and a B Register 880. The Operator field is passed to a Opcode Decoder 890 which determines which operation is specified by the Operator field.
The RHS is also decoded during the second phase Ph II, but only the Unconditional Transfer, Link, Link Conditional, Return and Load K Register instructions are acted upon during the second phase. All other RHS instructions are passed to the next phase. For the Unconditional Transfer instruction, the address field of the instruction is used to select the next value for the location counter 830, which references the address for the instruction to be fetched immediately after the instruction in Ph I advances. The Link and Link Conditional instructions are executed in similar fashion to the Unconditional Transfer instruction but in addition they "push" the accompanying location counter value onto the Link Stack Register 900 for subsequent use by the Return instruction. That instruction "pops" a value from the Link Stack Register 900 and adds the value of its address field to to the popped value. The sum is used as the next value of the location counter 830. The Load K Register instruction loads its address field into the K Register 910, which is a special purpose register used in conjunction with a Memory Address Register 920 to reference the cache memory 60.
During the third phase Ph III, only the LHS instruction is acted upon. The values selected in the second phase Ph II are operated upon as specified by the Operator field and the result is sent to an X Register 930, which gives the succeeding instruction access to that result when it advances in the next clock cycle. To illustrate, the sum R1 + R2 + R3 may be assigned to R5 by using the instruction R4 := R3 + R2 followed by R5 := X + R1. The result of the R3 + R2 operation is stored in the X Register 930 and then added to R1 when the follow up instruction advances into the third phase Ph III.
During the fourth phase Ph IV, the result of the LHS instruction held in the X Register 930 is stored in a general register or special register as specified by the T field. It is also used for conditional transfer or skip testing by the Test and Skip Logic 940. If a transfer or skip is indicated by result, the address field of the instruction is used to select the next value of the location counter 830 and causes the instructions in the first three phases to be inhibited.
The processor 40 communicates with the cache memory 60 through a Cache Memory Interface 950. It communicates with external systems through External Bus Logic 960 which is connected to a 32-bit-wide bi-directional bus 970. The interface 70 between the processor 40 and the host computer 20 is connected to the external bus 970 and responds to commands under control of the program in the instruction memory 50. Although the external bus 970 is 32-bits wide, only bits 15-0 and parity bits 1-0 are used for communicating with the interface 70. Two read and two write commands are implemented as follows:
TABLE X
Figure imgf000022_0002
The letters "ss" refer to the subsystem address of the interface 70. The Write Data function causes 16 bits of data to be sent from the processor 40 to the interface 70. A "data" bit in a status register of the interface 70 is also set to indicate that data has been transmitted. However, if either the data bit or control is already set the operation is deferred. The Write Control function causes 16 bits to be sent from the processor 40 to interface 70 and sets a control bit in the interface status register if neither the data bit or control bit is already set. If either bit is already set the operation is deferred.
The Read Data Register function causes the contents of a 16-bit data register of the interface 70 to be sent to the processor 40. The contents of the interface data register are loaded by the host computer 20 as either data or control information. The data/control bit of the interface status register is then reset unless neither the data or control bit is set. In that event, the operation is deferred. The Read Status function causes the interface 70 to send to the processor 40 the contents of the interface status register in a format as shown in TABLE XI:
Figure imgf000022_0001
Figure imgf000023_0001
The Read Status function is never deferred. The host computer 20 communicates with the processor 40 as if it were an input/output device. Accordingly, the interface to the host computer 20 is formatted according to standard input/output commands of the host computer.
International Bureau
Figure imgf000041_0001
INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT)
(51) International Patent Classification 4 : (11) International Publication Number: WO 88/ 051 G06F 9/34, 9/44 Al
(43) International Publication Date: 14 July 1988 (14.07.
(21) International Application Number : PCT/US87/03444 (74) Agent: CARSON, M., John; 201 North Figue Street, 5th Floor, Los Angeles, CA 90012 (US).
(22) International Filing Date: 28 December 1987 (28.12.87)
(81) Designated States: DE (European patent), FR (Eu
(31) Priority Application Number : 000,714 pean patent), GB (European patent), IT (Europ patent), JP.
(32) Priority Date: 6 January 1987 (06.01.87)
(33) Priority Countr : US Published
With international search report. With amended claims.
(71) Applicant: INTERNATIONAL META SYSTEMS, INC. [US/US]; 9841 Airport Boulevard, Suite 1508, Los Angeles, CA 90045 (US).
(72 Inventors: SPECKHARD, Arthur, E. ; 28205 S. Ride- gefern Ct., Rancho Palos Verdes, CA 90274 (US). THAMES, Joseph, M. ; 5406 Via Del Valle, Torrance, CA 90505 (US).
(54) Title: MICROPROGRAMMABLE LANGUAGE EMULATION SYSTEM
Figure imgf000041_0002
(57). Abstract
A data processing system executes a high level computer language program by encoding statements of the progra into variable length tokens and then executing the tokens. Each token is a variable length bit field having a value represe tative of a semantic element of a program statement and a length representative of the context of the semantic element. single very large scale integrated circuit (VLSI) processor (40) is microprogrammed to execute the encoded program in pipeline manner and may be used in conjunction with a host computer (20), such as an IBM PC AT or equivalent system.
FOR THE PURPOSES OF INFORMATION ONLY
Codes used to identify States party to the PCT on the frontpages ofpamphlets publishing international applϊ- cations under the PCT.
AT Austria FR France ML Mali
AU Australia GA Gabon MR Mauritania
BB Barbados GB United Kingdom MW Malawi
BE Belgium HU Hungary NL Netherlands
BG Bulgaria rr Italy NO Norway
BJ Benin JP Japan RO Romania
BR Brazil KP Democratic People's Republic SD Sudan
CF Central African Republic of Korea SE Sweden
CG Congo KR Republic of Korea SN Senegal
CH Switzerland LI Liechtenstein SU Soviet Union
CM Cameroon LK Sri Lanka TD Chad
DE Germany, Federal Republic of LU Luxembourg TG Togo
DK Denmark MC Monaco US United States of America
FI Finland MG Madagascar
MICROPROGRAMMABLE LANGUAGE EMULATION SYSTEM BACKGROUND OF THE INVENTION
The present invention generally relates to a system for execution of high level computer language programs. More particularly, it relates to a system for emulating high level computer languages and executing programs written in such languages. Computer languages have traditionally been divided into two classes: programming languages and machine languages. As the name implies, machine languages have been used to refer to machine elements of computers and have been directed to each action to be performed by the computers for which they are written. Programming languages, on the other hand, are generally considered to be "high level" languages because their operators and constructs correspond more to properties of applications or problems to be solved rather than actual machine functions or physical elements. Because high level languages typically do not make reference to actual machine hardware, high level languages have been translated into machine languages in order for computers to execute the statements of high level languages. The traditional method for translating high level languages has been to compile several machine language instructions for each high level language statement. Multiple machine language instructions have been used because the logic of high level languages typically cannot be expressed in one-to- one correspondence with machine language instructions. A disadvantage of compiling several machine language instructions for each high level language statement is that most computers execute only one instruction at a time. Moreover, each instruction must be fetched from a memory location one at a time, presenting a "bottleneck" between memory access time and machine processing speed. Recent improvements in very large scale integrated (VLSI) circuit technology have dramatically increased processing speed and illuminated the "bottleneck" problem.
MICROFICHE APPENDICES
Microfiche appendices, which constitute a part of this specification, are as follows:
MICROFICHE APPENDIX A is a computer program listing of the encoder program of the preferred embodiment, contained on three microfiche having 201 frames; and
MICROFICHE APPENDIX B is a computer program listing of the emulator program of the preferred embodiment, contained on 2 microfiche having 162 frames.
SUMMARY OF THE INVENTION
The present invention solves the problems associated with the compiling of high level languages by encoding such languages in variable length tokens representative of characteristics intrinsic to such languages. The tokens are then executed by a processor which is microprogrammed to emulate a high level language. The system of the present invention may be used independently or in conjunction with a host computer system, such as an IBM PC AT or equivalent system, to provide high speed processing of application programs written in a multiplicity of high level languages. Because the processor of the present invention is microprogrammable, encoder and emulator programs written for specific languages may be used as microcode for the processor.
It is to be understood that the following description of the preferred embodiment is illustrative of the present invention but other embodiments are possible without departing from the spirit and scope of the present invention.
BRIEF DESCRIPTION OF THE DRAWINGS
The drawings, which constitute a part of this specification, are as follows:
FIG. 1 is a block diagram of apparatus of the preferred embodiment;
FIG. 2 is a block diagram of a program procedure;
FIG. 3 is a logic flow-chart showing the word padding function of the preferred embodiment;
FIG. 4 is a sample listing of tokens of the preferred embodiment, which are used to represent a high level language statement;
FIG. 5 is a conceptualized depiction of the branching operation of the preferred embodiment;
FIGs. 6a and 6b are graphic representations of alternative instruction formats of the processor of the preferred embodiment; and
FIG. 7 is a block diagram of the processor of the preferred embodiment;
DETAILED DESCRIPTION OF THE DRAWINGS
Referring initially to FIG. 1, a general overview of the preferred embodiment will now be described. An expansion board 10 is added to a host computer 20, such as an IBM PC AT or equivalent system. The expansion board 10 includes a main memory 30, a processor 40, a cache memory 50, an instruction memory 60, an interface 70 with the host computer 20 and bus and control lines 80.
Electrical signals representative of a high level program are loaded into the main memory 30 by the processor 40 from the host computer 20 through the interface 70 and bus lines 80. The main memory 30 is a dynamic random access memory unit having a storage capacity of approximately 1 megabyte and information is stored in the general purpose memory 30 in 32-bit words.
An encoder program is loaded into the instruction memory 60 by the host computer 20 through the interface 70 and bus lines 80. The instruction memory is a loadable read only memory (ROM) unit having a storage capacity of 64K 32- bit words. A listing of a representative encoder program is attached to this disclosure as Microfiche Appendix A and incorporated by reference. The encoder program causes the processor 40 to fetch statements of the high level language program from the main memory 30, encode each statement in a representative stream of variable length bit fields ("tokens") without regard to word boundaries, and store the encoded statements in the main memory 30.
After the high level language statements have been encoded and stored, an emulator program is loaded into the instruction memory 60 by the host computer 20 through the interface 70 and bus lines 80. A listing of a representative emulator program is attached to this disclosure as Microfiche Appendix B and incorporated by reference. The emulator program causes the microprocessor to fetch the encoded statements from the main memory 30 and interprets the tokens of the encoded statements into microcode instructions resulting in execution of the high level language. That is, the instruction tokens executed by the emulator program have a syntax and data structure resembling the syntax and data structure of the high level language rather than the physical elements of the processor. Instruction tokens are packed into the main memory 30 and subsequently fetched and executed by the processor 40.
Referring now to FIG. 2, the instruction token architecture for encoding a high level language will be described. For purposes of illustration, FORTRAN 77 is used as a representative high level language. A program written in that language is generally a set of programming units known as "procedures." Each procedure 100 has a specific function to perform and typically comprises three major parts: a header 110; a code body 120; and a data contour 130. The header 110 contains a table which describes the bit size and storage locations of the code body 120 and the data contour 130. The code body 120 contains the logic statements of the procedure and the data contour 130 contains data or storage locations of data to be used in the procedure.
In the preferred embodiment, absolute addresses are not used to provide location references to elements of the code body 120 or data contour 130. Rather, all location references are relative to fixed positions in the data contour 130 or the header 110. Thus, machine references are not directly made. An exception exists, however, for global references which are used to refer to other procedures and to common or global variables.
Each statement of the high level language is represented in a stream of instruction tokens generally without regard to word size, again allowing for independence from machine constraints. Each token is a variable length bit field representable by an integer pair having the following general form. The first integer of the pair is a constant of the emulator program, fixed by the order of tokens in the language representation, which indicates the length of the token in bits. The second integer is the value of the token, representative of an operator or operand corresponding to characteristics that are intrinsic to the execution of the high level language. In combination, the integer pair provides a switching context within the emulator program potentially having a branch for each integer in the set of all integers representable by the token of the specified length. That is, during execution, the emulator program interprets each token to cause the processor 40 to branch to locations in analagous fashion to a FORTRAN computed GO TO statement or a PASCAL CASE statement.
For convenience, the token stream format is presented herein in a left-to-right ordering but in actual implementation, the ordering is right-to-left. That is, token streams fill memory words of the main memory 30 beginning at the least significant bit (bit position 0) of a memory word and ending at the most significant bit (bit position 31). Although the tokens are generated without regard to word size, they must be packed in memory according to the word boundaries of the main memory 30, that is, 32 bits. Accordingly, as shown in FIG. 3, the number of bits remaining in a memory word is calculated (200) for each token stored. If a token requires more than the number of bits remaining in a word (210), that word is padded with 0's (220), which are ignored by the emulator program, and the token is packed at the beginning of the next word (230). Such padding allows the generation of tokens to be free from considerations of word size.
The primary operators are shown in TABLE I below:
Figure imgf000048_0001
6:1 6:1 6:1 6:2 6:2 6:2 6:2 6:2 6:2 6:2 {6:2 6:2 {6:2 6:3 6:3 6:3 6:3 {6:3 6:3 6:3 {6:3
{6:3 6:3 {6:4
Figure imgf000049_0001
The first integer in each primary operator token representation indicates that the instruction token is six bits in length. This integer is not part of the instruction token stream, but is part of the emulator program which executes the token stream. Its value is predictable according to the ordering of tokens in the language representation. The second integer (the token value) indicates which statement of the high level language is being encoded. For each of these operators there are suboperations corresponding to the suboperations of the high level language, thus providing isomorphic representation of the high level language. A common suboperation is the
"expression," which contains a blend of operators and operands. An operand may in turn correspond to syntax names or literals as in the high level language.
Expressions are initiated by preceding operations which imply the beginning of an expression and concluded with an "expression-end" operator. All operators, variable- operands and literal operands within the expression are prefaced with a discriminator token as shown in TABLE II below:
Figure imgf000050_0002
Secondary operators, used in expressions, are shown in TABLE III below, together with an operator discriminator token:
{ } : : : 7 : }
{ {
{ } { : 1 }
Figure imgf000050_0001
Figure imgf000051_0001
Operands (data references) are token structures corresponding to syntax names that are used to refernece the content of data structures. However, operand references are not simply memory addresses. Rather, the mapping of references to data content is a dynamic process. Two kinds of operand refernces are employed in code-body 120 expressions: variable references and name references. Variable references are references to variable data structures while name referneces correspond to names of vqariable data structures. In general, variable references are used to retrieve data, whereas name references are used to store data. A name is considered a literal.
The reference structure of these two types of operands is as follows:
Variable: {2:2}{4:class}{primary}[{secondary}] Name: {2:3}{4:3}{4:class}{primary}[{secondary}]
The second field of the name reference is actually a type code {4:3} referring to type "name," since a name is a literal data structure. The reference structure has a four- bit class field, a variable format primary reference and may have a secondary reference relative to the primary one
(common variables only). The primary reference is an index to one of the tables in the contour 130 portion of the object program, as designated by the class field. The four-bit class field and associated class designations are listed in TABLE IV below:
Figure imgf000052_0001
The primary reference format has a two bit length code and a value subfield. The length of the value subfield is encoded in the length specification shown in TABLE V below:
Figure imgf000052_0003
The preceding class code and the length code are contiguous in the same word so that a zero length code may be distinguished from a zero pad, mentioned above.
In all reference classes except common variables, only a primary reference exists. In these cases, references have the token structure shown in TABLE VI below:
Figure imgf000052_0002
The value represents a relative pointer into one of the tables in the contour 130 as specified by the class field.
Common variable references have the token structure shown in TABLE VII below:
Figure imgf000053_0002
The length code applies only to the block-value. The offset value is a "word encoded" literal, meaning that its size is determined by the number of remaining bits in the word containing the preceding fields. If enough bits remain to contain the value (which cannot be zero), these bits are used as the offset-value field. Otherwise, the remaining bits are zero, and the next full word is used as the offset field. Literal operands and operand data structures have similar formats. Literal operands exist in the code body 120. Operand data structures, if they exist prior to execution, are stored in the data contour 130. Otherwise, they are created and stored dynamically and are referenced indirectly through the data contour 130 tables. Data structures are prefaced by a four-bit type code, as shown in TABLE VIII below:
Figure imgf000053_0001
Figure imgf000054_0001
Such data structures possess great variability in length and may not fit in a single word of memory. If not, the entire value, including the sign bit is stored in the next memory word. The low order bit of the data structure represents the sign of the value, containing a 0 for a positive sign or a 1 for a negative sign. The literal value of zero has a negative sign bit (1) to distinguish it from a 0 pad field, which is ignored by the emulator program. Thus, {000001} represents a literal value of zero, while {000000} represents a pad field.
Referring now to FIG. 4, an example of an encoded high level language statement will be described. The statement 300 is a FORTRAN assignment statement assigning the sum of 10 + 5 to the variable A. The first token 310 is a primary operator having a length of six bits and indicating an assignment statement, as shown in TABLE I above. The second token 320 is a literal operand discriminator, as shown in TABLE II above, which initiates an expression. The third token 330 indicates that the literal operand is an integer, as shown in TABLE VIII above. The fourth token 340 has a length of 20 bits to fill the remainder of a 32 bit memory word and a value of 20 which indicates that its true value of 10 has been shifted to add a positive sign bit to the zero bit position of the memory word. Such single position shifting has the appearance of multiplying the value by 2 since it is encoded in binary form.
The fifth token 350 is another literal operand discriminator to preface a literal value. The sixth token 360 indicates that the literal operand is an integer and the seventh token 370 has a length of 26 bits to fill the remainder of a 32-bit memory word. The literal value of the seventh token 370 is 10 which indicates that a positive sign bit was added to the memory word to shift the true value of by 1 bit.
The eighth token 380 is an operator discriminator to indicate that an operation is to be performed in the expression. It follows the literal operand token groups described above to indicate Polish post-fix notation. That is, the operator is applied to operands which precede it. During execution of the encoded statement, the literal operands are loaded into a push-down stack so that operators are applied to them on a last-in-first-out basis. The ninth token 390 indicates an addition operation is to be performed as shown in TABLE III above.
The tenth token 400 is another operator discriminator and the eleventh token 410 indicates an end to the expression, as shown in TABLE III above. The twelfth token 420 initiates a new expression containing a literal operand as shown in TABLE II. The thirteenth token 430 indicates that the literal operand is a name reference as shown in TABLE VIII. (The structure of this reference is shown in TABLE VI, in which its first two tokens correspond to 420 and 430 in FIG. 4.)
The fourteenth token 440 is the class code of the name reference, indicating the name of a local variable as shown in TABLE IV. The fifteenth token 450 is the length code of the reference as shown in TABLE V, indicating that the subsequent (sixteenth) value token 460 is 5 bits in length. The value token 460 is a relative pointer to the location of the local variable in the data contour 130, as shown in FIG. 2. Its value, one, indicates the location of the first local variable (local variable A 130G1) in the Local Variables Table 130G of the data contour 130 located by the local variables offset pointer in the Header 110. The seventeenth token 470 is another operator discriminator and the eighteenth token 480 is an Expression-End operator, as shown in TABLE III.
Recall that the integer pair symbolizing each token represents a logical switching context in the emulator program containing a branch corresponding to each integer in the set of integers representable by the token of the specified length. As shown in FIG. 5, the branching operation of the emulator program may be conceptualized as a logical tree system. Considering the example assignment statement referred to in FIG. 4, the token stream defines a path through a hierarchy of integer pairs, which achieves the execution of the statement.
The primary operator token, (6:1), is a member of the highest order set which represents the statement context of the high-level language. The integer one (500), indicated by the value of the token, designates a branch to the Assign, (primary) operator, as shown in TABLE I. The Assign operator automatically invokes expression subcontext which begins with discriminator subcontext, as shown in TABLE II.
The length of the discriminator token (510) allows three branches (zero values are invalid in most contexts since zeros are used as padding, causing the emulator program to proceed to the next word). The value of the discriminator token (510), three, selects the third branch indicating that a literal data structure follows.
The literal data structure begins with a type subcontext as shown in TABLE VIII. The length of the type token (520), four bits, allows up to 15 type branches (excluding the zero value). The value, four, of the type token (520) selects the integer type. The literal data structure is concluded with the value of the integer. If enough bits remain in the word containing the preceding tokens to represent the integer value, then the remainder of the word is used. Otherwise, the remainder of the word is zero-filled (and ignored by the emulator program) and the subsequent full word is used (as a token) to contain the value of the integer. In the example (530), twenty bits remain in the 32-bit word, and are used as the value token. The value itself consists of a zero in the low order bit (the sign bit) indicating a plus sign, and the value 10 in the high-order nineteen bits.
A hierarchy of switching contexts exist in the emulator for each primary operator of the high level language. The characteristics of the high level language are thus directly represented by the content and ordering of token streams. Branches are made isomorphically to the high level language so it is unnecessary to compile multiple machine instructions to interpret the logic of high level language statements. Rather, such statements are executed in direct correspondence with their intrinsic characteristics as represented by the token streams.
An emulator program, such as shown in Microfiche Appendix B and incorporated herein, is loaded into the instruction memory 50 and causes the processor 40 to execute token streams. The emulator program is written specifically to support the language of the program being executed. That is, each high level language requires its own emulator program in order to provide direct interpretation of tokens via micro-programmed branching.
The emulator program interprets token streams using microcode instructions of the machine language of the processor 40. In the preferred embodiment, the processor 40 has an instruction set of 24 hardware operations that may be combined into a composite (dual) instruction format having a left hand side (LHS) and a right hand side (RHS), as shown in FIG. 6a. The composite instruction is 32 bits wide and contains seven fields. Alternatively, the instruction may have only a LHS and contain only five fields, as shown in FIG. 6b. A composite instruction is designated by F=0 in the F field at bit position 20 and a LHS-only instruction is designated by F=1 in the F field.
The LHS portion of an instruction contains an arithmetic, logic or shift operation between two operands with the result assigned to a third operand. The RHS portion of an instruction contains a second operation, including external bus instructions, subroutine link/return skips, transfers, or memory indexing.
A three address instruction format is used for LHS operations, having the following symbolic form:
T := A op B
wherein a register specified by the "T" field is assigned (:=) the value of some binary operation (op) performed on the contents of the register specified by the "A" field and the contents specified by the "B" field. Considering the example FORTRAN assignment statement of FIG. 4, A = 10 + 5, the T field of an LHS instruction specifies a register containing the local variable "A," the A field specifies a register containing the value of 10 and the B field specifies a register containing the value of 5.
The 24 operations implemented in the processor are designated by the Operator (Op) and C fields, as shown TABLE IX below:
1
Figure imgf000059_0001
The implementation of certain of the operations listed above will be described in connection with a description of the processor 40 of the preferred embodiment below. However, it is to be recognized that the operations listed above are generally understood by those of ordinary skill in the art. Referring now to FIG. 7, the processor 40 of the preferred embodiment is contained on a single very large scale integrated circuit (VLSI) silicon chip 800. It executes instructions in a "pipeline" manner in four phases Ph I, Ph II, Ph III and Ph IV. That is, four sequential instructions are concurrently executed in one of the four phases, each phase being one clock cycle in duration. As execution of an instruction is completed, it exits the pipeline and a new instruction enters it. The intermediate instructions simultaneously advance to their next phase of execution. During the first phase Ph I, an instruction is fetched from the instruction memory 50 and loaded into an instruction register 810. The address of the instruction is indicated by a location counter 830 which is either incremented sequentially or given values by transfer, return or conditional transfer instructions.
The instruction is then decoded in the second phase Ph II by an instruction decoder 820. The LHS is decoded for the Operator, A and B fields but not for the T field, which is passed unaltered to the next phase. The A and B fields designate registers in a general register file 840, or literal registers 850 and 860, whose values are passed to an A Register 870 and a B Register 880. The Operator field is passed to a Opcode Decoder 890 which determines which operation is specified by the Operator field.
The RHS is also decoded during the second phase PhII, but only the Unconditional Transfer, Link, Link Conditional, Return and Load K Register instructions are acted upon during the second phase. All other RHS instructions are passed to the next phase. For the Unconditional Transfer instruction, the address field of the instruction is used to select the next value for the location counter 830, which references the address for the instruction to be fetched immediately after the instruction in Ph I advances. The Link and Link Conditional instructions are executed in similar fashion to the Unconditional Transfer instruction but in addition they "push" the accompanying location counter value onto the Link Stack Register 900 for subsequent use by the Return instruction. That instruction "pops" a value from the Link Stack Register 900 and adds the value of its address field to to the popped value. The sum is used as the next value of the location counter 830. The Load K Register instruction loads its address field into the K Register 910, which is a special purpose register used in conjunction with a Memory Address Register 920 to reference the cache memory 60.
During the third phase Ph III, only the LHS instruction is acted upon. The values selected in the second phase Ph II are operated upon as specified by the Operator field and the result is sent to an X Register 930, which gives the succeeding instruction access to that result when it advances in the next clock cycle. To illustrate, the sum R1 + R2 + R3 may be assigned to R5 by using the instruction R4 := R3 + R2 followed by R5 := X + R1. The result of the R3 + R2 operation is stored in the X Register 930 and then added to R1 when the follow up instruction advances into the third phase Ph III.
During the fourth phase Ph IV, the result of the LHS instruction held in the X Register 930 is stored in a general register or special register as specified by the T field. It is also used for conditional transfer or skip testing by the Test and Skip Logic 940. If a transfer or skip is indicated by result, the address field of the instruction is used to select the next value of the location counter 830 and causes the instructions in the first three phases to be inhibited.
The processor 40 communicates with the cache memory 60 through a Cache Memory Interface 950. It communicates with external systems through External Bus Logic 960 which is connected to a 32-bit-wide bi-directional bus 970. The interface 70 between the processor 40 and the host computer 20 is connected to the external bus 970 and responds to commands under control of the program in the instruction memory 50. Although the external bus 970 is 32-bits wide, only bits 15-0 and parity bits 1-0 are used for communicating with the interface 70. Two read and two write commands are implemented as follows:
TABLE X
Figure imgf000062_0002
The letters "ss" refer to the subsystem address of the interface 70. The Write Data function causes 16 bits of data to be sent from the processor 40 to the interface 70. A "data" bit in a status register of the interface 70 is also set to indicate that data has been transmitted. However, if either the data bit or control is already set the operation is deferred. The Write Control function causes 16 bits to be sent from the processor 40 to interface 70 and sets a control bit in the interface status register if neither the data bit or control bit is already set. If either bit is already set the operation is deferred.
The Read Data Register function causes the contents of a 16-bit data register of the interface 70 to be sent to the processor 40. The contents of the interface data register are loaded by the host computer 20 as either data or control information. The data/control bit of the interface status register is then reset unless neither the data or control bit is set. In that event, the operation is deferred. The Read Status function causes the interface 70 to send to the processor 40 the contents of the interface status register in a format as shown in TABLE XI:
Figure imgf000062_0001
Figure imgf000063_0001
The Read Status function is never deferred. The host computer 20 communicates with the processor 40 as if it were an input/output device. Accordingly, the interface to the host computer 20 is formatted according to standard input/output commands of the host computer.

Claims

WHAT IS CLAIMED IS:
1. A method for executing at least one high level computer language program, comprising the steps of: encoding at least one statement of said program with at least one token, said token being a variable length bit field have a value representative of at least one semantic element of said statement and a length representative of the context of said semantic element; and executing said token with at least one processor, said executing step comprising branching to at least one instruction or data location according to said value of said token.
2. A method according to claim 1, wherein said program is supplied to said processor by a host computer through an interface.
3. A method according to claim 2, wherein at least one result of executing said program with said processor is supplied to said host computer through said interface.
4. A method for executing at least one high level computer language program, comprising the steps of: encoding at least one statement of said program with a multiplicity of tokens, each of said tokens being a variable length bit field having a value representative of at least one semantic element of said statement and a length representative of the context of said semantic element; and executing said tokens with a single processor, said executing step comprising branching to at least one instruction or data location according to each said value of each of said tokens.
5. A method according to claim 4, further comprising the steps of: determining the length in bits of at least one of said tokens; calculating the number of bit positions remaining in at least one fixed-length word of a memory connnected to said processor; and packing said token in said word if said length of said token does not exceed said number of bit positions remaining in said word.
6. A method according to claim 5, further comprising the steps of: packing a zero value in each bit position remaining in said word if said number of bit positions remaining in said word is less than said length of said token; and packing said token in an adjacent word of said memory.
7. A method for executing a high level computer language program, comprising the steps of: encoding said program with a multiplicity of tokens, each of said tokens having a value representative of at least one semantic element of said program; emulating said program, said emulating step comprising branching to at least one instruction according to each said value of each of said tokens; and executing said instruction with a processor, said executing step being performed with said processor in a multiplicity of phases.
8. A method according to claim 7, wherein said instruction is a microcode instruction.
9. A method according to claim 7, wherein said tokens are ordered in a manner representative of the context of each of said tokens.
10. A method according to claim 7, wherein each of said tokens is a variable length bit field having a length representative of the context of said semantic element.
11. A method according to claim 7, further comprising the steps of: packing each of said tokens sequentially in at least one fixed-length word of a memory connected to said processor.
12. A method according to claim 11, further comprising the steps of: determining the length in bits of each of said tokens; calculating the number of bit positions remaining in said word after packing each of said tokens; packing a zero value in each remaining bit position of said word if said length of a token exceeds said number of remaining bit positions in said word; packing said token in an adjacent fixed-length memory word of said memory; and repeating said calculating and packing steps for at least one succeeding token.
13. A method according to claim 7, wherein said program is supplied to said processor by a host computer through an interface.
14. A method according to claim 7, wherein at least one result of executing said encoded program is supplied to said host computer through said interface.
15. An apparatus for executing a high level computer language program, comprising: a multiplicity of processors, wherein at least two processors are connected to supply signals between said processors; at least one instruction memory connected to at least one of said processors to supply signals representative of at least one instruction, said instruction memory being loadable with at least one microprogram; and at least one cache memory connected to at least one of said processors to supply and receive signals representative of data to be used during execution of said at least one instruction.
16. An apparatus according to claim 15, further comprising a main memory connected to at least ione processor to supply and receive signals representative of any information to be packed or stored.
17. An apparatus according to claim 15, further comprising at least one interface means for receiving and supplying signals between at least two of said processors.
18. An apparatus according to claim 15, wherein at least one of said processors executes a multiplicity of instructions in a pipeline manner, each of said instructions in said pipeline being partially executed concurrently with each other instruction in said pipeline.
19. An apparatus for executing a high level computer language program, comprising: a single very large scale integrated circuit processor, said processor being contained on a single silicon chip; an instruction memory connected to said processor to supply signals representative of at least one instruction, said instruction memory being loadable with at least one microprogram; and a cache memory connected to said processor to supply and receive signals representative of data generated during execution of said at least one instruction.
20. An apparatus according to claim 19, further comprising a main memory connected to said processor to supply and receive signals representative of any information to be packed or stored.
21. An apparatus according to claim 19, further comprising an interface means for receiving ans supplying signals between a host computer and said processor, said interface means being connected to said processor and said instruction memory.
22. An apparatus according to claim 21, wherein said interface means is further connected to said main memory.
23. An apparatus according to claim 19, wherein said processor executes a multiplicity of instructions in a pipeline, each instruction in said pipeline being partially executed concurrently with each other instruction in said pipeline.
PCT/US1987/003444 1987-01-06 1987-12-28 Microprogrammable language emulation system WO1988005190A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/US1987/003444 WO1988005190A1 (en) 1987-01-06 1987-12-28 Microprogrammable language emulation system
EP19880900933 EP0343171A4 (en) 1987-12-28 1987-12-28 Microprogrammable language emulation system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US000,714 1987-01-06
PCT/US1987/003444 WO1988005190A1 (en) 1987-01-06 1987-12-28 Microprogrammable language emulation system

Publications (1)

Publication Number Publication Date
WO1988005190A1 true WO1988005190A1 (en) 1988-07-14

Family

ID=42123138

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1987/003444 WO1988005190A1 (en) 1987-01-06 1987-12-28 Microprogrammable language emulation system

Country Status (2)

Country Link
EP (1) EP0343171A4 (en)
WO (1) WO1988005190A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5930512A (en) * 1996-10-18 1999-07-27 International Business Machines Corporation Method and apparatus for building and running workflow process models using a hypertext markup language

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4390946A (en) * 1980-10-20 1983-06-28 Control Data Corporation Lookahead addressing in a pipeline computer control store with separate memory segments for single and multiple microcode instruction sequences
US4437184A (en) * 1981-07-09 1984-03-13 International Business Machines Corp. Method of testing a data communication system
US4456952A (en) * 1977-03-17 1984-06-26 Honeywell Information Systems Inc. Data processing system having redundant control processors for fault detection
US4499535A (en) * 1981-05-22 1985-02-12 Data General Corporation Digital computer system having descriptors for variable length addressing for a plurality of instruction dialects
US4506325A (en) * 1980-03-24 1985-03-19 Sperry Corporation Reflexive utilization of descriptors to reconstitute computer instructions which are Huffman-like encoded
US4724521A (en) * 1986-01-14 1988-02-09 Veri-Fone, Inc. Method for operating a local terminal to execute a downloaded application program

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4456952A (en) * 1977-03-17 1984-06-26 Honeywell Information Systems Inc. Data processing system having redundant control processors for fault detection
US4506325A (en) * 1980-03-24 1985-03-19 Sperry Corporation Reflexive utilization of descriptors to reconstitute computer instructions which are Huffman-like encoded
US4390946A (en) * 1980-10-20 1983-06-28 Control Data Corporation Lookahead addressing in a pipeline computer control store with separate memory segments for single and multiple microcode instruction sequences
US4499535A (en) * 1981-05-22 1985-02-12 Data General Corporation Digital computer system having descriptors for variable length addressing for a plurality of instruction dialects
US4437184A (en) * 1981-07-09 1984-03-13 International Business Machines Corp. Method of testing a data communication system
US4724521A (en) * 1986-01-14 1988-02-09 Veri-Fone, Inc. Method for operating a local terminal to execute a downloaded application program

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP0343171A4 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5930512A (en) * 1996-10-18 1999-07-27 International Business Machines Corporation Method and apparatus for building and running workflow process models using a hypertext markup language

Also Published As

Publication number Publication date
EP0343171A4 (en) 1991-01-30
EP0343171A1 (en) 1989-11-29

Similar Documents

Publication Publication Date Title
US4297743A (en) Call and stack mechanism for procedures executing in different rings
US5790825A (en) Method for emulating guest instructions on a host computer through dynamic recompilation of host instructions
US3886523A (en) Micro program data processor having parallel instruction flow streams for plural levels of sub instruction sets
US5077657A (en) Emulator Assist unit which forms addresses of user instruction operands in response to emulator assist unit commands from host processor
US6564179B1 (en) DSP emulating a microcontroller
US4587612A (en) Accelerated instruction mapping external to source and target instruction streams for near realtime injection into the latter
US5781758A (en) Software emulation system with reduced memory requirements
Rafiquzzaman Microprocessors and microcomputer-based system design
US6061783A (en) Method and apparatus for manipulation of bit fields directly in a memory source
EP0092610A2 (en) Methods for partitioning mainframe instruction sets to implement microprocessor based emulation thereof
US5455955A (en) Data processing system with device for arranging instructions
US4305124A (en) Pipelined computer
JPH02502589A (en) Microprogrammable language emulation system
US5150468A (en) State controlled instruction logic management apparatus included in a pipelined processing unit
US4434462A (en) Off-chip access for psuedo-microprogramming in microprocessor
US4325121A (en) Two-level control store for microprogrammed data processor
US20100011191A1 (en) Data processing device with instruction translator and memory interface device to translate non-native instructions into native instructions for processor
US5920722A (en) System and process for efficiently determining absolute memory addresses for an intermediate code model
US5034879A (en) Programmable data path width in a programmable unit having plural levels of subinstruction sets
JP2991242B2 (en) How to use a multiprocessor computer system
WO1988005190A1 (en) Microprogrammable language emulation system
EP0013291B1 (en) Instruction fetch control system in a computer
Bhandarkar Architecture management for ensuring software compatibility in the VAX family of computers
EP0305752B1 (en) Programmable data path width in a programmable unit having plural levels of subinstruction sets
EP0134386A2 (en) Method and apparatus for executing object code instructions compiled from a high-level language source

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): JP

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): DE FR GB IT

COP Corrected version of pamphlet

Free format text: ON PAGE 28;THE DATE OF RECEIPT OF THE AMENDED CLAIMS SHOULD READ "880523"INSTEAD OF "880524"

WWE Wipo information: entry into national phase

Ref document number: 1988900933

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1988900933

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 1988900933

Country of ref document: EP