US7913069B2 - Processor and method for executing a program loop within an instruction word - Google Patents

Processor and method for executing a program loop within an instruction word Download PDF

Info

Publication number
US7913069B2
US7913069B2 US11/441,812 US44181206A US7913069B2 US 7913069 B2 US7913069 B2 US 7913069B2 US 44181206 A US44181206 A US 44181206A US 7913069 B2 US7913069 B2 US 7913069B2
Authority
US
United States
Prior art keywords
computer
instructions
instruction
data bus
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
US11/441,812
Other versions
US20070192575A1 (en
Inventor
Charles H. Moore
Jeffrey Arthur Fox
John W. Rible
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Array Portfolio LLC
Original Assignee
VNS Portfolio LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US11/355,513 external-priority patent/US7904695B2/en
Application filed by VNS Portfolio LLC filed Critical VNS Portfolio LLC
Priority to US11/441,812 priority Critical patent/US7913069B2/en
Priority to EP07250646A priority patent/EP1821200B1/en
Priority to AT07250649T priority patent/ATE495491T1/en
Priority to DE602007011841T priority patent/DE602007011841D1/en
Priority to AT07250646T priority patent/ATE512400T1/en
Priority to EP07250644A priority patent/EP1821198A1/en
Priority to EP07250647A priority patent/EP1821211A3/en
Priority to EP07250649A priority patent/EP1821202B1/en
Priority to EP07250614A priority patent/EP1821199B1/en
Priority to PCT/US2007/004029 priority patent/WO2007098005A2/en
Priority to KR1020077009925A priority patent/KR20090016645A/en
Priority to JP2008555370A priority patent/JP2009527814A/en
Priority to PCT/US2007/004081 priority patent/WO2007098024A2/en
Priority to PCT/US2007/004083 priority patent/WO2007098026A2/en
Priority to KR1020087022319A priority patent/KR20090003217A/en
Priority to KR1020077009924A priority patent/KR20090004394A/en
Priority to TW096106394A priority patent/TW200809613A/en
Priority to JP2008555353A priority patent/JP2009527808A/en
Priority to KR1020077009923A priority patent/KR20090017390A/en
Priority to EP07750884A priority patent/EP1984836A4/en
Priority to KR1020077009922A priority patent/KR20090016644A/en
Priority to TW096106397A priority patent/TW200809609A/en
Priority to JP2008555354A priority patent/JP2009527809A/en
Priority to TW096106396A priority patent/TW200809531A/en
Priority to JP2008555372A priority patent/JP2009527816A/en
Priority to PCT/US2007/004082 priority patent/WO2007098025A2/en
Priority to PCT/US2007/004030 priority patent/WO2007098006A2/en
Priority to JP2008555371A priority patent/JP2009527815A/en
Priority to KR1020087028864A priority patent/KR20090019806A/en
Priority to JP2009513215A priority patent/JP2009538488A/en
Priority to PCT/US2007/012539 priority patent/WO2007139964A2/en
Publication of US20070192575A1 publication Critical patent/US20070192575A1/en
Assigned to TECHNOLOGY PROPERTIES LIMITED reassignment TECHNOLOGY PROPERTIES LIMITED ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MOORE, CHARLES H., FOX, JEFFREY ARTHUR, RIBLE, JOHN W.
Assigned to VNS PORTFOLIO LLC reassignment VNS PORTFOLIO LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: TECHNOLOGY PROPERTIES LIMITED
Assigned to VNS PORTFOLIO LLC reassignment VNS PORTFOLIO LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: TECHNOLOGY PROPERTIES LIMITED
Assigned to TECHNOLOGY PROPERTIES LIMITED LLC reassignment TECHNOLOGY PROPERTIES LIMITED LLC LICENSE (SEE DOCUMENT FOR DETAILS). Assignors: VNS PORTFOLIO LLC
Priority to US13/053,062 priority patent/US8468323B2/en
Application granted granted Critical
Publication of US7913069B2 publication Critical patent/US7913069B2/en
Assigned to ARRAY PORTFOLIO LLC reassignment ARRAY PORTFOLIO LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GREENARRAYS, INC., MOORE, CHARLES H.
Assigned to ARRAY PORTFOLIO LLC reassignment ARRAY PORTFOLIO LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: VNS PORTFOLIO LLC
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • G06F9/30134Register stacks; shift registers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/80Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • G06F15/8007Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
    • G06F15/8023Two dimensional arrays, e.g. mesh, torus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • G06F9/30043LOAD or STORE instructions; Clear instruction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3005Arrangements for executing specific machine instructions to perform operations for flow control
    • G06F9/30054Unconditional branch instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3005Arrangements for executing specific machine instructions to perform operations for flow control
    • G06F9/30065Loop control instructions; iterative instructions, e.g. LOOP, REPEAT
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30076Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
    • G06F9/30079Pipeline control instructions, e.g. multicycle NOP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/32Address formation of the next instruction, e.g. by incrementing the instruction counter
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/32Address formation of the next instruction, e.g. by incrementing the instruction counter
    • G06F9/322Address formation of the next instruction, e.g. by incrementing the instruction counter for non-sequential address
    • G06F9/325Address formation of the next instruction, e.g. by incrementing the instruction counter for non-sequential address for loops, e.g. loop detection or loop counter
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/448Execution paradigms, e.g. implementations of programming paradigms
    • G06F9/4482Procedural
    • G06F9/4484Executing subprograms
    • G06F9/4486Formation of subprogram jump address

Definitions

  • the present invention relates to the field of computers and computer processors, and more particularly to a method and means for allowing a computer to execute instructions as they are received from an external source without first storing said instruction, and an associated method for using that method and means to facilitate communications between computers and the ability of a computer to use the available resources of another computer.
  • the predominant current usage of the present inventive direct execution method and apparatus is in the combination of multiple computers on a single microchip, wherein operating efficiency is important not only because of the desire for increased operating speed but also because of the power savings and heat reduction that are a consequence of the greater efficiency.
  • the use of multiple processors tends to create a need for communication between the processors. Indeed, there may well be a great deal of communication between the processors, such that a significant portion of time is spent in transferring instructions and data there between. Where the amount of such communication is significant, each additional instruction that must be executed in order to accomplish it places an incremental delay in the process which, cumulatively, can be very significant.
  • the conventional method for communicating instructions or data from one computer to another involves first storing the data or instruction in the receiving computer and then, subsequently, calling it for execution (in the case of an instruction) or for operation thereon (in the case of data).
  • I/O input/output
  • a processor can go about performing its assigned task and then, when a I/O Port/Device needs attention as indicated by the fact that a byte has been received or status has changed, it sends an Interrupt Request (IRQ) to the processor.
  • IRQ Interrupt Request
  • the processor receives an Interrupt Request, it finishes its current instruction, places a few things on the stack, and executes the appropriate Interrupt Service Routine (ISR) which can remove the byte from the port and place it in a buffer. Once the ISR has finished, the processor returns to where it left off. Using this method, the processor doesn't have to waste time, looking to see if the I/O Device is in need of attention, but rather the device will only service the interrupt when it needs attention.
  • ISR Interrupt Service Routine
  • a known embodiment of the present invention is a computer having its own memory such that it is capable of independent computational functions.
  • a plurality of the computers are arranged in an array.
  • the computers In order to accomplish tasks cooperatively, the computers must pass data and/or instructions from one to another. Since all of the computers working simultaneously will typically provide much more computational power than is required by most tasks, and since whatever algorithm or method that is used to distribute the task among the several computers will almost certainly result in an uneven distribution of assignments, it is anticipated that at least some, and perhaps most, of the computers may not be actively participating in the accomplishment of the task at any given time. Therefore, it would be desirable to find a way for under-used computers to be available to assist their busier neighbors by “lending” either computational resources, memory, or both.
  • the present invention provides a means and method for a computer to execute instructions and/or act on data provided directly from another computer, rather than having to receive and then store the data and/or instructions prior to such action. It will be noted that this invention will also be useful for instructions that will act as an intermediary to cause a computer to “pass on” instructions or data from one other computer to yet another computer.
  • One aspect of the invention described herein is that instructions and data are treated essentially identically whether their source is the internal memory of the computer or else whether such instructions and data are being received from another source, such as another computer, an external communications port, or the like. This is significant because “additional” operations, such as storing the data or instructions and thereafter recalling them from internal memory becomes unnecessary, thereby reducing the number of instructions required and increasing the speed of operation of the computers involved.
  • Another aspect of the described embodiment is that very small groups of instructions can be communicated to another computer, generally simultaneously, such that relatively simple operations that require repetitive iterations can be quickly and easily accomplished. This will greatly expedite the process of communication between the computers.
  • Still another aspect of the described embodiment is that, since there are a quantity of computers available to perform various tasks, and since one or more computers can be placed in a dormant state wherein they use essentially no power while awaiting an input, such computers can be assigned the task of awaiting inputs, thereby reducing or eliminating the need to “interrupt” other computers that may be accomplishing other tasks.
  • FIG. 1 is a diagrammatic view of a computer array, according to the present invention.
  • FIG. 2 is a detailed diagram showing a subset of the computers of FIG. 1 and a more detailed view of the interconnecting data buses of FIG. 1 ;
  • FIG. 3 is a block diagram depicting a general layout of one of the computers of FIGS. 1 and 2 ;
  • FIG. 4 is a diagrammatic representation of an instruction word according to the present inventive application.
  • FIG. 5 is a schematic representation of the slot sequencer 42 of FIG. 3 ;
  • FIG. 6 is a flow diagram depicting an example of a micro-loop according to the present invention.
  • FIG. 7 is a flow diagram depicting an example of the inventive method for executing instructions from a port
  • FIG. 8 is a flow diagram depicting an example of the inventive improved method for alerting a computer
  • FIG. 9 is a flow diagram depicting another example of an inventive method for alerting a computer.
  • FIG. 10 is a flow diagram depicting an inventive method for one computer to borrow the memory resources of a neighboring computer.
  • a known mode for carrying out the invention is an array of individual computers.
  • the array is depicted in a diagrammatic view in FIG. 1 and is designated therein by the general reference character 10 .
  • the computer array 10 has a plurality (twenty four in the example shown) of computers 12 (sometimes also referred to as “cores” or “nodes” in the example of an array). In the example shown, all of the computers 12 are located on a single die 14 . According to the present invention, each of the computers 12 is a generally independently functioning computer, as will be discussed in more detail hereinafter.
  • the computers 12 are interconnected by a plurality (the quantities of which will be discussed in more detail hereinafter) of interconnecting data buses 16 .
  • the data buses 16 are bidirectional, asynchronous, high-speed, parallel data buses, although it is within the scope of the invention that other interconnecting means might be employed for the purpose.
  • the individual computers 12 In the present embodiment of the array 10 , not only is data communication between the computers 12 asynchronous, the individual computers 12 also operate in an internally asynchronous mode. This has been found by the inventor to provide important advantages. For example, since a clock signal does not have to be distributed throughout the computer array 10 , a great deal of power is saved. Furthermore, not having to distribute a clock signal eliminates many timing problems that could limit the size of the array 10 or cause other known difficulties. Also, the fact that the individual computers operate asynchronously saves a great deal of power, since each computer will use essentially no power when it is not executing instructions, since there is no clock running therein.
  • Such additional components include power buses, external connection pads, and other such common aspects of a microprocessor chip.
  • Computer 12 e is an example of one of the computers 12 that is not on the periphery of the array 10 . That is, computer 12 e has four orthogonally adjacent computers 12 a , 12 b , 12 c and 12 d . This grouping of computers 12 a through 12 e will be used, by way of example, hereinafter in relation to a more detailed discussion of the communications between the computers 12 of the array 10 . As can be seen in the view of FIG. 1 , interior computers such as computer 12 e will have four other computers 12 with which they can directly communicate via the buses 16 . In the following discussion, the principles discussed will apply to all of the computers 12 except that the computers 12 on the periphery of the array 10 will be in direct communication with only three or, in the case of the corner computers 12 , only two other of the computers 12 .
  • FIG. 2 is a more detailed view of a portion of FIG. 1 showing only some of the computers 12 and, in particular, computers 12 a through 12 e , inclusive.
  • the view of FIG. 2 also reveals that the data buses 16 each have a read line 18 , a write line 20 and a plurality (eighteen, in this example) of data lines 22 .
  • the data lines 22 are capable of transferring all the bits of one eighteen-bit instruction word generally simultaneously in parallel.
  • some of the computers 12 are mirror images of adjacent computers. However, whether the computers 12 are all oriented identically or as mirror images of adjacent computers is not an aspect of this presently described invention. Therefore, in order to better describe this invention, this potential complication will not be discussed further herein.
  • a computer 12 such as the computer 12 e can set high one, two, three or all four of its read lines 18 such that it is prepared to receive data from the respective one, two, three or all four adjacent computers 12 .
  • a computer 12 it is also possible for a computer 12 to set one, two, three or all four of its write lines 20 high.
  • the receiving computer may try to set the write line 20 low slightly before the sending computer 12 releases (stops pulling high) its write line 20 . In such an instance, as soon as the sending computer 12 releases its write line 20 the write line 20 will be pulled low by the receiving computer 12 e.
  • computer 12 e was described as setting one or more of its read lines 18 high before an adjacent computer (selected from one or more of the computers 12 a , 12 b , 12 c or 12 d ) has set its write line 20 high.
  • this process can certainly occur in the opposite order. For example, if the computer 12 e were attempting to write to the computer 12 a , then computer 12 e would set the write line 20 between computer 12 e and computer 12 a to high. If the read line 18 between computer 12 e and computer 12 a has then not already been set to high by computer 12 a , then computer 12 e will simply wait until computer 12 a does set that read line 20 high.
  • the receiving computer 12 sets both the read line 18 and the write line 20 between the two computers ( 12 e and 12 a in this example) to low as soon as the sending computer 12 e releases the write line 18 .
  • the computers 12 there may be several potential means and/or methods to cause the computers 12 to function as described.
  • the computers 12 so behave simply because they are operating generally asynchronously internally (in addition to transferring data there-between in the asynchronous manner described). That is, instructions are generally completed sequentially. When either a write or read instruction occurs, there can be no further action until that instruction is completed (or, perhaps alternatively, until it is aborted, as by a “reset” or the like). There is no regular clock pulse, in the prior art sense.
  • a pulse is generated to accomplish a next instruction only when the instruction being executed either is not a read or write type instruction (given that a read or write type instruction would require completion, often by another entity) or else when the read or write type operation is, in fact, completed.
  • FIG. 3 is a block diagram depicting the general layout of an example of one of the computers 12 of FIGS. 1 and 2 .
  • each of the computers 12 is a generally self contained computer having its own RAM 24 and ROM 26 .
  • the computers 12 are also sometimes referred to as individual “nodes”, given that they are, in the present example, combined on a single chip.
  • a return stack 28 (including an R register 29 , discussed hereinafter), an instruction area 30 , an arithmetic logic unit (“ALU” or “processor”) 32 , a data stack 34 and a decode logic section 36 for decoding instructions.
  • ALU arithmetic logic unit
  • data stack 34 a data stack 34 and a decode logic section 36 for decoding instructions.
  • ALU arithmetic logic unit
  • decode logic section 36 for decoding instructions.
  • the computers 12 are dual stack computers having the data stack 34 and the separate return stack 28 .
  • the computer 12 has four communication ports 38 for communicating with adjacent computers 12 .
  • the communication ports 38 are tri-state drivers, having an off status, a receive status (for driving signals into the computer 12 ) and a send status (for driving signals out of the computer 12 )
  • the particular computer 12 is not on the interior of the array ( FIG. 1 ) such as the example of computer 12 e , then one or more of the communication ports 38 will not be used in that particular computer, at least for the purposes described above.
  • those communication ports 38 that do abut the edge of the die 14 can have additional circuitry, either designed into such computer 12 or else external to the computer 12 but associated therewith, to cause such communication port 38 to act as an external I/O port 39 ( FIG. 1 ).
  • Examples of such external I/O ports 39 include, but are not limited to, USB (universal serial bus) ports, RS232 serial bus ports, parallel communications ports, analog to digital and/or digital to analog conversion ports, and many other possible variations. No matter what type of additional or modified circuitry is employed for this purpose, according to the presently described embodiment of the invention the method of operation of the “external” I/O ports 39 regarding the handling of instructions and/or data received there from will be alike to that described, herein, in relation to the “internal” communication ports 38 .
  • an “edge” computer 12 f is depicted with associated interface circuitry 80 (shown in block diagrammatic form) for communicating through an external I/O port 39 with an external device 82 .
  • the instruction area 30 includes a number of registers 40 including, in this example, an A register 40 a , a B register 40 b and a P register 40 c .
  • the A register 40 a is a full eighteen-bit register
  • the B register 40 b and the P register 40 c are nine-bit registers.
  • the present computer 12 is implemented to execute native Forth language instructions.
  • Forth “words” are constructed from the native processor instructions designed into the computer.
  • the collection of Forth words is known as a “dictionary”. In other languages, this might be known as a “library”.
  • the computer 12 reads eighteen bits at a time from RAM 24 , ROM 26 or directly from one of the data buses 16 ( FIG. 2 ).
  • operand-less instructions since in Forth most instructions (known as operand-less instructions) obtain their operands directly from the stacks 28 and 34 , they are generally only five bits in length, such that up to four instructions can be included in a single eighteen-bit instruction word, with the condition that the last instruction in the group is selected from a limited set of instructions that require only three bits. (In the described embodiment, the two least significant bits of an instruction in the last position are assumed to be “01”.) Also depicted in block diagrammatic form in the view of FIG. 3 is a slot sequencer 42 .
  • data stack 34 is a last-in-first-out stack for parameters to be manipulated by the ALU 32
  • the return stack 28 is a last-in first-out stack for nested return addresses used by CALL and RETURN instructions.
  • the return stack 28 is also used by PUSH, POP and NEXT instructions, as will be discussed in some greater detail, hereinafter.
  • the data stack 34 and the return stack 28 are not arrays in memory accessed by a stack pointer, as in many prior art computers. Rather, the stacks 34 and 28 are an array of registers.
  • the top two registers in the data stack 34 are a T register 44 and an S register 46 .
  • the remainder of the data stack 34 has a circular register array 34 a having eight additional hardware registers therein numbered, in this example S 2 through S 9 .
  • One of the eight registers in the circular register array 34 a will be selected as the register below the S register 46 at any time.
  • the value in the shift register that selects the stack register to be below S cannot be read or written by software.
  • the top position in the return stack 28 is the dedicated R register 29
  • the remainder of the return stack 28 has a circular register array 28 a having twelve additional hardware registers therein (not specifically shown in the drawing) that are numbered, in this example R 1 through R 11 .
  • the software can simply assume that a stack 28 or 34 is ‘empty’ at any time. There is no need to clear old items from the stack as they will be pushed down towards the bottom where they will be lost as the stack fills. So there is nothing to initialize for a program to assume that the stack is empty.
  • the instruction area 30 also has an 18 bit instruction register 30 a for storing the instruction word 48 that is presently being used, and an additional 5 bit opcode register 30 b for the instruction in the particular instruction presently being executed.
  • FIG. 4 is a diagrammatic representation of an instruction word 48 .
  • the instruction word 48 can actually contain instructions, data, or some combination thereof.
  • the instruction word 48 consists of eighteen bits 50 . This being a binary computer, each of the bits 50 will be a ‘1’ or a ‘0’.
  • the eighteen-bit wide instruction word 48 can contain up to four instructions 52 in four slots 54 called slot zero 54 a , slot one 54 b , slot two 54 c and slot three 54 d .
  • the eighteen-bit instruction words 48 are always read as a whole.
  • FIG. 5 is a schematic representation of the slot sequencer 42 of FIG. 3 .
  • the slot sequencer 42 has a plurality (fourteen in this example) of inverters 56 and one NAND gate 58 arranged in a ring, such that a signal is inverted an odd number of times as it travels through the fourteen inverters 56 and the NAND gate 58 .
  • a signal is initiated in the slot sequencer 42 when either of the two inputs to an OR gate 60 goes high.
  • a first OR gate input 62 is derived from a bit i 4 66 ( FIG. 4 ) of the instruction 52 being executed. If bit i 4 is high then that particular instruction 52 is an ALU instruction, and the i4 bit 66 is ‘1’. When the i4 bit is ‘1’, then the first OR gate input 62 is high, and the slot sequencer 42 is triggered to initiate a pulse that will cause the execution of the next instruction 52 .
  • a signal will travel around the slot sequencer 42 twice, producing an output at a slot sequencer output 68 each time.
  • the relatively wide output from the slot sequencer output 68 is provided to a pulse generator 70 (shown in block diagrammatic form) that produces a narrow timing pulse as an output.
  • a pulse generator 70 shown in block diagrammatic form
  • the i4 bit 66 is ‘0’ (low) and the first OR gate input 62 is, therefore, also low.
  • the timing of events in a device such as the computers 12 is generally quite critical, and this is no exception.
  • the output from the OR gate 60 must remain high until after the signal has circulated past the NAND gate 58 in order to initiate the second “lap” of the ring. Thereafter, the output from the OR gate 60 will go low during that second “lap” in order to prevent unwanted continued oscillation of the circuit.
  • each instruction 52 is set according to whether or not that instruction is a read or write type of instruction, as opposed to that instruction being one that requires no input or output.
  • the remaining bits 50 in the instruction 52 provide the remainder of the particular opcode for that instruction.
  • one or more of the bits may be used to indicate where data is to be read from, or written to, in that particular computer 12 .
  • data to be written always comes from the T register 44 (the top of the data stack 34 ), however data can be selectively read into either the T register 44 or else the instruction area 30 from where it can be executed. That is because, in this particular embodiment of the invention, either data or instructions can be communicated in the manner described herein and instructions can, therefore, be executed directly from the data bus 16 .
  • One or more of the bits 50 will be used to indicate which of the ports 38 , if any, is to be set to read or write. This later operation is optionally accomplished by using one or more bits to designate a register 40 , such as the A register 40 a , the B register 40 b , or the like.
  • the designated register 40 will be preloaded with data having a bit corresponding to each of the ports 38 (and, also, any other potential entity with which the computer 12 may be attempting to communicate, such as memory (RAM 24 or ROM 26 ), an external communications port 39 , or the like.)
  • each of four bits in the particular register 40 can correspond to each of the up port 38 a , the right port 38 b , the left port 38 c or the down port 38 d . In such case, where there is a ‘1’ at any of those bit locations, communication will be set to proceed through the corresponding port 38 .
  • a read opcode might set more than one port 38 for communication in a single instruction while, although it is possible, it is not anticipated that a write opcode will set more than one port 38 for communication in a single instruction.
  • the opcode of the instruction 52 will have a ‘0’ at bit position i4 66 , and so the first OR gate input 62 of the OR gate 60 is low, and so the slot sequencer 42 is not triggered to generate an enabling pulse.
  • both the read line 18 and the corresponding write line 20 between computers 12 e and 12 c are high, then both lines 18 and 20 will released by each of the respective computers 12 that is holding it high.
  • the sending computer 12 e will be holding the write line 18 high while the receiving computer 12 c will be holding the read line 20 high).
  • the receiving computer 12 c will pull both lines 18 and 20 low.
  • the receiving computer 12 c may attempt to pull the lines 18 and 20 low before the sending computer 12 e has released the write line 18 .
  • any attempt to pull a line 18 or 20 low will not actually succeed until that line 18 or 20 is released by the computer 12 that is holding it high.
  • each of the computers 12 e and 12 c will, upon the acknowledge condition, set its own internal acknowledge line 72 high.
  • the acknowledge line 72 provides the second OR gate input 64 . Since an input to either of the OR gate 60 inputs 62 or 64 will cause the output of the OR gate 60 to go high, this will initiate operation of the slot sequencer 42 in the manner previously described herein, such that the instruction 52 in the next slot 54 of the instruction word 48 will be executed.
  • the acknowledge line 72 stays high until the next instruction 52 is decoded, in order to prevent spurious addresses from reaching the address bus.
  • the computer 12 will fetch the next awaiting eighteen-bit instruction word 48 unless, of course, bit i 4 66 is a ‘0’ or, also, unless the instruction in slot three is a “next” instruction, which will be discussed in more detail hereinafter.
  • the present inventive mechanism includes a method and apparatus for “prefetching” instructions such that the fetch can begin before the end of the execution of all instructions 52 in the instruction word 48 .
  • this also is not a necessary aspect of the presently described invention.
  • the inventor believes that a key feature for enabling efficient asynchronous communications between devices is some sort of acknowledge signal or condition.
  • acknowledge signal or condition In the prior art, most communication between devices has been clocked and there is no direct way for a sending device to know that the receiving device has properly received the data. Methods such as checksum operations may have been used to attempt to insure that data is correctly received, but the sending device has no direct indication that the operation is completed.
  • the present inventive method provides the necessary acknowledge condition that allows, or at least makes practical, asynchronous communications between the devices. Furthermore, the acknowledge condition also makes it possible for one or more of the devices to “go to sleep” until the acknowledge condition occurs.
  • an acknowledge condition could be communicated between the computers 12 by a separate signal being sent between the computers 12 (either over the interconnecting data bus 16 or over a separate signal line), and such an acknowledge signal would be within the scope of this aspect of the present invention.
  • the method for acknowledgement does not require any additional signal, clock cycle, timing pulse, or any such resource beyond that described, to actually effect the communication.
  • FIG. 6 is a diagrammatic representation of a micro-loop 100 .
  • the micro-loop 100 not unlike other prior art loops, has a FOR instruction 102 and a NEXT instruction 104 . Since an instruction word 48 ( FIG. 4 ) contains as many as four instructions 52 , an instruction word 48 can include three operation instructions 106 within a single instruction word 48 .
  • the operation instructions 106 can be essentially any of the available instructions that a programmer might want to include in the micro-loop 100 .
  • a typical example of a micro-loop 100 that might be transmitted from one computer 12 to another might be a set of instructions for reading from, or writing to the RAM 24 of the second computer 12 , such that the first computer 12 could “borrow” available RAM 24 capacity.
  • the FOR instruction 102 pushes a value onto the return stack 28 representing the number of iterations desired. That is, the value on the T register 44 at the top of the data stack 34 is PUSHed into the R register 29 of the return stack 28 .
  • the FOR instruction 102 while often located in slot three 54 d of an instruction word 48 can, in fact, be located in any slot 54 . Where the FOR instruction 102 is not located in slot three 54 d , then the remaining instructions 52 in that instruction word 48 will be executed before going on to the micro-loop 100 , which will generally be the next loaded instruction word 103 that includes a NEXT instruction 104 and three operation instructions 106 .
  • the NEXT instruction 104 depicted in the view of FIG. 6 is a particular type of NEXT instruction 104 . This is because it is located in slot three 54 d ( FIG. 4 ). According to this embodiment of the invention, it is assumed that all of the data in a particular instruction word 40 that follows an “ordinary” NEXT instruction (not shown) is an address (the address where the for/next loop begins). The opcode for the NEXT instruction 104 is the same, no matter which of the four slots 54 it is in (with the obvious exception that the first two digits are assumed if it is slot three 54 d , rather than being explicitly written, as discussed previously herein).
  • the NEXT instruction 104 in slot three 54 d is a MICRO-NEXT instruction 104 a .
  • the MICRO-NEXT instruction 104 a uses the address of the first instruction 52 , located in slot zero 54 a of the same instruction word 48 in which it is located, as the address to which to return.
  • the MICRO-NEXT instruction 104 a also takes the value from the R register 29 (which was originally PUSHed there by the FOR instruction 102 ), decrements it by 1, and then returns it to the R register 29 .
  • the MICRO-NEXT instruction When the value on the R register 29 reaches a predetermined value (such as zero), then the MICRO-NEXT instruction will load the next instruction word 48 and continue on as described previously herein. However, when the MICRO-NEXT instruction 104 a reads a value from the R register 29 that is greater than the predetermined value, it will resume operation at slot zero 54 a of its own instruction word 48 and execute the three instructions 52 located in slots zero through three, inclusive, thereof. That is, a MICRO-NEXT instruction 104 a will always, in this embodiment of the invention, execute three operation instructions 106 . Because, in some instances, it may not be desired to use all three potentially available instructions 52 , a “no-op” instruction is available to fill one or two of the slots 54 , as required.
  • a predetermined value such as zero
  • micro-loops 100 can be used entirely within a single computer 12 . Indeed, the entire set of available machine language instructions is available for use as the operation instructions 106 , and the application and use of micro-loops is limited only by the imagination of the programmer. However, when the ability to execute an entire micro-loop 100 within a single instruction word 48 is combined with the ability to allow a computer 12 to send the instruction word 48 to a neighbor computer 12 to execute the instructions 52 therein essentially directly from the data bus 16 , this provides a powerful tool for allowing a computer 12 to utilize the resources of its neighbors.
  • the small micro-loop 100 can be communicated between computers 12 , as described herein and it can be executed directly from the communications port 38 of the receiving computer 12 , just like any other set of instructions contained in a instruction word 48 , as described herein. While there are many uses for this sort of “micro-loop” 100 , a typical use would be where one computer 12 wants to store some data onto the memory of a neighbor computer 12 . It could, for example, first send an instruction to that neighbor computer telling it to store a incoming data word to a particular memory address, then increment that address, then repeat for a given number of iterations (the number of data words to be transmitted). To read the data back, the first computer would just instruct the second computer (the one used for storage here) to write the stored data back to the first computer, using a similar micro-loop.
  • a computer 12 can use an otherwise resting neighbor computer 12 for storage of excess data when the data storage need exceeds the relatively small capacity built into each individual computer 12 . While this example has been described in terms of data storage, the same technique can equally be used to allow a computer 12 to have its neighbor share its computational resources—by creating a micro-loop 100 that causes the other computer 12 to perform some operations, store the result, and repeat a given number of times. As can be appreciated, the number of ways in which this inventive micro-loop 100 structure can be used is nearly infinite.
  • either data or instructions can be communicated in the manner described herein and instructions can, therefore, be executed essentially directly from the data bus 16 . That is, there is no need to store instructions to RAM 24 and then recall them before execution. Instead, according to this aspect of the invention, an instruction word 48 that is received on a communications port 38 is not treated essentially differently than it would be were it recalled from RAM 24 or ROM 26 . While this lack of a difference is revealed in the prior discussion, herein, concerning the described operation of the computers 12 , the following more specific discussion of how instruction words 48 are fetched and used will aid in the understanding of the invention.
  • the FETCH instruction uses the address on the A register 40 a to determine from where to fetch an 18 bit word. Of course, the program will have to have already provided for placing the correct address on the A register 40 a .
  • the A register 40 a is an 18 bit register, such that there is a sufficient range of address data available that any of the potential sources from which a fetch can occur can be differentiated. That is, there is a range of addresses assigned to ROM, a different range of addresses assigned to RAM, and there are specific addresses for each of the ports 38 and for the external I/O port 39 .
  • a FETCH instruction always places the 18 bits that it fetches on the T register 44 .
  • executable instructions are temporarily stored in the instruction register 30 a .
  • the computer will automatically fetch the “next” instruction word 48 .
  • the “program counter” the P register 40 c .
  • the P register 40 c is often automatically incremented, as is the case where a sequence of instruction words 48 is to be fetched from RAM 24 or ROM 26 .
  • a JUMP or CALL instruction will cause the P register 40 c to be loaded with the address designated by the data in the remainder of the presently loaded instruction word 48 after the JUMP or CALL instruction, rather than being incremented.
  • the P register 40 c is then loaded with an address corresponding to one or more of the ports 38 , then the next instruction word 48 will be loaded into the instruction register 30 a from the ports 38 .
  • the P register 40 c also does not increment when an instruction word 48 has just been retrieved from a port 38 into the instruction register 30 a . Rather, it will continue to retain that same port address until a specific JUMP or CALL instruction is executed to change the P register 40 c .
  • the computer 12 knows that the next eighteen bits fetched is to be placed in the instruction register 30 a when there are no more executable instructions left in the present instruction word 48 .
  • there are no more executable instructions left in the present instruction word 48 after a JUMP or CALL instruction (or also after certain other instructions that will not be specifically discussed here) because, by definition, the remainder of the 18 bit instruction word following a JUMP or CALL instruction is dedicated to the address referred to by the JUMP or CALL instruction.
  • Another way of stating this is that the above described processes are unique in many ways, including but not limited to the fact that a JUMP or CALL instruction can, optionally, be to a port 38 , rather than to just a memory address, or the like.
  • the computer 12 can look for its next instruction from one port 38 or from any of a group of the ports 38 . Therefore, addresses are provided to correspond to various combinations of the ports 38 .
  • a computer is told to fetch an instruction from a group of ports 38 , then it will accept the first available instruction word 48 from any of the selected ports 38 . If no neighbor computer 12 has already attempted to write to any of those ports 38 , then the computer 12 in question will “go to sleep”, as described in detail above, until a neighbor does write to the selected port 38 .
  • FIG. 7 is a flow diagram depicting an example of the above described direct execution method 120 .
  • a “normal” flow of operations will commence when, as discussed previously herein, there are no more executable instructions left in the instruction register 30 a .
  • the computer 12 will “fetch” another instruction word (note that the term “fetch” is used here in a general sense, in that an actual FETCH instruction is not used), as indicated by a “fetch word” operation 122 . That operation will be accomplished according to the address in the P register 40 c (as indicated by an “address” decision operation 124 in the flow diagram of FIG. 7 .
  • the next instruction word 48 will be retrieved from the designated memory location in a “fetch from memory” operation 126 . If, on the other hand, the address in the P register 40 c is that of a port 38 or ports 38 (not a memory address) then the next instruction word 48 will be retrieved from the designated port location in a “fetch from port” operation 128 . In either case, the instruction word 48 being retrieved is placed in the instruction register 30 c in a “retrieve instruction word” operation 130 . In an “execute instruction word” operation 132 , the instructions in the slots 54 of the instruction word 48 are accomplished sequentially, as described previously herein.
  • a “jump” decision operation 134 it is determined if one of the operations in the instruction word 48 is a JUMP instruction, or other instruction that would divert operation away from the continued “normal” progression as discussed previously herein. If yes, then the address provided in the instruction word 48 after the JUMP (or other such) instruction is provided to the P register 40 c in a “load P register” operation 136 , and the sequence begins again in the “fetch word” operation 122 , as indicated in the diagram of FIG. 7 . If no, then the next action depends upon whether the last instruction fetch was from a port 38 or from a memory address, as indicated in a “port address” decision operation 138 .
  • the last instruction fetch was from a port 38 , then no change is made to the P register 30 a and the sequence is repeated starting with the “fetch word” operation 122 . If, on the other hand, the last instruction fetch was from a memory address (RAM 24 or ROM 26 ), then the address in the P register 30 a is incremented, as indicated by an “increment P register” operation 140 in FIG. 7 , before the “fetch word” operation 122 is accomplished.
  • FIG. 8 is a flow diagram depicting an example of the inventive improved method for alerting a computer.
  • the computers 12 of the embodiment described will “go to sleep” while awaiting an input. Such an input can be from a neighboring computer 12 , as in the embodiment described in relation to FIGS. 1 through 5 .
  • the computers 12 that have communication ports 38 that abut the edge of the die 14 can have additional circuitry, either designed into such computer 12 or else external to the computer 12 but associated therewith, to cause such communication port 38 to act as an external I/O port 39 .
  • the inventive combination can provide the additional advantage that the “sleeping” computer 12 can be poised and ready to awaken and spring into some prescribed action when an input is received. Therefore, this invention also provides an alternative to the use of interrupts to handle inputs, whether such inputs come from an external input device, or from another computer 12 in the array 10 .
  • the inventive combination described herein will allow for a computer 12 to be in an “asleep but alert” state, as described above. Therefore, one or more computers 12 can be assigned to receive and act upon certain inputs. While there are numerous ways in which this feature might be used, an example that will serve to illustrate just one such “computer alert method” is illustrated in the view of FIG. 8 and is enumerated therein by the reference character 150 . As can be seen in the view of FIG.
  • a computer 12 in an “enter alert state” operation 152 , a computer 12 is caused to “go to sleep” such that it is awaiting input from an neighbor computer 12 , or more than one (as many as all four) neighbor computers or, in the case of a “edge” computer 12 an external input, or some combination of external inputs and/or inputs from a neighbor computer 12 .
  • a computer 12 can “go to sleep” awaiting completion of either a read or a write operation.
  • the waiting computer 12 is being used, as described in this example, to await some possible “input”, then it would be natural to assume that the waiting computer has set its read line 18 high awaiting a “write” from the neighbor or outside source. Indeed, it is presently anticipated that this will be the usual condition. However, it is within the scope of the invention that the waiting computer 12 will have set its write line 20 high and, therefore, that it will be awakened when the neighbor or outside source “reads” from it.
  • the sleeping computer 12 is caused to resume operation because the neighboring computer 12 or external device 39 has completed the transaction being awaited. If the transaction being awaited was the receipt of an instruction word 48 to be executed, then the computer 12 will proceed to execute the instructions therein. If the transaction being awaited was the receipt of data, then the computer 12 will proceed to execute the next instruction in queue, which will be either the instruction in the next slot 54 in the present instruction word 48 , or else the next instruction word 48 will be loaded and the next instruction will be in slot 0 of that next instruction word 48 . In any case, while being used in the described manner, then that next instruction will begin a sequence of one or more instructions for handling the input just received.
  • Options for handling such input can include reacting to perform some predefined function internally, communicating with one or more of the other computers 12 in the array 10 , or even ignoring the input (just as conventional prior art interrupts may be ignored under prescribed conditions).
  • the options are depicted in the view of FIG. 8 as an “act on input” operation 156 . It should be noted that, in some instances, the content of the input may not be important. In some cases, for example, it may be only the very fact that an external device has attempted communication that is of interest.
  • the computer 12 is assigned the task of acting as an “alert” computer, in the manner depicted in FIG. 8 , then it will generally return to the “asleep but alert” status, as indicated in FIG. 8 .
  • the option is always open to assign the computer 12 some other task, such as when it is no longer necessary to monitor the particular input or inputs there being monitored, or when it is more convenient to transfer that task to some other of the computers 12 in the array.
  • a computer 12 When a computer 12 has one or more of its read lines 18 (or a write line 20 ) set high, it can be said to be an “alert” condition. In the alert condition, the computer 12 is ready to immediately execute any instruction sent to it on the data bus 16 corresponding to the read line or lines 18 that are set high or, alternatively, to act on data that is transferred over the data bus 16 . Where there is an array of computers 12 available, one or more can be used, at any given time, to be in the above described alert condition such that any of a prescribed set of inputs will trigger it into action.
  • alert condition could be embodied in a computer even if it were not “asleep”.
  • the described alert condition can be used in essentially any situation where a conventional prior art interrupt (either a hardware interrupt or a software interrupt) might have otherwise been used.
  • FIG. 9 is another example of a computer alert method 150 a .
  • This is but one example wherein interaction between a monitoring computer 12 f ( FIG. 1 ) and another computer 12 g ( FIG. 1 ) that is assigned to some other task may be desirable or necessary.
  • the “enter alert status” operation 152 , the “awaken” operation 154 and the “act on input” operation each are accomplished as described previously herein in relation to the first example of the computer alert method 150 .
  • the computer 12 f enters a “send info?” decision operation 158 wherein, according to its programming, it is determined if the input just received requires the attention of the other computer 12 g . If no, then the computer 12 f returns to alert status, or some other alternative such as was discussed previously herein.
  • the computer 12 f initiates communication with the computer 12 g as described in detail previously herein in a “send to other” operation 160 .
  • the computer 12 f could be sending instructions such as it may have generated internally in response to the input from the external device 82 or such as it may have received from the external device 82 .
  • the computer 12 f could pass on data to the computer 12 g and such data could be internally generated in computer 12 f or else “passed through” from the external device 82 .
  • the computer 12 f in some situations, might attempt to read from the computer 12 g when it receives an input from the external device 82 . All of these opportunities are available to the programmer.
  • the computer 12 g is generally executing code to accomplish its assigned primary task, whatever that might be, as indicated in an “execute primary function” operation 162 .
  • the programmer will have provided that the computer 12 g occasionally pause to see if one or more of its neighbors has attempted a communication, as indicated in a “look for input” operation 166 .
  • an “input?” decision operation 168 if there is a communication waiting (as, for example, if the computer 12 f has already initiated a write to the computer 12 g ).
  • the computer 12 g will complete the communication, as described in detail previously herein, in a “receive from other” operation 170 . If no, then the computer 12 g will return to the execution of its primary function 162 , as shown in FIG. 9 . After the “receive from other” operation 170 , the computer 12 g will act on the input received in an “act on input” operation 172 . As mentioned above, the programmer could have provided that the computer 12 g would be expecting instructions as an input, in which case the computer 12 g would execute the instructions as described previously herein. Alternatively, the computer 12 g might be programmed to be expecting data to act upon.
  • a given computer 12 need not be interrupted while it is performing a task because another computer 12 is assigned the task of monitoring and handling inputs that might otherwise require an interrupt.
  • the computer 12 that is busy handling another task also cannot be disturbed unless and until its programming provides that it look to its ports 38 for input. Therefore, it will sometimes be desirable to cause the computer 12 to pause to look for other inputs. It is important to realize that what is being described here is an example of a paradigm in computing that might be described as “cooperative multi-tasking” wherein tasks that might formerly have been accomplished by a single processor are divided, in new an interesting ways, among several processors.
  • FIG. 10 shows a flowchart summarizing one method 200 according to the present invention where one computer 12 (e.g., computer 12 e ) uses a micro-loop 100 , such as that shown FIG. 6 , to borrow the memory resources of another computer 12 (e.g., computer 12 b ).
  • computer 12 b pushes a count value onto its return stack 28 .
  • the computer 12 b fetches an instruction word 48 , including a group of instructions 52 (e.g., a micro-loop 100 ), into its instruction register.
  • a group of instructions 52 e.g., a micro-loop 100
  • computer 12 e of the computer array 10 provides the micro-loop to the computer 12 b via a data bus 16 connecting the two computers. Then, in a third step 206 , the computer 12 b sequentially executes the instructions 52 in the micro-loop. In one instance, where computer 12 e wants to borrow memory space in the computer 12 b , the instructions cause the computer 12 b to store an incoming data word, asserted on to the data bus 16 by computer 12 e , to its memory, for example, in its RAM 24 .
  • the instruction word 48 causes the computer 12 b to assert a data word stored in its memory (e.g., RAM 24 ) on the data bus 16 between the computer 12 b and the another computer 12 e .
  • a data word stored in its memory e.g., RAM 24
  • count value on the return stack 28 is changed (e.g., decremented).
  • a predetermined value e.g., 0
  • the computer 12 b executes the micro-loop instruction word 48 in the instruction register again. If the count value is not greater than the predetermined value, then method 200 ends.
  • inventive computer arrays 10 While specific examples of the inventive computer arrays 10 , computers 12 , micro-loops 100 , direct execution method 120 and associated apparatus, and computer alert method 150 have been discussed herein, it is expected that there will be a great many applications for these which have not yet been envisioned. Indeed, it is one of the advantages of the present invention that the inventive method and apparatus may be adapted to a great variety of uses.
  • inventive computer arrays 10 , computers 12 , micro-loops 100 , direct execution method 120 and associated apparatus, and computer alert method 150 are intended to be widely used in a great variety of computer applications. It is expected that they will be particularly useful in applications where significant computing power is required, and yet power consumption and heat production are important considerations.
  • the applicability of the present invention is such that the sharing of information and resources between the computers in an array is greatly enhanced, both in speed a versatility. Also, communications between a computer array and other devices is enhanced according to the described method and means.

Abstract

A computer array (10) has a plurality of computers (12). The computers (12) communicate with each other asynchronously, and the computers (12) themselves operate in a generally asynchronous manner internally. Instruction words (48) can include a micro-loop (100) which is capable of performing a series of operations repeatedly. In a particular example, the series of operations are included in a single instruction word (48). The micro-loop (100) in combination with the ability of the computers (12) to send instruction words (48) to a neighboring computer (12) provides a powerful tool for allowing a computer (12) to utilize the resources of a neighboring computer (12).

Description

RELATED APPLICATIONS
This application is a continuation-in-part of U.S. application Ser. No. 11/355,513 filed Feb. 16, 2006 by at least one common inventor, and claims the benefit of provisional U.S. Application Ser. No. 60/788,265 filed Mar. 31, 2006 by at least one common inventor, and U.S. Application Ser. No. 60/797,345 filed May 3, 2006 by at least one common inventor, all of which are incorporated herein by reference in their entireties.
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to the field of computers and computer processors, and more particularly to a method and means for allowing a computer to execute instructions as they are received from an external source without first storing said instruction, and an associated method for using that method and means to facilitate communications between computers and the ability of a computer to use the available resources of another computer. The predominant current usage of the present inventive direct execution method and apparatus is in the combination of multiple computers on a single microchip, wherein operating efficiency is important not only because of the desire for increased operating speed but also because of the power savings and heat reduction that are a consequence of the greater efficiency.
2. Description of the Background Art
In the art of computing, processing speed is a much desired quality, and the quest to create faster computers and processors is ongoing. However, it is generally acknowledged in the industry that the limits for increasing the speed in microprocessors are rapidly being approached, at least using presently known technology. Therefore, there is an increasing interest in the use of multiple processors to increase overall computer speed by sharing computer tasks among the processors.
The use of multiple processors tends to create a need for communication between the processors. Indeed, there may well be a great deal of communication between the processors, such that a significant portion of time is spent in transferring instructions and data there between. Where the amount of such communication is significant, each additional instruction that must be executed in order to accomplish it places an incremental delay in the process which, cumulatively, can be very significant. The conventional method for communicating instructions or data from one computer to another involves first storing the data or instruction in the receiving computer and then, subsequently, calling it for execution (in the case of an instruction) or for operation thereon (in the case of data).
It would be useful to reduce the number of steps required to transmit, receive, and then use information, in the form of data or instructions, between computers. However, to the inventor's knowledge no prior art system has streamlined the above described process in a significant manner.
Also, in the prior art it is known that it is necessary to “get the attention” of a computer from time to time. That is, sometimes even though a computer may be busy with one task, another time sensitive task requirement can occur that may necessitate temporarily diverting the computer away from the first task. Examples include, but are not limited to, instances where a user input device is used to provide input to the computer. In such cases, the computer might need to temporarily acknowledge the input and/or react in accordance with the input. Then, the computer will either continue what it was doing before the input or else change what it was doing based upon the input. Although an external input is used as an example here, the same situation occurs when there is a potential conflict for the attention of the arithmetic logic unit (“ALU”) between internal aspects of the computer, as well.
When receiving data and change in status from input/output (“I/O”) ports there have been two methods available in the prior art. One has been to “poll” the port, which involves reading the status of the port at fixed intervals to determine whether any data has been received or a change of status has occurred. However, polling the port consumes considerable time and resources which could usually be better used doing other things. A better alternative has often been the use of “interrupts”. When using interrupts, a processor can go about performing its assigned task and then, when a I/O Port/Device needs attention as indicated by the fact that a byte has been received or status has changed, it sends an Interrupt Request (IRQ) to the processor. Once the processor receives an Interrupt Request, it finishes its current instruction, places a few things on the stack, and executes the appropriate Interrupt Service Routine (ISR) which can remove the byte from the port and place it in a buffer. Once the ISR has finished, the processor returns to where it left off. Using this method, the processor doesn't have to waste time, looking to see if the I/O Device is in need of attention, but rather the device will only service the interrupt when it needs attention. However, the use of interrupts, itself, is far less than desirable in many cases, since there can be a great deal of overhead associated with the use of interrupts. For example, each time an interrupt occurs, a computer may have to temporarily store certain data relating to the task it was previously trying to accomplish, then load data pertaining to the interrupt, and then reload the data necessary for the prior task once the interrupt is handled. Obviously, it would be desirable to reduce or eliminate all of this time and resource consuming overhead. However, no prior art method has been developed which has alleviated the need for interrupts.
SUMMARY
Accordingly, it is an object of the present invention to provide an apparatus and method for increasing the speed of operation where two or more computers are communicating data and/or instructions there between.
It is still another object of the present invention to provide an apparatus and method for providing substantial computing power inexpensively.
It is still another object of the present invention to provide an apparatus and method for accomplishing computationally intensive tasks in a minimal amount of time.
It is yet another object of the present invention to provide a computer device that produces a great amount of processing capability.
It is still another object of the present invention to increase the efficiency of communications between computers and computer controlled devices.
It is still another object of the present invention to increase the efficiency of communications between computers.
It is yet another object of the present invention to increase the efficiency of the manner in which computers communicate with each other and with the other devices, such as user input devices and the like.
Briefly, a known embodiment of the present invention is a computer having its own memory such that it is capable of independent computational functions. In one embodiment of the invention a plurality of the computers are arranged in an array. In order to accomplish tasks cooperatively, the computers must pass data and/or instructions from one to another. Since all of the computers working simultaneously will typically provide much more computational power than is required by most tasks, and since whatever algorithm or method that is used to distribute the task among the several computers will almost certainly result in an uneven distribution of assignments, it is anticipated that at least some, and perhaps most, of the computers may not be actively participating in the accomplishment of the task at any given time. Therefore, it would be desirable to find a way for under-used computers to be available to assist their busier neighbors by “lending” either computational resources, memory, or both. In order that such a relationship be efficient and useful it would further be desirable that communications and interaction between neighboring computers be as quick and efficient as possible. Therefore, the present invention provides a means and method for a computer to execute instructions and/or act on data provided directly from another computer, rather than having to receive and then store the data and/or instructions prior to such action. It will be noted that this invention will also be useful for instructions that will act as an intermediary to cause a computer to “pass on” instructions or data from one other computer to yet another computer.
In the embodiment described, in order to prevent unnecessary consumption of power and unnecessary production of heat, when a computer attempts to communicate with one or more of its neighbors it will be in a dormant mode consuming essentially no power until the neighbor or one of the neighbors acts to complete the communication. However, this is not a necessary aspect of the present invention. Furthermore, in order to accomplish the desired savings of power and reduced heat production it is desirable that the initiating computer cease, or at least significantly reduce, its power consumption while it is awaiting completion of the communication. It is conceivable that this could be accomplished by any of a number of means. For example, if the computer were timed by either an internal or an external clock, then that clock could be slowed or stopped during that period of time. Indeed, it is contemplated that such an embodiment may be implemented for reasons outside the scope of this invention, although the embodiment presently described is the best and most efficient embodiment now known to the inventor.
One aspect of the invention described herein is that instructions and data are treated essentially identically whether their source is the internal memory of the computer or else whether such instructions and data are being received from another source, such as another computer, an external communications port, or the like. This is significant because “additional” operations, such as storing the data or instructions and thereafter recalling them from internal memory becomes unnecessary, thereby reducing the number of instructions required and increasing the speed of operation of the computers involved.
Another aspect of the described embodiment is that very small groups of instructions can be communicated to another computer, generally simultaneously, such that relatively simple operations that require repetitive iterations can be quickly and easily accomplished. This will greatly expedite the process of communication between the computers.
Still another aspect of the described embodiment is that, since there are a quantity of computers available to perform various tasks, and since one or more computers can be placed in a dormant state wherein they use essentially no power while awaiting an input, such computers can be assigned the task of awaiting inputs, thereby reducing or eliminating the need to “interrupt” other computers that may be accomplishing other tasks.
These and other objects and advantages of the present invention will become clear to those skilled in the art in view of the description of modes of carrying out the invention, and the industrial applicability thereof, as described herein and as illustrated in the several figures of the drawing. The objects and advantages listed are not an exhaustive list of all possible advantages of the invention. Moreover, it will be possible to practice the invention even where one or more of the intended objects and/or advantages might be absent or not required in the application.
Further, those skilled in the art will recognize that various embodiments of the present invention may achieve one or more, but not necessarily all, of the described objects and/or advantages. Accordingly, the objects and/or advantages described herein are not essential elements of the present invention, and should not be construed as limitations.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a diagrammatic view of a computer array, according to the present invention;
FIG. 2 is a detailed diagram showing a subset of the computers of FIG. 1 and a more detailed view of the interconnecting data buses of FIG. 1;
FIG. 3 is a block diagram depicting a general layout of one of the computers of FIGS. 1 and 2;
FIG. 4 is a diagrammatic representation of an instruction word according to the present inventive application;
FIG. 5 is a schematic representation of the slot sequencer 42 of FIG. 3;
FIG. 6 is a flow diagram depicting an example of a micro-loop according to the present invention;
FIG. 7 is a flow diagram depicting an example of the inventive method for executing instructions from a port;
FIG. 8 is a flow diagram depicting an example of the inventive improved method for alerting a computer;
FIG. 9 is a flow diagram depicting another example of an inventive method for alerting a computer; and
FIG. 10 is a flow diagram depicting an inventive method for one computer to borrow the memory resources of a neighboring computer.
DETAILED DESCRIPTION OF THE INVENTION
This invention is described in the following description with reference to the Figures, in which like numbers represent the same or similar elements. While this invention is described in terms of modes for achieving this invention's objectives, it will be appreciated by those skilled in the art that variations may be accomplished in view of these teachings without deviating from the spirit or scope of the present invention.
The embodiments and variations of the invention described herein, and/or shown in the drawings, are presented by way of example only and are not limiting as to the scope of the invention. Unless otherwise specifically stated, individual aspects and components of the invention may be omitted or modified, or may have substituted therefore known equivalents, or as yet unknown substitutes such as may be developed in the future or such as may be found to be acceptable substitutes in the future. The invention may also be modified for a variety of applications while remaining within the spirit and scope of the claimed invention, since the range of potential applications is great, and since it is intended that the present invention be adaptable to many such variations.
A known mode for carrying out the invention is an array of individual computers. The array is depicted in a diagrammatic view in FIG. 1 and is designated therein by the general reference character 10. The computer array 10 has a plurality (twenty four in the example shown) of computers 12 (sometimes also referred to as “cores” or “nodes” in the example of an array). In the example shown, all of the computers 12 are located on a single die 14. According to the present invention, each of the computers 12 is a generally independently functioning computer, as will be discussed in more detail hereinafter. The computers 12 are interconnected by a plurality (the quantities of which will be discussed in more detail hereinafter) of interconnecting data buses 16. In this example, the data buses 16 are bidirectional, asynchronous, high-speed, parallel data buses, although it is within the scope of the invention that other interconnecting means might be employed for the purpose. In the present embodiment of the array 10, not only is data communication between the computers 12 asynchronous, the individual computers 12 also operate in an internally asynchronous mode. This has been found by the inventor to provide important advantages. For example, since a clock signal does not have to be distributed throughout the computer array 10, a great deal of power is saved. Furthermore, not having to distribute a clock signal eliminates many timing problems that could limit the size of the array 10 or cause other known difficulties. Also, the fact that the individual computers operate asynchronously saves a great deal of power, since each computer will use essentially no power when it is not executing instructions, since there is no clock running therein.
One skilled in the art will recognize that there will be additional components on the die 14 that are omitted from the view of FIG. 1 for the sake of clarity. Such additional components include power buses, external connection pads, and other such common aspects of a microprocessor chip.
Computer 12 e is an example of one of the computers 12 that is not on the periphery of the array 10. That is, computer 12 e has four orthogonally adjacent computers 12 a, 12 b, 12 c and 12 d. This grouping of computers 12 a through 12 e will be used, by way of example, hereinafter in relation to a more detailed discussion of the communications between the computers 12 of the array 10. As can be seen in the view of FIG. 1, interior computers such as computer 12 e will have four other computers 12 with which they can directly communicate via the buses 16. In the following discussion, the principles discussed will apply to all of the computers 12 except that the computers 12 on the periphery of the array 10 will be in direct communication with only three or, in the case of the corner computers 12, only two other of the computers 12.
FIG. 2 is a more detailed view of a portion of FIG. 1 showing only some of the computers 12 and, in particular, computers 12 a through 12 e, inclusive. The view of FIG. 2 also reveals that the data buses 16 each have a read line 18, a write line 20 and a plurality (eighteen, in this example) of data lines 22. The data lines 22 are capable of transferring all the bits of one eighteen-bit instruction word generally simultaneously in parallel. It should be noted that, in one embodiment of the invention, some of the computers 12 are mirror images of adjacent computers. However, whether the computers 12 are all oriented identically or as mirror images of adjacent computers is not an aspect of this presently described invention. Therefore, in order to better describe this invention, this potential complication will not be discussed further herein.
According to the present inventive method, a computer 12, such as the computer 12 e can set high one, two, three or all four of its read lines 18 such that it is prepared to receive data from the respective one, two, three or all four adjacent computers 12. Similarly, it is also possible for a computer 12 to set one, two, three or all four of its write lines 20 high. Although the inventor does not believe that there is presently any practical value to setting more than one of the write lines 20 of a computer 12 high at one time, doing so is not beyond the scope of this invention, as it conceivable that a use for such an operation may occur in the future.
When one of the adjacent computers 12 a, 12 b, 12 c or 12 d sets a write line 20 between itself and the computer 12 e high, if the computer 12 e has already set the corresponding read line 18 high, then a word is transferred from that computer 12 a, 12 b, 12 c or 12 d to the computer 12 e on the associated data lines 22. Then, the sending computer 12 will release the write line 20 and the receiving computer (12 e in this example) pulls both the write line 20 and the read line 18 low. The latter action will acknowledge to the sending computer 12 that the data has been received. Note that the above description is not intended necessarily to denote the sequence of events in order. In actual practice, the receiving computer may try to set the write line 20 low slightly before the sending computer 12 releases (stops pulling high) its write line 20. In such an instance, as soon as the sending computer 12 releases its write line 20 the write line 20 will be pulled low by the receiving computer 12 e.
In the present example, only a programming error would cause both computers 12 on the opposite ends of one of the buses 16 to try to set high the read line 18 there-between. Also, it would be error for both computers 12 on the opposite ends of one of the buses 16 to try to set high the write line 18 there-between at the same time. Similarly, as discussed above, it is not currently anticipated that it would be desirable to have a single computer 12 set more than one of its four write lines 20 high. However, it is presently anticipated that there will be occasions wherein it is desirable to set different combinations of the read lines 18 high such that one of the computers 12 can be in a wait state awaiting data from the first one of the chosen computers 12 to set its corresponding write line 20 high.
In the example discussed above, computer 12 e was described as setting one or more of its read lines 18 high before an adjacent computer (selected from one or more of the computers 12 a, 12 b, 12 c or 12 d) has set its write line 20 high. However, this process can certainly occur in the opposite order. For example, if the computer 12 e were attempting to write to the computer 12 a, then computer 12 e would set the write line 20 between computer 12 e and computer 12 a to high. If the read line 18 between computer 12 e and computer 12 a has then not already been set to high by computer 12 a, then computer 12 e will simply wait until computer 12 a does set that read line 20 high. Then, as discussed above, when both of a corresponding pair of write line 18 and read line 20 are high the data awaiting to be transferred on the data lines 22 is transferred. Thereafter, the receiving computer 12 (computer 12 a, in this example) sets both the read line 18 and the write line 20 between the two computers (12 e and 12 a in this example) to low as soon as the sending computer 12 e releases the write line 18.
Whenever a computer 12 such as the computer 12 e has set one of its write lines 20 high in anticipation of writing it will simply wait, using essentially no power, until the data is “requested”, as described above, from the appropriate adjacent computer 12, unless the computer 12 to which the data is to be sent has already set its read line 18 high, in which case the data is transmitted immediately. Similarly, whenever a computer 12 has set one or more of its read lines 18 to high in anticipation of reading it will simply wait, using essentially no power, until the write line 20 connected to a selected computer 12 goes high to transfer an instruction word between the two computers 12.
As discussed above, there may be several potential means and/or methods to cause the computers 12 to function as described. However, in this present example, the computers 12 so behave simply because they are operating generally asynchronously internally (in addition to transferring data there-between in the asynchronous manner described). That is, instructions are generally completed sequentially. When either a write or read instruction occurs, there can be no further action until that instruction is completed (or, perhaps alternatively, until it is aborted, as by a “reset” or the like). There is no regular clock pulse, in the prior art sense. Rather, a pulse is generated to accomplish a next instruction only when the instruction being executed either is not a read or write type instruction (given that a read or write type instruction would require completion, often by another entity) or else when the read or write type operation is, in fact, completed.
FIG. 3 is a block diagram depicting the general layout of an example of one of the computers 12 of FIGS. 1 and 2. As can be seen in the view of FIG. 3, each of the computers 12 is a generally self contained computer having its own RAM 24 and ROM 26. As mentioned previously, the computers 12 are also sometimes referred to as individual “nodes”, given that they are, in the present example, combined on a single chip.
Other basic components of the computer 12 are a return stack 28 (including an R register 29, discussed hereinafter), an instruction area 30, an arithmetic logic unit (“ALU” or “processor”) 32, a data stack 34 and a decode logic section 36 for decoding instructions. One skilled in the art will be generally familiar with the operation of stack based computers such as the computers 12 of this present example. The computers 12 are dual stack computers having the data stack 34 and the separate return stack 28.
In this embodiment of the invention, the computer 12 has four communication ports 38 for communicating with adjacent computers 12. The communication ports 38 are tri-state drivers, having an off status, a receive status (for driving signals into the computer 12) and a send status (for driving signals out of the computer 12) Of course, if the particular computer 12 is not on the interior of the array (FIG. 1) such as the example of computer 12 e, then one or more of the communication ports 38 will not be used in that particular computer, at least for the purposes described above. However, those communication ports 38 that do abut the edge of the die 14 can have additional circuitry, either designed into such computer 12 or else external to the computer 12 but associated therewith, to cause such communication port 38 to act as an external I/O port 39 (FIG. 1). Examples of such external I/O ports 39 include, but are not limited to, USB (universal serial bus) ports, RS232 serial bus ports, parallel communications ports, analog to digital and/or digital to analog conversion ports, and many other possible variations. No matter what type of additional or modified circuitry is employed for this purpose, according to the presently described embodiment of the invention the method of operation of the “external” I/O ports 39 regarding the handling of instructions and/or data received there from will be alike to that described, herein, in relation to the “internal” communication ports 38. In FIG. 1 an “edge” computer 12 f is depicted with associated interface circuitry 80 (shown in block diagrammatic form) for communicating through an external I/O port 39 with an external device 82.
In the presently described embodiment, the instruction area 30 includes a number of registers 40 including, in this example, an A register 40 a, a B register 40 b and a P register 40 c. In this example, the A register 40 a is a full eighteen-bit register, while the B register 40 b and the P register 40 c are nine-bit registers.
Although the invention is not limited by this example, the present computer 12 is implemented to execute native Forth language instructions. As one familiar with the Forth computer language will appreciate, complicated Forth instructions, known as Forth “words” are constructed from the native processor instructions designed into the computer. The collection of Forth words is known as a “dictionary”. In other languages, this might be known as a “library”. As will be described in greater detail hereinafter, the computer 12 reads eighteen bits at a time from RAM 24, ROM 26 or directly from one of the data buses 16 (FIG. 2). However, since in Forth most instructions (known as operand-less instructions) obtain their operands directly from the stacks 28 and 34, they are generally only five bits in length, such that up to four instructions can be included in a single eighteen-bit instruction word, with the condition that the last instruction in the group is selected from a limited set of instructions that require only three bits. (In the described embodiment, the two least significant bits of an instruction in the last position are assumed to be “01”.) Also depicted in block diagrammatic form in the view of FIG. 3 is a slot sequencer 42.
In this embodiment of the invention, data stack 34 is a last-in-first-out stack for parameters to be manipulated by the ALU 32, and the return stack 28 is a last-in first-out stack for nested return addresses used by CALL and RETURN instructions. The return stack 28 is also used by PUSH, POP and NEXT instructions, as will be discussed in some greater detail, hereinafter. The data stack 34 and the return stack 28 are not arrays in memory accessed by a stack pointer, as in many prior art computers. Rather, the stacks 34 and 28 are an array of registers. The top two registers in the data stack 34 are a T register 44 and an S register 46. The remainder of the data stack 34 has a circular register array 34 a having eight additional hardware registers therein numbered, in this example S2 through S9. One of the eight registers in the circular register array 34 a will be selected as the register below the S register 46 at any time. The value in the shift register that selects the stack register to be below S cannot be read or written by software. Similarly, the top position in the return stack 28 is the dedicated R register 29, while the remainder of the return stack 28 has a circular register array 28 a having twelve additional hardware registers therein (not specifically shown in the drawing) that are numbered, in this example R1 through R11.
In this embodiment of the invention, there is no hardware detection of stack overflow or underflow conditions. Generally, prior art processors use stack pointers and memory management, or the like, such that an error condition is flagged when a stack pointer goes out of the range of memory allocated for the stack. That is because, were the stacks located in memory an overflow or underflow would overwrite or use as a stack item something that is not intended to be part of the stack. However, because the present invention has the circular arrays 28 a and 34 a at the bottom on the stacks 28 and 34 the stacks 28 and 34 cannot overflow or underflow out of the stack area. Instead, the circular arrays 28 a and 34 a will merely wrap around the circular array of registers. Because the stacks 28 and 34 have finite depth, pushing anything to the top of a stack 28 or 34 means something on the bottom is being overwritten. Pushing more than ten items to the data stack 34, or more than thirteen items to the return stack 28 must be done with the knowledge that doing so will result in the item at the bottom of the stack 28 or 34 being overwritten. It is the responsibility of software to keep track of the number of items on the stacks 28 and 34 and not try to put more items there than the respective stacks 28 and 34 can hold. The hardware will not detect an overwriting of items at the bottom of the stack or flag it as an error. However, it should be noted that the software can take advantage of the circular arrays 28 a and 34 a at the bottom of the stacks 28 and 34 in several ways. As just one example, the software can simply assume that a stack 28 or 34 is ‘empty’ at any time. There is no need to clear old items from the stack as they will be pushed down towards the bottom where they will be lost as the stack fills. So there is nothing to initialize for a program to assume that the stack is empty.
In addition to the registers previously discussed herein, the instruction area 30 also has an 18 bit instruction register 30 a for storing the instruction word 48 that is presently being used, and an additional 5 bit opcode register 30 b for the instruction in the particular instruction presently being executed.
FIG. 4 is a diagrammatic representation of an instruction word 48. (It should be noted that the instruction word 48 can actually contain instructions, data, or some combination thereof.) The instruction word 48 consists of eighteen bits 50. This being a binary computer, each of the bits 50 will be a ‘1’ or a ‘0’. As previously discussed herein, the eighteen-bit wide instruction word 48 can contain up to four instructions 52 in four slots 54 called slot zero 54 a, slot one 54 b, slot two 54 c and slot three 54 d. In the present embodiment of the invention, the eighteen-bit instruction words 48 are always read as a whole. Therefore, since there is always a potential of having up to four instructions in the instruction word 48, a no-op (no operation) instruction is included in the instruction set of the computer 12 to provide for instances when using all of the available slots 54 might be unnecessary or even undesirable. It should be noted that, according to one particular embodiment of the invention, the polarity (active high as compared to active low) of bits 50 in alternate slots (specifically, slots one 54 b and three 54 c) is reversed. However, this is not a necessary aspect of the presently described invention and, therefore, in order to better explain this invention this potential complication is avoided in the following discussion.
FIG. 5 is a schematic representation of the slot sequencer 42 of FIG. 3. As can be seen in the view of FIG. 5, the slot sequencer 42 has a plurality (fourteen in this example) of inverters 56 and one NAND gate 58 arranged in a ring, such that a signal is inverted an odd number of times as it travels through the fourteen inverters 56 and the NAND gate 58. A signal is initiated in the slot sequencer 42 when either of the two inputs to an OR gate 60 goes high. A first OR gate input 62 is derived from a bit i4 66 (FIG. 4) of the instruction 52 being executed. If bit i4 is high then that particular instruction 52 is an ALU instruction, and the i4 bit 66 is ‘1’. When the i4 bit is ‘1’, then the first OR gate input 62 is high, and the slot sequencer 42 is triggered to initiate a pulse that will cause the execution of the next instruction 52.
When the slot sequencer 42 is triggered, either by the first OR gate input 62 going high or by the second OR gate input 64 going high (as will be discussed hereinafter), then a signal will travel around the slot sequencer 42 twice, producing an output at a slot sequencer output 68 each time. The first time the signal passes the slot sequencer output 68 it will be low, and the second time the output at the slot sequencer output 68 will be high. The relatively wide output from the slot sequencer output 68 is provided to a pulse generator 70 (shown in block diagrammatic form) that produces a narrow timing pulse as an output. One skilled in the art will recognize that the narrow timing pulse is desirable to accurately initiate the operations of the computer 12.
When the particular instruction 52 being executed is a read or a write instruction, or any other instruction wherein it is not desired that the instruction 52 being executed triggers immediate execution of the next instruction 52 in sequence, then the i4 bit 66 is ‘0’ (low) and the first OR gate input 62 is, therefore, also low. One skilled in the art will recognize that the timing of events in a device such as the computers 12 is generally quite critical, and this is no exception. Upon examination of the slot sequencer 42 one skilled in the art will recognize that the output from the OR gate 60 must remain high until after the signal has circulated past the NAND gate 58 in order to initiate the second “lap” of the ring. Thereafter, the output from the OR gate 60 will go low during that second “lap” in order to prevent unwanted continued oscillation of the circuit.
As can be appreciated in light of the above discussion, when the i4 bit 66 is ‘0’, then the slot sequencer 42 will not be triggered—assuming that the second OR gate input 66, which will be discussed hereinafter, is not high.
As discussed, above, the i4 bit 66 of each instruction 52 is set according to whether or not that instruction is a read or write type of instruction, as opposed to that instruction being one that requires no input or output. The remaining bits 50 in the instruction 52 provide the remainder of the particular opcode for that instruction. In the case of a read or write type instruction, one or more of the bits may be used to indicate where data is to be read from, or written to, in that particular computer 12. In the present example of the invention, data to be written always comes from the T register 44 (the top of the data stack 34), however data can be selectively read into either the T register 44 or else the instruction area 30 from where it can be executed. That is because, in this particular embodiment of the invention, either data or instructions can be communicated in the manner described herein and instructions can, therefore, be executed directly from the data bus 16.
One or more of the bits 50 will be used to indicate which of the ports 38, if any, is to be set to read or write. This later operation is optionally accomplished by using one or more bits to designate a register 40, such as the A register 40 a, the B register 40 b, or the like. In such an example, the designated register 40 will be preloaded with data having a bit corresponding to each of the ports 38 (and, also, any other potential entity with which the computer 12 may be attempting to communicate, such as memory (RAM 24 or ROM 26), an external communications port 39, or the like.) For example, each of four bits in the particular register 40 can correspond to each of the up port 38 a, the right port 38 b, the left port 38 c or the down port 38 d. In such case, where there is a ‘1’ at any of those bit locations, communication will be set to proceed through the corresponding port 38. As previously discussed herein, in the present embodiment of the invention it is anticipated that a read opcode might set more than one port 38 for communication in a single instruction while, although it is possible, it is not anticipated that a write opcode will set more than one port 38 for communication in a single instruction.
The immediately following example will assume a communication wherein computer 12 e is attempting to write to computer 12 c, although the example is applicable to communication between any adjacent computers 12. When a write instruction is executed in a writing computer 12 e, the selected write line 20 (in this example, the write line 20 between computers 12 e and 12 c) is set high, if the corresponding read line 18 is already high then data is immediately sent from the selected location through the selected communications port 38. Alternatively, if the corresponding read line 18 is not already high, then computer 12 e will simply stop operation until the corresponding read line 18 does go high. The mechanism for stopping (or, more accurately, not enabling further operations on the computer 12 a when there is a read or write type instruction has been discussed previously herein. In short, the opcode of the instruction 52 will have a ‘0’ at bit position i4 66, and so the first OR gate input 62 of the OR gate 60 is low, and so the slot sequencer 42 is not triggered to generate an enabling pulse.
As for how the operation of the computer 12 e is resumed when a read or write type instruction is completed, the mechanism for that is as follows: When both the read line 18 and the corresponding write line 20 between computers 12 e and 12 c are high, then both lines 18 and 20 will released by each of the respective computers 12 that is holding it high. (In this example, the sending computer 12 e will be holding the write line 18 high while the receiving computer 12 c will be holding the read line 20 high). Then the receiving computer 12 c will pull both lines 18 and 20 low. In actual practice, the receiving computer 12 c may attempt to pull the lines 18 and 20 low before the sending computer 12 e has released the write line 18. However, since the lines 18 and 20 are pulled high and only weakly held (latched) low, any attempt to pull a line 18 or 20 low will not actually succeed until that line 18 or 20 is released by the computer 12 that is holding it high.
When both lines 18 and 20 in a data bus 16 are pulled low, this is an “acknowledge” condition. Each of the computers 12 e and 12 c will, upon the acknowledge condition, set its own internal acknowledge line 72 high. As can be seen in the view of FIG. 5, the acknowledge line 72 provides the second OR gate input 64. Since an input to either of the OR gate 60 inputs 62 or 64 will cause the output of the OR gate 60 to go high, this will initiate operation of the slot sequencer 42 in the manner previously described herein, such that the instruction 52 in the next slot 54 of the instruction word 48 will be executed. The acknowledge line 72 stays high until the next instruction 52 is decoded, in order to prevent spurious addresses from reaching the address bus.
In any case when the instruction 52 being executed is in the slot three position of the instruction word 48, the computer 12 will fetch the next awaiting eighteen-bit instruction word 48 unless, of course, bit i4 66 is a ‘0’ or, also, unless the instruction in slot three is a “next” instruction, which will be discussed in more detail hereinafter.
In actual practice, the present inventive mechanism includes a method and apparatus for “prefetching” instructions such that the fetch can begin before the end of the execution of all instructions 52 in the instruction word 48. However, this also is not a necessary aspect of the presently described invention.
The above example wherein computer 12 e is writing to computer 12 c has been described in detail. As can be appreciated in light of the above discussion, the operations are essentially the same whether computer 12 e attempts to write to computer 12 c first, or whether computer 12 c first attempts to read from computer 12 e. The operation cannot be completed until both computers 12 e and 12 c are ready, and whichever computer 12 e or 12 c is ready first simply “goes to sleep” until the other computer 12 e or 12 c completes the transfer. Another way of looking at the above described process is that, actually, both the writing computer 12 e and the receiving computer 12 c go to sleep when they execute the write and read instructions, respectively, but the last one to enter into the transaction reawakens nearly instantaneously when both the read line 18 and the write line 20 are high, whereas the first computer 12 to initiate the transaction can stay asleep nearly indefinitely until the second computer 12 is ready to complete the process.
The inventor believes that a key feature for enabling efficient asynchronous communications between devices is some sort of acknowledge signal or condition. In the prior art, most communication between devices has been clocked and there is no direct way for a sending device to know that the receiving device has properly received the data. Methods such as checksum operations may have been used to attempt to insure that data is correctly received, but the sending device has no direct indication that the operation is completed. The present inventive method, as described herein, provides the necessary acknowledge condition that allows, or at least makes practical, asynchronous communications between the devices. Furthermore, the acknowledge condition also makes it possible for one or more of the devices to “go to sleep” until the acknowledge condition occurs. Of course, an acknowledge condition could be communicated between the computers 12 by a separate signal being sent between the computers 12 (either over the interconnecting data bus 16 or over a separate signal line), and such an acknowledge signal would be within the scope of this aspect of the present invention. However, according to the embodiment of the invention described herein, it can be appreciated that there is even more economy involved here, in that the method for acknowledgement does not require any additional signal, clock cycle, timing pulse, or any such resource beyond that described, to actually effect the communication.
Since four instructions 52 can be included in an instruction word 48 and since, according to the present invention, an entire instruction word 48 can be communicated at one time between computers 12, this presents an ideal opportunity for transmitting a very small program in one operation. For example most of a small “For/Next” loop can be implemented in a single instruction word 48. FIG. 6 is a diagrammatic representation of a micro-loop 100. The micro-loop 100, not unlike other prior art loops, has a FOR instruction 102 and a NEXT instruction 104. Since an instruction word 48 (FIG. 4) contains as many as four instructions 52, an instruction word 48 can include three operation instructions 106 within a single instruction word 48. The operation instructions 106 can be essentially any of the available instructions that a programmer might want to include in the micro-loop 100. A typical example of a micro-loop 100 that might be transmitted from one computer 12 to another might be a set of instructions for reading from, or writing to the RAM 24 of the second computer 12, such that the first computer 12 could “borrow” available RAM 24 capacity.
The FOR instruction 102 pushes a value onto the return stack 28 representing the number of iterations desired. That is, the value on the T register 44 at the top of the data stack 34 is PUSHed into the R register 29 of the return stack 28. The FOR instruction 102, while often located in slot three 54 d of an instruction word 48 can, in fact, be located in any slot 54. Where the FOR instruction 102 is not located in slot three 54 d, then the remaining instructions 52 in that instruction word 48 will be executed before going on to the micro-loop 100, which will generally be the next loaded instruction word 103 that includes a NEXT instruction 104 and three operation instructions 106.
According to the presently described embodiment of the invention, the NEXT instruction 104 depicted in the view of FIG. 6 is a particular type of NEXT instruction 104. This is because it is located in slot three 54 d (FIG. 4). According to this embodiment of the invention, it is assumed that all of the data in a particular instruction word 40 that follows an “ordinary” NEXT instruction (not shown) is an address (the address where the for/next loop begins). The opcode for the NEXT instruction 104 is the same, no matter which of the four slots 54 it is in (with the obvious exception that the first two digits are assumed if it is slot three 54 d, rather than being explicitly written, as discussed previously herein). However, since there can be no address data following the NEXT instruction 104 when it is in slot three 54 d, it can be also assumed that the NEXT instruction 104 in slot three 54 d is a MICRO-NEXT instruction 104 a. The MICRO-NEXT instruction 104 a uses the address of the first instruction 52, located in slot zero 54 a of the same instruction word 48 in which it is located, as the address to which to return. The MICRO-NEXT instruction 104 a also takes the value from the R register 29 (which was originally PUSHed there by the FOR instruction 102), decrements it by 1, and then returns it to the R register 29. When the value on the R register 29 reaches a predetermined value (such as zero), then the MICRO-NEXT instruction will load the next instruction word 48 and continue on as described previously herein. However, when the MICRO-NEXT instruction 104 a reads a value from the R register 29 that is greater than the predetermined value, it will resume operation at slot zero 54 a of its own instruction word 48 and execute the three instructions 52 located in slots zero through three, inclusive, thereof. That is, a MICRO-NEXT instruction 104 a will always, in this embodiment of the invention, execute three operation instructions 106. Because, in some instances, it may not be desired to use all three potentially available instructions 52, a “no-op” instruction is available to fill one or two of the slots 54, as required.
It should be noted that micro-loops 100 can be used entirely within a single computer 12. Indeed, the entire set of available machine language instructions is available for use as the operation instructions 106, and the application and use of micro-loops is limited only by the imagination of the programmer. However, when the ability to execute an entire micro-loop 100 within a single instruction word 48 is combined with the ability to allow a computer 12 to send the instruction word 48 to a neighbor computer 12 to execute the instructions 52 therein essentially directly from the data bus 16, this provides a powerful tool for allowing a computer 12 to utilize the resources of its neighbors.
The small micro-loop 100, all contained within the single data word 48, can be communicated between computers 12, as described herein and it can be executed directly from the communications port 38 of the receiving computer 12, just like any other set of instructions contained in a instruction word 48, as described herein. While there are many uses for this sort of “micro-loop” 100, a typical use would be where one computer 12 wants to store some data onto the memory of a neighbor computer 12. It could, for example, first send an instruction to that neighbor computer telling it to store a incoming data word to a particular memory address, then increment that address, then repeat for a given number of iterations (the number of data words to be transmitted). To read the data back, the first computer would just instruct the second computer (the one used for storage here) to write the stored data back to the first computer, using a similar micro-loop.
By using the micro-loop 100 structure in conjunction with the direct execution aspect described herein, a computer 12 can use an otherwise resting neighbor computer 12 for storage of excess data when the data storage need exceeds the relatively small capacity built into each individual computer 12. While this example has been described in terms of data storage, the same technique can equally be used to allow a computer 12 to have its neighbor share its computational resources—by creating a micro-loop 100 that causes the other computer 12 to perform some operations, store the result, and repeat a given number of times. As can be appreciated, the number of ways in which this inventive micro-loop 100 structure can be used is nearly infinite.
As previously mentioned herein, in the presently described embodiment of the invention, either data or instructions can be communicated in the manner described herein and instructions can, therefore, be executed essentially directly from the data bus 16. That is, there is no need to store instructions to RAM 24 and then recall them before execution. Instead, according to this aspect of the invention, an instruction word 48 that is received on a communications port 38 is not treated essentially differently than it would be were it recalled from RAM 24 or ROM 26. While this lack of a difference is revealed in the prior discussion, herein, concerning the described operation of the computers 12, the following more specific discussion of how instruction words 48 are fetched and used will aid in the understanding of the invention.
One of the available machine language instructions is a FETCH instruction. The FETCH instruction uses the address on the A register 40 a to determine from where to fetch an 18 bit word. Of course, the program will have to have already provided for placing the correct address on the A register 40 a. As previously discussed herein, the A register 40 a is an 18 bit register, such that there is a sufficient range of address data available that any of the potential sources from which a fetch can occur can be differentiated. That is, there is a range of addresses assigned to ROM, a different range of addresses assigned to RAM, and there are specific addresses for each of the ports 38 and for the external I/O port 39. A FETCH instruction always places the 18 bits that it fetches on the T register 44.
In contrast, as previously discussed herein, executable instructions (as opposed to data) are temporarily stored in the instruction register 30 a. There is no specific command for “fetching” an 18 bit instruction word 48 into the instruction register 30 a. Instead, when there are no more executable instructions left in the instruction register 30 a, then the computer will automatically fetch the “next” instruction word 48. Where that “next” instruction word is located is determined by the “program counter” (the P register 40 c). The P register 40 c is often automatically incremented, as is the case where a sequence of instruction words 48 is to be fetched from RAM 24 or ROM 26. However, there are a number of exceptions to this general rule. For example, a JUMP or CALL instruction will cause the P register 40 c to be loaded with the address designated by the data in the remainder of the presently loaded instruction word 48 after the JUMP or CALL instruction, rather than being incremented. When the P register 40 c is then loaded with an address corresponding to one or more of the ports 38, then the next instruction word 48 will be loaded into the instruction register 30 a from the ports 38. The P register 40 c also does not increment when an instruction word 48 has just been retrieved from a port 38 into the instruction register 30 a. Rather, it will continue to retain that same port address until a specific JUMP or CALL instruction is executed to change the P register 40 c. That is, once the computer 12 is told to look for its next instruction from a port 38, it will continue to look for instructions from that same port 38 (or ports 38) until it is told to look elsewhere, such as back to the memory (RAM 24 or ROM 26) for its next instruction word 48.
As noted above, the computer 12 knows that the next eighteen bits fetched is to be placed in the instruction register 30 a when there are no more executable instructions left in the present instruction word 48. By default, there are no more executable instructions left in the present instruction word 48 after a JUMP or CALL instruction (or also after certain other instructions that will not be specifically discussed here) because, by definition, the remainder of the 18 bit instruction word following a JUMP or CALL instruction is dedicated to the address referred to by the JUMP or CALL instruction. Another way of stating this is that the above described processes are unique in many ways, including but not limited to the fact that a JUMP or CALL instruction can, optionally, be to a port 38, rather than to just a memory address, or the like.
It should be remembered that, as discussed previously herein, the computer 12 can look for its next instruction from one port 38 or from any of a group of the ports 38. Therefore, addresses are provided to correspond to various combinations of the ports 38. When, for example, a computer is told to fetch an instruction from a group of ports 38, then it will accept the first available instruction word 48 from any of the selected ports 38. If no neighbor computer 12 has already attempted to write to any of those ports 38, then the computer 12 in question will “go to sleep”, as described in detail above, until a neighbor does write to the selected port 38.
FIG. 7 is a flow diagram depicting an example of the above described direct execution method 120. A “normal” flow of operations will commence when, as discussed previously herein, there are no more executable instructions left in the instruction register 30 a. At such time, the computer 12 will “fetch” another instruction word (note that the term “fetch” is used here in a general sense, in that an actual FETCH instruction is not used), as indicated by a “fetch word” operation 122. That operation will be accomplished according to the address in the P register 40 c (as indicated by an “address” decision operation 124 in the flow diagram of FIG. 7. If the address in the P register 40 c is a RAM 24 or ROM 26 address, then the next instruction word 48 will be retrieved from the designated memory location in a “fetch from memory” operation 126. If, on the other hand, the address in the P register 40 c is that of a port 38 or ports 38 (not a memory address) then the next instruction word 48 will be retrieved from the designated port location in a “fetch from port” operation 128. In either case, the instruction word 48 being retrieved is placed in the instruction register 30 c in a “retrieve instruction word” operation 130. In an “execute instruction word” operation 132, the instructions in the slots 54 of the instruction word 48 are accomplished sequentially, as described previously herein.
In a “jump” decision operation 134 it is determined if one of the operations in the instruction word 48 is a JUMP instruction, or other instruction that would divert operation away from the continued “normal” progression as discussed previously herein. If yes, then the address provided in the instruction word 48 after the JUMP (or other such) instruction is provided to the P register 40 c in a “load P register” operation 136, and the sequence begins again in the “fetch word” operation 122, as indicated in the diagram of FIG. 7. If no, then the next action depends upon whether the last instruction fetch was from a port 38 or from a memory address, as indicated in a “port address” decision operation 138. If the last instruction fetch was from a port 38, then no change is made to the P register 30 a and the sequence is repeated starting with the “fetch word” operation 122. If, on the other hand, the last instruction fetch was from a memory address (RAM 24 or ROM 26), then the address in the P register 30 a is incremented, as indicated by an “increment P register” operation 140 in FIG. 7, before the “fetch word” operation 122 is accomplished.
The above description is not intended to represent actual operational steps. Instead, it is a diagram of the various decisions and operations resulting there from that are performed according to the described embodiment of the invention. Indeed, this flow diagram should not be understood to mean that each operation described and shown requires a separate distinct sequential step. In fact many of the described operations in the flow diagram of FIG. 7 will, in practice, be accomplished generally simultaneously.
FIG. 8 is a flow diagram depicting an example of the inventive improved method for alerting a computer. As previously discussed herein, the computers 12 of the embodiment described will “go to sleep” while awaiting an input. Such an input can be from a neighboring computer 12, as in the embodiment described in relation to FIGS. 1 through 5. Alternatively, as was also discussed previously herein, the computers 12 that have communication ports 38 that abut the edge of the die 14 can have additional circuitry, either designed into such computer 12 or else external to the computer 12 but associated therewith, to cause such communication port 38 to act as an external I/O port 39. In either case, the inventive combination can provide the additional advantage that the “sleeping” computer 12 can be poised and ready to awaken and spring into some prescribed action when an input is received. Therefore, this invention also provides an alternative to the use of interrupts to handle inputs, whether such inputs come from an external input device, or from another computer 12 in the array 10.
Instead of causing a computer 12 to have to stop (or pause) what it is doing in order to handle an interrupt, the inventive combination described herein will allow for a computer 12 to be in an “asleep but alert” state, as described above. Therefore, one or more computers 12 can be assigned to receive and act upon certain inputs. While there are numerous ways in which this feature might be used, an example that will serve to illustrate just one such “computer alert method” is illustrated in the view of FIG. 8 and is enumerated therein by the reference character 150. As can be seen in the view of FIG. 8, in an “enter alert state” operation 152, a computer 12 is caused to “go to sleep” such that it is awaiting input from an neighbor computer 12, or more than one (as many as all four) neighbor computers or, in the case of a “edge” computer 12 an external input, or some combination of external inputs and/or inputs from a neighbor computer 12. As described previously herein, a computer 12, can “go to sleep” awaiting completion of either a read or a write operation. Where the computer 12 is being used, as described in this example, to await some possible “input”, then it would be natural to assume that the waiting computer has set its read line 18 high awaiting a “write” from the neighbor or outside source. Indeed, it is presently anticipated that this will be the usual condition. However, it is within the scope of the invention that the waiting computer 12 will have set its write line 20 high and, therefore, that it will be awakened when the neighbor or outside source “reads” from it.
In an “awaken” operation 154, the sleeping computer 12 is caused to resume operation because the neighboring computer 12 or external device 39 has completed the transaction being awaited. If the transaction being awaited was the receipt of an instruction word 48 to be executed, then the computer 12 will proceed to execute the instructions therein. If the transaction being awaited was the receipt of data, then the computer 12 will proceed to execute the next instruction in queue, which will be either the instruction in the next slot 54 in the present instruction word 48, or else the next instruction word 48 will be loaded and the next instruction will be in slot 0 of that next instruction word 48. In any case, while being used in the described manner, then that next instruction will begin a sequence of one or more instructions for handling the input just received. Options for handling such input can include reacting to perform some predefined function internally, communicating with one or more of the other computers 12 in the array 10, or even ignoring the input (just as conventional prior art interrupts may be ignored under prescribed conditions). The options are depicted in the view of FIG. 8 as an “act on input” operation 156. It should be noted that, in some instances, the content of the input may not be important. In some cases, for example, it may be only the very fact that an external device has attempted communication that is of interest.
If the computer 12 is assigned the task of acting as an “alert” computer, in the manner depicted in FIG. 8, then it will generally return to the “asleep but alert” status, as indicated in FIG. 8. However, the option is always open to assign the computer 12 some other task, such as when it is no longer necessary to monitor the particular input or inputs there being monitored, or when it is more convenient to transfer that task to some other of the computers 12 in the array.
One skilled in the art will recognize that this above described operating mode will be useful as a more efficient alternative to the conventional use of interrupts. When a computer 12 has one or more of its read lines 18 (or a write line 20) set high, it can be said to be an “alert” condition. In the alert condition, the computer 12 is ready to immediately execute any instruction sent to it on the data bus 16 corresponding to the read line or lines 18 that are set high or, alternatively, to act on data that is transferred over the data bus 16. Where there is an array of computers 12 available, one or more can be used, at any given time, to be in the above described alert condition such that any of a prescribed set of inputs will trigger it into action. This is preferable to using the conventional interrupt technique to “get the attention” of a computer, because an interrupt will cause a computer to have to store certain data, load certain data, and so on, in response to the interrupt request. While, according to the present invention, a computer can be placed in the alert condition and dedicated to awaiting the input of interest, such that not a single instruction period is wasted in beginning execution of the instructions triggered by such input. Again, note that in the presently described embodiment, computers in the alert condition will actually be “asleep but alert”, meaning that they are “asleep” in the sense that they are using essentially no power, but “alert” in that they will be instantly triggered into action by an input. However, it is within the scope of this aspect of the invention that the “alert” condition could be embodied in a computer even if it were not “asleep”. The described alert condition can be used in essentially any situation where a conventional prior art interrupt (either a hardware interrupt or a software interrupt) might have otherwise been used.
FIG. 9 is another example of a computer alert method 150 a. This is but one example wherein interaction between a monitoring computer 12 f (FIG. 1) and another computer 12 g (FIG. 1) that is assigned to some other task may be desirable or necessary. As can be seen in the view of FIG. 9, there are two generally independent flow charts, one for each of the computers 12 f and 12 g. This is indicative of the nature of the cooperative coprocessor approach of the present invention, wherein each of the computers 12 has its own assignment which it carries out generally independently, except for occasions when interaction is accomplished as described herein.
Regarding the computer 12 f, the “enter alert status” operation 152, the “awaken” operation 154 and the “act on input” operation each are accomplished as described previously herein in relation to the first example of the computer alert method 150. However, because this example anticipates a possible need for interaction between the computers 12 f and 12 g, then following the “act on input” operation 156, the computer 12 f enters a “send info?” decision operation 158 wherein, according to its programming, it is determined if the input just received requires the attention of the other computer 12 g. If no, then the computer 12 f returns to alert status, or some other alternative such as was discussed previously herein. If yes, then the computer 12 f initiates communication with the computer 12 g as described in detail previously herein in a “send to other” operation 160. It should be noted that, according to the choice of the programmer, the computer 12 f could be sending instructions such as it may have generated internally in response to the input from the external device 82 or such as it may have received from the external device 82. Alternatively, the computer 12 f could pass on data to the computer 12 g and such data could be internally generated in computer 12 f or else “passed through” from the external device 82. Still another alternative might be that the computer 12 f, in some situations, might attempt to read from the computer 12 g when it receives an input from the external device 82. All of these opportunities are available to the programmer.
Meanwhile, the computer 12 g is generally executing code to accomplish its assigned primary task, whatever that might be, as indicated in an “execute primary function” operation 162. However, if the programmer has decided that occasional interaction between the computers 12 f and 12 g is desirable, then the programmer will have provided that the computer 12 g occasionally pause to see if one or more of its neighbors has attempted a communication, as indicated in a “look for input” operation 166. As indicated by an “input?” decision operation 168, if there is a communication waiting (as, for example, if the computer 12 f has already initiated a write to the computer 12 g). If there has been a communication initiated (yes) then the computer 12 g will complete the communication, as described in detail previously herein, in a “receive from other” operation 170. If no, then the computer 12 g will return to the execution of its primary function 162, as shown in FIG. 9. After the “receive from other” operation 170, the computer 12 g will act on the input received in an “act on input” operation 172. As mentioned above, the programmer could have provided that the computer 12 g would be expecting instructions as an input, in which case the computer 12 g would execute the instructions as described previously herein. Alternatively, the computer 12 g might be programmed to be expecting data to act upon.
In the example of FIG. 9, it is shown that following the “act on input” operation 172, then the computer 12 g returns to the accomplishment of its primary function (that is, it returns to the “execute primary function” operation 162). However the possibility of even more complicated examples certainly exists. For instance, the programming might be such that certain inputs received from the computer 12 f will cause it to abort its previously assigned primary function and begin a new one, or else it might simply temporarily stop and await further input. As one skilled in the art will recognize, the various possibilities for action here are limited only by the imagination of the programmer.
It should be noted that, according to the embodiment of the invention described herein, a given computer 12 need not be interrupted while it is performing a task because another computer 12 is assigned the task of monitoring and handling inputs that might otherwise require an interrupt. However, it is interesting to note also that the computer 12 that is busy handling another task also cannot be disturbed unless and until its programming provides that it look to its ports 38 for input. Therefore, it will sometimes be desirable to cause the computer 12 to pause to look for other inputs. It is important to realize that what is being described here is an example of a paradigm in computing that might be described as “cooperative multi-tasking” wherein tasks that might formerly have been accomplished by a single processor are divided, in new an interesting ways, among several processors.
FIG. 10 shows a flowchart summarizing one method 200 according to the present invention where one computer 12 (e.g., computer 12 e) uses a micro-loop 100, such as that shown FIG. 6, to borrow the memory resources of another computer 12 (e.g., computer 12 b). In a first step 202, computer 12 b pushes a count value onto its return stack 28. Then, in a second step 204, the computer 12 b fetches an instruction word 48, including a group of instructions 52 (e.g., a micro-loop 100), into its instruction register. According to a particular method, computer 12 e of the computer array 10 provides the micro-loop to the computer 12 b via a data bus 16 connecting the two computers. Then, in a third step 206, the computer 12 b sequentially executes the instructions 52 in the micro-loop. In one instance, where computer 12 e wants to borrow memory space in the computer 12 b, the instructions cause the computer 12 b to store an incoming data word, asserted on to the data bus 16 by computer 12 e, to its memory, for example, in its RAM 24. According to another option, where the computer 12 e wants to retrieve information that has been previously stored in the computer 12 b, the instruction word 48 causes the computer 12 b to assert a data word stored in its memory (e.g., RAM 24) on the data bus 16 between the computer 12 b and the another computer 12 e. Next, in a fourth step 208, count value on the return stack 28 is changed (e.g., decremented). Then, in a fifth step 210, if the count value is greater than a predetermined value (e.g., 0), the computer 12 b executes the micro-loop instruction word 48 in the instruction register again. If the count value is not greater than the predetermined value, then method 200 ends.
Various modifications may be made to the invention without altering its value or scope. For example, while this invention has been described herein using the example of the particular computers 12, many or all of the inventive aspects are readily adaptable to other computer designs, other sorts of computer arrays, and the like.
Similarly, while the present invention has been described primarily herein in relation to communications between computers 12 in an array 10 on a single die 14, the same principles and methods can be used, or modified for use, to accomplish other inter-device communications, such as communications between a computer 12 and its dedicated memory or between a computer 12 in an array 10 and an external device.
While specific examples of the inventive computer arrays 10, computers 12, micro-loops 100, direct execution method 120 and associated apparatus, and computer alert method 150 have been discussed herein, it is expected that there will be a great many applications for these which have not yet been envisioned. Indeed, it is one of the advantages of the present invention that the inventive method and apparatus may be adapted to a great variety of uses.
All of the above are only some of the examples of available embodiments of the present invention. Those skilled in the art will readily observe that numerous other modifications and alterations may be made without departing from the spirit and scope of the invention. Accordingly, the disclosure herein is not intended as limiting and the appended claims are to be interpreted as encompassing the entire scope of the invention.
INDUSTRIAL APPLICABILITY
The inventive computer arrays 10, computers 12, micro-loops 100, direct execution method 120 and associated apparatus, and computer alert method 150 are intended to be widely used in a great variety of computer applications. It is expected that they will be particularly useful in applications where significant computing power is required, and yet power consumption and heat production are important considerations.
As discussed previously herein, the applicability of the present invention is such that the sharing of information and resources between the computers in an array is greatly enhanced, both in speed a versatility. Also, communications between a computer array and other devices is enhanced according to the described method and means.
Since the computer arrays 10, computers 12, micro-loops 100, direct execution method 120 and associated apparatus, and computer alert method 150 of the present invention may be readily produced and integrated with existing tasks, input/output devices, and the like, and since the advantages as described herein are provided, it is expected that they will be readily accepted in the industry. For these and other reasons, it is expected that the utility and industrial applicability of the invention will be both significant in scope and long-lasting in duration.

Claims (60)

1. In a computer, an improvement comprising:
a plurality of instructions that are read simultaneously into an instruction register of said computer; and wherein
said computer repeats said plurality of instructions a quantity of iterations as indicated by a number on a stack of said computer, said number on said stack indicative of a number of data words to be transferred between said computer and another computer via a data bus;
said computer and said another computer are integrated in a single circuit substrate;
each time said computer executes said plurality of instructions, said computer stores one of said data words asserted on said data bus in a memory of the computer, said data word asserted on said data bus by said another computer; and
repeating said plurality of instructions said number of times causes said number of data words asserted on said data bus to be sequentially stored in said memory.
2. The computer of claim 1, wherein:
the number on the stack is decremented after each iteration.
3. The computer of claim 1, wherein:
a last instruction in said plurality of instructions is a NEXT instruction.
4. The computer of claim 3, wherein:
the NEXT instruction causes the number on the stack to change and further causes operation of the computer to continue at a first instruction in said plurality of instructions until the number on the stack reaches a predetermined value.
5. The computer of claim 1, wherein:
when a NEXT instruction is the last instruction in said plurality of instructions then the computer will continue operation at the first instruction in said plurality of instructions until the number on the stack reaches a predetermined value.
6. The computer of claim 5, wherein:
the predetermined value is zero.
7. The computer of claim 1, and further including:
a FOR instruction preceding said plurality of instructions.
8. The computer of claim 7, wherein:
said FOR instruction causes the number to be placed on the stack.
9. The computer of claim 7, wherein:
said FOR instruction is in a group of instructions immediately preceding said plurality of instructions.
10. A method for causing a computer to execute a loop, the method comprising:
configuring the computer such that when a NEXT instruction occurs as the last instruction in a group of instructions in an instruction register of the computer then the computer will sequentially execute the entire group of instructions in the instruction register after the NEXT instruction for a number of times indicated by a counter;
placing the NEXT instruction in the last position in said group of instructions;
placing a FOR instruction prior to said group of instructions wherein said FOR instruction sets the counter; and
selecting said group of instructions such that the sequential execution of the entire group of instructions said number of times causes a plurality of data words to be sequentially transferred between the computer and another computer via a data bus, the another computer located on a same integrated circuit as the computer; and wherein
the group of instructions is read by the computer simultaneously as a single unit into the instruction register;
each time the computer executes the group of instructions, the computer stores one of the plurality of data words asserted on the data bus in a memory of the computer, the data word asserted on the data bus by the another computer; and
sequentially executing the group of instructions the number of times causes the plurality of data words asserted on the data bus to be sequentially stored in the memory.
11. The method of claim 10, wherein:
said group of instructions includes four instructions.
12. The method of claim 10, wherein:
at least one of the instructions in said group of instructions is a NO-OP instruction, wherein the NO-OP instruction is a place holder that the computer will skip over without executing.
13. The method of claim 10, wherein:
the counter is a value on the top of a stack.
14. The method of claim 10, wherein:
the NEXT instruction causes the counter to change.
15. The method of claim 10, wherein:
the NEXT instruction causes the counter to be decremented.
16. A method for executing instructions in a computer, comprising:
(a) placing a number in a register, the number indicative of a number of data words to be transferred between the computer and another computer via a data bus;
(b) fetching a group of instructions into an instruction register;
(c) executing the instructions in said group of instructions sequentially;
(d) causing the number in the register to change; and
(e) repeating steps (c) and (d) until the number in the register reaches a predetermined value; and wherein
repeating steps (c) and (d) causes the number of data words to be sequentially transferred between a memory of said computer and said another computer, said another computer located on a same integrated circuit as said computer;
each time steps (c) and (d) are repeated, the computer stores one of the data words asserted on the data bus to the memory of the computer, the stored data word being asserted on the data bus by said another computer; and
repeating steps (c) and (d) cause the number of data words asserted on the data bus to be sequentially stored in the memory of the computer.
17. The method of claim 16, and further including:
fetching another group of instructions; and
continuing with the first instruction in said another group of instructions.
18. The method of claim 16, wherein:
in step (c) the instructions are executed from the instruction register.
19. The method of claim 16, wherein:
in step (b) the instructions in the group of instructions are fetched simultaneously.
20. The method of claim 16, wherein:
the register is the top element in a stack.
21. The method of claim 16, wherein:
in step (d) the number decrements.
22. A computer comprising:
means for reading a plurality of instructions generally simultaneously into an instruction register of a processor;
means for repeating said plurality of instructions a quantity of iterations while said plurality of instructions remain in said instruction register, said quantity of iterations indicative of a number of data words to be transferred between said computer and another computer; and
means for storing or asserting one of said data words from or on a data bus coupled between said computer and said another computer; and wherein
said computer is one of a plurality of computers formed in a single integrated circuit substrate;
said quantity of iterations is indicated by a number on a stack;
each time said computer executes said plurality of instructions said computer stores one of said data words asserted on said data bus by said another computer in a memory of said computer or each time said computer executes said plurality of instructions said computer asserts one of said data words stored in said memory on said data bus; and
repeating said plurality of instructions said quantity of iterations causes said number of data words to be sequentially stored in said memory or repeating said plurality of instructions said quantity of iterations causes said number of data words to be sequentially asserted on said data bus from said memory.
23. The computer of claim 1, wherein said plurality of instructions includes a complete program loop that is executed multiple times while said plurality of instructions remains in said instruction register.
24. The method of claim 10, wherein the group of instructions remains in the instruction register while the entire group of instructions is executed said number of times indicated by said counter.
25. The method of claim 16, wherein the group of instructions remain in the instruction register while the steps (c) and (d) are repeated until the number in the register reaches the predetermined value.
26. The computer of claim 1, wherein said another computer sends said plurality of instructions to said computer via said data bus.
27. The computer of claim 1, wherein each time said computer executes said plurality of instructions, said computer stores one of said data words at a previously-defined memory address in said memory and then changes said previously-defined memory address.
28. The computer of claim 1, further comprising:
a second plurality of instructions that are read simultaneously into said instruction register of said computer; and wherein
said second plurality of instructions is repeated a quantity of iterations as indicated by a second number on said stack of said computer, said second number indicative of a second number of data words to be transferred between said computer and said another computer;
said computer repeats said second plurality of instructions a quantity of iterations as indicated by said second number on said stack;
each time said computer executes said second plurality of instructions, said computer asserts one of said second number of data words stored in said memory on said data bus; and
repeating said second plurality of instructions said second number of times causes said second number of data words to be sequentially asserted on said data bus from said memory.
29. The computer of claim 28, wherein each time said computer executes said second plurality of instructions, said computer asserts one of said second number of data words located at a previously-defined memory address in said memory on said data bus and then changes said previously-defined memory address.
30. The computer of claim 28, wherein said number on said stack is equal to said second number on said stack.
31. The method of claim 10, wherein the another computer sends said group of instructions to said computer via the data bus.
32. The method of claim 10, wherein each time the computer executes the group of instructions, the computer stores one of the data words at a previously-defined memory address in the memory and then changes the previously-defined memory address.
33. The method of claim 10, further comprising:
placing the NEXT instruction in the last position in a second group of instructions;
placing a second FOR instruction prior to said second group of instructions wherein said second FOR instruction sets the counter to a second number; and
selecting said second group of instructions such that the sequential execution of the entire second group of instructions said second number of times causes a second plurality of data words to be sequentially transferred between the computer and the another computer over the data bus; and wherein
each time the computer executes the second group of instructions, the computer asserts one of the second plurality of data words stored in the memory on the data bus; and
sequentially executing the second plurality of instructions the second number of times causes the second plurality of data words to be sequentially asserted on the data bus from the memory of the computer.
34. The method of claim 33, wherein each time the computer executes the second group of instructions, the computer asserts one of the second number of data words located at a previously-defined memory address in the memory on the data bus and then changes the previously-defined memory address.
35. The method of claim 33, wherein the number indicated by the counter is equal to the second number indicated by the counter.
36. The method of claim 16, wherein the another computer sends said group of instructions to said computer via the data bus.
37. The method of claim 16, wherein each time the computer executes the group of instructions, the computer stores one of the data words at a previously-defined memory address in the memory and then changes the previously-defined memory address.
38. A method of claim 16, further comprising:
(f) placing a second number in the register, the second number indicative of a second number of data words to be transferred between the computer and the another computer;
(g) fetching a second group of instructions into the instruction register;
(h) executing the instructions in the second group of instructions sequentially;
(i) causing the second number in the register to change; and
(j) repeating steps (h) and (i) until the second number in the register reaches a predetermined value; and wherein
repeating steps (h) and (i) causes the second number of data words to be sequentially transferred between the computer and the another computer;
each time steps (h) and (i) are repeated, the computer asserts one of the second number of data words stored in the memory on the data bus; and
repeating steps (h) and (i) causes the second number of data words to be sequentially asserted on the data bus from the memory of the computer.
39. The method of claim 38, wherein each time the computer executes the second group of instructions, the computer asserts one of the second number of data words located at a previously-defined memory address in the memory on the data bus and then changes the previously-defined memory address.
40. The method of claim 38, wherein the number placed in the register is equal to the second number placed in the register.
41. A computer array comprising:
a first computer including a port and a second computer including a port, the port of the first computer coupled to the port of the second computer via a data bus; and wherein
the first computer is operative to transfer an instruction word to the second computer via the data bus, the instruction word including a plurality of instructions defining a loop operation;
the second computer is operative to load the plurality of instructions simultaneously into an instruction register of the second computer;
the second computer is operative to execute the plurality of instructions a quantity of iterations as indicated by a number on a stack of the second computer; and
each time the second computer executes the plurality of instructions, the second computer performs operations defined by the instruction word provided by the first computer such that the first computer is able to utilize at least one of a computational resource and a memory of the second computer.
42. The computer array of claim 41, wherein:
the number on the stack is decremented after each iteration.
43. The computer array of claim 41, wherein:
a last instruction in said plurality of instructions is a NEXT instruction.
44. The computer array of claim 43, wherein:
the NEXT instruction causes the number on the stack to change and further causes operation of the second computer to continue at a first instruction in said plurality of instructions until the number on the stack reaches a predetermined value.
45. The computer array of claim 41, wherein:
when a NEXT instruction is the last instruction in said plurality of instructions then the second computer will continue operation at the first instruction in said plurality of instructions until the number on the stack reaches a predetermined value.
46. The computer array of claim 45, wherein:
the predetermined value is zero.
47. The computer array of claim 41, and further including:
a FOR instruction preceding said plurality of instructions; and
wherein said FOR instruction causes the number to be placed on the stack.
48. The computer array of claim 47, wherein:
said FOR instruction is in a group of instructions immediately preceding said plurality of instructions, the group of instructions being provided to the second computer by the first computer.
49. The computer array of claim 41, wherein:
the quantity of iterations indicates a predetermined number of data words to be transferred between the first computer and the second computer via the data bus;
each time the second computer executes the plurality of instructions, the second computer either stores one of the predetermined number of data words from the data bus to the memory of the second computer or asserts one of the predetermined number of data words on the data bus from the memory; and
repeating the plurality of instructions for the quantity of iterations by the second computer causes the predetermined number of data words to be transferred between the first computer and the second computer.
50. The computer array of claim 49, wherein:
each time the second computer executes the plurality of instructions, the second computer stores one of the predetermined number of data words from the data bus to the memory;
the first computer is operative to transfer a second instruction word to the second computer via the data bus, the second instruction word including a second plurality of instructions defining a second loop operation;
the second computer is operative to load the second instruction word including the second plurality of instructions simultaneously into the instruction register of the second computer;
the second computer is operative to execute the second plurality of instructions a second quantity of iterations as indicated by a second number on the stack of the second computer;
the second quantity of iterations indicates a second predetermined number of data words to be transferred between the first computer and the second computer via the data bus;
each time the second computer executes the second plurality of instructions, the second computer asserts one of the second predetermined number of data words on the data bus from the memory; and
repeating the second plurality of instructions for the second quantity of iterations by the second computer causes the second predetermined number of data words to be transferred from the second computer to the first computer.
51. The computer array of claim 50, wherein:
the quantity of iterations equals the second quantity of iterations.
52. A method of operating a computer system including first and second independently-functioning computers having respective ports linked by a data bus, the method comprising:
transferring an instruction word from the first computer to the second computer via the ports and the data bus, the instruction word comprising a plurality of instructions defining a loop operation to be executed by the second computer;
reading at the second computer the plurality of instructions of the instruction word in one operation; and
executing at the second computer the plurality of instructions for a predetermined number of iterations; and wherein
each time the second computer executes the plurality of instructions, the second computer performs operations defined by the instruction word provided by the first computer such that the first computer is able to utilize at least one of a computational resource and a memory of the second computer.
53. The method of claim 52, further comprising:
transferring a predetermined number of data words between the first computer and the second computer; and wherein
each time the second computer executes the plurality of instructions, the second computer is operative to store one of the predetermined number of data words from the data bus to the memory of the second computer or to assert one of the predetermined number of data words on the data bus from the memory; and
the predetermined number of iterations indicates the predetermined number of data words to be transferred between the first computer and the second computer.
54. A method of claim 53, wherein:
each time the second computer executes the plurality of instructions, the second computer stores one of the predetermined number of data words from the data bus to the memory, said method further comprising:
reading at the second computer a second instruction word including a second plurality of instructions in one operation;
executing at the second computer the second plurality of instructions a second predetermined number of iterations;
transferring a second predetermined number of data words between the first computer and the second computer; and wherein
each time the second computer executes the second plurality of instructions, the second computer asserts one of the second predetermined number of data words on the data bus from the memory of the second computer; and
repeating the second plurality of instructions the second predetermined number of iterations causes the second computer to sequentially assert the second predetermined number of data words on the data bus from the memory.
55. The method of claim 54, wherein:
the predetermined number of iterations is equal to the second predetermined number of iterations.
56. The method of claim 52, wherein:
the instruction word defines a sequence of operations; and
the last instruction in the sequence is a NEXT instruction.
57. The method of claim 52, wherein:
the number of iterations is set by a FOR instruction in an instruction word occurring before the instruction word defining the loop operation.
58. An electronically-readable non-transitory storage medium having code embodied therein for causing a computer to perform a method for executing a loop, the method comprising:
configuring the computer such that when a NEXT instruction occurs as the last instruction in a group of instructions in an instruction register of the computer then the computer will sequentially execute the entire group of instructions in the instruction register after the NEXT instruction for a number of times indicated by a counter;
placing the NEXT instruction in the last position in said group of instructions;
placing a FOR instruction prior to said group of instructions wherein said FOR instruction sets the counter; and
selecting said group of instructions such that the sequential execution of the entire group of instructions said number of times causes a plurality of data words to be sequentially transferred between the computer and another computer via a data bus, the another computer located on a same integrated circuit as the computer; and wherein
the group of instructions is read by the computer simultaneously as a single unit into the instruction register;
each time the computer executes the group of instructions, the computer stores one of the plurality of data words asserted on the data bus in a memory of the computer, the data word asserted on the data bus by the another computer; and
sequentially executing the plurality of instructions the number of times causes the plurality of data words asserted on the data bus to be sequentially stored in the memory.
59. An electronically-readable non-transitory storage medium having code embodied therein for causing a computer to perform a method for executing instructions in the computer, said method comprising:
(a) placing a number in a register, the number indicative of a number of data words to be transferred between the computer and another computer via a data bus;
(b) fetching a group of instructions into an instruction register;
(c) executing the instructions in said group of instructions sequentially;
(d) causing the number in the register to change; and
(e) repeating steps (c) and (d) until the number in the register reaches a predetermined value; and wherein
repeating steps (c) and (d) causes the number of data words to be sequentially transferred between a memory of said computer and said another computer, said another computer located on a same integrated circuit as said computer;
each time steps (c) and (d) are repeated, the computer stores one of the data words asserted on the data bus to the memory of the computer, the read data word being asserted on the data bus by said another computer; and
repeating steps (c) and (d) cause the number of data words asserted on the data bus to be sequentially stored in the memory of the computer.
60. An electronically-readable non-transitory storage medium having code embodied therein for performing a method of operating a computer system including first and second independently-functioning computers having respective ports linked by a data bus, the method comprising:
transferring an instruction word from the first computer to the second computer via the ports and the data bus, the instruction word comprising a plurality of instructions defining a loop operation to be executed by the second computer;
reading at the second computer the plurality of instructions of the instruction word in one operation; and
executing at the second computer the plurality of instructions for a predetermined number of iterations; and wherein
each time the second computer executes the plurality of instructions, the second computer performs operations defined by the instruction word provided by the first computer such that the first computer is able to utilize at least one of a computational resource and a memory of the second computer.
US11/441,812 2006-02-16 2006-05-26 Processor and method for executing a program loop within an instruction word Expired - Fee Related US7913069B2 (en)

Priority Applications (32)

Application Number Priority Date Filing Date Title
US11/441,812 US7913069B2 (en) 2006-02-16 2006-05-26 Processor and method for executing a program loop within an instruction word
EP07250646A EP1821200B1 (en) 2006-02-16 2007-02-15 Method and apparatus for handling inputs to a single-chip multiprocessor system
AT07250649T ATE495491T1 (en) 2006-02-16 2007-02-15 EXECUTION OF INSTRUCTIONS DIRECTLY FROM THE INPUT SOURCE
DE602007011841T DE602007011841D1 (en) 2006-02-16 2007-02-15 Execution of instructions directly from the input source
AT07250646T ATE512400T1 (en) 2006-02-16 2007-02-15 METHOD AND APPARATUS FOR TREATING INPUTS TO A SINGLE-CHIP MULTI-PROCESSOR SYSTEM
EP07250644A EP1821198A1 (en) 2006-02-16 2007-02-15 Circular register arrays of a computer
EP07250647A EP1821211A3 (en) 2006-02-16 2007-02-15 Cooperative multitasking method in a multiprocessor system
EP07250649A EP1821202B1 (en) 2006-02-16 2007-02-15 Execution of instructions directly from input source
EP07250614A EP1821199B1 (en) 2006-02-16 2007-02-15 Execution of microloop computer instructions received from an external source
KR1020077009922A KR20090016644A (en) 2006-02-16 2007-02-16 Computer system with increased operating efficiency
PCT/US2007/004030 WO2007098006A2 (en) 2006-02-16 2007-02-16 Execution of instructions directly from input source
JP2008555370A JP2009527814A (en) 2006-02-16 2007-02-16 Allocating resources between arrays of computers
PCT/US2007/004081 WO2007098024A2 (en) 2006-02-16 2007-02-16 Allocation of resources among an array of computers
PCT/US2007/004083 WO2007098026A2 (en) 2006-02-16 2007-02-16 Method and apparatus for monitoring inputs to a computer
KR1020087022319A KR20090003217A (en) 2006-02-16 2007-02-16 Allocation of resources among an array of computers
KR1020077009924A KR20090004394A (en) 2006-02-16 2007-02-16 Execution of instructions directly from input source
TW096106394A TW200809613A (en) 2006-02-16 2007-02-16 Execution of instructions directly from input source
JP2008555353A JP2009527808A (en) 2006-02-16 2007-02-16 Micro loop computer instruction
KR1020077009923A KR20090017390A (en) 2006-02-16 2007-02-16 Microloop computer instructions
EP07750884A EP1984836A4 (en) 2006-02-16 2007-02-16 Allocation of resources among an array of computers
PCT/US2007/004029 WO2007098005A2 (en) 2006-02-16 2007-02-16 Microloop computer instructions
TW096106397A TW200809609A (en) 2006-02-16 2007-02-16 Microloop computer instructions
JP2008555354A JP2009527809A (en) 2006-02-16 2007-02-16 Executing instructions directly from the input source
TW096106396A TW200809531A (en) 2006-02-16 2007-02-16 Method and apparatus for monitoring inputs to a computer
JP2008555372A JP2009527816A (en) 2006-02-16 2007-02-16 Method and apparatus for monitoring input to a computer
PCT/US2007/004082 WO2007098025A2 (en) 2006-02-16 2007-02-16 Computer system with increased operating efficiency
KR1020077009925A KR20090016645A (en) 2006-02-16 2007-02-16 Method and apparatus for monitoring inputs to a computer
JP2008555371A JP2009527815A (en) 2006-02-16 2007-02-16 Computer system with increased operating efficiency
PCT/US2007/012539 WO2007139964A2 (en) 2006-05-26 2007-05-25 Circular register arrays of a computer
KR1020087028864A KR20090019806A (en) 2006-05-26 2007-05-25 Circular register arrays of a computer
JP2009513215A JP2009538488A (en) 2006-05-26 2007-05-25 Computer circular register array
US13/053,062 US8468323B2 (en) 2006-02-16 2011-03-21 Clockless computer using a pulse generator that is triggered by an event other than a read or write instruction in place of a clock

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US11/355,513 US7904695B2 (en) 2006-02-16 2006-02-16 Asynchronous power saving computer
US78826506P 2006-03-31 2006-03-31
US79734506P 2006-05-03 2006-05-03
US11/441,812 US7913069B2 (en) 2006-02-16 2006-05-26 Processor and method for executing a program loop within an instruction word

Related Parent Applications (2)

Application Number Title Priority Date Filing Date
US11/355,513 Continuation-In-Part US7904695B2 (en) 2005-05-26 2006-02-16 Asynchronous power saving computer
US13/053,062 Continuation-In-Part US8468323B2 (en) 2006-02-16 2011-03-21 Clockless computer using a pulse generator that is triggered by an event other than a read or write instruction in place of a clock

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US13/053,062 Division US8468323B2 (en) 2006-02-16 2011-03-21 Clockless computer using a pulse generator that is triggered by an event other than a read or write instruction in place of a clock

Publications (2)

Publication Number Publication Date
US20070192575A1 US20070192575A1 (en) 2007-08-16
US7913069B2 true US7913069B2 (en) 2011-03-22

Family

ID=38370136

Family Applications (2)

Application Number Title Priority Date Filing Date
US11/441,812 Expired - Fee Related US7913069B2 (en) 2006-02-16 2006-05-26 Processor and method for executing a program loop within an instruction word
US13/053,062 Expired - Fee Related US8468323B2 (en) 2006-02-16 2011-03-21 Clockless computer using a pulse generator that is triggered by an event other than a read or write instruction in place of a clock

Family Applications After (1)

Application Number Title Priority Date Filing Date
US13/053,062 Expired - Fee Related US8468323B2 (en) 2006-02-16 2011-03-21 Clockless computer using a pulse generator that is triggered by an event other than a read or write instruction in place of a clock

Country Status (1)

Country Link
US (2) US7913069B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160188326A1 (en) * 2012-09-27 2016-06-30 Texas Instruments Deutschland Gmbh Processor with instruction iteration

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7937557B2 (en) * 2004-03-16 2011-05-03 Vns Portfolio Llc System and method for intercommunication between computers in an array
KR100730280B1 (en) * 2005-12-06 2007-06-19 삼성전자주식회사 Apparatus and Method for Optimizing Loop Buffer in Reconfigurable Processor
US7966481B2 (en) 2006-02-16 2011-06-21 Vns Portfolio Llc Computer system and method for executing port communications without interrupting the receiving computer
KR101738641B1 (en) 2010-12-17 2017-05-23 삼성전자주식회사 Apparatus and method for compilation of program on multi core system
WO2020220935A1 (en) * 2019-04-27 2020-11-05 中科寒武纪科技股份有限公司 Operation apparatus

Citations (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4462074A (en) * 1981-11-19 1984-07-24 Codex Corporation Do loop circuit
JPS6412339A (en) 1987-07-06 1989-01-17 Oki Electric Ind Co Ltd Forth machine
US4884193A (en) 1985-09-21 1989-11-28 Lang Hans Werner Wavefront array processor
US4984151A (en) * 1985-03-01 1991-01-08 Advanced Micro Devices, Inc. Flexible, next-address generation microprogram sequencer
JPH03176757A (en) 1989-11-21 1991-07-31 Deutsche Itt Ind Gmbh Array processor
US5375238A (en) * 1990-11-20 1994-12-20 Nec Corporation Nesting management mechanism for use in loop control system
US5386585A (en) 1993-02-03 1995-01-31 Intel Corporation Self-timed data pipeline apparatus using asynchronous stages having toggle flip-flops
US5440749A (en) 1989-08-03 1995-08-08 Nanotronics Corporation High performance, low cost microprocessor architecture
EP0724221A2 (en) 1995-01-26 1996-07-31 International Business Machines Corporation Method and apparatus for executing dissimilar seq. of instructions in the processor of a single-instruction-multiple data (SIMD) computer
WO1997015001A2 (en) 1995-10-06 1997-04-24 Patriot Scientific Corporation Risc microprocessor architecture
US5657485A (en) 1994-08-18 1997-08-12 Mitsubishi Denki Kabushiki Kaisha Program control operation to execute a loop processing not immediately following a loop instruction
US5727194A (en) 1995-06-07 1998-03-10 Hitachi America, Ltd. Repeat-bit based, compact system and method for implementing zero-overhead loops
US5752259A (en) * 1996-03-26 1998-05-12 Advanced Micro Devices, Inc. Instruction cache configured to provide instructions to a microprocessor having a clock cycle time less than a cache access time of said instruction cache
US6003128A (en) * 1997-05-01 1999-12-14 Advanced Micro Devices, Inc. Number of pipeline stages and loop length related counter differential based end-loop prediction
JP2000181878A (en) 1998-12-15 2000-06-30 Nec Corp Common memory type vector processing system, its control method and storage medium stored with control program for vector processing
US6219685B1 (en) 1998-09-04 2001-04-17 Intel Corporation Method to detect IEEE overflow and underflow conditions
US6223282B1 (en) * 1997-12-29 2001-04-24 Samsung Electronics Co., Ltd. Circuit for controlling execution of loop in digital signal processing chip
US6279101B1 (en) * 1992-08-12 2001-08-21 Advanced Micro Devices, Inc. Instruction decoder/dispatch
US6308229B1 (en) 1998-08-28 2001-10-23 Theseus Logic, Inc. System for facilitating interfacing between multiple non-synchronous systems utilizing an asynchronous FIFO that uses asynchronous logic
US6367005B1 (en) 1998-04-21 2002-04-02 Idea Corporation Of Delaware System and method for synchronizing a register stack engine (RSE) and backing memory image with a processor's execution of instructions during a state saving context switch
US6427204B1 (en) * 1999-06-25 2002-07-30 International Business Machines Corporation Method for just in-time delivery of instructions in a data processing system
JP2003044292A (en) 2001-08-02 2003-02-14 Kyushu Univ Multiplex parallel processor, multiplex parallel processing method and multiplex parallel processing program executed by computer
WO2003019356A1 (en) 2001-08-22 2003-03-06 Adelante Technologies B.V. Pipelined processor and instruction loop execution method
US20040003219A1 (en) * 2002-06-26 2004-01-01 Teruaki Uehara Loop control circuit and loop control method
US6732253B1 (en) * 2000-11-13 2004-05-04 Chipwrights Design, Inc. Loop handling for single instruction multiple datapath processor architectures
US6825843B2 (en) * 2002-07-18 2004-11-30 Nvidia Corporation Method and apparatus for loop and branch instructions in a programmable graphics pipeline
US20050223204A1 (en) * 2004-03-30 2005-10-06 Nec Electronics Corporation Data processing apparatus adopting pipeline processing system and data processing method used in the same
US20060101238A1 (en) * 2004-09-17 2006-05-11 Pradip Bose Adaptive fetch gating in multithreaded processors, fetch control and method of controlling fetches
US20060149925A1 (en) * 1991-07-08 2006-07-06 Seiko Epson Corporation High-performance superscalar-based computer system with out-of-order instruction execution and concurrent results distribution
US7136989B2 (en) * 2001-10-01 2006-11-14 Nec Corporation Parallel computation processor, parallel computation control method and program thereof
US20070113058A1 (en) * 2005-11-14 2007-05-17 Texas Instruments Incorporated Microprocessor with indepedent SIMD loop buffer
US7386689B2 (en) 2000-08-31 2008-06-10 Micron Technology, Inc. Method and apparatus for connecting a massively parallel processor array to a memory array in a bit serial manner

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4295193A (en) * 1979-06-29 1981-10-13 International Business Machines Corporation Machine for multiple instruction execution
JPH0623954B2 (en) * 1985-03-29 1994-03-30 富士通株式会社 Performance adjustment method for information processing equipment
JPS6282402A (en) * 1985-10-07 1987-04-15 Toshiba Corp Sequence controller
US7409570B2 (en) * 2005-05-10 2008-08-05 Sony Computer Entertainment Inc. Multiprocessor system for decrypting and resuming execution of an executing program after transferring the program code between two processors via a shared main memory upon occurrence of predetermined condition

Patent Citations (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4462074A (en) * 1981-11-19 1984-07-24 Codex Corporation Do loop circuit
US4984151A (en) * 1985-03-01 1991-01-08 Advanced Micro Devices, Inc. Flexible, next-address generation microprogram sequencer
US4884193A (en) 1985-09-21 1989-11-28 Lang Hans Werner Wavefront array processor
JPS6412339A (en) 1987-07-06 1989-01-17 Oki Electric Ind Co Ltd Forth machine
US5440749A (en) 1989-08-03 1995-08-08 Nanotronics Corporation High performance, low cost microprocessor architecture
US6598148B1 (en) 1989-08-03 2003-07-22 Patriot Scientific Corporation High performance microprocessor having variable speed system clock
JPH03176757A (en) 1989-11-21 1991-07-31 Deutsche Itt Ind Gmbh Array processor
US5375238A (en) * 1990-11-20 1994-12-20 Nec Corporation Nesting management mechanism for use in loop control system
US20060149925A1 (en) * 1991-07-08 2006-07-06 Seiko Epson Corporation High-performance superscalar-based computer system with out-of-order instruction execution and concurrent results distribution
US6279101B1 (en) * 1992-08-12 2001-08-21 Advanced Micro Devices, Inc. Instruction decoder/dispatch
US5386585A (en) 1993-02-03 1995-01-31 Intel Corporation Self-timed data pipeline apparatus using asynchronous stages having toggle flip-flops
US5657485A (en) 1994-08-18 1997-08-12 Mitsubishi Denki Kabushiki Kaisha Program control operation to execute a loop processing not immediately following a loop instruction
EP0724221A2 (en) 1995-01-26 1996-07-31 International Business Machines Corporation Method and apparatus for executing dissimilar seq. of instructions in the processor of a single-instruction-multiple data (SIMD) computer
US5727194A (en) 1995-06-07 1998-03-10 Hitachi America, Ltd. Repeat-bit based, compact system and method for implementing zero-overhead loops
WO1997015001A2 (en) 1995-10-06 1997-04-24 Patriot Scientific Corporation Risc microprocessor architecture
US5752259A (en) * 1996-03-26 1998-05-12 Advanced Micro Devices, Inc. Instruction cache configured to provide instructions to a microprocessor having a clock cycle time less than a cache access time of said instruction cache
US6003128A (en) * 1997-05-01 1999-12-14 Advanced Micro Devices, Inc. Number of pipeline stages and loop length related counter differential based end-loop prediction
US6223282B1 (en) * 1997-12-29 2001-04-24 Samsung Electronics Co., Ltd. Circuit for controlling execution of loop in digital signal processing chip
US6367005B1 (en) 1998-04-21 2002-04-02 Idea Corporation Of Delaware System and method for synchronizing a register stack engine (RSE) and backing memory image with a processor's execution of instructions during a state saving context switch
US6308229B1 (en) 1998-08-28 2001-10-23 Theseus Logic, Inc. System for facilitating interfacing between multiple non-synchronous systems utilizing an asynchronous FIFO that uses asynchronous logic
US6219685B1 (en) 1998-09-04 2001-04-17 Intel Corporation Method to detect IEEE overflow and underflow conditions
JP2000181878A (en) 1998-12-15 2000-06-30 Nec Corp Common memory type vector processing system, its control method and storage medium stored with control program for vector processing
US6427204B1 (en) * 1999-06-25 2002-07-30 International Business Machines Corporation Method for just in-time delivery of instructions in a data processing system
US7386689B2 (en) 2000-08-31 2008-06-10 Micron Technology, Inc. Method and apparatus for connecting a massively parallel processor array to a memory array in a bit serial manner
US6732253B1 (en) * 2000-11-13 2004-05-04 Chipwrights Design, Inc. Loop handling for single instruction multiple datapath processor architectures
JP2003044292A (en) 2001-08-02 2003-02-14 Kyushu Univ Multiplex parallel processor, multiplex parallel processing method and multiplex parallel processing program executed by computer
WO2003019356A1 (en) 2001-08-22 2003-03-06 Adelante Technologies B.V. Pipelined processor and instruction loop execution method
US7136989B2 (en) * 2001-10-01 2006-11-14 Nec Corporation Parallel computation processor, parallel computation control method and program thereof
US20040003219A1 (en) * 2002-06-26 2004-01-01 Teruaki Uehara Loop control circuit and loop control method
US6825843B2 (en) * 2002-07-18 2004-11-30 Nvidia Corporation Method and apparatus for loop and branch instructions in a programmable graphics pipeline
US20050223204A1 (en) * 2004-03-30 2005-10-06 Nec Electronics Corporation Data processing apparatus adopting pipeline processing system and data processing method used in the same
US20060101238A1 (en) * 2004-09-17 2006-05-11 Pradip Bose Adaptive fetch gating in multithreaded processors, fetch control and method of controlling fetches
US20070113058A1 (en) * 2005-11-14 2007-05-17 Texas Instruments Incorporated Microprocessor with indepedent SIMD loop buffer

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
CN Application No. 200780000015.6, Office Action dated Dec. 18, 2009.
EP Application No. 07250614.0, Office Action dated Jun. 13, 2007.
European Application No. 07250614.0, European Search Report dated Jun. 13, 2007.
PCT Application No. PCT/US2007/004029, International Preliminary Report on Patentability dated Sep. 18, 2008.
PCT Application No. PCT/US2007/004029, International Search Report and Written Opinion dated Aug. 25, 2008.
Schmidt et al.; "Datawave: A Single-Chip Multiprocessor for Video Applications"; 1991; IEEE Micro; pp. 22-25 and 88-94. *
Stack Computers, the new wave; Philip Koopman, Jr.; 1989; Mountain View Press, La Honda, CA; pp. 15-23 & pp. 32-48 (Chap 1 & chap 3).
W3Schools, "VBScript Looping Statements", Jul. 2000, http://www.w3schools.com/vbscript/vbscript-looping.asp. *
W3Schools, "VBScript Looping Statements", Jul. 2000, http://www.w3schools.com/vbscript/vbscript—looping.asp. *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160188326A1 (en) * 2012-09-27 2016-06-30 Texas Instruments Deutschland Gmbh Processor with instruction iteration
US11520580B2 (en) * 2012-09-27 2022-12-06 Texas Instruments Incorporated Processor with instruction iteration

Also Published As

Publication number Publication date
US8468323B2 (en) 2013-06-18
US20110179251A1 (en) 2011-07-21
US20070192575A1 (en) 2007-08-16

Similar Documents

Publication Publication Date Title
US20100281238A1 (en) Execution of instructions directly from input source
US5752071A (en) Function coprocessor
US8468323B2 (en) Clockless computer using a pulse generator that is triggered by an event other than a read or write instruction in place of a clock
US20080282062A1 (en) Method and apparatus for loading data and instructions into a computer
EP1840742A2 (en) Method and apparatus for operating a computer processor array
US7904615B2 (en) Asynchronous computer communication
US7904695B2 (en) Asynchronous power saving computer
EP1821211A2 (en) Cooperative multitasking method in a multiprocessor system
US7966481B2 (en) Computer system and method for executing port communications without interrupting the receiving computer
US7934075B2 (en) Method and apparatus for monitoring inputs to an asyncrhonous, homogenous, reconfigurable computer array
JP2009009550A (en) Communication for data
EP1821202B1 (en) Execution of instructions directly from input source
JP2009009549A (en) System and method for processing data by series of computers
US20100325389A1 (en) Microprocessor communications system
US20090300334A1 (en) Method and Apparatus for Loading Data and Instructions Into a Computer
US20040123073A1 (en) Data processing system having a cartesian controller
JP2002278753A (en) Data processing system
JPH04156645A (en) Semiconductor integrated circuit

Legal Events

Date Code Title Description
AS Assignment

Owner name: TECHNOLOGY PROPERTIES LIMITED, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MOORE, CHARLES H.;FOX, JEFFREY ARTHUR;RIBLE, JOHN W.;REEL/FRAME:020894/0503;SIGNING DATES FROM 20061130 TO 20070504

Owner name: TECHNOLOGY PROPERTIES LIMITED, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MOORE, CHARLES H.;FOX, JEFFREY ARTHUR;RIBLE, JOHN W.;SIGNING DATES FROM 20061130 TO 20070504;REEL/FRAME:020894/0503

AS Assignment

Owner name: VNS PORTFOLIO LLC, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:TECHNOLOGY PROPERTIES LIMITED;REEL/FRAME:020988/0234

Effective date: 20080423

Owner name: VNS PORTFOLIO LLC,CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:TECHNOLOGY PROPERTIES LIMITED;REEL/FRAME:020988/0234

Effective date: 20080423

AS Assignment

Owner name: VNS PORTFOLIO LLC, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:TECHNOLOGY PROPERTIES LIMITED;REEL/FRAME:021839/0420

Effective date: 20081114

Owner name: VNS PORTFOLIO LLC,CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:TECHNOLOGY PROPERTIES LIMITED;REEL/FRAME:021839/0420

Effective date: 20081114

AS Assignment

Owner name: TECHNOLOGY PROPERTIES LIMITED LLC, CALIFORNIA

Free format text: LICENSE;ASSIGNOR:VNS PORTFOLIO LLC;REEL/FRAME:022353/0124

Effective date: 20060419

Owner name: TECHNOLOGY PROPERTIES LIMITED LLC,CALIFORNIA

Free format text: LICENSE;ASSIGNOR:VNS PORTFOLIO LLC;REEL/FRAME:022353/0124

Effective date: 20060419

STCF Information on status: patent grant

Free format text: PATENTED CASE

AS Assignment

Owner name: ARRAY PORTFOLIO LLC, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MOORE, CHARLES H.;GREENARRAYS, INC.;REEL/FRAME:030289/0279

Effective date: 20130127

AS Assignment

Owner name: ARRAY PORTFOLIO LLC, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:VNS PORTFOLIO LLC;REEL/FRAME:030935/0747

Effective date: 20130123

REMI Maintenance fee reminder mailed
FEPP Fee payment procedure

Free format text: PAT HOLDER CLAIMS SMALL ENTITY STATUS, ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: LTOS); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

REFU Refund

Free format text: REFUND - SURCHARGE FOR LATE PAYMENT, LARGE ENTITY (ORIGINAL EVENT CODE: R1554); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

Free format text: REFUND - PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: R1551); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

FPAY Fee payment

Year of fee payment: 4

SULP Surcharge for late payment
FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20190322