WO2009158370A2 - Loop control system and method - Google Patents
Loop control system and method Download PDFInfo
- Publication number
- WO2009158370A2 WO2009158370A2 PCT/US2009/048370 US2009048370W WO2009158370A2 WO 2009158370 A2 WO2009158370 A2 WO 2009158370A2 US 2009048370 W US2009048370 W US 2009048370W WO 2009158370 A2 WO2009158370 A2 WO 2009158370A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- loop
- predicate
- instructions
- value
- instruction
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000001514 detection method Methods 0.000 claims abstract description 13
- 238000012545 processing Methods 0.000 claims description 22
- 230000008859 change Effects 0.000 claims description 12
- 230000004044 response Effects 0.000 claims description 12
- 230000007704 transition Effects 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 230000001419 dependent effect Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000001413 cellular effect Effects 0.000 description 2
- 230000001143 conditioned effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/54—Link editing before load time
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/445—Exploiting fine grain parallelism, i.e. parallelism at instruction level
- G06F8/4452—Software pipelining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30072—Arrangements for executing specific machine instructions to perform conditional operations, e.g. using predicates or guards
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/32—Address formation of the next instruction, e.g. by incrementing the instruction counter
- G06F9/322—Address formation of the next instruction, e.g. by incrementing the instruction counter for non-sequential address
- G06F9/325—Address formation of the next instruction, e.g. by incrementing the instruction counter for non-sequential address for loops, e.g. loop detection or loop counter
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3853—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution of compound instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
Definitions
- the present disclosure is generally related to loop control systems and methods.
- wireless computing devices such as portable wireless telephones, personal digital assistants (PDAs), and paging devices that are small, lightweight, and easily carried by users.
- portable computing devices such as cellular telephones and IP telephones
- portable wireless devices also incorporate other types of devices.
- a wireless telephone can also include a digital still camera, a digital video camera, a digital recorder, and an audio file player.
- wireless telephones can process executable instructions, including software applications, such as a web browser application that can be used to access the Internet. As such, these wireless telephones can include significant computing capabilities.
- Executable instructions repeated within a software application may be executed by a processor as a software pipelined loop.
- Software pipelining is a method for scheduling non-dependent instructions from different logical iterations of a program loop to execute concurrently. Overlapping instructions from different logical iterations of the loop increases an amount of parallelism for efficient processing. For example, a first loop instruction and a second loop instruction may be executed in parallel at separate execution units of a processor in a computing device such as a wireless mobile device, the first instruction corresponding to a first loop iteration while the second instruction corresponds to a second loop iteration.
- additional instructions to prevent data hazards due to data dependencies between instructions when filling the pipeline e.g., pro log instructions
- to prevent memory access hazards when emptying the pipeline e.g., epilog instructions
- additional memory may not be readily available at a wireless computing device.
- a system in a particular embodiment, includes a hardware loop control logic circuit.
- the hardware loop control logic circuit includes a detection unit to detect an end of loop indicator of a program loop, a decrement unit to decrement a loop count and to decrement a predicate trigger counter, and a comparison unit to compare the predicate trigger counter to a reference to determine when to set a predicate value.
- the system also includes a processor that executes a special instruction that triggers execution of the hardware loop control logic circuit. Use of the system with the hardware loop control logic circuit enables software pipeline loops to be executed without prolog instructions, thereby using reduced memory.
- an apparatus in another particular embodiment, includes a predicate count register to store a predicate trigger count.
- the apparatus also includes an initialization logic circuit to initialize loop parameters of a program loop.
- the apparatus includes a processor to execute loop instructions of the program loop and to execute a packet including an end of loop indicator.
- the apparatus also includes a logic circuit to modify the predicate trigger count and to modify a loop count of the program loop.
- the apparatus also includes a comparison logic circuit to compare the predicate trigger count to a reference value.
- the apparatus further includes a logic circuit to change a value of a predicate that affects at least one instruction in the program loop based on a result of the comparison.
- a method of processing loop instructions includes initializing loop parameters in special registers where the special registers include a predicate trigger count.
- the method also includes executing the loop instructions and executing a packet having an end of loop indicator.
- the method further includes modifying the predicate trigger count and modifying a loop count.
- the method includes changing a value of a predicate that affects at least one of the loop instructions.
- the method includes automatically initializing a predicate trigger counter to indicate a number of iterations of the loop to execute before setting a predicate value upon execution of a particular type of loop instruction.
- the method also includes executing the set of instructions during a loop iteration and, upon detecting an end of loop indicator of the loop, automatically triggering loop control hardware to modify the predicate trigger counter and to compare the predicate trigger counter to a reference to determine when to set the predicate value. At least one of the instructions in the set of instructions is conditionally executed based on the predicate value.
- One particular advantage provided by at least one of the disclosed embodiments is reduced code size, lower power operation, and higher speed processing of instructions that are executed as pipelined software loops.
- Other aspects, advantages, and features of the present disclosure will become apparent after review of the entire application, including the following sections: Brief Description of the Drawings, Detailed Description, and the Claims.
- FIG. 1 is a block diagram of a first illustrative embodiment of a loop control system
- FIG. 2 is a block diagram of a second illustrative embodiment of a loop control system
- FIG. 3 is a general diagram that illustrates processing of a software pipelined loop
- FIG. 4 is a flow chart of a first illustrative embodiment of a loop control method that may be performed by the loop control system of FIG. 1 or FIG. 2;
- FIG. 5 is a flow chart of a second illustrative embodiment of a loop control method that may be performed by the loop control system of FIG. 1 or FIG. 2; and [0014]
- FIG. 6 is a block diagram of a particular illustrative embodiment of a wireless processing device including a software pipelined loop hardware control logic circuit with a predicate counter.
- a first illustrative embodiment of a loop control system is depicted and generally designated 100.
- the system 100 may be part of computer, a portable wireless device, a wireless telephone, or any other device that executes software instructions.
- the system 100 includes a processor 102 with a hardware loop control logic circuit 104.
- the processor 102 is configured to execute loop instructions 120.
- the loop instructions 120 include at least one conditionally executed loop instruction 122 that uses a predicate logic circuit 110.
- the conditionally executed loop instruction 122 may be executed by the processor 102 when the predicate logic circuit 110 stores a predicate value evaluating to true, and the conditionally executed loop instruction 122 may not be executed when the predicate logic circuit 110 stores a predicate value evaluating to false. Executing the loop instructions 120 enables the processor 102 to efficiently perform repeated operations, such as for a multimedia software application or a digital signal processing operation, for example.
- Loop control values 106 are accessible to the hardware loop control logic circuit
- the predicate set logic circuit 108 is coupled to the predicate logic circuit 110.
- the predicate logic circuit 110 may include a latch or other storage device adapted to store a data bit having a false value (e.g., a logical "0" value) or a true value (e.g., a logical "1" value).
- the hardware loop control logic circuit 104 includes circuitry that is adapted to recognize a beginning of a software loop corresponding to the loop instructions 120.
- the hardware loop control logic circuit 104 may be adapted to initially set and to modify the loop control values 106 to initialize and control execution of the loop instructions 120 by the processor 102.
- the hardware loop control logic circuit 104 is adapted to initialize a predicate counter 124 of the loop control values 106.
- the loop control values 106 may include other values to control an operation of a loop, such as a loop start address and a number of loop iterations, as illustrative examples.
- the predicate counter 124 may be initialized by the hardware loop control logic circuit 104 to a value corresponding to a number of processing cycles to fill a software pipelined loop that includes the loop instructions 120 at the processor 102.
- the processor 102 may execute each loop iteration in a pipelined manner as multiple successive pipeline stages that may be performed concurrently at multiple execution units (not shown).
- the predicate counter 124 may be initialized to a value of three.
- An example of a software pipelined loop having a depth of three pipeline stages is depicted in FIG. 3.
- the hardware loop control logic circuit 104 may further be adapted to detect a loop iteration condition at the processor 102 and to modify the predicate counter 124 for each iteration of the loop.
- the predicate counter 124 may hold an initialization value that is successively decremented in response to the hardware loop control logic circuit 104 until the value of the predicate counter 124 reaches a reference value.
- the reference value may correspond to a value of the predicate counter 124 when the software pipelined loop is fully pipelined, where all instructions of the software loop process data that is not invalid due to pipeline dependencies.
- a later operation uses data produced by an earlier operation within a loop iteration
- the loop is pipelined so that the earlier operation is performed at a first pipeline stage and the later operation is performed at a later pipeline stage that is executed concurrently with the first pipeline stage
- the dependency of the later pipeline stage on data produced at the first pipeline stage will cause the later pipeline stage to process invalid data until the data produced at the first pipeline stage is received at the later pipeline stage.
- the predicate set logic circuit 108 may be adapted to set a predicate value stored at the predicate logic circuit 110 in response to detecting a value of the predicate counter 124.
- the predicate set logic circuit 108 includes comparison logic circuitry (not shown) to perform a comparison between a value at the predicate counter 124 and the reference value.
- the predicate set logic circuit 108 may be configured to automatically set a predicate value stored at the predicate logic circuit 110.
- the predicate logic circuit 110 may be initialized to store a false condition
- the predicate counter 124 may be initialized to a number of pipeline stages of the software pipelined loop
- the reference value may be zero.
- the predicate set logic circuit 108 may automatically change the predicate value at the predicate logic circuit 110 to a true condition.
- the true condition of the predicate value stored at the predicate logic circuit 110 may be provided to the conditionally executed loop instruction 122 to affect processing of the loop instructions 120 at the processor 102.
- the predicate counter 124 may be set to zero and the reference value may be set to the number of pipeline stages of the software pipelined loop, and the predicate counter 124 may be incremented in response to each loop iteration.
- loop initialization and loop control at the processor 102 may be performed by hardware elements including the hardware loop control logic circuit 104, latches or other devices to store the loop control values 106, to implement the predicate set logic circuit 108.
- hardware to implement loop control logic for a software pipelined loop software loops may be encoded in a compact form where pipelined loop stages used to initialize the software pipeline, referred to as the prolog, may be replaced with one or more conditionally executed loop instructions 122, working in conjunction with the hardware of the system 100.
- FIG. 2 a second illustrative embodiment of a loop control system is depicted and generally designated 200.
- the system 200 includes a processor 202, a hardware loop control logic circuit 204, and a loop parameter control register 206.
- the hardware loop control logic circuit 204 may correspond to the hardware loop control logic circuit 104 depicted in FIG. 1.
- Data stored at the loop parameter control register 206 may correspond to the loop control values 106 depicted in FIG. 1
- the processor 202 may correspond to the processor 102 depicted in FIG. 1.
- the loop parameter control register 206 includes a start address register 212 storing data representing a starting address of a software pipelined loop to be executed at the processor 202.
- the loop parameter control register 206 also includes a loop count register 214 that stores a loop count value corresponding to the software pipelined loop.
- the loop parameter control register 206 further includes a predicate trigger count register 216 storing a predicate trigger count value associated with the software pipelined loop to be executed at the processor 202.
- the loop parameter control register 206 is responsive to control inputs received from the hardware control logic circuit 204.
- the hardware loop control logic circuit 204 includes an initialization unit 220, a decrement unit 222, a comparison unit 230, a detection unit 228, and a predicate change unit 234.
- the initialization unit 220 may be responsive to a special instruction 240 executed at the processor 202.
- the initialization unit 220 may be adapted to determine a starting address and to set a value at the start address register 212.
- the initialization unit 220 may further be adapted to set an initial value of a loop counter 224 of the decrement unit 222.
- the initialization unit 220 may also be adapted to set an initial value of a predicate trigger counter 226 of the decrement unit 222.
- the decrement unit 222 is responsive to the detection unit 228 to decrement a value of the loop counter 224 and the predicate trigger counter 226 in response to a control input from the detection unit 228 indicating a completion of a loop iteration at the processor 202.
- the loop counter 224 may be initialized to a total number of iterations of loop instructions 250 to be performed at the processor 202, and may be decremented in response to each loop iteration detected at the detection unit 228.
- the predicate trigger counter 226 may be initialized to a value corresponding to a number of execution cycles required to completely fill a pipeline of a software pipelined loop so that sequential pipeline stages are executed using valid data from previous stages.
- the predicate trigger counter 226 may be decremented in response to loop iterations detected by the detection unit 228.
- the loop counter 224 and the predicate trigger counter 226 may write values to the loop count register 214 and the predicate trigger count register 216, respectively, and may update the respective values in response to an operation of the decrement unit 222.
- the detection unit 228 is configured to detect an end- of-loop condition at the processor 202.
- the detection unit 228 may include a parsing logic circuit to parse a very long instruction word (VLIW) packet 254 with an end-of-loop indicator at the processor 202.
- VLIW very long instruction word
- the end-of-loop indicator includes a predetermined bit field having a specified value within the VLIW packet 254.
- the detection unit 228 provides a control input to the decrement unit 222 to decrement one or both of the counters 224 and 226.
- the comparison unit 230 is responsive to a value stored at the predicate trigger count register 216.
- the comparison unit 230 may include a comparator 232 that is adapted to compare a value of the predicate trigger count register 216 to a reference value and to provide an output of the comparison to the predicate change unit 234.
- the reference value may be zero
- the comparison unit 230 may be configured to provide a zero value output to the predicate change unit 234 until the predicate trigger count register 216 has a zero or negative value.
- the comparator 232 is adapted to automatically identify a transition from a one value to a zero value of the predicate trigger counter 226, such as via the predicate trigger count register 216.
- the predicate change unit 234 is responsive to the control signal received from the comparison unit 230 to set or to reset a predicate value stored at the predicate logic circuit 210.
- the predicate change unit 234 may be configured to initialize a predicate value stored at the predicate logic circuit 210 to a false condition.
- the predicate change unit 234 may set the predicate value at the predicate logic circuit 210 to a true value.
- the predicate change unit 234 may also be responsive to the initialization unit 220 to clear the predicate value stored at the predicate logic circuit 210 prior to an execution of loop instructions.
- the predicate logic circuit 210 may include one or more hardware components configured to store a logical true or false value.
- the predicate logic circuit 210 may be accessible to the processor 202 to be used in conjunction with executing the loop instructions 250.
- the processor 202 is configured to receive and to execute instructions associated with the software pipelined loop.
- the processor 202 is configured to execute a special instruction 240 that may designate initialization values and control values associated with a subsequent software pipelined loop.
- the initialization values and control values of the special instruction 240 may be detected by or provided to the hardware loop control logic circuit 204.
- the processor 202 is configured to receive and to execute the loop instructions 250 as a software pipelined loop.
- the processor 202 may be adapted to execute one or more of the loop instructions 250 in parallel, such as at multiple parallel execution units of the processor 202.
- the processor 202 may execute the loop instructions 250 as software pipelined instructions, such that a single iteration of the loop instructions 250 may be performed in various sequential pipeline stages at the processor 202.
- the loop instructions 250 include at least one conditionally executed loop instruction 252.
- the at least one conditionally executed loop instruction 252 is responsive to a predicate value stored at the predicate logic circuit 210 to determine a condition of execution.
- the conditionally executed loop instruction 252 conditionally stores data based on a predicate value at the predicate logic circuit 210 so that values calculated before the predicate value is set to "true" are not stored.
- the conditionally executed loop instruction 252 may include a write command to write data to a memory, such as to an output register (not shown), based on computations performed earlier in a current iteration of the loop instructions 250.
- Executing the conditionally executed loop instruction 252 before the loop is fully pipelined would write invalid data to the memory when the write is performed before the data generated by the earlier computations is received. Therefore, execution of the conditionally executed loop instruction 252 may be conditioned on a predicate value stored at the predicate logic circuit 210, where the predicate value stored at the predicate logic circuit 210 indicates a condition of the software pipelined loop corresponding to the loop instructions 250.
- FIG. 3 a particular illustrative embodiment of processing a software pipelined loop is depicted and generally designated 300.
- Representative instruction pipeline stages 302, 304, 306, and 308 represent pipeline stages of a software pipelined loop.
- a predicate value 310 indicates a value at a predicate that is designated "P3" and that may be accessed by one or more of the instructions executed at the instruction pipeline stages 302-308.
- a hardware predicate loop counter 312 indicates a countdown value corresponding to the software pipelined loop.
- each of the instruction pipeline stages 302-308, the predicate value 310, and the hardware predicate loop counter 312 are depicted for consecutive clock cycles, beginning with clock cycle 1 at a loop beginning time period, and proceeding to clock cycle 23 at a later time period.
- each clock cycle corresponds to an execution cycle at a pipelined processor.
- the system 300 represents execution of the loop instructions 120 at the processor 102 depicted in FIG. 1, with the predicate value 310 reflecting the predicate value stored at the predicate logic circuit 110, and with the hardware predicate loop counter 312 corresponding to the predicate counter 124.
- the system 300 represents execution of the loop instructions 250 at the processor 202 depicted in FIG. 2, with the predicate value 310 corresponding to a predicate value stored at the predicate logic circuit 210, and with the hardware predicate loop counter 312 corresponding to an output of the predicate trigger counter 226 stored at the predicate trigger count register 216.
- the software pipelined loop is initiated via a special instruction illustrated in an exploded view as a loop initialization instruction 330.
- the loop initialization instruction 330 includes an instruction name 334 having the form spNLoop, where "N" has a value of three.
- the loop initialization instruction 330 includes data fields that include program loop setup information. For example, the loop initialization instruction 330 includes a first data field 336 corresponding to a start address of a software loop.
- the loop initialization instruction 330 also has a second data field 338 corresponding to a loop count that indicates a number of iterations of the loop to be performed.
- the loop initialization instruction 330 when executed by a processor, may return an initial value corresponding to an initialization of a predicate, such as a value of the predicate P3 332, which corresponds to the predicate value 310.
- the loop initialization instruction 330 may indicate a start address of an instruction of the loop, a number of iterations of the loop, and may further indicate, by a value of "N" in the name 334, an initial value of a hardware predicate loop counter 312.
- the name sp31oop indicates an initial value of three at the hardware predicate loop counter 312.
- Other values of "N” may be used to indicate other initial values of the hardware predicate loop counter 312.
- “splLoop” may indicate an initial value of one
- “sp2Loop” may indicate an initial value of two.
- the initial value of the hardware predicate loop counter 312 may be set to prevent execution of conditional operations until the loop is sufficiently pipelined.
- "N" may be a positive integer less than four and may indicate a pro log count or a number of loops of a program loop to execute before changing the predicate value 310.
- the software pipelined loop begins, illustrated as including a VLIW packet having instructions labeled A, B, C, and D.
- the instructions A, B, C, and D may each be performed in parallel at the processor, such as at multiple execution units of a single processor.
- the instructions A, B, C, and D may be sequential in that instruction B may use data that is generated by instruction A.
- instruction C may use data that is generated by instruction A, instruction B, or any combination thereof.
- instruction D may use data generated by any of instructions A, B, C, or any combination thereof.
- Instruction D may write data indicative of an output of each particular loop iteration to a memory.
- instruction D may perform a computation using results from each of the operations A, B, and C, and may store a resulting value to an output register. Therefore, instruction D should not be executed until the software pipelined loop is fully pipelined so that each of instructions A, B, and C is sequentially executed before instruction D to ensure that the input to instruction D consists of valid values.
- Instructions that may be sequentially executed before the software pipeline is completely filled are generally designated as the prolog 320.
- the portion of the execution of the software pipeline loop where the pipeline is full is designated as the kernel 322.
- the portion of the software pipeline loop where a final execution of the first instruction has completed but other pipeline instructions have yet to be executed, is generally referred to as the epilog 324.
- an initial value of "three" is stored at the hardware predicate loop counter 312.
- the predicate value P3 310 is initialized to a value of false.
- the software loop begins with execution of instruction A for the first iteration of the loop. Instructions B, C, and D may also be executed in parallel with
- instruction D includes a write instruction to store data at a memory, instruction D should not be executed until the data to be written is valid data indicating an output of instructions A,
- instruction D may be conditionally executed based on the predicate value 310, illustrated as a shading of the pipeline stage indicating non-execution of the instruction in a particular clock cycle. Because the predicate value 310 is false, the conditional write instruction D in the fourth instruction pipeline stage 308 is not performed.
- instruction B receives output from instruction A and is executed for the first iteration of the loop, indicated as B(I).
- instruction A is executed using data associated with the second iteration of the loop, indicated as A(2).
- Instructions C and D may be executed; however, an input value and consequently an output of each of instructions C and D may be undefined due to a data dependency on prior instructions.
- the hardware predicate loop counter is decremented from a value of "three" to a value of "two,” and the predicate value 310 remains false. Because the predicate value 310 is false, the conditional write instruction D in the fourth instruction pipeline stage 308 is not performed.
- instruction C at the third instruction pipeline stage 306 is executed corresponding to the first iteration of the loop.
- Instruction B is executed at the second instruction pipeline stage 304 corresponding to the second iteration of the loop, and instruction A is executed at the first instruction pipeline stage 302 corresponding to a third iteration of the loop.
- the hardware predicate loop counter 312 is decremented from a value of "two" to a value of "one,” and the predicate value 310 remains false. Because the predicate value 310 is false, the conditional write instruction D is not performed.
- the pro log portion 320 of the software loop has ended and the kernel portion 322 has begun.
- the software pipeline has been filled and each of the instruction pipeline stages 302-308 operates on valid data.
- the hardware predicate loop counter 312 is decremented to the value "zero,” indicating that the pipeline is full and that the pro log stage 302 is finished.
- the predicate value 310 is set to a true condition.
- the predicate value 310 is set by hardware logic circuitry that is configured to compare a value of the predicate loop counter 312 to a reference value, such as the comparator 232 depicted in FIG. 2.
- the loop remains in the kernel portion 322 of execution where the pipeline remains full and all pipeline stages 302-308 execute instructions in sequential order to accommodate data dependencies between the instructions. Because the predicate value 310 evaluates to true, all instructions including instruction D are performed during clock cycle four and continuing through clock cycle twenty.
- the epilog portion 324 begins where the first pipeline stage 302 has completed executing instruction A for all twenty loop iterations, but the remaining pipeline stages 304, 306, and 308 continue processing instructions associated with prior iterations of the software loop. For example, at clock cycle twenty one, execution of instruction B corresponds to iteration 20, instruction C corresponds to iteration 19, and instruction D corresponds to iteration 18.
- the prolog portion 320 and the kernel portion 322 may be performed using a single VLIW packet including instructions A, B, C, and D, where execution of instruction D is conditional based on the predicate value P3 310, and including an end of loop indicator, denoted as " ⁇ A, B, C, if (P3) D ⁇ :endloop.”
- kernel code i.e., the VLIW packet including the instructions A, B, C, and D
- kernel code is executed in both the prolog portion 320 and the kernel portion 322.
- epilog VLIW packets may be used, such as: (NOP, B, C, D ⁇ , (NOP, NOP, C, D ⁇ , and (NOP, NOP, NOP, D ⁇ , where NOP indicates no operation at a particular execution unit.
- Such epilog instructions ensure that earlier pipeline instructions do not access unauthorized portions of memory when executed beyond the last loop iteration.
- the epilog portion 322 may instead perform the kernel instructions when one or more input data sources may be safely accessed outside loop boundaries, such as additional memory read operations that may be safely performed by the instruction A at clock cycles 21, 22, and 23.
- the predicate value 310 may be set to restrict execution of conditionally executed data dependent instructions, such as instruction D, to the kernel when the pipeline is full.
- Such software pipelined loop processing may be performed using the predicate logic circuit 110 and the predicate counter 124 in conjunction with the processor 102 of FIG. 1, or by using the predicate logic circuit 210, the predicate trigger counter 226, and the predicate trigger count register 216 in conjunction with the processor 202 depicted in FIG. 2
- a flow chart of a first illustrative embodiment of a loop control method is depicted and generally designated 400.
- the method of processing a set of instructions in a loop 400 may be performed using one or more of the systems depicted in FIGs. 1 and 2.
- a predicate trigger counter is automatically initialized to indicate a number of iterations of the loop before setting a predicate value upon execution of a particular type of loop instruction.
- the set of instructions may be executed as a software pipelined loop, and the predicate trigger counter may be based on a number of pipeline stages of the software pipelined loop.
- the particular type of loop instruction is the loop initialization instruction 330 depicted in FIG. 3.
- the set of instructions is executed during a loop iteration. At least one of the instructions in the set of instructions is conditionally executed based on the predicate value. For example, at least one of the instructions in the set of instructions that is conditionally executed may conditionally write data to an output register based on a predicate value.
- loop control hardware modifying the predicate trigger counter is automatically triggered.
- the loop control hardware may decrement the predicate trigger counter in response to detecting the end of loop indicator.
- the predicate trigger counter is compared to a reference to determine when to set the predicate value. In a particular embodiment, the reference is a zero value.
- execution of the conditionally executed instruction may be controlled by initializing the predicate trigger counter and setting the predicate in response to a comparison of the predicate trigger counter to the reference.
- Execution of a software pipelined loop without separate prolog and kernel instructions, such as depicted in FIG. 3, is therefore enabled and may be performed using the systems depicted in FIG. 1 and FIG. 2.
- loop parameters are initialized in special registers including a predicate trigger count.
- a predicate value is initialized to a false condition, and the predicate trigger count corresponds to a pipeline depth of a software pipelined loop.
- the loop instructions are executed.
- the loop instructions include kernel code but do not include prolog instructions.
- the kernel code may include a set of instructions of a software pipelined loop.
- an instruction having an end of loop indicator is executed.
- the predicate trigger count is modified and a loop count is modified.
- the predicate trigger count equals a reference value, a value of a predicate that affects at least one of the loop instructions is changed.
- the loop instructions may include at least one instruction that is conditionally executed based on the predicate.
- the predicate trigger counter When the predicate trigger counter is initialized to a software pipeline depth, decrementing the predicate trigger count to equal reference value "zero" may indicate an end of a prolog portion of the software pipelined loop and a beginning of a kernel portion of the loop when the pipeline is filled.
- a conditionally executed instruction that is executed based on the predicate may therefore not be executed until the pipeline is filled.
- kernel instructions may also be executed in the pro log when the predicate is used to prevent execution of instructions that may generate harmful results before the pipeline is sufficiently filled.
- the systems depicted in FIG. 1 and FIG. 2 provide examples of systems on which the method 500 may be performed.
- the loop parameters may be initialized in the loop parameter control register 206 of FIG. 2
- the loop count and predicated trigger count may be decremented by the decrement unit 222
- the value of the predicate 210 may be changed by the predicate change unit 234 of FIG. 2.
- FIG. 6 a block diagram of a particular illustrative embodiment of a wireless processing device including a software pipelined loop hardware control logic circuit with a predicate counter 664 is depicted and generally designated 600.
- the device 600 includes a processor, such as a digital signal processor (DSP) 610, coupled to a memory 632.
- DSP digital signal processor
- the software pipelined loop hardware control logic circuit with the predicate counter 664 may include one or more of the systems depicted in FIG. 1 and FIG. 2 and may operate in accordance with one or more of FIGs. 3-5, or any combination thereof.
- the system 600 is a wireless phone.
- FIG. 6 also shows a display controller 626 that is coupled to the digital signal processor 610 and to a display 628.
- a coder/decoder (CODEC) 634 can also be coupled to the digital signal processor 610.
- a speaker 636 and a microphone 638 can be coupled to the CODEC 634.
- a modem 640 can be coupled to the digital signal processor 610 and further coupled to a wireless antenna 642.
- the DSP 610, the display controller 626, the memory 632, the CODEC 634, and the modem 640 are included in a system-in-package or system-on-chip device 622.
- an input device 630 and a power supply 644 are coupled to the on-chip system 622.
- the display 628, the input device 630, the speaker 636, the microphone 638, the wireless antenna 642, and the power supply 644 are external to the system-on-chip device 622.
- each can be coupled to a component of the system-on-chip device 622, such as an interface or a controller.
- the software pipelined loop hardware control logic with the predicate counter 664 may be used to enable efficient software pipelined loop processing at the digital signal processor 610.
- the software pipelined loop hardware control logic circuit with the predicate counter 664 may include circuits or devices to detect a loop initialization instruction, an end of loop instruction, or both, at the digital signal processor 610, and may be operative to control a loop operation at the digital signal processor 610 by controlling values of one or more loop counters, such as a prolog counter, one or more predicates, or any combination thereof.
- the software pipelined loop hardware control logic circuit with the predicate counter 664 may be separate from one or more processors, such as at a control portion of the system-on-chip device 622.
- the software pipelined loop may be implemented in any processor that has one or more parallel pipelines that enable instructions in the same software loop to be executed across the one or more parallel pipelines.
- the device 600 may be any wireless processing device, such as a personal digital assistant (PDA), an audio player, an internet protocol (IP) phone, a cellular phone, a mobile phone, a laptop computer, a notebook computer, a template computer, any other system that may process a software pipelined loop, or any combination thereof.
- PDA personal digital assistant
- IP internet protocol
- a software module may reside in random access memory (RAM), flash memory, read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), registers, hard disk, a removable disk, a compact disc read-only memory (CD-ROM), or any other form of storage medium known in the art.
- An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium.
- the storage medium may be integral to the processor.
- the processor and the storage medium may reside in an application-specific integrated circuit (ASIC).
- ASIC application-specific integrated circuit
- the ASIC may reside in a computing device or a user terminal.
- the processor and the storage medium may reside as discrete components in a computing device or user terminal.
Abstract
Description
Claims
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200980123763.2A CN102067087B (en) | 2008-06-27 | 2009-06-24 | Loop control system and method |
JP2011516552A JP5536052B2 (en) | 2008-06-27 | 2009-06-24 | Loop control system and method |
EP09770903A EP2304557A2 (en) | 2008-06-27 | 2009-06-24 | Loop control system and method |
KR1020117002173A KR101334863B1 (en) | 2008-06-27 | 2009-06-24 | Loop control system and method |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/147,893 | 2008-06-27 | ||
US12/147,893 US20090327674A1 (en) | 2008-06-27 | 2008-06-27 | Loop Control System and Method |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2009158370A2 true WO2009158370A2 (en) | 2009-12-30 |
WO2009158370A3 WO2009158370A3 (en) | 2010-02-25 |
Family
ID=41306021
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2009/048370 WO2009158370A2 (en) | 2008-06-27 | 2009-06-24 | Loop control system and method |
Country Status (7)
Country | Link |
---|---|
US (1) | US20090327674A1 (en) |
EP (1) | EP2304557A2 (en) |
JP (3) | JP5536052B2 (en) |
KR (1) | KR101334863B1 (en) |
CN (1) | CN102067087B (en) |
TW (1) | TW201015431A (en) |
WO (1) | WO2009158370A2 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2725483A3 (en) * | 2012-10-23 | 2015-06-17 | Analog Devices Global | Predicate counter |
US9201828B2 (en) | 2012-10-23 | 2015-12-01 | Analog Devices, Inc. | Memory interconnect network architecture for vector processor |
EP2680132A3 (en) * | 2012-06-29 | 2016-01-06 | Analog Devices, Inc. | Staged loop instructions |
US9342306B2 (en) | 2012-10-23 | 2016-05-17 | Analog Devices Global | Predicate counter |
CN109643270A (en) * | 2016-08-24 | 2019-04-16 | 谷歌有限责任公司 | Multi-layer testing external member generates |
Families Citing this family (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7987347B2 (en) * | 2006-12-22 | 2011-07-26 | Broadcom Corporation | System and method for implementing a zero overhead loop |
US7991985B2 (en) * | 2006-12-22 | 2011-08-02 | Broadcom Corporation | System and method for implementing and utilizing a zero overhead loop |
JP5300294B2 (en) * | 2008-03-25 | 2013-09-25 | パナソニック株式会社 | Processing device, obfuscation device, program and integrated circuit |
KR101645001B1 (en) | 2009-02-18 | 2016-08-02 | 삼성전자주식회사 | Apparatus and method for generating VLIW instruction and VLIW processor and method for processing VLIW instruction |
EP2367102B1 (en) * | 2010-02-11 | 2013-04-10 | Nxp B.V. | Computer processor and method with increased security properties |
CN104115113B (en) * | 2011-12-14 | 2018-06-05 | 英特尔公司 | For cycling the systems, devices and methods of remaining mask instruction |
US10083032B2 (en) * | 2011-12-14 | 2018-09-25 | Intel Corporation | System, apparatus and method for generating a loop alignment count or a loop alignment mask |
US9632779B2 (en) * | 2011-12-19 | 2017-04-25 | International Business Machines Corporation | Instruction predication using instruction filtering |
KR101991680B1 (en) | 2012-01-25 | 2019-06-21 | 삼성전자 주식회사 | Hardware debugging apparatus and method of software pipelined program |
US9280344B2 (en) * | 2012-09-27 | 2016-03-08 | Texas Instruments Incorporated | Repeated execution of instruction with field indicating trigger event, additional instruction, or trigger signal destination |
CN103777922B (en) * | 2012-10-23 | 2018-05-22 | 亚德诺半导体集团 | Count of predictions device |
US9830164B2 (en) * | 2013-01-29 | 2017-11-28 | Advanced Micro Devices, Inc. | Hardware and software solutions to divergent branches in a parallel pipeline |
US9633409B2 (en) * | 2013-08-26 | 2017-04-25 | Apple Inc. | GPU predication |
US20160019061A1 (en) * | 2014-07-21 | 2016-01-21 | Qualcomm Incorporated | MANAGING DATAFLOW EXECUTION OF LOOP INSTRUCTIONS BY OUT-OF-ORDER PROCESSORS (OOPs), AND RELATED CIRCUITS, METHODS, AND COMPUTER-READABLE MEDIA |
US9348595B1 (en) | 2014-12-22 | 2016-05-24 | Centipede Semi Ltd. | Run-time code parallelization with continuous monitoring of repetitive instruction sequences |
US9135015B1 (en) | 2014-12-25 | 2015-09-15 | Centipede Semi Ltd. | Run-time code parallelization with monitoring of repetitive instruction sequences during branch mis-prediction |
US9208066B1 (en) | 2015-03-04 | 2015-12-08 | Centipede Semi Ltd. | Run-time code parallelization with approximate monitoring of instruction sequences |
US10296346B2 (en) | 2015-03-31 | 2019-05-21 | Centipede Semi Ltd. | Parallelized execution of instruction sequences based on pre-monitoring |
US10296350B2 (en) | 2015-03-31 | 2019-05-21 | Centipede Semi Ltd. | Parallelized execution of instruction sequences |
US9715390B2 (en) | 2015-04-19 | 2017-07-25 | Centipede Semi Ltd. | Run-time parallelization of code execution based on an approximate register-access specification |
GB2548603B (en) * | 2016-03-23 | 2018-09-26 | Advanced Risc Mach Ltd | Program loop control |
US10248908B2 (en) * | 2017-06-19 | 2019-04-02 | Google Llc | Alternative loop limits for accessing data in multi-dimensional tensors |
US11614941B2 (en) * | 2018-03-30 | 2023-03-28 | Qualcomm Incorporated | System and method for decoupling operations to accelerate processing of loop structures |
US11520570B1 (en) * | 2021-06-10 | 2022-12-06 | Xilinx, Inc. | Application-specific hardware pipeline implemented in an integrated circuit |
US11693666B2 (en) * | 2021-10-20 | 2023-07-04 | Arm Limited | Responding to branch misprediction for predicated-loop-terminating branch instruction |
CN117250480B (en) * | 2023-11-08 | 2024-02-23 | 英诺达(成都)电子科技有限公司 | Loop detection method, device, equipment and storage medium of combinational logic circuit |
Family Cites Families (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5452425A (en) * | 1989-10-13 | 1995-09-19 | Texas Instruments Incorporated | Sequential constant generator system for indicating the last data word by using the end of loop bit having opposite digital state than other data words |
JPH0863355A (en) * | 1994-08-18 | 1996-03-08 | Mitsubishi Electric Corp | Program controller and program control method |
US5958048A (en) * | 1996-08-07 | 1999-09-28 | Elbrus International Ltd. | Architectural support for software pipelining of nested loops |
WO1998006038A1 (en) * | 1996-08-07 | 1998-02-12 | Sun Microsystems, Inc. | Architectural support for software pipelining of loops |
US6289443B1 (en) * | 1998-01-28 | 2001-09-11 | Texas Instruments Incorporated | Self-priming loop execution for loop prolog instruction |
US6192515B1 (en) * | 1998-07-17 | 2001-02-20 | Intel Corporation | Method for software pipelining nested loops |
US6598155B1 (en) * | 2000-01-31 | 2003-07-22 | Intel Corporation | Method and apparatus for loop buffering digital signal processing instructions |
US7302557B1 (en) * | 1999-12-27 | 2007-11-27 | Impact Technologies, Inc. | Method and apparatus for modulo scheduled loop execution in a processor architecture |
US6754893B2 (en) * | 1999-12-29 | 2004-06-22 | Texas Instruments Incorporated | Method for collapsing the prolog and epilog of software pipelined loops |
US6629238B1 (en) * | 1999-12-29 | 2003-09-30 | Intel Corporation | Predicate controlled software pipelined loop processing with prediction of predicate writing and value prediction for use in subsequent iteration |
US6892380B2 (en) * | 1999-12-30 | 2005-05-10 | Texas Instruments Incorporated | Method for software pipelining of irregular conditional control loops |
US6567895B2 (en) * | 2000-05-31 | 2003-05-20 | Texas Instruments Incorporated | Loop cache memory and cache controller for pipelined microprocessors |
GB2363480B (en) * | 2000-06-13 | 2002-05-08 | Siroyan Ltd | Predicated execution of instructions in processors |
US6615403B1 (en) * | 2000-06-30 | 2003-09-02 | Intel Corporation | Compare speculation in software-pipelined loops |
US6912709B2 (en) * | 2000-12-29 | 2005-06-28 | Intel Corporation | Mechanism to avoid explicit prologs in software pipelined do-while loops |
US6986131B2 (en) * | 2002-06-18 | 2006-01-10 | Hewlett-Packard Development Company, L.P. | Method and apparatus for efficient code generation for modulo scheduled uncounted loops |
US7269719B2 (en) * | 2002-10-30 | 2007-09-11 | Stmicroelectronics, Inc. | Predicated execution using operand predicates |
US20040221283A1 (en) * | 2003-04-30 | 2004-11-04 | Worley Christopher S. | Enhanced, modulo-scheduled-loop extensions |
US7020769B2 (en) * | 2003-09-30 | 2006-03-28 | Starcore, Llc | Method and system for processing a loop of instructions |
US7406590B2 (en) * | 2004-02-25 | 2008-07-29 | Analog Devices, Inc. | Methods and apparatus for early loop bottom detection in digital signal processors |
US7673294B2 (en) * | 2005-01-18 | 2010-03-02 | Texas Instruments Incorporated | Mechanism for pipelining loops with irregular loop control |
US7991984B2 (en) * | 2005-02-17 | 2011-08-02 | Samsung Electronics Co., Ltd. | System and method for executing loops in a processor |
US20060190710A1 (en) * | 2005-02-24 | 2006-08-24 | Bohuslav Rychlik | Suppressing update of a branch history register by loop-ending branches |
US7526633B2 (en) * | 2005-03-23 | 2009-04-28 | Qualcomm Incorporated | Method and system for encoding variable length packets with variable instruction sizes |
GB0524720D0 (en) * | 2005-12-05 | 2006-01-11 | Imec Inter Uni Micro Electr | Ultra low power ASIP architecture II |
US20070266229A1 (en) * | 2006-05-10 | 2007-11-15 | Erich Plondke | Encoding hardware end loop information onto an instruction |
US20080040591A1 (en) * | 2006-08-11 | 2008-02-14 | Moyer William C | Method for determining branch target buffer (btb) allocation for branch instructions |
-
2008
- 2008-06-27 US US12/147,893 patent/US20090327674A1/en not_active Abandoned
-
2009
- 2009-06-24 WO PCT/US2009/048370 patent/WO2009158370A2/en active Application Filing
- 2009-06-24 CN CN200980123763.2A patent/CN102067087B/en not_active Expired - Fee Related
- 2009-06-24 JP JP2011516552A patent/JP5536052B2/en not_active Expired - Fee Related
- 2009-06-24 KR KR1020117002173A patent/KR101334863B1/en not_active IP Right Cessation
- 2009-06-24 EP EP09770903A patent/EP2304557A2/en not_active Ceased
- 2009-06-26 TW TW098121712A patent/TW201015431A/en unknown
-
2014
- 2014-04-24 JP JP2014090336A patent/JP5917592B2/en not_active Expired - Fee Related
-
2016
- 2016-04-06 JP JP2016076753A patent/JP2016157463A/en active Pending
Non-Patent Citations (1)
Title |
---|
None |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2680132A3 (en) * | 2012-06-29 | 2016-01-06 | Analog Devices, Inc. | Staged loop instructions |
EP2725483A3 (en) * | 2012-10-23 | 2015-06-17 | Analog Devices Global | Predicate counter |
US9201828B2 (en) | 2012-10-23 | 2015-12-01 | Analog Devices, Inc. | Memory interconnect network architecture for vector processor |
US9342306B2 (en) | 2012-10-23 | 2016-05-17 | Analog Devices Global | Predicate counter |
CN109643270A (en) * | 2016-08-24 | 2019-04-16 | 谷歌有限责任公司 | Multi-layer testing external member generates |
CN109643270B (en) * | 2016-08-24 | 2022-03-11 | 谷歌有限责任公司 | Method and system for multi-layer test suite generation |
Also Published As
Publication number | Publication date |
---|---|
TW201015431A (en) | 2010-04-16 |
JP2016157463A (en) | 2016-09-01 |
US20090327674A1 (en) | 2009-12-31 |
CN102067087A (en) | 2011-05-18 |
KR101334863B1 (en) | 2013-12-02 |
WO2009158370A3 (en) | 2010-02-25 |
JP2011526045A (en) | 2011-09-29 |
KR20110034656A (en) | 2011-04-05 |
EP2304557A2 (en) | 2011-04-06 |
JP5536052B2 (en) | 2014-07-02 |
JP2014170571A (en) | 2014-09-18 |
JP5917592B2 (en) | 2016-05-18 |
CN102067087B (en) | 2014-04-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20090327674A1 (en) | Loop Control System and Method | |
JP2011526045A5 (en) | ||
JP6345623B2 (en) | Method and apparatus for predicting non-execution of conditional non-branching instructions | |
US5822602A (en) | Pipelined processor for executing repeated string instructions by halting dispatch after comparision to pipeline capacity | |
CN107450888B (en) | Zero overhead loop in embedded digital signal processor | |
EP0965910A2 (en) | Data processor system having branch control and method thereof | |
US8843730B2 (en) | Executing instruction packet with multiple instructions with same destination by performing logical operation on results of instructions and storing the result to the destination | |
JP2008535065A (en) | Indirect register read and write operations | |
US9361109B2 (en) | System and method to evaluate a data value as an instruction | |
KR100551544B1 (en) | Hardware loops | |
JP3738253B2 (en) | Method and apparatus for processing program loops in parallel | |
JP2004513427A (en) | Hardware loop | |
KR100536018B1 (en) | Hardware loops | |
US20110219212A1 (en) | System and Method of Processing Hierarchical Very Long Instruction Packets | |
KR100576560B1 (en) | Speculative register adjustment | |
US20110296143A1 (en) | Pipeline processor and an equal model conservation method | |
US20170052782A1 (en) | Delayed zero-overhead loop instruction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 200980123763.2 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 09770903 Country of ref document: EP Kind code of ref document: A2 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2607/MUMNP/2010 Country of ref document: IN |
|
ENP | Entry into the national phase |
Ref document number: 2011516552 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2009770903 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 20117002173 Country of ref document: KR Kind code of ref document: A |