`
`X0416
`
`(cid:3)P
`
`Integrated Predicated and Speculative Execution in the IMPACT EPIC
`Architecture
`
`David I. August Daniel A. Connors
`John W. Sias Kevin M. Crozier
`Scott A. Mahlket
`Patrick R. Eaton Qudus B. Olaniran Wen-mei W. Hwu
`Ben-Chung Cheng
`
`Center for Reliable and High-Performance Computing
`University of Illinois
`Urbana-Champaign, IL 61801
`{august, dconnors, sias, crozier, bccheng, eaton, mrq, hwu }@crhc.uiuc.edu
`
`tHewlett-Packard Laboratories
`Hewlett-Packard
`Palo Alto, CA 94304
`mahlke@hpl.hp.com
`
`Abstract
`
`Explicitly Parallel Instruction Computing (EPIC) architectures
`require the compiler to express program instruction level paral(cid:173)
`lelism directly to the hardware. EPIC techniques which enable the
`compiler to represent control speculation, data dependence spec(cid:173)
`ulation, and predication have individually been shown to be very
`effective. However, these techniques have not been studied in com(cid:173)
`bination with each other. This paper presents the IMPACT EPIC
`Architecture to address the issues involved in designing processors
`based on these EPIC concepts. In particular, we focus on new
`execution and recovery models in which microarchitectural sup(cid:173)
`port for predicated execution is also used to enable efficient recov(cid:173)
`ery from exceptions caused by speculatively executed instructions.
`This paper demonstrates that a coherent framework to integrate
`the three techniques can be elegantly designed to achieve much
`better performance than each individual technique could alone
`provide.
`
`1. Introduction
`
`The performance of modem processors is increasingly depen(cid:173)
`dent on their ability to execute multiple instructions per cycle.
`While mainstream microprocessors in 1990 executed at most one
`instruction per cycle [ 5] [7], those in 1995 had the ability to execute
`up to fom instructions per cycle [6]. By the year 2000, hardware
`technology will be capable of producing microprocessors that ex(cid:173)
`ecute up to sixteen instructions per clock cycle. Such rapid, dra(cid:173)
`matic increases in hardware parallelism have placed tremendous
`pressme on compiler technology. Without appropriate instruction
`set architecture support, it can be very costly in terms of code size
`and compile time for the compiler to expose sufficient amounts
`of Instruction Level Parallelism (ILP) to the hardware. As a re(cid:173)
`sult, an increasingly important aspect of computer architectme is
`to provide the compiler with means to control compile-time and
`run-time costs while enhancing the amount of ILP visible to the
`hardware.
`The term Explicitly Parallel Instruction Computing (EPIC) was
`coined recently by Hewlett Packard and Intel in their joint an(cid:173)
`nouncement of the IA-64 instruction set [10]. It refers to archi-
`
`tectmes in which featmes are provided to facilitate compiler en(cid:173)
`hancements of ILP in all programs. It is natural to expect that
`the coming generation of EPIC architectmes will have features
`to overcome the worst impediments to a compiler's ability to en(cid:173)
`hance ILP: frequent control transfers and ambiguous memory de(cid:173)
`pendences. Three such featmes have been proposed and studied
`in the literatme. Predication allows the compiler to overlap the
`execution of independent control constructs without code explo(cid:173)
`sion [12]. It also enables the compiler to reduce the frequency of
`branch instructions, to reduce branch mispredictions, and to per(cid:173)
`form sophisticated control flow optimizations [16][19][23]. Pred(cid:173)
`ication does this at the cost of increased fetch utilization. Control
`speculation allows the compiler to judiciously eliminate control
`dependences at the cost of increased register consumption and in(cid:173)
`struction overhead [14][21]. Data dependence speculation enables
`the compiler to overcome ambiguous memory dependences, also
`at the cost of increased register consumption and instruction over(cid:173)
`head [8][12].
`
`Although these three techniques have been studied individu(cid:173)
`ally, issues involved in synthesizing a coherent architectme that
`supports all of them have not been addressed in the literature.
`In [16], the benefit of predication support was studied with a pred(cid:173)
`ication compiler. However, the accompanying control speculation
`model, based on silent instructions, did not precisely detect all ex(cid:173)
`ceptions. Sentinel speculation was introduced in [14] to provide
`accmate detection of and recovery from exceptions; however, the
`sentinel speculation model was not developed in the context of a
`predicated architectme. [8] presented a compiler-directed data de(cid:173)
`pendence speculation model based on the Memory Conflict Buffer
`(MCB). However, the model was not defined in the context of a
`predicated architectme. Fmthermore, it used silent instructions to
`eliminate spmious exceptions caused by data speculative memory
`loads and their dependent instructions, preventing accmate detec(cid:173)
`tion of and recovery from all exceptions.
`
`The primary contribution of this paper is the new IMPACT
`EPIC Architectme framework that elegantly supports all three fea(cid:173)
`tures. A machine based on the IMPACT EPIC Architecture frame(cid:173)
`work will allow the compiler to achieve several key improvements
`surpassing the current state of the art. First, the compiler can spec(cid:173)
`ulate both control and data flow in predicated code without intro(cid:173)
`ducing spmious exceptions, data page faults, Translation Look-
`
`W0072790
`WARF0072790
`
`PX0416.0001
`
`Valtrus Ex 2011-p. 1
`Google v Valtrus
`IPR2022-01197
`
`
`
`aside Buffer (TLB) misses, or long latency cache misses. Second,
`the microarchitectural support required by predicated instructions
`can also be used to support inline recovery for both control and
`data speculation. Third, a single recovery model can be used for
`both control and data speculation, simplifying the compiler code
`generation scheme.
`The secondary contribution of this paper is to present some
`preliminary experimental results based on a prototype compiler for
`the IMPACT EPIC Architecture and initial insights into the perfor(cid:173)
`mance characteristics of the architecture. These results will show
`that combining control speculation, data dependence speculation,
`and predicated execution into a coherent architecture provides a
`significantly greater performance potential than any one of these
`techniques alone could provide, and that an efficient mechanism
`can be designed for detection of and recovery from speculative
`exceptions in such an architecture.
`
`2. Background and motivation
`
`IMPACT EPIC
`the
`features of
`three enabling
`The
`Architecture-control speculation, data dependence specula(cid:173)
`tion, and predicated execution-are examined in this section.
`First, the individual merits of each feature are presented. Then,
`the potential benefits of combining the features into a coherent
`architecture are described. A running example consisting of the
`if-then-else C statement shown in Figure 1a is used to focus the
`In the example, a conjnnction of three conditions
`discussion.
`is evaluated to alternatively increment either the variable val5
`or the variable va/6. Note that the second condition evaluation
`also has the side effect of updating the location pointed to by
`ptr2. The corresponding, scheduled assembly code is presented
`in Figure 1 b. The processor model assumed for illustration
`purposes is a 6-issue processor capable of executing one branch
`per cycle, with no further restrictions on the combination of
`operations that may be concurrently issued. Conditional branches
`require separate comparison and control transfer operations. All
`operations are assumed to have a latency of one cycle, with the
`exception of memory loads which have a latency of two cycles.
`Figure 1 b shows that the schedule for this code segment is
`rather sparse. The exact execution time through this code is depen(cid:173)
`dent on the fraction of time each branch is taken. Thus, two mea(cid:173)
`sures of execution time will be used for explanation: the longest
`path length and the average schedule length given that each condi(cid:173)
`tional branch is taken 25% of the time. In this example, the longest
`execution path is 13 cycles, and the average schedule length is
`10.25 cycles.
`
`2.1. Speculation
`
`Compiler-controlled speculation refers to breaking inherent
`programmatic dependences by guessing the outcome of a run-time
`event at compile time. As a result, the available ILP in the program
`is increased by reducing the height oflong dependence chains and
`by increasing the scheduling freedom amongst the operations.
`Control speculation breaks control dependences which occur
`between branches and other operations [4][14][22]. An operation
`is control dependent on a branch if the branch determines whether
`control flow will actually reach the operation during the execu(cid:173)
`tion of the program. A control dependence is broken by guessing
`a branch will go in a particular direction, thereby making an op(cid:173)
`eration's execution independent of the branch. By breaking con-
`
`trol dependences, the compiler is able to aggressively move op(cid:173)
`erations across branches and systematically reduce control depen(cid:173)
`dence height, which often results in a more compact schedule.
`Data dependence speculation, to which we will refer as "data
`speculation" throughout the remainder of this work, breaks data
`flow dependences between memory operations. Two memory op(cid:173)
`erations are flow dependent on one another if the first operation
`writes a value to an address and the second operation potentially
`reads from the same address. Thus, the original ordering of the
`memory operations must be maintained to ensure proper value
`flow. Note that for a dependence to exist the operation need only
`potentially read from the same address. Thus, if two memory op(cid:173)
`erations are not provably independent, they are dependent by def(cid:173)
`inition. Such memory dependences in which the dependence con(cid:173)
`dition is not certain are referred to as ambiguous memory depen(cid:173)
`dences. A memory dependence is broken by guessing that the two
`memory operations will access different locations, thereby making
`the operations independent of one another.
`Data speculation techniques can be classified in two major cat(cid:173)
`egories. The first category contains mechanisms that assist hard(cid:173)
`ware schedulers or hardware data prefetch techniques with re(cid:173)
`ordering memory operations [9][17]. The second category con(cid:173)
`tains mechanisms that assist compiler schedulers with reordering
`memory operations [8][12]. This work focuses on the second cate(cid:173)
`gory. With data speculation support, the compiler is able to aggres(cid:173)
`sively reorder memory operations and effectively reduce memory
`dependence height which again results in a more compact sched(cid:173)
`ule.
`Applying speculation to the code in Fignre 1 results in the
`tighter schedule shown in Fignre 1c, in which <CS> and <DS>
`denote operations which have been speculated with regard to con(cid:173)
`trol or data. The resultant increase in ILP is achieved primarily
`by applying speculation to two of the loads (operations 4 and 8).
`In the original code segment, operation 4 is control dependent on
`operation 3. However, control speculation enables the compiler to
`break that control dependence and move load operation 4 to the top
`of the block. Operation 8 is control dependent on both operations
`3 and 7, as well as memory dependent on operation 5. The mem(cid:173)
`ory dependence is an ambiguous memory dependence because the
`compiler cannot prove that ptr2 does not point to the same location
`as ptr4. By applying both control and data speculation to operation
`8, all three dependences are broken allowing it to move to the top
`of the block as well. The net result is that the dependence height
`of the code segment is cut nearly in half. Thus, the longest path
`length is reduced from 13 to 7 cycles and the average schedule
`length is reduced from 10.25 to 6.31 cycles.
`Due to the breaking of control dependences, speculated oper(cid:173)
`ations execute more frequently than their non-speculated counter(cid:173)
`parts in the original code. For this reason, exceptions generated by
`speculated operations can either be genuine, reflecting exception
`conditions present in the original code, or spurious, resulting from
`nnnecessary execution of speculative operations.
`Suppression of spurious exceptions is required for both correct
`program execution and high performance. Speculative operations,
`like ordinary operations, may cause non-terminal exceptions that
`are time consuming to repair. Page faults, TLB misses, long la(cid:173)
`tency cache misses, and other such exceptions could cost hnndreds
`of cycles to service. While it would be possible to handle such an
`exception immediately on execution of the speculative operation,
`when the speculative operation is not necessary, time is wasted re(cid:173)
`pairing a spurious exception. The performance effects of spurious
`
`W0072791
`WARF0072791
`
`PX0416.0002
`
`Valtrus Ex 2011-p. 2
`Google v Valtrus
`IPR2022-01197
`
`
`
`if((*ptrl == 0) && ((*ptr2 = *ptr3) == 1) && (*ptr4 > 2))
`val5++;
`
`0
`
`(I) rll =MEM[rl]
`
`(4) rl3 = MEM[r3]<CS>
`
`(8) rl4 = MEM[r4]<CS,DS>
`
`else
`
`val6++;
`
`0
`
`(I) rll = MEM[rl]
`
`4
`
`(4) rl3 = MEM[r3]
`
`6
`
`(5) MEM[r2] = rl3
`
`8
`
`(8) rl4 = MEM[r4]
`
`10
`
`(9) c3 = (rl4 <= 2)
`
`(a)
`
`(3) jump cl, ELSE
`
`(6) c2 = (rl3 '= I)
`
`(7) jump c2, ELSE
`
`II ~------------------~(~IO~)~jr~rm2p~c~3~,E~L~S~E------~
`12 (ll)r5=r5+1
`(12)jumpCONTINUE
`
`ELSE:
`0 (13)r6=r6+1
`
`CONTINUE:
`
`(b)
`
`(6) c2 = (rl3 '= I)<CS>
`
`(9) c3 = (rl4 <= 2)<CS>
`
`(5) MEM[r2] = rl3
`
`(3) jump cl, ELSE
`
`(7) jump c2, ELSE
`
`( 10) jump c3, ELSE
`
`(12) jump CONTINUE
`
`4
`
`(4') Checkrl3
`
`5 (8')Checkrl4
`6 (II) r5=r5+ I
`
`ELSE:
`0 (13) r(Fr6+ I
`
`CONTINUE:
`
`(c)
`
`0
`
`(I) rll = MEM[rl]
`
`(14) p4=0
`
`I (15) p2 =I
`2
`(2) p4of, plut = (rll == 0)
`
`(16) p3 =I
`(2') p2at, p3at = (rll == 0)
`
`3
`
`(4) rl3 = MEM[r3]
`
`<pi
`
`5
`
`6
`
`(5) MEM[r2] = rl3
`
`<pi
`
`(6) p4of,p2at=(rl3== I)
`
`(8) rl4 = MEM[r4]
`
`<p2
`
`(6') p3at = (rl3 == I)
`
`(9) p4of, p3at = (rl4 > 2)
`8
`9 (II) r5=r5+ I
`
`(13) r6=r6+ I
`<p3
`(d)
`
`<p4
`
`Figure 1. C-source code (a), its initial schedule (b), with speculation alone (c), and with predication alone (d).
`
`speculative exceptions are quantified in Section 4.
`
`2.2. Predication
`
`To eliminate spurious exceptions, delayed exception handling
`is required [14]. This can be accomplished by taking exceptions
`only when the results of speculative operations are used non(cid:173)
`speculatively, indicating that the speculated code would have ex(cid:173)
`ecuted in the original program. A symbolic operation, called a
`check, is responsible for detecting any problems that occurred in
`previous speculative execution. When an error is detected by a
`check instruction, either an exception is reported or repair is ini(cid:173)
`tiated. By positioning the check at the point of the original op(cid:173)
`eration, the error detection and repair is guaranteed only to occur
`when the original operation would have been executed by a non(cid:173)
`speculated version of the program.
`
`For data speculation, repair is necessary when an actual data
`dependence existed between the speculated load and one or more
`stores presumed to be independent at compile time. The check
`queries the hardware to detect if a dependence actually existed for
`this execution and initiates repair if required.
`
`In Figure lc, operations 4' and 8' are the previously discussed
`symbolic check operations. There are two important points worth
`making regarding check operations. First, the presence of a sym(cid:173)
`bolic check does not necessarily indicate the presence of a real
`check operation. This is dependent on the speculation model and
`will be addressed in the next section. Second, speculative oper(cid:173)
`ations that the compiler can prove will cause no undesirable side
`effects do not require a symbolic check. For this example, opera(cid:173)
`tions 6 and 9 are control-speculative, but are certain to cause no ex(cid:173)
`ceptions, so no check is provided. In general, all data-speculative
`and all potentially excepting control speculative operations require
`checks.
`
`Predicated execution is a mechanism that supports condi(cid:173)
`tional execution of individual operations based on Boolean guards,
`which are implemented as predicate register values [11][20]. With
`predication, the representation of programmatic control flow can
`be inherently changed. A conventional processor requires that all
`control flow be explicitly represented in the form of branches be(cid:173)
`cause that is the only mechanism available to conditionally execute
`operations. However, a processor with support for predicated exe(cid:173)
`cution can support conditional execution either with conventional
`branches or with conditional operations. As a result, the compiler
`has the opportunity to physically restructure the program control
`flow into a more efficient form for execution on a wide-issue pro(cid:173)
`cessor.
`A compiler converts control flow into predicates by applying
`if-conversion. If-conversion translates conditional branches into
`predicate defining operations and guards operations along alterna(cid:173)
`tive paths of control under the computed predicates [1][16][18].
`A predicated operation is fetched regardless of its predicate value.
`An operation whose predicate is TRUE is executed normally. Con(cid:173)
`versely, an operation whose predicate is FALSE is prevented from
`modifying the processor state. With if-conversion, complex nets
`of branching code can be replaced by a straight-line sequence
`of predicated code. There are many benefits associated with ap(cid:173)
`plying if-conversion. First, a compiler can eliminate problematic
`branches from the program. In doing so, all overhead associated
`with these branches, including misprediction penalties, penalties
`for redirecting sequential instruction fetch, and branch resource
`contention, is removed [ 15] [23]. In addition, predication increases
`ILP by allowing separate control flow paths to be overlapped and
`simultaneously executed in a single thread of control.
`
`W0072792
`WARF0072792
`
`PX0416.0003
`
`Valtrus Ex 2011-p. 3
`Google v Valtrus
`IPR2022-01197
`
`
`
`(I) rii~MCM[rl]
`
`(4) rl3 ~ MCM[r3]<CS>
`
`(8) rl4 ~ MCM[r4]<CS,DS>
`
`I (14)p4~0
`
`(15) p2~ I
`
`(16) p3 ~I
`
`(2) p4of, plut~ (rll ~~ 0)
`
`(2') p2at, p3at ~ (rll ~~ 0)
`
`(6) p4of, p2at~ (rl3 ~~ I)<CS>
`
`(6') p3at~ (rl3 ~~ I)<CS>
`
`(9) p4of, p3at ~ (rl4 > 2)<CS>
`
`3
`
`(4') Check rl3
`
`<pi
`
`(5) MCM[r2] ~ rl3
`
`<pi
`
`(8') Check rl4
`
`<p2
`
`(II) r5~r5+ I
`
`<p3
`
`(13) r<Fr6+ I
`
`<p4>
`
`Figure 2. Scheduled code example with predication, control speculation, and data speculation applied.
`
`Figme ld illustrates an if-conversion of the code segment from
`Figme 1 b. The conjunction of the three conditions results in a
`relatively complex control structme that can be restructured effec(cid:173)
`tively using predication. The predicate for each operation is shown
`within angle brackets. For example, operation 4 is predicated on
`pl. The absence of a predicate indicates that the operation is al(cid:173)
`ways executed.
`Predicates are computed using predicate define operations,
`such as operation 2. The semantics for the predicate defines are de(cid:173)
`scribed in the IMPACT EPIC 1.0 Architectme and Instruction Set
`Reference Manual [2]. Predicate define operations compute one
`or two predicates. The letters after a destination predicate indicate
`the type of predicate assignment being performed. In this example,
`three predicate types are utilized: unconditional-true (ut), or-false
`(of), and and-true (at). For operation 2, pi is an ut predicate and
`is set to TRUE if r 11 == 0 evaluates to TRUE. Otherwise, it is
`set to FALSE. The other destination for operation 2, p4, is an of
`predicate which is set to TRUE if r11 == 0 evaluates to FALSE.
`Otherwise, the value of p4 is not modified. Note that operations 6
`and 9 also possibly set p4, making it TRUE if any of the operations
`write a TRUE value. Hence,p4 is the logical OR of the conditions
`specified by operations 2, 6, and 9. And-type predicates through a
`similar behavior compute the logical AND of multiple conditions.
`A requirement of using or-type and and-type predicates is that they
`are explicitly initialized to 0 (operation 14), and 1 (operations 15
`and 16), respectively.
`The predicated code is significantly different from the original
`code. All fom branches are removed by applying if-conversion,
`resulting in a single sequential stream of predicated operations.
`With the branches removed, all mispredictions and other run-time
`branch penalties are eliminated. Furthermore, ILP is increased by
`overlapping the execution of the "then" and "else" paths of the
`original code. Operations 11 and 13 are executed concurrently
`with the appropriate one taking effect based on the predicate val(cid:173)
`ues. After full if-conversion, all instructions are fetched, yielding a
`schedule length independent of branch conditions. Therefore, the
`longest and expected paths, respectively 13 and 10.25 cycles in
`the original code, are both reduced to 10 cycles. This effect, com(cid:173)
`bined with elimination of runtime overhead, shows some benefits
`of predication.
`
`2.3. Combining speculation and predication
`
`Up to this point, speculation and predication have been exam(cid:173)
`ined in isolation. Each technique on its own provides an effec(cid:173)
`tive opportunity to increase ILP. However, the previous examples
`show that their means of improving performance are fundamen(cid:173)
`tally different. Speculation allows the compiler to break control
`and memory dependences, while predication allows the compiler
`to restructure program control flow and to overlap separate execu(cid:173)
`tion paths. The problems attacked by both techniques often occm
`in conjunction; the techniques can, therefore, be mutually benefi(cid:173)
`cial.
`
`To illustrate the use of speculation and predication in combina(cid:173)
`tion, the previous example is continued in Figme 2. As one would
`expect, the resultant code exhibits characteristics of both previ(cid:173)
`ous examples. If-conversion removes all fom branches, resulting
`in a sequential stream of predicated operations. As before, data
`speculation breaks the dependence between operations 5 and 8.
`Even though no branches remain in the code, control speculation
`is still useful to break dependences between predicate definitions
`and guarded instructions. In this example, the control dependences
`between operations 2 and 4, operations 2' and 8, and operations 6
`and 8 are eliminated by removing the predicates on operations 4
`and 8. These instructions as a result execute more frequently and,
`thus, are in effect speculative. This form of control speculation in
`predicated code is called promotion. As a result of this specula(cid:173)
`tion, the compiler can hoist operations 4 and 8 to the top of the
`block to achieve a more compact schedule. The result is that the
`maximum and expected schedule lengths through the code seg(cid:173)
`ment are reduced to 4 cycles without any branch-related overhead.
`Ignoring branch-related overhead while considering expected path
`performance, predication is only 1.03 times faster and speculation
`is only 1.63 times faster than the original code segment. However,
`the final code segment is 2.56 times faster than the original code.
`This is much more than the speed improvement one would ex(cid:173)
`pect from multiplying together the speedups obtained by applying
`predication and speculation separately.
`One very common misconception is that there are fewer oppor(cid:173)
`tunities for control speculation after if-conversion because many
`of the branches are eliminated. However, this is not true.
`If(cid:173)
`conversion merely converts control dependences to data flow de(cid:173)
`pendences. Therefore, operations are no longer sequentialized
`with branches, but are dependent on the results of predicate define
`operations. Speculation in the form of promotion overcomes these
`predicate flow dependences. As shown in this example, specula(cid:173)
`tion, in the form of promotion, can have a greater positive effect
`on performance after if-conversion than before.
`The synergistic relationship of speculation and predication
`makes combining them into a single architectme very attractive.
`However, several issues must be addressed in designing an effi(cid:173)
`cient architectme based on these EPIC techniques.
`
`3. The IMPACT EPIC execution model
`
`The IMPACT EPIC Architectme exposes instruction-level par(cid:173)
`allelism through predicated execution and compiler-directed con(cid:173)
`trol and data dependence speculation. This section of the paper
`presents the architectural featmes and semantics that enable these
`technologies. As discussed in Section 2.1, an architecture which
`supports speculation must provide mechanisms to detect poten(cid:173)
`tial exceptions on control-speculative operations as they occm,
`to record information about data-speculative memory accesses as
`they occm, and then to check at an appropriate time whether an
`exception should be taken or data-speculative repair should be ini(cid:173)
`tiated. These functions are supported in the IMPACT EPIC Archi-
`
`W0072793
`WARF0072793
`
`PX0416.0004
`
`Valtrus Ex 2011-p. 4
`Google v Valtrus
`IPR2022-01197
`
`
`
`tecture by additions to operation encodings and to the register file
`and by addition of the Memory Conflict Buffer, a device which
`checks speculated loads for conflicts with subsequent stores.
`First, it is important to distinguish speculative operations from
`non-speculative operations, since operations which have not been
`control speculated should report exceptions immediately and loads
`which have not been speculated with regard to data dependence
`need not interact with memory conflict detection hardware. This is
`accomplished by the addition of a bit to each operation which can
`be speculated, called the S-hit, and an additional bit to each load,
`the DS-hit. The S-bit is set on operations which are either control(cid:173)
`speculated or are data dependent on a data-speculative load. The
`DS-bit is set only on data-speculative loads.
`Second, a mechanism must exist to record an exception on a
`control-speculative operation until a check, located at the opera(cid:173)
`tion's original location in the control flow, examines the result of
`the speculative execution. This is accomplished by the addition of
`a single bit to each register, which is forwarded with its associated
`register. This bit is called the E-tag and, when set, indicates that
`an exception occurred in the generation of the value stored in its
`register. By appropriately generating and propagating E-tags, the
`machine maintains sufficient information about pending specula(cid:173)
`tive exceptions to report them if necessary. The delayed excep(cid:173)
`tion model presented for use in the IMPACT EPIC Architecture,
`including theE-tags and S-bits, is an extension of the original Sen(cid:173)
`tinel scheduling model proposed in [ 14].
`Third, a mechanism must be provided to store the source ad(cid:173)
`dresses of data-speculative loads until their independence with re(cid:173)
`spect to intervening stores can be established. This functionality is
`provided by the Memory Conflict Buffer [8][13]. The MCB tem(cid:173)
`porarily associates the destination register number of a speculative
`load with the address from which the value in the register was
`speculatively loaded. Destination addresses of subsequent stores
`are checked against the addresses in the buffer to detect memory
`conflicts. The MCB is queried by explicit data speculation check
`instructions, which initiate recovery if a conflict is discovered to
`have occurred.
`Finally, the architecture must be able efficiently to recover
`from exceptions on control-speculative operations as well as from
`conflicts encountered in data dependence speculation. Two ear(cid:173)
`lier approaches, write-back suppression [3] and instruction boost(cid:173)
`ing [22] both provide accurate recovery from excepting speculated
`instructions, but at a limiting hardware cost. Write-back suppres(cid:173)
`sion requires the addition of fields to each instruction to identify
`the instruction's home block and an additional recovery Program
`Counter (PC) stack. Instruction boosting requires multiple shadow
`register files in addition to similar instruction fields. Both models
`are limited in the number of branches above which an instruction
`can be speculated by these hardware cost considerations. Both
`models are also limited to speculating along a single path of con(cid:173)
`trol. The IMPACT EPIC recovery model, based on the Sentinel
`model, does not suffer from these limitations. In addition, in an
`improvement to the original Sentinel recovery model, the IMPACT
`EPIC Architecture adds an additional bit to each register, the R(cid:173)
`tag, which is used to selectively execute only data flow successors
`of excepting speculative operations during recovery. One should
`note that the benefits of this recovery model come at the cost of
`increased register pressure and heightened compiler complexity as
`compared to the write-back suppression and instruction boosting
`methods.
`Given these architectural elements-one or two additional bits
`
`in operation encodings, two bits added to each register, and the
`Memory Conflict Buffer-the IMPACT EPIC Architecture can ac(cid:173)
`curately detect, report, and recover from exactly those exceptions
`that would have occurred in non-speculated code, and can recover
`from memory conflicts in data-speculative code.
`
`3.1. Exception detection for control speculation
`
`A control-speculative operation is executed more frequently
`than its non-speculated counterpart in the original code. When
`such an operation generates an exception condition, it is not known
`whether or not the operation would have executed in the origi(cid:173)
`nal code, and therefore, whether or not the exception should be
`reported. Thus it is necessary to suppress the exception while
`recording sufficient information to report the exception if it is later
`discovered to be genuine. This is accomplished using register E(cid:173)
`tags. When a speculative operation completes without exception,
`its result is deposited into its destination register and the register's
`E-tag is cleared to indicate successful completion. If, however,
`an exception condition occurs, the destination register's E-tag is
`set. If the destination is a non-predicate register, the excepting op(cid:173)
`eration's PC value is also deposited into the register for potential
`use in recovery. Since the exception detection and recovery model
`used in the IMPACT EPIC Architecture uses bits in the register
`file to maintain pending exceptions, potentially excepting opera(cid:173)
`tions which do not write their results into a destination register
`may not be speculated.
`Results of excepting speculative operations are thus labeled by
`set E-tags in the register file. If these tagged registers are used as
`sources to other speculative operations, the E-tag and PC of the
`originally excepting operation are copied to the destination regis(cid:173)
`ter. In this manner, a speculatively generated exception is prop(cid:173)
`agated via data flow through other speculated operations until a
`non-speculative use is reached. A non-speculative use of specu(cid:173)
`latively generated data constitutes a check, which in the case of
`control speculation can be an explicit check operation or simply
`any non-speculative operation that sources a non-predicate regis(cid:173)
`ter, called an implicit check. If the check sources a register with
`a clear E-tag, indicating speculative execution completed without
`exception, execution may continue normally. If, however, a source
`E-tag is set, an exception occurred during speculative execution
`and repair is required. Since the check executes only when the
`speculated operation would have executed in the original code, this
`guarantees that exceptions are taken only when a correct result of
`program execution and not merely a side-effect of speculation.
`Predicate registers and predicate define operations require
`some extra consideration. Obviously, a predicate register caunot
`accommodat