throbber
3:14-cv-00062-wmc
`
`X0416
`
`(cid:3)P
`
`Integrated Predicated and Speculative Execution in the IMPACT EPIC
`Architecture
`
`David I. August Daniel A. Connors
`John W. Sias Kevin M. Crozier
`Scott A. Mahlket
`Patrick R. Eaton Qudus B. Olaniran Wen-mei W. Hwu
`Ben-Chung Cheng
`
`Center for Reliable and High-Performance Computing
`University of Illinois
`Urbana-Champaign, IL 61801
`{august, dconnors, sias, crozier, bccheng, eaton, mrq, hwu }@crhc.uiuc.edu
`
`tHewlett-Packard Laboratories
`Hewlett-Packard
`Palo Alto, CA 94304
`mahlke@hpl.hp.com
`
`Abstract
`
`Explicitly Parallel Instruction Computing (EPIC) architectures
`require the compiler to express program instruction level paral(cid:173)
`lelism directly to the hardware. EPIC techniques which enable the
`compiler to represent control speculation, data dependence spec(cid:173)
`ulation, and predication have individually been shown to be very
`effective. However, these techniques have not been studied in com(cid:173)
`bination with each other. This paper presents the IMPACT EPIC
`Architecture to address the issues involved in designing processors
`based on these EPIC concepts. In particular, we focus on new
`execution and recovery models in which microarchitectural sup(cid:173)
`port for predicated execution is also used to enable efficient recov(cid:173)
`ery from exceptions caused by speculatively executed instructions.
`This paper demonstrates that a coherent framework to integrate
`the three techniques can be elegantly designed to achieve much
`better performance than each individual technique could alone
`provide.
`
`1. Introduction
`
`The performance of modem processors is increasingly depen(cid:173)
`dent on their ability to execute multiple instructions per cycle.
`While mainstream microprocessors in 1990 executed at most one
`instruction per cycle [ 5] [7], those in 1995 had the ability to execute
`up to fom instructions per cycle [6]. By the year 2000, hardware
`technology will be capable of producing microprocessors that ex(cid:173)
`ecute up to sixteen instructions per clock cycle. Such rapid, dra(cid:173)
`matic increases in hardware parallelism have placed tremendous
`pressme on compiler technology. Without appropriate instruction
`set architecture support, it can be very costly in terms of code size
`and compile time for the compiler to expose sufficient amounts
`of Instruction Level Parallelism (ILP) to the hardware. As a re(cid:173)
`sult, an increasingly important aspect of computer architectme is
`to provide the compiler with means to control compile-time and
`run-time costs while enhancing the amount of ILP visible to the
`hardware.
`The term Explicitly Parallel Instruction Computing (EPIC) was
`coined recently by Hewlett Packard and Intel in their joint an(cid:173)
`nouncement of the IA-64 instruction set [10]. It refers to archi-
`
`tectmes in which featmes are provided to facilitate compiler en(cid:173)
`hancements of ILP in all programs. It is natural to expect that
`the coming generation of EPIC architectmes will have features
`to overcome the worst impediments to a compiler's ability to en(cid:173)
`hance ILP: frequent control transfers and ambiguous memory de(cid:173)
`pendences. Three such featmes have been proposed and studied
`in the literatme. Predication allows the compiler to overlap the
`execution of independent control constructs without code explo(cid:173)
`sion [12]. It also enables the compiler to reduce the frequency of
`branch instructions, to reduce branch mispredictions, and to per(cid:173)
`form sophisticated control flow optimizations [16][19][23]. Pred(cid:173)
`ication does this at the cost of increased fetch utilization. Control
`speculation allows the compiler to judiciously eliminate control
`dependences at the cost of increased register consumption and in(cid:173)
`struction overhead [14][21]. Data dependence speculation enables
`the compiler to overcome ambiguous memory dependences, also
`at the cost of increased register consumption and instruction over(cid:173)
`head [8][12].
`
`Although these three techniques have been studied individu(cid:173)
`ally, issues involved in synthesizing a coherent architectme that
`supports all of them have not been addressed in the literature.
`In [16], the benefit of predication support was studied with a pred(cid:173)
`ication compiler. However, the accompanying control speculation
`model, based on silent instructions, did not precisely detect all ex(cid:173)
`ceptions. Sentinel speculation was introduced in [14] to provide
`accmate detection of and recovery from exceptions; however, the
`sentinel speculation model was not developed in the context of a
`predicated architectme. [8] presented a compiler-directed data de(cid:173)
`pendence speculation model based on the Memory Conflict Buffer
`(MCB). However, the model was not defined in the context of a
`predicated architectme. Fmthermore, it used silent instructions to
`eliminate spmious exceptions caused by data speculative memory
`loads and their dependent instructions, preventing accmate detec(cid:173)
`tion of and recovery from all exceptions.
`
`The primary contribution of this paper is the new IMPACT
`EPIC Architectme framework that elegantly supports all three fea(cid:173)
`tures. A machine based on the IMPACT EPIC Architecture frame(cid:173)
`work will allow the compiler to achieve several key improvements
`surpassing the current state of the art. First, the compiler can spec(cid:173)
`ulate both control and data flow in predicated code without intro(cid:173)
`ducing spmious exceptions, data page faults, Translation Look-
`
`W0072790
`WARF0072790
`
`PX0416.0001
`
`Valtrus Ex 2011-p. 1
`Google v Valtrus
`IPR2022-01197
`
`

`

`aside Buffer (TLB) misses, or long latency cache misses. Second,
`the microarchitectural support required by predicated instructions
`can also be used to support inline recovery for both control and
`data speculation. Third, a single recovery model can be used for
`both control and data speculation, simplifying the compiler code
`generation scheme.
`The secondary contribution of this paper is to present some
`preliminary experimental results based on a prototype compiler for
`the IMPACT EPIC Architecture and initial insights into the perfor(cid:173)
`mance characteristics of the architecture. These results will show
`that combining control speculation, data dependence speculation,
`and predicated execution into a coherent architecture provides a
`significantly greater performance potential than any one of these
`techniques alone could provide, and that an efficient mechanism
`can be designed for detection of and recovery from speculative
`exceptions in such an architecture.
`
`2. Background and motivation
`
`IMPACT EPIC
`the
`features of
`three enabling
`The
`Architecture-control speculation, data dependence specula(cid:173)
`tion, and predicated execution-are examined in this section.
`First, the individual merits of each feature are presented. Then,
`the potential benefits of combining the features into a coherent
`architecture are described. A running example consisting of the
`if-then-else C statement shown in Figure 1a is used to focus the
`In the example, a conjnnction of three conditions
`discussion.
`is evaluated to alternatively increment either the variable val5
`or the variable va/6. Note that the second condition evaluation
`also has the side effect of updating the location pointed to by
`ptr2. The corresponding, scheduled assembly code is presented
`in Figure 1 b. The processor model assumed for illustration
`purposes is a 6-issue processor capable of executing one branch
`per cycle, with no further restrictions on the combination of
`operations that may be concurrently issued. Conditional branches
`require separate comparison and control transfer operations. All
`operations are assumed to have a latency of one cycle, with the
`exception of memory loads which have a latency of two cycles.
`Figure 1 b shows that the schedule for this code segment is
`rather sparse. The exact execution time through this code is depen(cid:173)
`dent on the fraction of time each branch is taken. Thus, two mea(cid:173)
`sures of execution time will be used for explanation: the longest
`path length and the average schedule length given that each condi(cid:173)
`tional branch is taken 25% of the time. In this example, the longest
`execution path is 13 cycles, and the average schedule length is
`10.25 cycles.
`
`2.1. Speculation
`
`Compiler-controlled speculation refers to breaking inherent
`programmatic dependences by guessing the outcome of a run-time
`event at compile time. As a result, the available ILP in the program
`is increased by reducing the height oflong dependence chains and
`by increasing the scheduling freedom amongst the operations.
`Control speculation breaks control dependences which occur
`between branches and other operations [4][14][22]. An operation
`is control dependent on a branch if the branch determines whether
`control flow will actually reach the operation during the execu(cid:173)
`tion of the program. A control dependence is broken by guessing
`a branch will go in a particular direction, thereby making an op(cid:173)
`eration's execution independent of the branch. By breaking con-
`
`trol dependences, the compiler is able to aggressively move op(cid:173)
`erations across branches and systematically reduce control depen(cid:173)
`dence height, which often results in a more compact schedule.
`Data dependence speculation, to which we will refer as "data
`speculation" throughout the remainder of this work, breaks data
`flow dependences between memory operations. Two memory op(cid:173)
`erations are flow dependent on one another if the first operation
`writes a value to an address and the second operation potentially
`reads from the same address. Thus, the original ordering of the
`memory operations must be maintained to ensure proper value
`flow. Note that for a dependence to exist the operation need only
`potentially read from the same address. Thus, if two memory op(cid:173)
`erations are not provably independent, they are dependent by def(cid:173)
`inition. Such memory dependences in which the dependence con(cid:173)
`dition is not certain are referred to as ambiguous memory depen(cid:173)
`dences. A memory dependence is broken by guessing that the two
`memory operations will access different locations, thereby making
`the operations independent of one another.
`Data speculation techniques can be classified in two major cat(cid:173)
`egories. The first category contains mechanisms that assist hard(cid:173)
`ware schedulers or hardware data prefetch techniques with re(cid:173)
`ordering memory operations [9][17]. The second category con(cid:173)
`tains mechanisms that assist compiler schedulers with reordering
`memory operations [8][12]. This work focuses on the second cate(cid:173)
`gory. With data speculation support, the compiler is able to aggres(cid:173)
`sively reorder memory operations and effectively reduce memory
`dependence height which again results in a more compact sched(cid:173)
`ule.
`Applying speculation to the code in Fignre 1 results in the
`tighter schedule shown in Fignre 1c, in which <CS> and <DS>
`denote operations which have been speculated with regard to con(cid:173)
`trol or data. The resultant increase in ILP is achieved primarily
`by applying speculation to two of the loads (operations 4 and 8).
`In the original code segment, operation 4 is control dependent on
`operation 3. However, control speculation enables the compiler to
`break that control dependence and move load operation 4 to the top
`of the block. Operation 8 is control dependent on both operations
`3 and 7, as well as memory dependent on operation 5. The mem(cid:173)
`ory dependence is an ambiguous memory dependence because the
`compiler cannot prove that ptr2 does not point to the same location
`as ptr4. By applying both control and data speculation to operation
`8, all three dependences are broken allowing it to move to the top
`of the block as well. The net result is that the dependence height
`of the code segment is cut nearly in half. Thus, the longest path
`length is reduced from 13 to 7 cycles and the average schedule
`length is reduced from 10.25 to 6.31 cycles.
`Due to the breaking of control dependences, speculated oper(cid:173)
`ations execute more frequently than their non-speculated counter(cid:173)
`parts in the original code. For this reason, exceptions generated by
`speculated operations can either be genuine, reflecting exception
`conditions present in the original code, or spurious, resulting from
`nnnecessary execution of speculative operations.
`Suppression of spurious exceptions is required for both correct
`program execution and high performance. Speculative operations,
`like ordinary operations, may cause non-terminal exceptions that
`are time consuming to repair. Page faults, TLB misses, long la(cid:173)
`tency cache misses, and other such exceptions could cost hnndreds
`of cycles to service. While it would be possible to handle such an
`exception immediately on execution of the speculative operation,
`when the speculative operation is not necessary, time is wasted re(cid:173)
`pairing a spurious exception. The performance effects of spurious
`
`W0072791
`WARF0072791
`
`PX0416.0002
`
`Valtrus Ex 2011-p. 2
`Google v Valtrus
`IPR2022-01197
`
`

`

`if((*ptrl == 0) && ((*ptr2 = *ptr3) == 1) && (*ptr4 > 2))
`val5++;
`
`0
`
`(I) rll =MEM[rl]
`
`(4) rl3 = MEM[r3]<CS>
`
`(8) rl4 = MEM[r4]<CS,DS>
`
`else
`
`val6++;
`
`0
`
`(I) rll = MEM[rl]
`
`4
`
`(4) rl3 = MEM[r3]
`
`6
`
`(5) MEM[r2] = rl3
`
`8
`
`(8) rl4 = MEM[r4]
`
`10
`
`(9) c3 = (rl4 <= 2)
`
`(a)
`
`(3) jump cl, ELSE
`
`(6) c2 = (rl3 '= I)
`
`(7) jump c2, ELSE
`
`II ~------------------~(~IO~)~jr~rm2p~c~3~,E~L~S~E------~
`12 (ll)r5=r5+1
`(12)jumpCONTINUE
`
`ELSE:
`0 (13)r6=r6+1
`
`CONTINUE:
`
`(b)
`
`(6) c2 = (rl3 '= I)<CS>
`
`(9) c3 = (rl4 <= 2)<CS>
`
`(5) MEM[r2] = rl3
`
`(3) jump cl, ELSE
`
`(7) jump c2, ELSE
`
`( 10) jump c3, ELSE
`
`(12) jump CONTINUE
`
`4
`
`(4') Checkrl3
`
`5 (8')Checkrl4
`6 (II) r5=r5+ I
`
`ELSE:
`0 (13) r(Fr6+ I
`
`CONTINUE:
`
`(c)
`
`0
`
`(I) rll = MEM[rl]
`
`(14) p4=0
`
`I (15) p2 =I
`2
`(2) p4of, plut = (rll == 0)
`
`(16) p3 =I
`(2') p2at, p3at = (rll == 0)
`
`3
`
`(4) rl3 = MEM[r3]
`
`<pi
`
`5
`
`6
`
`(5) MEM[r2] = rl3
`
`<pi
`
`(6) p4of,p2at=(rl3== I)
`
`(8) rl4 = MEM[r4]
`
`<p2
`
`(6') p3at = (rl3 == I)
`
`(9) p4of, p3at = (rl4 > 2)
`8
`9 (II) r5=r5+ I
`
`(13) r6=r6+ I
`<p3
`(d)
`
`<p4
`
`Figure 1. C-source code (a), its initial schedule (b), with speculation alone (c), and with predication alone (d).
`
`speculative exceptions are quantified in Section 4.
`
`2.2. Predication
`
`To eliminate spurious exceptions, delayed exception handling
`is required [14]. This can be accomplished by taking exceptions
`only when the results of speculative operations are used non(cid:173)
`speculatively, indicating that the speculated code would have ex(cid:173)
`ecuted in the original program. A symbolic operation, called a
`check, is responsible for detecting any problems that occurred in
`previous speculative execution. When an error is detected by a
`check instruction, either an exception is reported or repair is ini(cid:173)
`tiated. By positioning the check at the point of the original op(cid:173)
`eration, the error detection and repair is guaranteed only to occur
`when the original operation would have been executed by a non(cid:173)
`speculated version of the program.
`
`For data speculation, repair is necessary when an actual data
`dependence existed between the speculated load and one or more
`stores presumed to be independent at compile time. The check
`queries the hardware to detect if a dependence actually existed for
`this execution and initiates repair if required.
`
`In Figure lc, operations 4' and 8' are the previously discussed
`symbolic check operations. There are two important points worth
`making regarding check operations. First, the presence of a sym(cid:173)
`bolic check does not necessarily indicate the presence of a real
`check operation. This is dependent on the speculation model and
`will be addressed in the next section. Second, speculative oper(cid:173)
`ations that the compiler can prove will cause no undesirable side
`effects do not require a symbolic check. For this example, opera(cid:173)
`tions 6 and 9 are control-speculative, but are certain to cause no ex(cid:173)
`ceptions, so no check is provided. In general, all data-speculative
`and all potentially excepting control speculative operations require
`checks.
`
`Predicated execution is a mechanism that supports condi(cid:173)
`tional execution of individual operations based on Boolean guards,
`which are implemented as predicate register values [11][20]. With
`predication, the representation of programmatic control flow can
`be inherently changed. A conventional processor requires that all
`control flow be explicitly represented in the form of branches be(cid:173)
`cause that is the only mechanism available to conditionally execute
`operations. However, a processor with support for predicated exe(cid:173)
`cution can support conditional execution either with conventional
`branches or with conditional operations. As a result, the compiler
`has the opportunity to physically restructure the program control
`flow into a more efficient form for execution on a wide-issue pro(cid:173)
`cessor.
`A compiler converts control flow into predicates by applying
`if-conversion. If-conversion translates conditional branches into
`predicate defining operations and guards operations along alterna(cid:173)
`tive paths of control under the computed predicates [1][16][18].
`A predicated operation is fetched regardless of its predicate value.
`An operation whose predicate is TRUE is executed normally. Con(cid:173)
`versely, an operation whose predicate is FALSE is prevented from
`modifying the processor state. With if-conversion, complex nets
`of branching code can be replaced by a straight-line sequence
`of predicated code. There are many benefits associated with ap(cid:173)
`plying if-conversion. First, a compiler can eliminate problematic
`branches from the program. In doing so, all overhead associated
`with these branches, including misprediction penalties, penalties
`for redirecting sequential instruction fetch, and branch resource
`contention, is removed [ 15] [23]. In addition, predication increases
`ILP by allowing separate control flow paths to be overlapped and
`simultaneously executed in a single thread of control.
`
`W0072792
`WARF0072792
`
`PX0416.0003
`
`Valtrus Ex 2011-p. 3
`Google v Valtrus
`IPR2022-01197
`
`

`

`(I) rii~MCM[rl]
`
`(4) rl3 ~ MCM[r3]<CS>
`
`(8) rl4 ~ MCM[r4]<CS,DS>
`
`I (14)p4~0
`
`(15) p2~ I
`
`(16) p3 ~I
`
`(2) p4of, plut~ (rll ~~ 0)
`
`(2') p2at, p3at ~ (rll ~~ 0)
`
`(6) p4of, p2at~ (rl3 ~~ I)<CS>
`
`(6') p3at~ (rl3 ~~ I)<CS>
`
`(9) p4of, p3at ~ (rl4 > 2)<CS>
`
`3
`
`(4') Check rl3
`
`<pi
`
`(5) MCM[r2] ~ rl3
`
`<pi
`
`(8') Check rl4
`
`<p2
`
`(II) r5~r5+ I
`
`<p3
`
`(13) r<Fr6+ I
`
`<p4>
`
`Figure 2. Scheduled code example with predication, control speculation, and data speculation applied.
`
`Figme ld illustrates an if-conversion of the code segment from
`Figme 1 b. The conjunction of the three conditions results in a
`relatively complex control structme that can be restructured effec(cid:173)
`tively using predication. The predicate for each operation is shown
`within angle brackets. For example, operation 4 is predicated on
`pl. The absence of a predicate indicates that the operation is al(cid:173)
`ways executed.
`Predicates are computed using predicate define operations,
`such as operation 2. The semantics for the predicate defines are de(cid:173)
`scribed in the IMPACT EPIC 1.0 Architectme and Instruction Set
`Reference Manual [2]. Predicate define operations compute one
`or two predicates. The letters after a destination predicate indicate
`the type of predicate assignment being performed. In this example,
`three predicate types are utilized: unconditional-true (ut), or-false
`(of), and and-true (at). For operation 2, pi is an ut predicate and
`is set to TRUE if r 11 == 0 evaluates to TRUE. Otherwise, it is
`set to FALSE. The other destination for operation 2, p4, is an of
`predicate which is set to TRUE if r11 == 0 evaluates to FALSE.
`Otherwise, the value of p4 is not modified. Note that operations 6
`and 9 also possibly set p4, making it TRUE if any of the operations
`write a TRUE value. Hence,p4 is the logical OR of the conditions
`specified by operations 2, 6, and 9. And-type predicates through a
`similar behavior compute the logical AND of multiple conditions.
`A requirement of using or-type and and-type predicates is that they
`are explicitly initialized to 0 (operation 14), and 1 (operations 15
`and 16), respectively.
`The predicated code is significantly different from the original
`code. All fom branches are removed by applying if-conversion,
`resulting in a single sequential stream of predicated operations.
`With the branches removed, all mispredictions and other run-time
`branch penalties are eliminated. Furthermore, ILP is increased by
`overlapping the execution of the "then" and "else" paths of the
`original code. Operations 11 and 13 are executed concurrently
`with the appropriate one taking effect based on the predicate val(cid:173)
`ues. After full if-conversion, all instructions are fetched, yielding a
`schedule length independent of branch conditions. Therefore, the
`longest and expected paths, respectively 13 and 10.25 cycles in
`the original code, are both reduced to 10 cycles. This effect, com(cid:173)
`bined with elimination of runtime overhead, shows some benefits
`of predication.
`
`2.3. Combining speculation and predication
`
`Up to this point, speculation and predication have been exam(cid:173)
`ined in isolation. Each technique on its own provides an effec(cid:173)
`tive opportunity to increase ILP. However, the previous examples
`show that their means of improving performance are fundamen(cid:173)
`tally different. Speculation allows the compiler to break control
`and memory dependences, while predication allows the compiler
`to restructure program control flow and to overlap separate execu(cid:173)
`tion paths. The problems attacked by both techniques often occm
`in conjunction; the techniques can, therefore, be mutually benefi(cid:173)
`cial.
`
`To illustrate the use of speculation and predication in combina(cid:173)
`tion, the previous example is continued in Figme 2. As one would
`expect, the resultant code exhibits characteristics of both previ(cid:173)
`ous examples. If-conversion removes all fom branches, resulting
`in a sequential stream of predicated operations. As before, data
`speculation breaks the dependence between operations 5 and 8.
`Even though no branches remain in the code, control speculation
`is still useful to break dependences between predicate definitions
`and guarded instructions. In this example, the control dependences
`between operations 2 and 4, operations 2' and 8, and operations 6
`and 8 are eliminated by removing the predicates on operations 4
`and 8. These instructions as a result execute more frequently and,
`thus, are in effect speculative. This form of control speculation in
`predicated code is called promotion. As a result of this specula(cid:173)
`tion, the compiler can hoist operations 4 and 8 to the top of the
`block to achieve a more compact schedule. The result is that the
`maximum and expected schedule lengths through the code seg(cid:173)
`ment are reduced to 4 cycles without any branch-related overhead.
`Ignoring branch-related overhead while considering expected path
`performance, predication is only 1.03 times faster and speculation
`is only 1.63 times faster than the original code segment. However,
`the final code segment is 2.56 times faster than the original code.
`This is much more than the speed improvement one would ex(cid:173)
`pect from multiplying together the speedups obtained by applying
`predication and speculation separately.
`One very common misconception is that there are fewer oppor(cid:173)
`tunities for control speculation after if-conversion because many
`of the branches are eliminated. However, this is not true.
`If(cid:173)
`conversion merely converts control dependences to data flow de(cid:173)
`pendences. Therefore, operations are no longer sequentialized
`with branches, but are dependent on the results of predicate define
`operations. Speculation in the form of promotion overcomes these
`predicate flow dependences. As shown in this example, specula(cid:173)
`tion, in the form of promotion, can have a greater positive effect
`on performance after if-conversion than before.
`The synergistic relationship of speculation and predication
`makes combining them into a single architectme very attractive.
`However, several issues must be addressed in designing an effi(cid:173)
`cient architectme based on these EPIC techniques.
`
`3. The IMPACT EPIC execution model
`
`The IMPACT EPIC Architectme exposes instruction-level par(cid:173)
`allelism through predicated execution and compiler-directed con(cid:173)
`trol and data dependence speculation. This section of the paper
`presents the architectural featmes and semantics that enable these
`technologies. As discussed in Section 2.1, an architecture which
`supports speculation must provide mechanisms to detect poten(cid:173)
`tial exceptions on control-speculative operations as they occm,
`to record information about data-speculative memory accesses as
`they occm, and then to check at an appropriate time whether an
`exception should be taken or data-speculative repair should be ini(cid:173)
`tiated. These functions are supported in the IMPACT EPIC Archi-
`
`W0072793
`WARF0072793
`
`PX0416.0004
`
`Valtrus Ex 2011-p. 4
`Google v Valtrus
`IPR2022-01197
`
`

`

`tecture by additions to operation encodings and to the register file
`and by addition of the Memory Conflict Buffer, a device which
`checks speculated loads for conflicts with subsequent stores.
`First, it is important to distinguish speculative operations from
`non-speculative operations, since operations which have not been
`control speculated should report exceptions immediately and loads
`which have not been speculated with regard to data dependence
`need not interact with memory conflict detection hardware. This is
`accomplished by the addition of a bit to each operation which can
`be speculated, called the S-hit, and an additional bit to each load,
`the DS-hit. The S-bit is set on operations which are either control(cid:173)
`speculated or are data dependent on a data-speculative load. The
`DS-bit is set only on data-speculative loads.
`Second, a mechanism must exist to record an exception on a
`control-speculative operation until a check, located at the opera(cid:173)
`tion's original location in the control flow, examines the result of
`the speculative execution. This is accomplished by the addition of
`a single bit to each register, which is forwarded with its associated
`register. This bit is called the E-tag and, when set, indicates that
`an exception occurred in the generation of the value stored in its
`register. By appropriately generating and propagating E-tags, the
`machine maintains sufficient information about pending specula(cid:173)
`tive exceptions to report them if necessary. The delayed excep(cid:173)
`tion model presented for use in the IMPACT EPIC Architecture,
`including theE-tags and S-bits, is an extension of the original Sen(cid:173)
`tinel scheduling model proposed in [ 14].
`Third, a mechanism must be provided to store the source ad(cid:173)
`dresses of data-speculative loads until their independence with re(cid:173)
`spect to intervening stores can be established. This functionality is
`provided by the Memory Conflict Buffer [8][13]. The MCB tem(cid:173)
`porarily associates the destination register number of a speculative
`load with the address from which the value in the register was
`speculatively loaded. Destination addresses of subsequent stores
`are checked against the addresses in the buffer to detect memory
`conflicts. The MCB is queried by explicit data speculation check
`instructions, which initiate recovery if a conflict is discovered to
`have occurred.
`Finally, the architecture must be able efficiently to recover
`from exceptions on control-speculative operations as well as from
`conflicts encountered in data dependence speculation. Two ear(cid:173)
`lier approaches, write-back suppression [3] and instruction boost(cid:173)
`ing [22] both provide accurate recovery from excepting speculated
`instructions, but at a limiting hardware cost. Write-back suppres(cid:173)
`sion requires the addition of fields to each instruction to identify
`the instruction's home block and an additional recovery Program
`Counter (PC) stack. Instruction boosting requires multiple shadow
`register files in addition to similar instruction fields. Both models
`are limited in the number of branches above which an instruction
`can be speculated by these hardware cost considerations. Both
`models are also limited to speculating along a single path of con(cid:173)
`trol. The IMPACT EPIC recovery model, based on the Sentinel
`model, does not suffer from these limitations. In addition, in an
`improvement to the original Sentinel recovery model, the IMPACT
`EPIC Architecture adds an additional bit to each register, the R(cid:173)
`tag, which is used to selectively execute only data flow successors
`of excepting speculative operations during recovery. One should
`note that the benefits of this recovery model come at the cost of
`increased register pressure and heightened compiler complexity as
`compared to the write-back suppression and instruction boosting
`methods.
`Given these architectural elements-one or two additional bits
`
`in operation encodings, two bits added to each register, and the
`Memory Conflict Buffer-the IMPACT EPIC Architecture can ac(cid:173)
`curately detect, report, and recover from exactly those exceptions
`that would have occurred in non-speculated code, and can recover
`from memory conflicts in data-speculative code.
`
`3.1. Exception detection for control speculation
`
`A control-speculative operation is executed more frequently
`than its non-speculated counterpart in the original code. When
`such an operation generates an exception condition, it is not known
`whether or not the operation would have executed in the origi(cid:173)
`nal code, and therefore, whether or not the exception should be
`reported. Thus it is necessary to suppress the exception while
`recording sufficient information to report the exception if it is later
`discovered to be genuine. This is accomplished using register E(cid:173)
`tags. When a speculative operation completes without exception,
`its result is deposited into its destination register and the register's
`E-tag is cleared to indicate successful completion. If, however,
`an exception condition occurs, the destination register's E-tag is
`set. If the destination is a non-predicate register, the excepting op(cid:173)
`eration's PC value is also deposited into the register for potential
`use in recovery. Since the exception detection and recovery model
`used in the IMPACT EPIC Architecture uses bits in the register
`file to maintain pending exceptions, potentially excepting opera(cid:173)
`tions which do not write their results into a destination register
`may not be speculated.
`Results of excepting speculative operations are thus labeled by
`set E-tags in the register file. If these tagged registers are used as
`sources to other speculative operations, the E-tag and PC of the
`originally excepting operation are copied to the destination regis(cid:173)
`ter. In this manner, a speculatively generated exception is prop(cid:173)
`agated via data flow through other speculated operations until a
`non-speculative use is reached. A non-speculative use of specu(cid:173)
`latively generated data constitutes a check, which in the case of
`control speculation can be an explicit check operation or simply
`any non-speculative operation that sources a non-predicate regis(cid:173)
`ter, called an implicit check. If the check sources a register with
`a clear E-tag, indicating speculative execution completed without
`exception, execution may continue normally. If, however, a source
`E-tag is set, an exception occurred during speculative execution
`and repair is required. Since the check executes only when the
`speculated operation would have executed in the original code, this
`guarantees that exceptions are taken only when a correct result of
`program execution and not merely a side-effect of speculation.
`Predicate registers and predicate define operations require
`some extra consideration. Obviously, a predicate register caunot
`accommodat

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket