`Moshovos et al.
`
`IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
`
`[11] Patent Number:
`[45] Date of Patent:
`
`5,781,752
`Jul. 14, 1998
`
`[54] TABLE BASED DATA SPECULATION
`CIRCUIT FOR PARALLEL PROCESSING
`COMPUTER
`
`5,666,506
`
`9/1997 Hesson et a1. ........................ .. 395/392
`
`OTHER PUBLICATIONS
`
`[75] Inventors; Andreas L Moshovos; Scott E
`Breach2 Terani N_ Vijaykumar;
`Gurindar S. Sohi. all of Madison. Wis.
`
`Gurinda Sohi et al.. Instruction Issue Logic for High-Per
`formance Interruptable Pipelined Processors; ACM 1987.
`PP- 27-34
`
`[73] Assignee: Wisconsin Alumni Research
`Foundation. Madison. Wis.
`
`Primary Examiner-Krisna Lim
`Attorney; Agent, or Firm-Quarles & Brady
`
`[21] Appl. No.: 773,992
`-
`_
`Dec‘ 26’ 19%
`[22] Fllcd'
`[51] Int. Cl.6 ...................................................... .. G06F 9/38
`395/392
`............................................. .. 395/392
`
`
`
`[52] US. Cl. [5 8] Field of Search
`
`[56]
`
`Refewnces Cited
`
`US. PATENT DOCUMENTS
`
`ABSTRACT
`[57]
`A predictor circuit permits advanced execution of instruc
`tions depending for their data on previous instructions by
`predicting such dependencies based on previous mis
`speculations detected at the ?nal stages of processing. Syn
`chronilatio? of dependent instructions is Provided by a table
`creating entries for each instance of potential dependency.
`Table entries are created and deleted dynamically to limit
`total memory requirements.
`
`5,664,138
`
`9/1997 Yoshida ................................. .. 395/395
`
`9 Claims, 7 Drawing Sheets
`
`4O
`
`INSTRUCTION FROM
`RETIREMENT CIRCUIT
`
`66
`
`WILL
`THIS BE A
`DATA SPECULATIVE
`LOAD
`
`HANDLE f 70
`READY To LDAD
`
`72
`No
`
`YES f 80
`WAIT UNTIL ANY OF THE
`FOLLOWING EVENTS OCCURS:
`@wAKE UP
`CONSUMER NO LONGER
`DATA SPECULATIVE
`@ sDuAsHED
`
`57
`\ HANDLE
`MIS-SPECULATION
`64
`\ HslilégLEE
`
`6D
`
`‘
`:5
`THERE A
`SUBSEQUENT LDAD
`THAT HAS NoT BEEN
`SQUASHED
`
`52
`\ SQUASH
`LDAD
`
`I
`74\ ISSUE MD
`REQUEST
`I
`76 ‘L
`WAIT UNTIL ANY
`OF THE FOLLOWING
`EvENTs OCCURS:
`@souAsHED
`@LDAD No
`LONGER DATA
`SPECULATIVE
`
`4e
`f
`HANDLE
`SQUASH
`I
`
`I
`
`NO
`
`6B
`\ HANDLE
`L0AD
`
`I
`86
`\ ISSUE
`LOAD
`REQUEST
`
`r
`
`v
`
`Valtrus Ex 2007-p. 1
`Google v Valtrus
`IPR2022-01197
`
`
`
`Valtrus Ex 2007-p. 2
`Google v Valtrus
`IPR2022-01197
`
`
`
`U.S. Patent
`
`Jul. 14, 1998
`
`Sheet 2 0f 7
`
`5,781,752
`
`40
`
`INSTRUCTION FROM
`RETIREMENT CIRCUIT
`
`No
`
`A
`
`WM
`THIS BE A
`
`66
`
`DATA SPECULATIVE
`LOAD
`
`IS THIS
`A sTORE OR A
`LOAD
`
`HANDLE
`READY TO LOAD
`
`/70
`
`72
`NO
`
`YES [80
`wAlT UNTIL ANY OF THE
`FOLLOWING EvENTs OCCURS:
`@WAKE UP
`@cONsuMER NO LONGER
`DATA SPECULATIVE
`@sQuAsHED
`
`——-->
`
`57 \
`
`HANDLE
`MlS-SPECULATION
`
`sOuAsHED
`?
`
`HANDLE
`STORE
`
`THERE A
`SUBSEQUENT LOAD
`THAT HAS NOT BEEN
`SQUAEHED
`
`YES
`62
`\ SQUASH
`
`v
`74
`\ ISSUE LOAD
`REOuEsT
`76 T
`T
`WAIT UNTIL ANY
`OF THE FOLLOWING
`EVENTS OCCURS:
`@SQUASHED
`@LDAD NO
`LONGER DATA
`SPECULATIVE
`
`REQUEST
`
`-’
`
`¢
`
`f
`HANDLE
`SQUASH
`
`Valtrus Ex 2007-p. 3
`Google v Valtrus
`IPR2022-01197
`
`
`
`US. Patent
`
`Jul. 14, 1998
`
`Sheet 3 0f 7
`
`5,781,752
`
`HANDLE READY TO LOAD A70
`
`NO
`
`LOAD IN
`THE PRED. TABLE
`?
`
`FIG. 4
`
`SYNCHRONIZATION
`REQUIRED
`?
`
`106
`
`SYNC.
`TABLE ENTRY
`EXISTS FOR THE
`PARTICULAR LOAD / STORE
`PAIR AND W /THE GIVEN
`DATA ADDRESS
`?
`
`THE SYNC.
`FLAG SET TO 1
`7
`
`[120
`
`YES
`UPDATE THE
`PREDICTION
`TOWARDS "DO NOT
`SYNCHRONIZE"
`122
`I
`FREE SYNC. TABLE ENTRY /
`
`flog
`SELECT A SYNC. TABLE ENTRY
`
`[110
`I
`—FILL IN STORE & LOAD
`INSTRUCTION ADDRESSES
`
`—FILL IN THE DATA ADDRESS
`ACCESSED BY THE CONSUMER
`—FILL IN LDAD ID
`—SET SYNC. FLAG TO D
`
`I
`WA|T_1 /-116
`
`'
`
`WAIT=O / 102
`I
`
`DONE
`
`Valtrus Ex 2007-p. 4
`Google v Valtrus
`IPR2022-01197
`
`
`
`US. Patent
`
`Jul. 14, 1998
`
`Sheet 4 0f 7
`
`5,781,752
`
`PREDICTION TABLE
`LD 8
`ST 10
`1
`
`SYNCHRONIZATION TABLE
`
`AC109
`FIG. 5
`
`V
`
`\
`
`\
`
`/
`
`/
`
`SYNCHRONIZATION TABLE
`PREDICTION TABLE
`LD s[xx1 ST 10
`0‘
`xx‘
`u) 8
`ST 10
`1
`.f
`ii» ___,__
`1
`\
`_
`109
`\112
`\114
`
`\F
`
`FIG. 6
`
`PREDLCTION TABLE
`LD
`LD 8
`ST 10
`1
`.f
`A<-- “if
`\
`109
`
`SYNCHRONIZATION TABLE
`ST [XX]
`1+
`\
`\
`\209
`‘112
`
`XX
`it»
`210/
`
`Valtrus Ex 2007-p. 5
`Google v Valtrus
`IPR2022-01197
`
`
`
`U.S. Patent
`
`Jul. 14, 1998
`
`Sheet 5 of 7
`
`5,781,752
`
`64
`\ HANDLE STORE
`
`201
`
`IS
`STORE
`IN THE PREDICTION
`TABLE
`7
`
`SYNCHRONIZATION
`REQUIRED
`?
`
`204
`
`SYNC.
`TABLE ENTRY
`EXISTS FOR THE
`LOAD / STORE PAIR AND
`W / THE SAME
`DATA ADDR.
`
`212
`
`THE SYNC.
`FLAG OF THE
`ENTRY 1
`7
`
`21
`NO
`4\
`UPDATE PRED|CT|ON
`TOWARDS "SYNCHRONIZE"
`I
`
`21s\ WAKE UP LOAD
`21s\
`I
`FREE SYNC. TABLE ENTRY
`
`DONE
`
`FIG. 7
`
`NO
`
`206
`f
`v
`SELECT A SYNC. TABLE ENTRY
`
`208
`I
`y
`—FILL IN STORE & LOAD
`INSTRUCTION ADDRESSES
`-FILL IN THE DATA ADDRESS
`ACCESSED BY THE STORE
`—SET THE SYNC. FLAG TO 1
`—FILL IN STORE ID
`
`Valtrus Ex 2007-p. 6
`Google v Valtrus
`IPR2022-01197
`
`
`
`US. Patent
`
`Jul. 14, 1998
`
`Sheet 6 of 7
`
`5,781,752
`
`HANDLE MlS-SPECULATION ~55
`
`301
`
`FIG. 9
`
`YES
`
`A PRED. TABLE ENTRY
`FOR THIS LOAD /STORE
`PAIR
`
`_
`REPLACE-1
`
`30s
`/_I
`UPDATE ENTRY’S PREDICTOR
`TOWARDS "DO NoT SYNCHRONIZE"
`
`YES NO
`
`306
`
`'3
`THERE A
`PRED. TABLE
`ENTRY FOR THE
`LOAD
`312
`?
`/
`No
`REPLACE=0
`
`309
`
`IS THE
`PREDICTOR
`BELOW THE REPLACE
`LIMIT
`3’
`
`K310
`FREE PRED. TABLE ENTRY
`REPLACE=1
`
`IS
`THERE A
`PRED. TABLE
`ENTRY FOR THE
`STORE
`7'
`
`314
`
`YES
`
`316
`/
`UPDATE ENTRY’S PREDICTOR
`TowARDs "DO NoT SYNCHRONIZE"
`
`r’
`Y
`UPDATE DATE
`PREDTCToR
`TOWARDS
`"SYNCHRONIZE"
`
`NO
`
`NO
`
`322
`
`REPLACE=0
`
`[8 THE PREDICTOR
`
`BELOW THE REPLACE
`LIMIT
`’
`
`YES
`
`/, 320
`
`24
`
`YES
`
`FREE PRED. TABLE ENTRY
`REPLACE=1
`
`[326
`ALLocATE A PREDICTION TABLE ENTRY I
`J,
`/328
`-FILL IN THE ADDRESSES OF THE LoAD AND sTDRE
`-SET PREDICTOR TO THE DEFAULT VALUE
`
`REPLACE
`:
`2
`N0
`
`DONE
`
`Valtrus Ex 2007-p. 7
`Google v Valtrus
`IPR2022-01197
`
`
`
`US. Patent
`
`Jul. 14, 1998
`
`Sheet 7 of 7
`
`5,781,752
`
`HANDLE LOAD / 68
`
`402
`
`is
`THERE A
`SYNC. TABLE ENTRY
`FOR THIS LOAD
`(LOAD ID)
`7
`
`NO
`
`FIG. 10
`
`YES
`
`404
`FREE THAT ENTRY /
`
`406
`
`IS
`THERE A
`PREDICTION TABLE
`ENTRY FOR THIS
`LOAD
`?
`YES I 408
`UPDATE PREDICTOR
`TOWARDS "DO NOT
`SYNCHRONIZE"
`
`——»1
`
`DONE
`
`HANDLE SQUASH
`
`/46
`
`FIG. 11
`
`502
`
`IS
`THERE A
`SYNC. TABLE ENTRY
`FOR THIS INST.
`7
`
`504 /
`FREE SYNC. TABLE ENTRY
`
`—__’I
`
`DONE
`
`Valtrus Ex 2007-p. 8
`Google v Valtrus
`IPR2022-01197
`
`
`
`1
`TABLE BASED DATA SPECULATION
`CIRCUIT FOR PARALLEL PROCESSING
`COMPUTER
`
`This invention was made with United States government
`support awarded by the following agencies:
`ARPA Grant No. DABT63-95-C-O127;
`ONR. Grant No. NOOO14-93-l-0465; and
`NSF. Grant Nos.: CCR-9303030 and MIP-9505853.
`The United States has certain rights in this invention.
`
`FIELD OF THE INVENTION
`
`The invention relates generally to architectures of elec
`tronic computers and speci?cally to electronic computers for
`parallel processing.
`
`BACKGROUND OF THE INVENTION
`
`5.781.752
`
`2
`instructions that use data on earlier instructions that change
`the data. These latter instructions may correctly execute only
`if the earlier instructions using the same data do not change
`the common data or have completed the change of the
`common data. A dependency is “unambiguous” if it neces
`sarily produces an error when the dependent instruction is
`executed before the instruction on which it is dependent. The
`dependence of an instruction may remain ambiguous until
`both instructions involved are executed. A dependency
`between two instructions is ambiguous if it cannot be
`determined whether a dependency really exists without
`executing the instructions.
`Usually. instructions that are data dependent must be
`executed in the program order.
`
`Control and Data Speculation
`
`25
`
`Since dependencies are often ambiguous. as the ILP
`processing unit prepares to execute an instruction. it cannot
`always determine if the instruction will in fact be dependent
`20
`‘on earlier instructions that have not yet completed their
`execution. In the case of ambiguous dependencies and
`unless special circuitry is provided as will be explained in
`the next paragraph. the ILP processing unit is forced to
`assume dependencies exist.
`However. it is quite often the case that an ambiguous
`dependency is resolved as no dependency at all. For this
`reason. some ILP processors may provide for “speculation”.
`that is. execution of an instruction that has ambiguous
`dependency as if it had no dependency at all. One may
`speculate on control dependencies and on data dependen
`cies. Control speculation. for example. might involve
`executing an instruction that follows a branch instruction
`without knowing the outcome of the branch (and thus
`whether the following instruction should have been executed
`or was branched around). Data speculation. for example.
`might involve reading from memory to obtain data for a later
`instruction. even though there are earlier STORES to that
`memory location that have not yet been completed and that
`may change the value of the memory location.
`
`General Computer Architecture and Instruction
`Level Parallel (ILP) Processing
`In an electronic computer with a single processing unit.
`the processing unit may communicate with a memory hold
`ing the data and program instructions. The processing unit
`may also hold data in internal registers. The program
`instructions are executed by the processing unit which
`operates on the data according to the instructions.
`Typically. the processing unit repeats a series of fetch/
`execute cycles in which each instruction is ?rst fetched from
`memory and then executed. The order in which the instruc
`tions are executed is determined by the value of a program
`counter within the processing unit. After each execution of
`an instruction. the program counter normally increases in
`value by one so that the next instruction memory is fetched.
`The order of the instructions in the stored program will be
`termed “memory order”.
`Some instructions. when executed. cause data to be
`loaded into the processing unit from memory or stored from
`the processing unit to memory. Other instructions may
`perform their operation on data that are stored in registers
`without loading or storing data from or to memory. Still
`other instructions change the value of the program counter
`permitting the processing unit to jump or branch through the
`program instructions according to a “program order” that
`normally dilfers from the memory order. The branch or
`jumps in a program may be conditional on values of data
`used by the program that may change as the program is
`executed.
`One method of increasing the speed of electronic com
`puters involves using multiple processing and/or functional
`units to execute multiple instructions at the same time or in
`an “execution order” differing from the program order.
`Computers using this technique are termed “parallel pro
`cessing units”. An instruction level parallel (“ILP processing
`unit”) is one where individual instructions of a single
`program are separated to be run on different processing units
`in contrast to systems in which independent programs may
`be assigned different processing units. for example.
`
`35
`
`45
`
`55
`
`Squashing
`Control and data dependencies are important in an ILP
`processing unit which. in the course of execution of
`instructions. may execute some dependent instructions
`before the instructions on which they are dependent. If the
`dependency is unambiguous. then the results of the prema
`turely executed dependent instructions must be discarded
`(“squashed”) and the instructions of the correct branch
`executed.
`Squashing instructions is a time consuming process that to
`some extent defeats the advantages to be gained from
`parallel processing. In order to avoid the problems associ
`ated with unambiguous dependencies. it is necessary that
`when the ILP processing unit speculates. that it is mostly
`correct.
`
`Speculation in an ILP Processor
`
`Control and Data Dependencies
`There are two types of dependencies exhibited by instruc
`tions. “Control dependency" is a dependency of instructions
`after a conditional branch or jump instruction on whether the
`branch or jump was taken. For example. instructions imme
`diately after a branch are properly executed only if the
`branch is not taken. “Data dependency” is a dependency of
`
`65
`
`In an ILP processor. the processor may fetch multiple
`instructions at a single time and an allocation circuit allo
`cates those instructions to separate processing units. The
`separate processing units may read data from memory and
`perform arithmetic or logical operations on that data. A data
`speculation circuit may exist to detect data dependencies and
`report any mis-speculation to a retirement circuit. The
`retirement circuit collects results generated by the indepen
`
`Valtrus Ex 2007-p. 9
`Google v Valtrus
`IPR2022-01197
`
`
`
`5,781,752
`
`3
`dent processing units and “retires” the instructions executed
`by those processing units by writing ?nal results to memory.
`The retirement circuitry also resolves rnis-speculation
`detected by the data speculation circuit. that is. instructions
`that were executed out of program order but were in fact
`dependent on an earlier instruction in the execution order
`and have produced an erroneous result. These instructions
`are squashed as is understood in the art. Data speculation
`circuitry currently does not decide when to do data specu
`lation. Either all memory accesses are speculated or none at
`all.
`For either type of speculation to be successful (control or
`data speculation). the performance cost associated with the
`speculation must be low. The performance cost is a function
`of the frequency that speculation is required. the time
`required to perform the speculation. the probability of mis
`speculation and the time required to recover from a mis
`speculation. A cumbersome or inaccurate speculation system
`may hurt overall system performance. especially when many
`dependencies must be evaluated.
`Because the ]LP processing is relevant primarily for high
`speed processing units. the speculation process must be
`implemented in circuitry rather than software. The process
`of speculation must be capable of rapid execution using
`limited amounts of high speed memory.
`Control speculation is well understood in the art. The
`points were control speculation is needed are clearly iden
`ti?ed by conditional branch and jump instructions (we will
`refer to these instructions as “control transfer instructions”).
`Typically all control transfer instructions are speculated
`since: ( 1) control transfer instructions are relatively
`frequent. (2) relatively few instructions from those that
`appear between two consecutive control transfer instructions
`can be executed in parallel. (3) typically the performance of
`a processor that always mis_speculates on control is virtually
`the same as the performance of a processor that never
`speculates on control.
`In contrast the points where data speculation is needed are
`not clear since any instruction loading data from memory
`can be data dependent on any previous instructions that
`writes to memory. Consequently . predicting and tracking
`data dependencies. “data dependence speculation” can eas
`ily become overwhelming. Furthermore. the cost of a data
`this-speculation typically cannot be neglected. that is the
`performance of a processor that always mis-speculates on
`data dependencies is far less than that of a processor that
`never speculates on data.
`
`50
`
`55
`
`BRIEF SUMMARY OF THE INVENTION
`The present inventors have recognized that most data
`dependence this-speculations can be attributed to a few
`static STORE/LOAD instruction pairs. Furthermore. these
`instruction pairs exhibit “temporal locality” that is. if one
`LOAD/STORE pair causes a data mis-speculation at a given
`point in time. it is highly likely that a later instance of the
`same pair will soon cause another mis-speculation. For this
`reason. a table based approach may be used in which data
`dependent instructions likely to be a source of mis
`speculation. are stored in a small. high speed memory. These
`particular instruction pairs can be identi?ed based on pre
`vious mis-speculations.
`Very low overhead in a table based data dependence
`speculation is obtained by a three-tiered approach. If there is
`no history of data mis-speculation. an instruction is executed
`without further inquiry. This will be the case for most data
`independent instructions. If there has been a mis-speculation
`
`20
`
`25
`
`30
`
`35
`
`4
`with a given LOAD instruction. a predictor based on the past
`history of mis-speculations for that LOAD instruction is
`employed to determine whether the instruction should be
`executed or delayed Thus. instructions that are typically not
`dependent may be executed immediately. If the instruction
`is to be delayed. a synchronization table is used to determine
`when the instruction is to be performed.
`Speci?cally. the present invention provides a speculation
`decision circuit for use in a processor capable of executing
`program instructions in an execution order ditfering from the
`program order of the instructions. the processing unit further
`having a data speculation circuit for detecting data depen
`dence between instructions and detecting a mis-speculation
`where a data consuming instruction dependent for its data on
`a data producing instruction of earlier program order is. in
`fact. executed before the data consuming instruction. The
`speculation decision circuit includes a predictor circuit
`receiving the mis-speculation signal from the data specula
`tion circuit to produce a prediction associated with the
`particular data producing/consuming instruction pair and
`based on the mis-speculation indication. A prediction thresh
`old detector prevents data speculation for instructions hav
`ing a prediction value within a predetermined range. This
`prediction threshold detector may include an instruction
`synchronizing circuit instructing a processing unit to delay
`a later execution of the particular data consuming instruction
`until after the execution of the particular data producing
`instruction when the prediction associated with the data
`producing/consuming instruction pair is within a predeter
`mined range.
`Thus. it is one object of the invention to provide a
`predictor circuit that may identify data dependencies on an
`on-going or dynamic basis. Recognizing that there are
`relatively few instructions which will cause data mis
`speculations. these instructions are identi?ed by reference to
`historical mis- speculation associated with the instructions as
`stored in a prediction.
`The instruction synchronization circuit may include a
`prediction table listing certain data consuming instructions
`and certain data producing instructions each associated with
`a prediction. The instruction synchronization circuit will
`delay the particular data consuming instruction only when
`the prediction associated with the data consuming instruc
`tion is within a predetermined range of predictions and when
`the particular data consuming instruction is in the prediction
`table.
`Thus. it is another object of the invention to provide a
`prediction of data dependence which adds very little over
`head to the execution of instructions that historically have
`not resulted in mis-speculation. If the particular data con
`suming instruction is not in the prediction table. no further
`inquiry into the prediction table is required.
`The instruction synchronization circuit may also include
`a synchronization table associating certain data consuming
`instructions and certain data producing instructions. each
`with a ?ag indicating whether the respective data producing
`instruction has been executed. The instruction synchroniza
`tion circuit delays the subsequent instances of the certain
`data consuming instruction only when the prediction asso
`ciated with the data consuming instruction is within a
`predetermined range and when the particular data consum
`ing instmction is in the prediction table and when the ?ag
`indicates that particular data producing instruction has not
`been executed.
`Thus. it is another object of the invention to provide
`reduced instruction overhead even for instructions that have
`
`Valtrus Ex 2007-p. 10
`Google v Valtrus
`IPR2022-01197
`
`
`
`5.781.752
`
`6
`FIG. 8 is a ?gure similar to that of FIGS. 5 and 6 showing
`the prediction and synchronization tables at yet a later point;
`FIG. 9 is a ?ow chart showing the steps performed by the
`predictor circuit of FIG. 1 upon receiving a HANDLE
`MIS-SPECULA'IION message from the data speculation
`circuit;
`FIG. 10 is a ?ow chart showing the steps performed by the
`predictor circuit of FIG. 1 upon receiving a HANDLE
`LOAD message from the data speculation circuit; and
`FIG. 11 is a ?ow chart showing the steps performed by the
`predictor circuit of FIG. 1 upon receiving a HANDLE
`SQUASH message from the data speculation circuit.
`
`DETAILED DESCRIPTION OF THE
`INVENTION
`An Example Program with Data Dependency Referring
`now to the following Table I. an example object code
`program may have three instructions: 11. I2 and I3.
`
`TABLE I
`
`5
`a history of this-speculation when it is unlikely that. in the
`given instance. nus-speculation will occur. If the prediction
`indicates that this-speculation is unlikely. no further syn
`chronization steps need be taken.
`It is yet another object of the invention to allow data
`consuming instructions that are historically dependent on
`preceding data producing instructions to be executed rapidly
`if the preceding data producing instruction has already been
`executed. Thus. if the ?ag indicates that the data producing
`instruction has been executed. the data consuming instruc
`tion may be immediately executed without further waiting.
`The predictor circuit may create an entry in this synchro
`nization table only after mis-speculation has occurred for a
`previous instance of the particular data consuming instruc
`tion and particular data producing instruction of the entry.
`After synchronization has occurred. this entry may be
`removed.
`Thus. it is another object of the invention to provide a
`predictor circuit that minimizes the need for storage. Syn
`chronizing table entries. which may be more numerous than
`the prediction table entries as a result of possible multiple
`instances of each instruction. have entries in the synchroni
`zation table that are dynamically created and released to
`minimize storage requirements. This also minimizes search
`overhead in identifying instructions within this table.
`The foregoing and other objects and advantages of the
`invention will appear from the following description. In this
`description. references made to the accompanying drawings
`which form a part hereof and in which there is shown by way
`of illustration a preferred embodiment of the invention. Such
`embodiment does not necessarily represent the full scope of
`the invention. however. and reference must be made there
`fore to the claims for interpreting the scope of the invention.
`
`20
`
`25
`
`30
`
`These instructions provide for a storage of data from a
`processor to a memory location and the loading of data from
`the memory location to the processor.
`The ?rst instruction 11 stores data from a processor
`register (implied in the instruction) to memory at a speci?c
`address determined by the contents of register R1 of the
`processing unit. Generally. the contents of register R1 will
`have been set by an earlier instruction and cannot be
`deduced simply by looking at instruction I1.
`The following instructions I2 and 13 each load data from
`a speci?c physical address of memory to the processor
`register. Again. the registers R2 and R3 have been set by
`earlier instructions and the relevant addresses cannot be
`determined from these instructions alone.
`If a parallel processor were to load the three instructions
`11-13 for parallel processing. there would be two possible
`data dependencies. I2 may be dependent on 11 insofar that
`the contents of register R1 and R2 may be the same and
`therefore the information stored in M(R1) may be the same
`that is loaded from memory location M(R2). For instruction
`I2 to be executed correctly. it may be necessary that instruc
`tion 12 wait for instruction 11. Likewise. instruction I3 may
`be dependent on instruction I1. That is. the address M(R3)
`may be the same as the physical address M(Rl).
`Each of these dependencies is ambiguous. That is. it
`cannot be determined whether there is in fact a dependency
`without knowing the contents of registers R1. R2 and R3
`which cannot be deduced from the instructions alone.
`In a parallel processor that seeks to load blocks of
`instructions and execute them in parallel. possibly out of
`their memory or program order. it is important to identify
`and address ambiguous data dependencies to eliminate mis
`speculation and time consuming instruction squashing.
`
`General Processor Architecture
`Referring now to FIG. 1. an ILP processor 10 suitable for
`use with the present invention includes a memory 12 having
`a portion 14 holding a stored program 16 at a plurality of
`physical addresses 19 here depicted as xxl-xx6 where the
`values xx indicate some higher ordered address bits that may
`be ignored in this example.
`
`35
`
`45
`
`55
`
`BRIEF DESCRIPTION OF THE SEVERAL
`VIEWS OF THE DRAWINGS
`FIG. 1 is a block diagram showing the architecture of an
`instruction level parallel processor having multiple process
`ing units. an allocation unit allocating instructions of a
`program in memory to the processing units. and a retirement
`unit retiring the parallel processed instructions. the latter
`unit incorporating the predictor circuit of the present inven
`tion;
`FIG. 2 is a fragmentary listing of a source code program
`and its corresponding object code instructions stored in
`memory order at physical addresses in memory and as
`unwound in program order producing multiple instances of
`each object code instruction in execution;
`FIG. 3 is a ?ow chart of the operation of a typical data
`speculation circuit of the processor of FIG. 1 as modi?ed to
`work with the predictor circuit of the present invention to
`provide READY TO LOAD. HANDLE STORE. HANDLE
`MIS-SPECULATION. HANDLE LOAD. and HANDLE
`SQUASH messages to the predictor circuit of the present
`invention;
`FIG. 4 is a flow chart showing the steps performed by the
`predictor circuit of FIG. 1 upon receiving a READY TO
`LOAD message from the data speculation circuit;
`FIG. 5 is fragmentary view of prediction and synchroni
`zation tables used by the predictor circuit of FIG. 1;
`FIG. 6 is a ?gure similar to that of FIG. 5 showing the
`prediction and synchronization tables at a later point in the
`operation of the predictor circuit;
`FIG. 7 is a ?ow chart showing the steps performed by the
`65
`predictor circuit of FIG. 1 upon receiving a HANDLE
`STORE message from the data speculation circuit;
`
`Valtrus Ex 2007-p. 11
`Google v Valtrus
`IPR2022-01197
`
`
`
`5.781.752
`
`8
`dictor circuit 33 provides a dynamic indication to the data
`speculation circuit 30 as to whether data speculation should
`be performed. The data speculation circuit 30 may then.
`based on the indication of the predictor circuit 33 stall the
`execution of a memory operation at the processing units 24
`in order to avoid mis-speculation.
`The prediction provided by the predictor circuit 33. as will
`be described. is updated based on historical this-speculations
`detected by the data speculation circuit 30. For this reason.
`the data speculation circuit 30 must communicate with the
`predictor circuit 33 on an ongoing basis. Generally. as will
`be described below. the data speculation circuit 30 provides
`?ve signals to the predictor circuit 33 as listed in the
`following Table 11.
`
`Signal Name
`
`TABLE II
`
`Description
`
`HANDLE STORE
`
`READY TO LOAD
`
`HANDLE M'lS-SPECLLA'I'ION The data speculation circuit has
`detected a nus-speculation.
`The data speculation circuit has
`decided to issue a STORE operation.
`The data speculation circuit is
`about to perform a speculative LOAD
`operation and needs information
`from the predictor as to whether
`the LOAD should wait.
`The data speculation circuit has
`decided to perform a non
`speculative LOAD operation without
`data dependency.
`The data speculation circuit has
`issued a squash for a particular
`instruction either as the result of
`data or control this-speculation.
`
`HANDLE LOAD
`
`HANDLE SQUASH
`
`10
`
`15
`
`20
`
`25
`
`7
`The data speculation circuit 30 receives signals from the
`allocation circuit 20 that notify it of the program order of any
`instructions that are allocated to the processing units 24 and
`that will access memory. The data speculation circuit is
`responsible of keeping track of order of the memory opera
`tions as they are performed by the processing units so that
`it can detect any mis-speculations.
`The ILP processor 10 includes an allocation circuit 20
`which may read memory 12 and in particular program
`memory portion 14 to fetch a subset of the program 16
`encompassing multiple instructions of an instruction win
`dow 22. Generally. as is understood in the art. the allocation
`circuit 20 sends dilferent ones of these instructions to
`different independent processing units 24 for execution.
`The processing units 24 may communicate with memory
`12 for the purpose of obtaining data. but generally do not
`modify portions of the memory 12 that may be also read by
`other processing units 24. Thus. for example. an instruction
`which requires data to be obtained from memory 12 and
`added to a constant may be completely executed. However.
`an instruction. which stores a register to memory 12. may
`read the register but stop before the store operation which
`may modify memory 12 used by other processing units 24.
`An instruction that has been executed as far as possible is
`considered “ready to commit the operation”. Prior to reading
`data from memory or requesting a store operation the
`processing units 24 notify the data speculation unit 30 of the
`operation so that the latter can. in conjunction with the
`allocation unit. keep track of the program and execution
`order of the operations.
`A retirement circuit 26 receives signals from the process
`ing units 24 indicating that their instructions are ready to
`perform the operation. The retirement circuit 26 then retires
`the instructions by writing any computed values to memory
`12.
`Prior to the retirement circuit 26 writing values to
`memory. a data speculation circuit 30 communicating with
`the allocation circuit 20 and the retirement circuit 26 detects
`mis-speculation. As described above. mis-speculation
`occurs in an instruction that has been executed prematurely
`and erroneously. Whenever a store instruction is ready to
`commit and write its data to a memory address. the data
`speculation circuit 30. checks to see if any subsequent in the
`instruction window load instructions have accessed the same
`memory address. and if so instructs the allocation circuit 20
`and retirement circuit 26 that these load instructions are to
`be squashed and ire-allocated by the allocation circuit 20 at
`a later time. Thus. for example. in the program Table L if
`instructions I1 through I3 represent the instruction window
`22 and instruction B has accessed memory prior to II
`writing to memory. the data speculation circuit 30 and at the
`time I1 is ready to commit checks if I3 has accessed the same
`memory address as 11. and if so it instructions the allocation
`circuit 20 and the retirement circuit 26 that B is to be
`squashed and reallocated at a later time. If so. the data
`speculation circuit 30 instructs the allocation circuit and
`retirement circuit 26 that instruction I3 is to be squashed and
`must be reallocated by the allocation circuit 20 at a later
`time. The writing of results from instructions that have not
`been squashed is then done by the retirement circuit 26 as
`indicated by arrow 28.
`The retirement circuit 26. the data speculation circuit 30.
`and the allocation circuit 20 communicate by a number of
`conn'ol signals 35. Each of these elements described above
`are well lmown in the art.
`The data speculation circuit 30 also communicates with
`the predictor circuit 33 of the present invention. The pre
`
`35
`
`45
`
`55
`
`65
`
`Generally. prior to each instruction being retired. the
`instruction is provided to the data speculation circuit 30
`which detects nus-speculations. The retirement circuit 26
`also instructs the data speculation circuit 30 when a squash
`instruction i