throbber
Umted States Patent [191
`Moshovos et al.
`
`IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
`
`[11] Patent Number:
`[45] Date of Patent:
`
`5,781,752
`Jul. 14, 1998
`
`[54] TABLE BASED DATA SPECULATION
`CIRCUIT FOR PARALLEL PROCESSING
`COMPUTER
`
`5,666,506
`
`9/1997 Hesson et a1. ........................ .. 395/392
`
`OTHER PUBLICATIONS
`
`[75] Inventors; Andreas L Moshovos; Scott E
`Breach2 Terani N_ Vijaykumar;
`Gurindar S. Sohi. all of Madison. Wis.
`
`Gurinda Sohi et al.. Instruction Issue Logic for High-Per
`formance Interruptable Pipelined Processors; ACM 1987.
`PP- 27-34
`
`[73] Assignee: Wisconsin Alumni Research
`Foundation. Madison. Wis.
`
`Primary Examiner-Krisna Lim
`Attorney; Agent, or Firm-Quarles & Brady
`
`[21] Appl. No.: 773,992
`-
`_
`Dec‘ 26’ 19%
`[22] Fllcd'
`[51] Int. Cl.6 ...................................................... .. G06F 9/38
`395/392
`............................................. .. 395/392
`
`
`
`[52] US. Cl. [5 8] Field of Search
`
`[56]
`
`Refewnces Cited
`
`US. PATENT DOCUMENTS
`
`ABSTRACT
`[57]
`A predictor circuit permits advanced execution of instruc
`tions depending for their data on previous instructions by
`predicting such dependencies based on previous mis
`speculations detected at the ?nal stages of processing. Syn
`chronilatio? of dependent instructions is Provided by a table
`creating entries for each instance of potential dependency.
`Table entries are created and deleted dynamically to limit
`total memory requirements.
`
`5,664,138
`
`9/1997 Yoshida ................................. .. 395/395
`
`9 Claims, 7 Drawing Sheets
`
`4O
`
`INSTRUCTION FROM
`RETIREMENT CIRCUIT
`
`66
`
`WILL
`THIS BE A
`DATA SPECULATIVE
`LOAD
`
`HANDLE f 70
`READY To LDAD
`
`72
`No
`
`YES f 80
`WAIT UNTIL ANY OF THE
`FOLLOWING EVENTS OCCURS:
`@wAKE UP
`CONSUMER NO LONGER
`DATA SPECULATIVE
`@ sDuAsHED
`
`57
`\ HANDLE
`MIS-SPECULATION
`64
`\ HslilégLEE
`
`6D
`
`‘
`:5
`THERE A
`SUBSEQUENT LDAD
`THAT HAS NoT BEEN
`SQUASHED
`
`52
`\ SQUASH
`LDAD
`
`I
`74\ ISSUE MD
`REQUEST
`I
`76 ‘L
`WAIT UNTIL ANY
`OF THE FOLLOWING
`EvENTs OCCURS:
`@souAsHED
`@LDAD No
`LONGER DATA
`SPECULATIVE
`
`4e
`f
`HANDLE
`SQUASH
`I
`
`I
`
`NO
`
`6B
`\ HANDLE
`L0AD
`
`I
`86
`\ ISSUE
`LOAD
`REQUEST
`
`r
`
`v
`
`Valtrus Ex 2007-p. 1
`Google v Valtrus
`IPR2022-01197
`
`

`

`Valtrus Ex 2007-p. 2
`Google v Valtrus
`IPR2022-01197
`
`

`

`U.S. Patent
`
`Jul. 14, 1998
`
`Sheet 2 0f 7
`
`5,781,752
`
`40
`
`INSTRUCTION FROM
`RETIREMENT CIRCUIT
`
`No
`
`A
`
`WM
`THIS BE A
`
`66
`
`DATA SPECULATIVE
`LOAD
`
`IS THIS
`A sTORE OR A
`LOAD
`
`HANDLE
`READY TO LOAD
`
`/70
`
`72
`NO
`
`YES [80
`wAlT UNTIL ANY OF THE
`FOLLOWING EvENTs OCCURS:
`@WAKE UP
`@cONsuMER NO LONGER
`DATA SPECULATIVE
`@sQuAsHED
`
`——-->
`
`57 \
`
`HANDLE
`MlS-SPECULATION
`
`sOuAsHED
`?
`
`HANDLE
`STORE
`
`THERE A
`SUBSEQUENT LOAD
`THAT HAS NOT BEEN
`SQUAEHED
`
`YES
`62
`\ SQUASH
`
`v
`74
`\ ISSUE LOAD
`REOuEsT
`76 T
`T
`WAIT UNTIL ANY
`OF THE FOLLOWING
`EVENTS OCCURS:
`@SQUASHED
`@LDAD NO
`LONGER DATA
`SPECULATIVE
`
`REQUEST
`
`-’
`

`
`f
`HANDLE
`SQUASH
`
`Valtrus Ex 2007-p. 3
`Google v Valtrus
`IPR2022-01197
`
`

`

`US. Patent
`
`Jul. 14, 1998
`
`Sheet 3 0f 7
`
`5,781,752
`
`HANDLE READY TO LOAD A70
`
`NO
`
`LOAD IN
`THE PRED. TABLE
`?
`
`FIG. 4
`
`SYNCHRONIZATION
`REQUIRED
`?
`
`106
`
`SYNC.
`TABLE ENTRY
`EXISTS FOR THE
`PARTICULAR LOAD / STORE
`PAIR AND W /THE GIVEN
`DATA ADDRESS
`?
`
`THE SYNC.
`FLAG SET TO 1
`7
`
`[120
`
`YES
`UPDATE THE
`PREDICTION
`TOWARDS "DO NOT
`SYNCHRONIZE"
`122
`I
`FREE SYNC. TABLE ENTRY /
`
`flog
`SELECT A SYNC. TABLE ENTRY
`
`[110
`I
`—FILL IN STORE & LOAD
`INSTRUCTION ADDRESSES
`
`—FILL IN THE DATA ADDRESS
`ACCESSED BY THE CONSUMER
`—FILL IN LDAD ID
`—SET SYNC. FLAG TO D
`
`I
`WA|T_1 /-116
`
`'
`
`WAIT=O / 102
`I
`
`DONE
`
`Valtrus Ex 2007-p. 4
`Google v Valtrus
`IPR2022-01197
`
`

`

`US. Patent
`
`Jul. 14, 1998
`
`Sheet 4 0f 7
`
`5,781,752
`
`PREDICTION TABLE
`LD 8
`ST 10
`1
`
`SYNCHRONIZATION TABLE
`
`AC109
`FIG. 5
`
`V
`
`\
`
`\
`
`/
`
`/
`
`SYNCHRONIZATION TABLE
`PREDICTION TABLE
`LD s[xx1 ST 10
`0‘
`xx‘
`u) 8
`ST 10
`1
`.f
`ii» ___,__
`1
`\
`_
`109
`\112
`\114
`
`\F
`
`FIG. 6
`
`PREDLCTION TABLE
`LD
`LD 8
`ST 10
`1
`.f
`A<-- “if
`\
`109
`
`SYNCHRONIZATION TABLE
`ST [XX]
`1+
`\
`\
`\209
`‘112
`
`XX
`it»
`210/
`
`Valtrus Ex 2007-p. 5
`Google v Valtrus
`IPR2022-01197
`
`

`

`U.S. Patent
`
`Jul. 14, 1998
`
`Sheet 5 of 7
`
`5,781,752
`
`64
`\ HANDLE STORE
`
`201
`
`IS
`STORE
`IN THE PREDICTION
`TABLE
`7
`
`SYNCHRONIZATION
`REQUIRED
`?
`
`204
`
`SYNC.
`TABLE ENTRY
`EXISTS FOR THE
`LOAD / STORE PAIR AND
`W / THE SAME
`DATA ADDR.
`
`212
`
`THE SYNC.
`FLAG OF THE
`ENTRY 1
`7
`
`21
`NO
`4\
`UPDATE PRED|CT|ON
`TOWARDS "SYNCHRONIZE"
`I
`
`21s\ WAKE UP LOAD
`21s\
`I
`FREE SYNC. TABLE ENTRY
`
`DONE
`
`FIG. 7
`
`NO
`
`206
`f
`v
`SELECT A SYNC. TABLE ENTRY
`
`208
`I
`y
`—FILL IN STORE & LOAD
`INSTRUCTION ADDRESSES
`-FILL IN THE DATA ADDRESS
`ACCESSED BY THE STORE
`—SET THE SYNC. FLAG TO 1
`—FILL IN STORE ID
`
`Valtrus Ex 2007-p. 6
`Google v Valtrus
`IPR2022-01197
`
`

`

`US. Patent
`
`Jul. 14, 1998
`
`Sheet 6 of 7
`
`5,781,752
`
`HANDLE MlS-SPECULATION ~55
`
`301
`
`FIG. 9
`
`YES
`
`A PRED. TABLE ENTRY
`FOR THIS LOAD /STORE
`PAIR
`
`_
`REPLACE-1
`
`30s
`/_I
`UPDATE ENTRY’S PREDICTOR
`TOWARDS "DO NoT SYNCHRONIZE"
`
`YES NO
`
`306
`
`'3
`THERE A
`PRED. TABLE
`ENTRY FOR THE
`LOAD
`312
`?
`/
`No
`REPLACE=0
`
`309
`
`IS THE
`PREDICTOR
`BELOW THE REPLACE
`LIMIT
`3’
`
`K310
`FREE PRED. TABLE ENTRY
`REPLACE=1
`
`IS
`THERE A
`PRED. TABLE
`ENTRY FOR THE
`STORE
`7'
`
`314
`
`YES
`
`316
`/
`UPDATE ENTRY’S PREDICTOR
`TowARDs "DO NoT SYNCHRONIZE"
`
`r’
`Y
`UPDATE DATE
`PREDTCToR
`TOWARDS
`"SYNCHRONIZE"
`
`NO
`
`NO
`
`322
`
`REPLACE=0
`
`[8 THE PREDICTOR
`
`BELOW THE REPLACE
`LIMIT
`’
`
`YES
`
`/, 320
`
`24
`
`YES
`
`FREE PRED. TABLE ENTRY
`REPLACE=1
`
`[326
`ALLocATE A PREDICTION TABLE ENTRY I
`J,
`/328
`-FILL IN THE ADDRESSES OF THE LoAD AND sTDRE
`-SET PREDICTOR TO THE DEFAULT VALUE
`
`REPLACE
`:
`2
`N0
`
`DONE
`
`Valtrus Ex 2007-p. 7
`Google v Valtrus
`IPR2022-01197
`
`

`

`US. Patent
`
`Jul. 14, 1998
`
`Sheet 7 of 7
`
`5,781,752
`
`HANDLE LOAD / 68
`
`402
`
`is
`THERE A
`SYNC. TABLE ENTRY
`FOR THIS LOAD
`(LOAD ID)
`7
`
`NO
`
`FIG. 10
`
`YES
`
`404
`FREE THAT ENTRY /
`
`406
`
`IS
`THERE A
`PREDICTION TABLE
`ENTRY FOR THIS
`LOAD
`?
`YES I 408
`UPDATE PREDICTOR
`TOWARDS "DO NOT
`SYNCHRONIZE"
`
`——»1
`
`DONE
`
`HANDLE SQUASH
`
`/46
`
`FIG. 11
`
`502
`
`IS
`THERE A
`SYNC. TABLE ENTRY
`FOR THIS INST.
`7
`
`504 /
`FREE SYNC. TABLE ENTRY
`
`—__’I
`
`DONE
`
`Valtrus Ex 2007-p. 8
`Google v Valtrus
`IPR2022-01197
`
`

`

`1
`TABLE BASED DATA SPECULATION
`CIRCUIT FOR PARALLEL PROCESSING
`COMPUTER
`
`This invention was made with United States government
`support awarded by the following agencies:
`ARPA Grant No. DABT63-95-C-O127;
`ONR. Grant No. NOOO14-93-l-0465; and
`NSF. Grant Nos.: CCR-9303030 and MIP-9505853.
`The United States has certain rights in this invention.
`
`FIELD OF THE INVENTION
`
`The invention relates generally to architectures of elec
`tronic computers and speci?cally to electronic computers for
`parallel processing.
`
`BACKGROUND OF THE INVENTION
`
`5.781.752
`
`2
`instructions that use data on earlier instructions that change
`the data. These latter instructions may correctly execute only
`if the earlier instructions using the same data do not change
`the common data or have completed the change of the
`common data. A dependency is “unambiguous” if it neces
`sarily produces an error when the dependent instruction is
`executed before the instruction on which it is dependent. The
`dependence of an instruction may remain ambiguous until
`both instructions involved are executed. A dependency
`between two instructions is ambiguous if it cannot be
`determined whether a dependency really exists without
`executing the instructions.
`Usually. instructions that are data dependent must be
`executed in the program order.
`
`Control and Data Speculation
`
`25
`
`Since dependencies are often ambiguous. as the ILP
`processing unit prepares to execute an instruction. it cannot
`always determine if the instruction will in fact be dependent
`20
`‘on earlier instructions that have not yet completed their
`execution. In the case of ambiguous dependencies and
`unless special circuitry is provided as will be explained in
`the next paragraph. the ILP processing unit is forced to
`assume dependencies exist.
`However. it is quite often the case that an ambiguous
`dependency is resolved as no dependency at all. For this
`reason. some ILP processors may provide for “speculation”.
`that is. execution of an instruction that has ambiguous
`dependency as if it had no dependency at all. One may
`speculate on control dependencies and on data dependen
`cies. Control speculation. for example. might involve
`executing an instruction that follows a branch instruction
`without knowing the outcome of the branch (and thus
`whether the following instruction should have been executed
`or was branched around). Data speculation. for example.
`might involve reading from memory to obtain data for a later
`instruction. even though there are earlier STORES to that
`memory location that have not yet been completed and that
`may change the value of the memory location.
`
`General Computer Architecture and Instruction
`Level Parallel (ILP) Processing
`In an electronic computer with a single processing unit.
`the processing unit may communicate with a memory hold
`ing the data and program instructions. The processing unit
`may also hold data in internal registers. The program
`instructions are executed by the processing unit which
`operates on the data according to the instructions.
`Typically. the processing unit repeats a series of fetch/
`execute cycles in which each instruction is ?rst fetched from
`memory and then executed. The order in which the instruc
`tions are executed is determined by the value of a program
`counter within the processing unit. After each execution of
`an instruction. the program counter normally increases in
`value by one so that the next instruction memory is fetched.
`The order of the instructions in the stored program will be
`termed “memory order”.
`Some instructions. when executed. cause data to be
`loaded into the processing unit from memory or stored from
`the processing unit to memory. Other instructions may
`perform their operation on data that are stored in registers
`without loading or storing data from or to memory. Still
`other instructions change the value of the program counter
`permitting the processing unit to jump or branch through the
`program instructions according to a “program order” that
`normally dilfers from the memory order. The branch or
`jumps in a program may be conditional on values of data
`used by the program that may change as the program is
`executed.
`One method of increasing the speed of electronic com
`puters involves using multiple processing and/or functional
`units to execute multiple instructions at the same time or in
`an “execution order” differing from the program order.
`Computers using this technique are termed “parallel pro
`cessing units”. An instruction level parallel (“ILP processing
`unit”) is one where individual instructions of a single
`program are separated to be run on different processing units
`in contrast to systems in which independent programs may
`be assigned different processing units. for example.
`
`35
`
`45
`
`55
`
`Squashing
`Control and data dependencies are important in an ILP
`processing unit which. in the course of execution of
`instructions. may execute some dependent instructions
`before the instructions on which they are dependent. If the
`dependency is unambiguous. then the results of the prema
`turely executed dependent instructions must be discarded
`(“squashed”) and the instructions of the correct branch
`executed.
`Squashing instructions is a time consuming process that to
`some extent defeats the advantages to be gained from
`parallel processing. In order to avoid the problems associ
`ated with unambiguous dependencies. it is necessary that
`when the ILP processing unit speculates. that it is mostly
`correct.
`
`Speculation in an ILP Processor
`
`Control and Data Dependencies
`There are two types of dependencies exhibited by instruc
`tions. “Control dependency" is a dependency of instructions
`after a conditional branch or jump instruction on whether the
`branch or jump was taken. For example. instructions imme
`diately after a branch are properly executed only if the
`branch is not taken. “Data dependency” is a dependency of
`
`65
`
`In an ILP processor. the processor may fetch multiple
`instructions at a single time and an allocation circuit allo
`cates those instructions to separate processing units. The
`separate processing units may read data from memory and
`perform arithmetic or logical operations on that data. A data
`speculation circuit may exist to detect data dependencies and
`report any mis-speculation to a retirement circuit. The
`retirement circuit collects results generated by the indepen
`
`Valtrus Ex 2007-p. 9
`Google v Valtrus
`IPR2022-01197
`
`

`

`5,781,752
`
`3
`dent processing units and “retires” the instructions executed
`by those processing units by writing ?nal results to memory.
`The retirement circuitry also resolves rnis-speculation
`detected by the data speculation circuit. that is. instructions
`that were executed out of program order but were in fact
`dependent on an earlier instruction in the execution order
`and have produced an erroneous result. These instructions
`are squashed as is understood in the art. Data speculation
`circuitry currently does not decide when to do data specu
`lation. Either all memory accesses are speculated or none at
`all.
`For either type of speculation to be successful (control or
`data speculation). the performance cost associated with the
`speculation must be low. The performance cost is a function
`of the frequency that speculation is required. the time
`required to perform the speculation. the probability of mis
`speculation and the time required to recover from a mis
`speculation. A cumbersome or inaccurate speculation system
`may hurt overall system performance. especially when many
`dependencies must be evaluated.
`Because the ]LP processing is relevant primarily for high
`speed processing units. the speculation process must be
`implemented in circuitry rather than software. The process
`of speculation must be capable of rapid execution using
`limited amounts of high speed memory.
`Control speculation is well understood in the art. The
`points were control speculation is needed are clearly iden
`ti?ed by conditional branch and jump instructions (we will
`refer to these instructions as “control transfer instructions”).
`Typically all control transfer instructions are speculated
`since: ( 1) control transfer instructions are relatively
`frequent. (2) relatively few instructions from those that
`appear between two consecutive control transfer instructions
`can be executed in parallel. (3) typically the performance of
`a processor that always mis_speculates on control is virtually
`the same as the performance of a processor that never
`speculates on control.
`In contrast the points where data speculation is needed are
`not clear since any instruction loading data from memory
`can be data dependent on any previous instructions that
`writes to memory. Consequently . predicting and tracking
`data dependencies. “data dependence speculation” can eas
`ily become overwhelming. Furthermore. the cost of a data
`this-speculation typically cannot be neglected. that is the
`performance of a processor that always mis-speculates on
`data dependencies is far less than that of a processor that
`never speculates on data.
`
`50
`
`55
`
`BRIEF SUMMARY OF THE INVENTION
`The present inventors have recognized that most data
`dependence this-speculations can be attributed to a few
`static STORE/LOAD instruction pairs. Furthermore. these
`instruction pairs exhibit “temporal locality” that is. if one
`LOAD/STORE pair causes a data mis-speculation at a given
`point in time. it is highly likely that a later instance of the
`same pair will soon cause another mis-speculation. For this
`reason. a table based approach may be used in which data
`dependent instructions likely to be a source of mis
`speculation. are stored in a small. high speed memory. These
`particular instruction pairs can be identi?ed based on pre
`vious mis-speculations.
`Very low overhead in a table based data dependence
`speculation is obtained by a three-tiered approach. If there is
`no history of data mis-speculation. an instruction is executed
`without further inquiry. This will be the case for most data
`independent instructions. If there has been a mis-speculation
`
`20
`
`25
`
`30
`
`35
`
`4
`with a given LOAD instruction. a predictor based on the past
`history of mis-speculations for that LOAD instruction is
`employed to determine whether the instruction should be
`executed or delayed Thus. instructions that are typically not
`dependent may be executed immediately. If the instruction
`is to be delayed. a synchronization table is used to determine
`when the instruction is to be performed.
`Speci?cally. the present invention provides a speculation
`decision circuit for use in a processor capable of executing
`program instructions in an execution order ditfering from the
`program order of the instructions. the processing unit further
`having a data speculation circuit for detecting data depen
`dence between instructions and detecting a mis-speculation
`where a data consuming instruction dependent for its data on
`a data producing instruction of earlier program order is. in
`fact. executed before the data consuming instruction. The
`speculation decision circuit includes a predictor circuit
`receiving the mis-speculation signal from the data specula
`tion circuit to produce a prediction associated with the
`particular data producing/consuming instruction pair and
`based on the mis-speculation indication. A prediction thresh
`old detector prevents data speculation for instructions hav
`ing a prediction value within a predetermined range. This
`prediction threshold detector may include an instruction
`synchronizing circuit instructing a processing unit to delay
`a later execution of the particular data consuming instruction
`until after the execution of the particular data producing
`instruction when the prediction associated with the data
`producing/consuming instruction pair is within a predeter
`mined range.
`Thus. it is one object of the invention to provide a
`predictor circuit that may identify data dependencies on an
`on-going or dynamic basis. Recognizing that there are
`relatively few instructions which will cause data mis
`speculations. these instructions are identi?ed by reference to
`historical mis- speculation associated with the instructions as
`stored in a prediction.
`The instruction synchronization circuit may include a
`prediction table listing certain data consuming instructions
`and certain data producing instructions each associated with
`a prediction. The instruction synchronization circuit will
`delay the particular data consuming instruction only when
`the prediction associated with the data consuming instruc
`tion is within a predetermined range of predictions and when
`the particular data consuming instruction is in the prediction
`table.
`Thus. it is another object of the invention to provide a
`prediction of data dependence which adds very little over
`head to the execution of instructions that historically have
`not resulted in mis-speculation. If the particular data con
`suming instruction is not in the prediction table. no further
`inquiry into the prediction table is required.
`The instruction synchronization circuit may also include
`a synchronization table associating certain data consuming
`instructions and certain data producing instructions. each
`with a ?ag indicating whether the respective data producing
`instruction has been executed. The instruction synchroniza
`tion circuit delays the subsequent instances of the certain
`data consuming instruction only when the prediction asso
`ciated with the data consuming instruction is within a
`predetermined range and when the particular data consum
`ing instmction is in the prediction table and when the ?ag
`indicates that particular data producing instruction has not
`been executed.
`Thus. it is another object of the invention to provide
`reduced instruction overhead even for instructions that have
`
`Valtrus Ex 2007-p. 10
`Google v Valtrus
`IPR2022-01197
`
`

`

`5.781.752
`
`6
`FIG. 8 is a ?gure similar to that of FIGS. 5 and 6 showing
`the prediction and synchronization tables at yet a later point;
`FIG. 9 is a ?ow chart showing the steps performed by the
`predictor circuit of FIG. 1 upon receiving a HANDLE
`MIS-SPECULA'IION message from the data speculation
`circuit;
`FIG. 10 is a ?ow chart showing the steps performed by the
`predictor circuit of FIG. 1 upon receiving a HANDLE
`LOAD message from the data speculation circuit; and
`FIG. 11 is a ?ow chart showing the steps performed by the
`predictor circuit of FIG. 1 upon receiving a HANDLE
`SQUASH message from the data speculation circuit.
`
`DETAILED DESCRIPTION OF THE
`INVENTION
`An Example Program with Data Dependency Referring
`now to the following Table I. an example object code
`program may have three instructions: 11. I2 and I3.
`
`TABLE I
`
`5
`a history of this-speculation when it is unlikely that. in the
`given instance. nus-speculation will occur. If the prediction
`indicates that this-speculation is unlikely. no further syn
`chronization steps need be taken.
`It is yet another object of the invention to allow data
`consuming instructions that are historically dependent on
`preceding data producing instructions to be executed rapidly
`if the preceding data producing instruction has already been
`executed. Thus. if the ?ag indicates that the data producing
`instruction has been executed. the data consuming instruc
`tion may be immediately executed without further waiting.
`The predictor circuit may create an entry in this synchro
`nization table only after mis-speculation has occurred for a
`previous instance of the particular data consuming instruc
`tion and particular data producing instruction of the entry.
`After synchronization has occurred. this entry may be
`removed.
`Thus. it is another object of the invention to provide a
`predictor circuit that minimizes the need for storage. Syn
`chronizing table entries. which may be more numerous than
`the prediction table entries as a result of possible multiple
`instances of each instruction. have entries in the synchroni
`zation table that are dynamically created and released to
`minimize storage requirements. This also minimizes search
`overhead in identifying instructions within this table.
`The foregoing and other objects and advantages of the
`invention will appear from the following description. In this
`description. references made to the accompanying drawings
`which form a part hereof and in which there is shown by way
`of illustration a preferred embodiment of the invention. Such
`embodiment does not necessarily represent the full scope of
`the invention. however. and reference must be made there
`fore to the claims for interpreting the scope of the invention.
`
`20
`
`25
`
`30
`
`These instructions provide for a storage of data from a
`processor to a memory location and the loading of data from
`the memory location to the processor.
`The ?rst instruction 11 stores data from a processor
`register (implied in the instruction) to memory at a speci?c
`address determined by the contents of register R1 of the
`processing unit. Generally. the contents of register R1 will
`have been set by an earlier instruction and cannot be
`deduced simply by looking at instruction I1.
`The following instructions I2 and 13 each load data from
`a speci?c physical address of memory to the processor
`register. Again. the registers R2 and R3 have been set by
`earlier instructions and the relevant addresses cannot be
`determined from these instructions alone.
`If a parallel processor were to load the three instructions
`11-13 for parallel processing. there would be two possible
`data dependencies. I2 may be dependent on 11 insofar that
`the contents of register R1 and R2 may be the same and
`therefore the information stored in M(R1) may be the same
`that is loaded from memory location M(R2). For instruction
`I2 to be executed correctly. it may be necessary that instruc
`tion 12 wait for instruction 11. Likewise. instruction I3 may
`be dependent on instruction I1. That is. the address M(R3)
`may be the same as the physical address M(Rl).
`Each of these dependencies is ambiguous. That is. it
`cannot be determined whether there is in fact a dependency
`without knowing the contents of registers R1. R2 and R3
`which cannot be deduced from the instructions alone.
`In a parallel processor that seeks to load blocks of
`instructions and execute them in parallel. possibly out of
`their memory or program order. it is important to identify
`and address ambiguous data dependencies to eliminate mis
`speculation and time consuming instruction squashing.
`
`General Processor Architecture
`Referring now to FIG. 1. an ILP processor 10 suitable for
`use with the present invention includes a memory 12 having
`a portion 14 holding a stored program 16 at a plurality of
`physical addresses 19 here depicted as xxl-xx6 where the
`values xx indicate some higher ordered address bits that may
`be ignored in this example.
`
`35
`
`45
`
`55
`
`BRIEF DESCRIPTION OF THE SEVERAL
`VIEWS OF THE DRAWINGS
`FIG. 1 is a block diagram showing the architecture of an
`instruction level parallel processor having multiple process
`ing units. an allocation unit allocating instructions of a
`program in memory to the processing units. and a retirement
`unit retiring the parallel processed instructions. the latter
`unit incorporating the predictor circuit of the present inven
`tion;
`FIG. 2 is a fragmentary listing of a source code program
`and its corresponding object code instructions stored in
`memory order at physical addresses in memory and as
`unwound in program order producing multiple instances of
`each object code instruction in execution;
`FIG. 3 is a ?ow chart of the operation of a typical data
`speculation circuit of the processor of FIG. 1 as modi?ed to
`work with the predictor circuit of the present invention to
`provide READY TO LOAD. HANDLE STORE. HANDLE
`MIS-SPECULATION. HANDLE LOAD. and HANDLE
`SQUASH messages to the predictor circuit of the present
`invention;
`FIG. 4 is a flow chart showing the steps performed by the
`predictor circuit of FIG. 1 upon receiving a READY TO
`LOAD message from the data speculation circuit;
`FIG. 5 is fragmentary view of prediction and synchroni
`zation tables used by the predictor circuit of FIG. 1;
`FIG. 6 is a ?gure similar to that of FIG. 5 showing the
`prediction and synchronization tables at a later point in the
`operation of the predictor circuit;
`FIG. 7 is a ?ow chart showing the steps performed by the
`65
`predictor circuit of FIG. 1 upon receiving a HANDLE
`STORE message from the data speculation circuit;
`
`Valtrus Ex 2007-p. 11
`Google v Valtrus
`IPR2022-01197
`
`

`

`5.781.752
`
`8
`dictor circuit 33 provides a dynamic indication to the data
`speculation circuit 30 as to whether data speculation should
`be performed. The data speculation circuit 30 may then.
`based on the indication of the predictor circuit 33 stall the
`execution of a memory operation at the processing units 24
`in order to avoid mis-speculation.
`The prediction provided by the predictor circuit 33. as will
`be described. is updated based on historical this-speculations
`detected by the data speculation circuit 30. For this reason.
`the data speculation circuit 30 must communicate with the
`predictor circuit 33 on an ongoing basis. Generally. as will
`be described below. the data speculation circuit 30 provides
`?ve signals to the predictor circuit 33 as listed in the
`following Table 11.
`
`Signal Name
`
`TABLE II
`
`Description
`
`HANDLE STORE
`
`READY TO LOAD
`
`HANDLE M'lS-SPECLLA'I'ION The data speculation circuit has
`detected a nus-speculation.
`The data speculation circuit has
`decided to issue a STORE operation.
`The data speculation circuit is
`about to perform a speculative LOAD
`operation and needs information
`from the predictor as to whether
`the LOAD should wait.
`The data speculation circuit has
`decided to perform a non
`speculative LOAD operation without
`data dependency.
`The data speculation circuit has
`issued a squash for a particular
`instruction either as the result of
`data or control this-speculation.
`
`HANDLE LOAD
`
`HANDLE SQUASH
`
`10
`
`15
`
`20
`
`25
`
`7
`The data speculation circuit 30 receives signals from the
`allocation circuit 20 that notify it of the program order of any
`instructions that are allocated to the processing units 24 and
`that will access memory. The data speculation circuit is
`responsible of keeping track of order of the memory opera
`tions as they are performed by the processing units so that
`it can detect any mis-speculations.
`The ILP processor 10 includes an allocation circuit 20
`which may read memory 12 and in particular program
`memory portion 14 to fetch a subset of the program 16
`encompassing multiple instructions of an instruction win
`dow 22. Generally. as is understood in the art. the allocation
`circuit 20 sends dilferent ones of these instructions to
`different independent processing units 24 for execution.
`The processing units 24 may communicate with memory
`12 for the purpose of obtaining data. but generally do not
`modify portions of the memory 12 that may be also read by
`other processing units 24. Thus. for example. an instruction
`which requires data to be obtained from memory 12 and
`added to a constant may be completely executed. However.
`an instruction. which stores a register to memory 12. may
`read the register but stop before the store operation which
`may modify memory 12 used by other processing units 24.
`An instruction that has been executed as far as possible is
`considered “ready to commit the operation”. Prior to reading
`data from memory or requesting a store operation the
`processing units 24 notify the data speculation unit 30 of the
`operation so that the latter can. in conjunction with the
`allocation unit. keep track of the program and execution
`order of the operations.
`A retirement circuit 26 receives signals from the process
`ing units 24 indicating that their instructions are ready to
`perform the operation. The retirement circuit 26 then retires
`the instructions by writing any computed values to memory
`12.
`Prior to the retirement circuit 26 writing values to
`memory. a data speculation circuit 30 communicating with
`the allocation circuit 20 and the retirement circuit 26 detects
`mis-speculation. As described above. mis-speculation
`occurs in an instruction that has been executed prematurely
`and erroneously. Whenever a store instruction is ready to
`commit and write its data to a memory address. the data
`speculation circuit 30. checks to see if any subsequent in the
`instruction window load instructions have accessed the same
`memory address. and if so instructs the allocation circuit 20
`and retirement circuit 26 that these load instructions are to
`be squashed and ire-allocated by the allocation circuit 20 at
`a later time. Thus. for example. in the program Table L if
`instructions I1 through I3 represent the instruction window
`22 and instruction B has accessed memory prior to II
`writing to memory. the data speculation circuit 30 and at the
`time I1 is ready to commit checks if I3 has accessed the same
`memory address as 11. and if so it instructions the allocation
`circuit 20 and the retirement circuit 26 that B is to be
`squashed and reallocated at a later time. If so. the data
`speculation circuit 30 instructs the allocation circuit and
`retirement circuit 26 that instruction I3 is to be squashed and
`must be reallocated by the allocation circuit 20 at a later
`time. The writing of results from instructions that have not
`been squashed is then done by the retirement circuit 26 as
`indicated by arrow 28.
`The retirement circuit 26. the data speculation circuit 30.
`and the allocation circuit 20 communicate by a number of
`conn'ol signals 35. Each of these elements described above
`are well lmown in the art.
`The data speculation circuit 30 also communicates with
`the predictor circuit 33 of the present invention. The pre
`
`35
`
`45
`
`55
`
`65
`
`Generally. prior to each instruction being retired. the
`instruction is provided to the data speculation circuit 30
`which detects nus-speculations. The retirement circuit 26
`also instructs the data speculation circuit 30 when a squash
`instruction i

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket