throbber
US 6,385,757 Bl
`
`41
`To optimize an iformat design for a particular application
`program, !he iforrnat system selects custom templates from
`operation issue statistics obtained from scheduling the pro(cid:173)
`gram. The iformat system then generates an iformat based
`on a combination of the custom templates and an abstract
`ISA specification.
`Tbe system uses a re-targetablc compiler to generate the
`operation issues statistics for a particular processor design.
`A:;; shown in FIG. 12, a module called the MOES extractor
`560 generates a machine description in a format called JO
`MOES.
`Tb is machine description re targets the compiler 564 to the
`processor design based on its abstract ISA specification 510
`and datapath specification 514. The compiler 564 then
`schedules a given application program 566 and generates
`operation issue statistics 568 regarding the usage of the
`operation groups in the instruction format templates. '01e
`system thea uses tbe frequency of use of the operations in
`each template by the application program lo compute cus(cid:173)
`tomized templates as shown in step 569. The customization
`process is automated in that it selects custom templates by
`minimizing a cost function that quantifies the static or
`dynamic code size and the decode cost (e.g., measured in
`chip area).
`The process of selecting instruction templates in the
`iformat based on scheduling statistics may be conducted as
`a stand-alone proce&s, or may be conducted in conjunction
`with the automated iformat design process. In the latter case,
`it may be used to provide an initial input specification of the
`desired ILP constraints to the automated iformat design JO
`process. Additionally, it·may be used lo optimize an existing
`iformat design.
`The system may also perform additional oplimizatioa by
`using variable-length field encodings to further reduce the
`instruction size. These optimized designs can lead lo dra- 35
`matic reductions in code size, as showa in the detailed
`description below.
`7.2 !mplementation of the Input Specification
`The principal input of the iformal design process is an
`Abstract Instruction Set Architecture (ISA) specification 40
`510. In the current implementation, the user or another
`program module may provide this specification as an Arch(cid:173)
`Spec in texrual form.
`An ArchSpec reader module converts the textual form of
`the ArchSpec to an abstract ISA spec data structure, which 45
`contains a machine-readable set of tabular parameters and
`constraints, including register file entries, operation groups,
`and exclusion/concurrency relationships.
`7.3 Instruction Syntax
`VLIW processors issue instructions having multiple so
`instruction fields. An instruction field is a set of bit positions
`intended lo be interpreted as an atomic unit within some
`instruction context. Familiar examples are opcode fields,
`source and destination register specifier fields, and literal
`fields. Bits from each of these fields flow from the instruc- 55
`lion register lo control ports in the data path. For example,
`opcode bits flow to functional uoits, and source register bits
`flow lo register file read address ports. Another common
`type of instruction field is a select field. Select fields encode
`a choice between disjoint alternatives and communicate this 60
`context to the decoder. For example, a select bit may indicate
`whether an operand field is to be interpreted as a register
`specifier or as a short literal value.
`An operation is the smallest unit of execution; ii com(cid:173)
`prises an opcode, source operaods, and destination operands. 65
`Each operand may support one or more operand types. A set
`of possible operand types is called an io-set. A list of io-sets,
`
`42
`one per operand, form an operation's io-formal. For
`example, suppose an add operation permits its left source
`operand to be either an integer register or a short literal
`value, and suppose iL' right source and destination operands
`source and sink from integer registers. 'foe corresponding
`io-sets are (gpr, s}, {gpr}, {gpr}. The io-fonnal is simply
`this list of io-sets, which are abbreviated in shorthand
`notation as follows:
`gpr s, gpr . gpr
`Closely related operations such as add and subtract often
`have the same io-formal. One reason for this is that related
`operations may be implemented by a single, multi-function
`unit (macro-cell). As discussed above, to simplify tbe
`instruction formal design process, related operations are
`15 grouped into operation groups.
`111e instruction format assigns sets of op groups (called
`super groups) to slots of an instruction. The processor issues
`operations within an instruction from these slots concur(cid:173)
`rently. To fully specify an operation, the instruction format
`20 specifies both an op-group and an opcode (specific lo that
`opgroup ). In effect, tbis organization factors a fiat opcode
`name space into a multi-tier encoding. In rare cases, this
`factorization may increase the encoding length by one bit
`per level. However, it should be noted that this approach
`25 does not preclude a fiat encoding space: placing each
`operation in its own op-group eliminates the factorization.
`More importantly, hierarchical encoding often gives the
`same benefits as variable-length field encoding, but is sim-
`pler lo implement.
`7 ,4 The Instruction Format Tree
`In a flat, horizontal instruction formal, all instruction
`fields are encoded in disjomt position.s within a single, wide
`instruction. A hierarchical instruction format allows exclu(cid:173)
`sive instruction fields (those that are not used simultaneously
`in any instruction) lo be encoded in overlapping bit
`positions, thereby reducing the overall instruction width, In
`the instruction formal design system shown in FIG. 12, the
`hierarchical relationship between instruction fields is repre(cid:173)
`sented by an instruction formal tree (if-tree). The leaves of
`an if-tree are instruction fields; where each leaf points to a
`control port in the data path, such as a register file address
`port, or an opcode input of a FU.
`FIG. 13 illustrates the structure of an if-tree used in the
`current implementation. 1be overall structure of the tree
`defines how each instruction is built. Each part of the tree
`represents a node, with the lowest nodes (the cut-off-box(cid:173)
`shaped nodes) forming the tree's leaves. The oval-shaped
`nodes are "OR" nodes, while the boxed-shaped nodes are
`"AND" nodes. The OR nodes denote a selection between the
`children of the node such that only one choice (one branch)
`extends lo the next level. Conversely, an AND node allows
`all of the components of the node to form new branches.
`Stated another way, each level of the tree is either a
`conjunction (AND) or disjunction (OR) of the subtrees al the
`lower level.
`The root node 632 of tbe tree is the overall machine
`instruction. This is an OR node representing a choice of
`instruction templates. A template select field (template ID) is
`used lo identify the particular template. This select field is
`illustrated as the leaf node labeled "steer" connected lo the
`instruction node 632.
`[ndividual instructions are based on instruction templates,
`which are the AND-type child nodes of the root node (See,
`e.g., template.s 634 and 636). 1be templates each encode the
`sets of operations that issue concurrently. Since the 1mmber
`of combinatioos of operations that may issue concurrently is
`astronomical, it is necessary to impose some structure oo the
`
`0181
`
`Volkswagen 1008 - Part 2 of 3
`
`

`
`US 6,385,757 Bl
`
`43
`encoding within each lemplale. Hence, each template is
`parlitiorled into one or more operation issue sloL>. Every
`combination of operations assigned lo th"''e slots may be
`issued concmrently.
`In addition, each template has a consume to end-of-packet s
`bit field (CEP) that indicates whether the next instruction
`directly follows the current instruction or ii starts at lhe next
`packet boundary. This capability is used to align certain
`inslrnclions (e.g. branch targets) lo known address bound(cid:173)
`aries. Each template also specifies the number of spare bits 10
`that may be used to encode the number of no-op cycle to
`follow the current instruction. These spare bits may arise due
`to a need for packet alignment or quantized allocation.
`1he next level of the tree defines each of the concurrent
`issue sloL~. Each slot is an OR node supporting a set of 15
`operation groups, called a super group (i.e., nodes 638, 640,
`642), that are all mutually exclusive and have tne same
`concurreoey pattern. A select field chooses among the vari(cid:173)
`ous operation groups within a super group. Again, this select
`field is illustrated as the leaf node labeled "steer'' connected 20
`to super group 640.
`Below each super group lie operation groups as defined in
`the input specification as described above. Each operation
`group (e.g., operation group 643) is an OR node that has a
`select field ("steer") to choose among tbe various operation
`formats supported by operation group. PIG. 13 shows this
`situation where one operation format allows a literal field on
`the left port, while the other allows it on the right port.
`Each operation format (e.g., IO fonnat descriptors 644,
`646) :is an AND node consisting of the opcode field 654, the 30
`predicate field (if any) 656, and a sequence of source and
`destination field types(shown as IO sets 648, 660, 652). The
`traditional three-address operation encoding is defined at
`this level.
`Each IO set is an OR node consisting of a singleton or a
`set of instruction fields that identify the exact kind and
`location of the operaod. IO sets with multiple choices (e.g.,
`650) have a select field to identify which instruction field is
`intended. For example, one of the IO set nodes 650 repre(cid:173)
`sents a selection between instruction fields 660, 662, which
`is controlled via a multiplexor select field 664. The other 10
`sets each have only one kind of field, and thus, have a single
`child node representing that field (nodes 658, 666). The
`instruction fields point to the datapath control ports 668.
`In implementing an instruction format, one principal 45
`design choice is whether lo use a single, fixed-length instruc(cid:173)
`tion format, or allow variable-length instructions. 'Il1e ifor(cid:173)
`mat design system supports both fixed and variable length
`instructions. The use of variable-length instructions pro(cid:173)
`duces more-compact code but increases decode complexity. 50
`The trade-off between code size and instruction decode
`complexity is a primary design consideration. A single,
`fixed-lengtb instruction format simplifies decode logic and
`of operations to functional units,
`the data path for
`but it often results
`poor code density, since the single 55
`format must accommodate the worst-case (longest) instruc(cid:173)
`tion. For example, if the longest instruction in a fixed-length
`instruction formal is 128 bil5 long, then all of the instruc(cid:173)
`tions in the instruction set must be 128 hits long. In order lo
`maintain a constant instruction length, many instructions 60
`will require the use of wasted bits whose sole purpose is to
`fill in unused space in the instructions. These wasted bits
`lead to increased code size. Conversely, variable-length
`instruclions can accommodate both wide and compact,
`restricted instruction formats without wasting bits, which 65
`results in a reduction in code size. By using variable-length
`instructions, the instruction format can accommodate the
`
`44
`wid"'st instructions where necessary, and make use of
`compact, restricted instruction formals, such as instructions
`that do not encode long literals.
`FIG. 14 shows tbe format of an instruction and its
`building blocks. Al the heart of the instruction is an instruc(cid:173)
`tion template 670. An instruction template encodes sets of
`operations that issue concurrently. Each template includes
`multiple concurrent slots 672, where each slot comprises a
`set of exclusive operation groups 674. Since all of the
`operations in an operation group are exclusive, all of the
`operations in each slot are also exclusive. Each template
`encodes the cross-product of the operations in each of its
`slors.
`11ie length of each template is variable, depending in part
`on the length and number of the slots in the template. For
`example, some templates might have two slots, while other
`templates
`have three or four slots. Furthermore, the
`width of
`slot will depend on the width of the widest
`operation group within that slot, plus overhead, as shown in
`the lower portion of FIG. 14. There :is considerable similarity
`and overlap among the opcodes within an operation group
`by construction, so very little encoding space is wasted
`within the operation group. But the opcode field now must
`be split into an operation group selection field 676 and an
`opcode selection field 678 within the operation group. With
`logarithmic encoding, this requires at most one additional bit
`for encoding the opcode. For example, 15 opcodes may be
`encoded in 4 bits, while splitting them into 3 operation
`groups of 5 opcodes each requires (log,(3))+(1og,(5)]+5
`bits. In addition, every slot has a reserved no-op encoding.
`In cases where an op group has alternative operation
`formats, there is yet another select field to select the opera(cid:173)
`tion format.
`Each instruction also includes a consume to end-of-packet
`35 bit 680, and a template specifier 682. The template specifier
`identifies the template. An instruction format having t tem(cid:173)
`plates will need [log2(t) J bits to encode the template speci(cid:173)
`fier. This template specifier is in a fixed position within every
`instruction, and from its value, the instruction sequencer in
`40 the proces.sor's control path determines the overall instnic(cid:173)
`tion length, and thus the address of the subsequent instmc(cid:173)
`tioo.
`In the current implementation, the length of the instruc(cid:173)
`tion is variable, but each length is a multiple of a pre(cid:173)
`determined number of bits called a quantum. For instance, if
`the quantum is 8 bits, the length of the instruction could pe
`any number equal to or above some minimum value (say 32
`bit.s) that is divisible by 8, such as 64 bits, 72 bits, 80 biL<;,
`etc. One or more dnmmy bits may be placed as appropriate.
`within the instruction to ensure that the length of the
`instruction falls on a quantum boundary.
`The iformat system builds the level-, of the if-tree in an
`incremental fashion. It construct-, the top three levels, con(cid:173)
`sisting of the iostrnction, the templates, and the super groups
`from the abstract !SA specification, and optionally, custom
`templates. It constrncls the middle layers, including the
`operation groups, the operation formats, and the field types
`from the abstract !SA specification. Finally, it coastmcts the
`instruction fields from the contents oft.be various field types
`in tbe abstract ISA specification and the individual control
`ports in the datapath that each field is supposed to control.
`7.5 Instruction Templates
`A primary objective of tbe instruction format desigo
`system is to produce a set of instruction templates that
`support the encoding of all of the sets of operation groups
`th<tl can be issued concurrently. 1b initiate lhe template
`design process, the instruction format design system starts
`
`0182
`
`

`
`US 6,385,757 Bl
`
`46
`cess. The process starts witb the !LP constraints 681, wbicb
`define a set of exclusion rdationsbips 683 between operation
`groups 684. From these exclusion relationships, the iformat
`design system builds a boolean exclusion matrix 686. In the
`exclusion matrix 686, the rows and columns are matched up
`with respective operation groups, e.g., "A" corresponds to
`the operation group A, "B" corresponds to the operation
`gmup B, etc. The 1 's in the matrix indicate an exclusion
`relationship, while a blank indicates that the corresponding
`operation groups may be issued concurrently. (1be blanks
`are actually O's in the real matrix--blanks are used here for
`clarity). The system then builds a concurrency matrix 688
`from the exclusion matrix 686. The concurrency matrix 688
`is the complement of the exclusion matrix 686. The "?''s
`along the diagonal of the concurrency matrix 688 can be
`interpreted as either a 1 or 0.
`The rows in the concurrency matrix determine a set of
`concurrency neigbbors for each operation group.Agraphical
`representation of the relationships defined by the ccncur(cid:173)
`rency matrix 688 is shown in concunency graph 692. Eacb
`node represents an operation group, while each connecting
`"edge" represents a concurrency relation. A clique is a set of
`nodes from a graph where every pair of nodes is connected
`by an edge. For instance, there are 16 cliques in lbe
`concurrency graph 692.
`After the concurrency matrix is generated, the system
`compares the rows in the concurrency matrix to identify
`equivalent operation groups. The super groups are formed
`from the equivalent operation groups. Two operation groups
`are said to be equivalent if they have the same set of
`concurrency neighbors. Note that two mutually exclusive
`nperation groups tbat have the same set of concurrency
`neighbors can replace each other in any template without
`violating any exclusion constraint and therefore can be
`treated equivalently. Similarly, two concurrent operation
`groups that have tbe same set of concurrency neighbors
`(other than themselves) can always be placed together in a
`template without violating any exclusion constraints and
`therefore can be treated equivalently.
`An example of psendocode for performing equivalence
`checking and partitioning into super groups is illustrated
`below.
`
`25
`
`45
`out with the architecture specification, which defines the
`exclusion and concurrency constraints for a particular
`design. ln one implementation, the architecture specification
`directly provides the exclusion relationships between
`to s
`tion groups. However, the iformal design process
`know which opcodes can be issued concurrently, i.e., the
`concurrency relationship, rather tlran which opcodes must be
`exclusive.
`In such an implementation, the concurrency relationship
`is taken to be the complement of the exclusion relationship. rn
`One way of determining the concurrency relation is lo take
`the complement of the exclusion relations among opcodes
`implied by the architecture specification and treat each set of
`concurrent opcodes as a potential candidate for becoming an
`instruction template. While this provides an excellent start- 15
`ing point, it unfortunately does not lead to a practical
`solution, since the number of combinations of operations
`that may issue concurrently quickly becomes intractable.
`For example, a typical VLJW machiue specification may
`include 2 integer ALUs, 1 floating point ALU and 1 memory 20
`u.nit, with 50 opcodes each. In such a machine tbe total
`number of distinct 4-issue instructions is 502 x50x50=6,250,
`000. Specializing instructions to 1, 2, and 3-issue templates
`would add many more. It is therefore necessary lo impose
`some structure on the encoding within each template.
`Our current implementation uses several mechanisms to
`reduce the complexity of the problem. These mechanisms
`represent iformat design decisions and affect the final
`instruction fonnat layout and size. In most cases there may
`al.so be a tradeoff between the simplicity and orthogonality 30
`of the field layout (and hence the decode hardware) and the
`size of the instruction template. These tradeoffs will be
`described as the design process is detailed below.
`AF. a first axiom, all templates must satisfy an exclusion
`constraint between two opcodes, i.e. these opcodes must 35
`never occupy separate slots in any template. 'D1is is because
`tbese opcodes may share hardware resources during
`execution, and therefore, the scheduler should never put
`these opcodes together within the same instruction. Oo the
`other hand, a concurrency constraint between two opcodes 40
`implies that the scheduler is free lo issue these opcodes
`together in a single instruction and therefore there should be
`some template in which these two opcodes are allowed to
`occur together. In particular, that template may contain
`additional slots that can be filled with noops, if necessary.
`Tberefore, it is unnecessary to generate a special template
`for each concurrency constraint, but rather all that is needed
`is a set of templates that can effectively cover all possible
`sets of concurrently scheduled opcodes_
`The problem becomes greatly simplified when the con(cid:173)
`currency of operation groups is considered instead of indi(cid:173)
`vidual opcodes. As introduced above, operation groups are
`defined as sets of opcode instances that are generally similar
`in nature in terms of their latency and connectivity to
`physical register files and are expected to be mutually
`exclusive with respect to operation issue. All opcodes within
`an operation group must be mutually exclusive by definition.
`Furthermore, the instruction format is designed so that all
`opcodes within an operation group sbare the same instruc(cid:173)
`tion fields. Tbus, the operation group is an obvious choice
`for the primary building block for creating templates.
`Another simplification involves classifying mutually(cid:173)
`exclusive operation groups into equivalence classes called
`super groups based on the constraints provided in the
`architecture specification. FIG. 15 illustrates an example that 65
`shows how the opera lion groups (sbown as letters) ancl
`exclusion relations arc used in the template selection pro-
`
`Proceduref'iodSuperGroups (BitM:atri.x con.cur)
`45 1:
`// "concuf' is a (numNodes x nurn.Nodes) boolean matrix
`{/First, initialize supergroup hash table and id counter
`2:
`3:
`Hash.Map;Bi.tVector, inll SGmap
`int Sf;COUOt ... O;
`4:
`for (i = 0 to numNodes-1) do
`5:
`//cxt:acl C8ch node's vector of n.eighbo1s w/ an.d w/o self
`6:
`SO 7:
`Bit Vector AND-group - ooncur.row(i) .aet_bit(i);
`8:
`BilVectcr OR~group - c.oncur.row(i) .reset_bit(i);
`9:
`//Check for existing AND-style st1pergroup for this node
`if (SGruap(AND·group) is already bound) then
`10:
`SGkind(i) = SO-AND;
`11:
`SGid(i) = SGtnap(AND·group);
`12:
`55 13:
`//Check for existing OR-style suporgroup for this node
`else if (SGmap(OR·gruup) is already bound) then
`14:
`15:
`SGkind(i) - SG·OR
`16:
`SGid(i) - SGmap(OR-group);
`//If neither neighbor relation is present, start a new
`17:
`18;
`f/supergroup with the ne:w neighbor re[ations
`GO ~~:
`else SGid(i) = S<Ycount;
`21:
`SGmap(AND-group) SG01ap(OR-group) SGcount;
`SGcount = SGcount + 1;
`22:
`cndif
`23:
`24:
`e1td[or
`
`The equivalence check and the partitioning can be per(cid:173)
`formed quickly by employing the pigeon-hole principle. The
`
`0183
`
`

`
`US 6,385,757 Bl
`
`47
`algorithm hashes each operation group using its set of
`neighbors in the concurrency matnx as the key, 11ie neigh(cid:173)
`bor relations (neighbor keys) for each operation group (each
`row) are converted 1D bilvectors. The algorithm hashes in
`two ways: once by treating each operation group as concur(cid:173)
`rent with itself (AND-style) thereby finding equivalent con(cid:173)
`current operation groups, and the second time by treating
`each operation group as exclusive with itself (OR-s!yle)
`thereby finding equivalent exclusive operation groups, Thi~
`hashing approach results io two bitvectors for each operation
`grou!'-"ne with the"?" entry changed to a 1 (AND-style),
`and one with the "?" entry changed to a 0 (OR-style).
`Bitvectors (operation groups) that hash to the same bucket
`necessarily have the same concurrency neighbors and there(cid:173)
`fore become part of the same super group. For example in
`FIG. 15, operation groups A, B, and C have the same 15
`concurrency neighbors and thus form the super group {A, B,
`C}. The other super groups, {P, Q}, {X, Y}, and {M, N}, are
`similarly formed. The set of all distioct super groups is
`defined by all the distinct neighbor keys. This partitioning
`leads to a reduced-concurrency (super group) graph 694, 20
`comprisiog the super groups and their concurrency relations.
`Instruction templates 696 are obtained from the reduced
`concurrency graph, as described below.
`Each operation group identifies whether it is anAND·lYPe
`or OR-type super grou]J. This information is used in the final 25
`template expansion, where each operation group from an
`AND-type super group is given a separate slot, while all
`operation groups from an OR-type super group are put into
`the same slot.
`In the concurrency matrix 690 shown in FIG. 15, the"?" 30
`entries o( the "A", "B", and "C" operation group bitvectors
`have been changed to O's so that their corresponding bitvec(cid:173)
`tors are identical. 'Thus, "A:', "B", and "C" form an OR-type
`super group {A, B, C}, and each operation group is placed
`in the same slot.
`FIG. 16 shows a case with an AND-type and an OR-type
`super group. In order to obtain identical bitvectors, the "A",
`"B", and "C" operation groups are treated as being concur(cid:173)
`rent with tbernselves. As a result, they form an AND-tYPe
`super group and are placed in separate template slots. In 40
`contrast, the "M", "N", "X", and 11Y" operation groups are
`treated as exclusive with them.selves and form two different
`sets of OR-type super groups {M,N} and {X,Y}, which each
`occupy a single slot
`For a homogenous VLIW-style machine with multiple, 45
`orthogonal functional units thL'i process yields tremendous
`savings by reducing the complexity of the problem to just a
`few iodependent super groups. Tbe resulting instruction
`templates closely match super groups to independent issue
`slots for each functional unit. For a more heterogeneous 50
`machine with shared resources, the resulting number of
`templates may be larger and the decoding is more complex
`but partitioning the operation groups into super groups still
`reduces the complexity of the problem significantly.
`7.6 Concurrency Cliques and Templates
`Once the super groups have been determined, each clique
`in the reduced concurrency graph is a candidate for an
`instruction template since it denotes a set of super groups
`tbat may be issued in parallel by the scheduler. A clique is
`a subgraph in which every node is a neighbor of every other 60
`node. Clearly, enumerating all cliques would lead to a large
`number of templates. On the other hand, unless the concur(cid:173)
`rency among super groups is restricted in some other way,
`it is necessary lo choose a set of templates that cover all
`possible cliques of the super group graph to ensure that !lie 65
`scheduler is not restricted in any way other than that
`specified in the Ard1Spec.
`
`48
`As an example, suppose super groups A, B and C only
`have pairwise concurrency constraints, i.e., {AB}. {AC},
`and {BC}. These pairwise concurrencies can be covered in
`one of two ways. First, the pairwise concurrency constraints
`can be treated as three independent templates AB, AC, and
`BC, each requiring lwo issne slots. A second possibility is lo
`treat the pairwise concurrencies as being simultaneously
`concurrent, thereby requiring only one template (ABC) with
`three issue slois. Strictly speaking, this allows more paral(cid:173)
`lelism than what was intended, lf the compiler never sched(cid:173)
`uled all three operations simultaneously, the second design
`would end up carrying one ooop in every instruction 111ereby
`wasting one-third of the program space, On the other hand,
`the first design requires additional decoding logic lo select
`among the three templates and more complex dispersal of
`the instruction bits to the various functional units,
`In the present scheme, this tradeoff is made tow•rds
`initially choosing a reduced number of possibly longer
`templates, 11iis is partly due to the fact that the ArchSpec
`does not directly specify concurrency in most instances, but
`rather specifies exclusion relations among operation groups
`that are then complemented to obtain concuneocy relations.
`During the initial template design phase, choosiog the maxi·
`mally concurrent templates covers all possible concurrency
`relations with as few templates as possible.
`The maximally concurrent !emplates may be determined
`by finding tbe cliques of the super group graph. An example
`of a simple reduced super group concurrency graph fo shown
`in FIG. 17. The graph comprises super groups 1-7, and their
`interconnecting edges. The maximal cliques for such a
`simple graph ca.n be determined by hand by siniply identi-
`fying sets of nodes that are completely connected-that is
`each node in a clique must connect to the remaining nodes
`in the clique, For instance, { 1, 3, 7} is a clique, while {2, 4,
`35 5, 6} is not (nodes 5 and 6 are not connected). In the
`supergraph of FIG. 6, there are seven maximal cliques, and
`thus seven maximally concurrent templates,
`It is necessary to use computational means to calculate the
`cliques for more complex super group graphs, The instruc(cid:173)
`tion format designer uses the same approach for finding
`cliques as the datapath synthesizer described above.
`7.7 Set-Up of Bit Allocation Problem
`Once the templates are selected, the iforrnat system con(cid:173)
`structs the lower levels of the IF tree, 'The tern plates form the
`upper level of the tree. For each of the operation groups in
`a template, the system extracts the inputs and outputs for
`each operation based on their f/0 formats in the .abstract ISA
`specification and adds this information to the IF tree, Using
`the extracted 1/0 fom1ats, the system enumerates the instruc·
`tion fields for each of the operation groups associated with
`the templates. Next, it builds field conflicts, partitions
`instruction fields into superfields, and extracts bit widtb
`requirements.
`7 .7 .1 Instruction Fields
`As shown in FIG, 13, the instructi.on fields form the leaves
`of the if-tree. Each instruction field corresponds to a data(cid:173)
`path control port such as register file read/write address
`ports, predicate and opcode port> of functional units, and
`selector port5 o[ multiplexors, Each field reserves a certain
`number of instruction bits to control the corresponding
`control port.
`1ne iforrnat designer assigns each field to a control port
`by traversing the if tree lo find the operation group associ(cid:173)
`ated with the field, and then extracting the fonctional unit
`assigned to the operation group in the datapath specification.
`The following sub-sections describe various kinds of
`instruction fields. FIG. 20 is annotated with letters S, A, L,
`
`10
`
`55
`
`0184
`
`

`
`us 6,385,757 Bl
`
`50
`49
`7.7 .2 Computing Field Conflicts
`op and C to illustrate examples of lhe information flowing
`Before performing graph coloring, tbe system computes
`from these fields in the instruction register to the control
`the pairwise conflict relation between instruction fields,
`ports in the data path.
`which are represented as an undirected conflict graph.
`Select fields (S). At each level of the if-tree that is an OR
`two leaf nodes (instruction fields) conflict if
`In the
`node, there is a select field that chooses among the various
`Least-common ancestor is an AND node.
`and only if
`alternatives. The number of alternatives is given by the
`The system computes pairwise conflict relations using a
`number of children, n, of the OR node in the if-tree.
`bottom-up data flow analysis of the if-tree. TI1e procedure in
`Assuming a simple binary encoding, the bit requirement of
`the implementation maintains a field set, F, and a conflict
`the select field is then log2(n) bits.
`Different select fields are used lo control different aspects 10 relation, R. Set F" is lbe set of instruction fields in the subtree
`rooted al node n. Relation R" is the conflict relation for the
`of the datapalh. 'Die root of the if-tree has a template select
`subtree rooted at node n.
`field that is routed directly to the instruction unit control
`The procedure proces.ses nodes in bottom-up order as
`logic in order lo determine the template width. ll also
`follows:
`specifies where the supergroup select fields are positioned. 15
`Leaf node: Al a leaf node, /, the field set is initialized lo
`Therefore, lb.is field must be allocated at a fixed position
`contain !he leaf node, and the conflict relation is empty.
`within tbe instruction. Together with the template select
`or-node: At an OR-node, the field set is the union of field
`fields, the select fields al super group and operatio

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket