`
`OPTIMIZATION OF MULTI-STAGE HIERARCHICAL NETWORKS FOR
`
`PRACTICAL ROUTING APPLICATIONS
`
`Venkat Konda
`
`CROSS REFERENCE TO RELATED APPLICATIONS
`
`This application is Continuation In Part Application to and claimspriority of the
`
`U.S. Provisional Patent Application Serial No. 61/531,615 entitled "OPTIMIZATION
`
`OF MULTI-STAGE HIERARCHICAL NETWORKSFOR PRACTICAL ROUTING
`
`APPLICATIONS"by Venkat Konda assigned to the same assignee as the current
`
`10
`
`application, filed September 7, 2011.
`
`This application is related to and incorporates by reference in its entirety the US
`
`Application Serial No. 12/530,207 entitled "FULLY CONNECTED GENERALIZED
`
`MULTI-STAGE NETWORKS"by Venkat Konda assigned to the same assigneeas the
`
`current application, filed September 6, 2009, the U.S. Provisional Patent Application
`
`15
`
`Serial No. 60/905,526 entitled "LARGE SCALE CROSSPOINT REDUCTION WITH
`
`NONBLOCKING UNICAST & MULTICAST IN ARBITRARILY LARGE MULTI-
`
`STAGE NETWORKS"by Venkat Kondaassigned to the same assignee as the current
`
`application, filed March 6, 2007, and the U.S. Provisional Patent Application Serial No.
`
`60/940, 383 entitled "FULLY CONNECTED GENERALIZED MULTI-STAGE
`
`20
`
`NETWORKS"by Venkat Kondaassigned to the same assignee as the current application,
`
`filed May 25, 2007.
`
`This application is related to and incorporates by reference in its entirety the US
`
`Application Serial No. 12/601,273 entitled "FULLY CONNECTED GENERALIZED
`
`BUTTERFLY FAT TREE NETWORKS"by Venkat Kondaassigned to the same
`
`25
`
`assignee as the current application, filed November 22, 2009, the U.S. Provisional Patent
`
`Application Serial No. 60/940, 387 entitled "FULLY CONNECTED GENERALIZED
`
`BUTTERFLY FAT TREE NETWORKS"by Venkat Kondaassigned to the same
`
`assignee as the current application, filed May 25, 2007, and the U.S. Provisional Patent
`
`-]-
`
`Page 1 of 102
`
`FLEX LOGIX EXHIBIT 1006
`
`Page 1 of 102
`
`FLEX LOGIX EXHIBIT 1006
`
`
`
`S-0050 PCT
`
`Application Serial No. 60/940, 390 entitled "FULLY CONNECTED GENERALIZED
`
`MULTI-LINK BUTTERFLY FAT TREE NETWORKS"by Venkat Kondaassignedto
`
`the same assignee as the current application, filed May 25, 2007
`
`This applicationis related to and incorporates by reference inits entirety the US
`
`Application Serial No. 12/601,274 entitled "FULLY CONNECTED GENERALIZED
`
`MULTI-LINK MULTI-STAGE NETWORKS"by Venkat Kondaassigned to the same
`
`assignee as the current application, filed November22, 2009, the U.S. Provisional Patent
`
`Application Serial No. 60/940, 389 entitled "FULLY CONNECTED GENERALIZED
`
`REARRANGEABLY NONBLOCKING MULTI-LINK MULTI-STAGE NETWORKS"
`
`10
`
`by Venkat Konda assigned to the same assignee as the current application, filed May 25,
`
`2007, the U.S. Provisional Patent Application Serial No. 60/940, 391 entitled "FULLY
`
`CONNECTED GENERALIZED FOLDED MULTI-STAGE NETWORKS"by Venkat
`
`Konda assigned to the same assignee as the current application, filed May 25, 2007 and
`
`the U.S. Provisional Patent Application Serial No. 60/940, 392 entitled "FULLY
`
`15
`
`CONNECTED GENERALIZED STRICTLY NONBLOCKING MULTI-LINK MULTI-
`
`STAGE NETWORKS"by Venkat Kondaassigned to the same assignee as the current
`
`application, filed May 25, 2007.
`
`This application is related to and incorporates by reference in its entirety the US
`
`Application Serial No. 12/601,275 entitled "VLSI LAYOUTS OF FULLY
`
`20
`
`CONNECTED GENERALIZED NETWORKS"by Venkat Kondaassigned to the same
`
`assignee as the current application, filed November 22, 2009, and the U.S. Provisional
`
`Patent Application Serial No. 60/940, 394 entitled "VLSI LAYOUTS OF FULLY
`
`CONNECTED GENERALIZED NETWORKS"by Venkat Konda assigned to the same
`
`assignee as the current application, filed May 25, 2007.
`
`25
`
`This application is related to and incorporates by reference in its entirety the PCT
`
`Application Serial No. PCT/US10/52984 entitled "VLSI LAYOUTS OF FULLY
`
`CONNECTED GENERALIZED AND PYRAMID NETWORKSWITH LOCALITY
`
`EXPLOITATION"by Venkat Konda assigned to the same assignee as the current
`
`application, filed October 16, 2010, the U.S. Provisional Patent Application Serial No.
`
`-2-
`
`Page 2 of 102
`
`Page 2 of 102
`
`
`
`S-0050 PCT
`
`61/252, 603 entitled "VLSI LAYOUTS OF FULLY CONNECTED NETWORKS
`
`WITH LOCALITY EXPLOITATION"by Venkat Konda assigned to the same assignee
`
`as the current application, filed October 16, 2009, and the U.S. Provisional Patent
`
`Application Serial No. 61/252, 609 entitled "VLSI LAYOUTS OF FULLY
`
`CONNECTED GENERALIZED AND PYRAMID NETWORKS"by Venkat Konda
`
`assigned to the same assignee as the current application, filed October 16, 2009.
`
`BACKGROUNDOF INVENTION
`
`Multi-stage interconnection networks such as Benes networks and butterfly fat
`
`10
`
`tree networks are widely useful in telecommunications, parallel and distributed
`
`computing. However VLSI layouts, known in the prior art, of these interconnection
`
`networks in an integrated circuit are inefficient and complicated.
`
`Other multi-stage interconnection networks including butterfly fat tree networks,
`
`Banyan networks, Batcher-Banyan networks, Baseline networks, Delta networks, Omega
`
`15
`
`networks and Flip networks have been widely studied particularly for self-routing packet
`
`switching applications. Also Benes Networks with radix of two have been widely studied
`
`and it is knownthat Benes Networks of radix two are shownto be built with back to back
`
`baseline networks which are rearrangeably nonblocking for unicast connections.
`
`The most commonly used VLSI layout in an integrated circuit is based on a two-
`
`20
`
`dimensional grid model comprising only horizontal and vertical tracks. An intuitive
`
`interconnection network that utilizes two-dimensional grid model is 2D Mesh Network
`
`and its variations such as segmented mesh networks. Hence routing networks used in
`
`VLSI layouts are typically 2D mesh networksandits variations. However Mesh
`
`Networks require large scale cross points typically with a growth rate of O(N’) where N
`
`25
`
`is the number of computing elements, ports, or logic elements depending on the
`
`application.
`
`Multi-stage interconnection network with a growth rate of O(N xlog N) requires
`
`significantly small numberof cross points. U.S. Patent 6,185,220 entitled “Grid Layouts
`
`-3-
`
`Page 3 of 102
`
`Page 3 of 102
`
`
`
`S-0050 PCT
`
`of Switching and Sorting Networks” granted to Muthukrishnanet al. describes a VLSI
`
`layout using existing VLSI grid model for Benes and Butterfly networks. U.S. Patent
`
`6,940,308 entitled “Interconnection Network for a Field Programmable Gate Array”
`
`granted to Wong describes a VLSI layout where switches belonging to lowerstage of
`
`Benes Network are laid out close to the logic cells and switches belonging to higher
`
`stages are laid out towards the center of the layout.
`
`Dueto the inefficient and in some cases impractical VLSI layout of Benes and
`
`butterfly fat tree networks on a semiconductor chip, today mesh networks and segmented
`
`mesh networks are widely used in the practical applications such as field programmable
`
`gate arrays (FPGAs), programmable logic devices (PLDs), and parallel computing
`
`interconnects. The prior art VLSI layouts of Benes and butterfly fat tree networks and
`
`VLSI layouts of mesh networks and segmented mesh networks require large area to
`
`implement the switches on the chip, large number of wires, longer wires, with increased
`
`power consumption, increased latency of the signals which effect the maximum clock
`
`15
`
`speed of operation. Some networks may not even be implementedpractically on a chip
`
`due to the lack of efficient layouts.
`
`Fully connected Benes and butterfly fat tree networks are an over kill for certain
`
`practical routing applications and need to be optimized to significantly improvearea,
`
`powerand performance of the routing network.
`
`20
`
`SUMMARYOF INVENTION
`
`Significantly optimized multi-stage networks, useful in wide target applications,
`
`with VLSI layouts (or floor plans) using only horizontal and vertical links to route large
`
`scale sub-integrated circuit blocks having inlet and outlet links, and laid out in an
`
`25
`
`integrated circuit device in a two-dimensional grid arrangement of blocks, (for example
`
`in an FPGA wherethe sub-integrated circuit blocks are Lookup Tables, or memory
`
`blocks, or DSP blocks) are presented. The optimized multi-stage networksin cach block
`
`employ several rings of stages of switches with inlet and outlet links of sub-integrated
`
`Page 4 of 102
`
`Page 4 of 102
`
`
`
`S-0050 PCT
`
`circuit blocks connecting to rings from either left-hand side only, or from right-hand side
`
`only, or from both left-hand side and right-handside.
`
`The optimized multi-stage networks with their VLSI layouts employ shuffle
`
`exchangelinks where outlet links of cross links from switches in a stage of a ring in one
`
`sub-integrated circuit block are connectedto either inlet links of switches in the another
`
`stage of a ring in another sub-integrated circuit block or inlet links of switches in the
`
`another stage of a ring in the same sub-integrated circuit block so that said cross links are
`
`either vertical links or horizontal and vice versa.
`
`The VLSI layouts exploit spatial locality so that different sub-integrated circuit
`
`10
`
`blocks that are spatially nearer are connected with shorter shuffle exchange links
`
`compared to the shuffle exchange links between spatially farther sub-integrated circuit
`
`blocks. The optimized multi-stage networks provide high routability for broadcast,
`
`unicast and multicast connections, yet with the benefits of significantly lower cross points
`
`hence smaller area, lower signal latency, lower power and with significant fast
`
`15
`
`compilation or routing time.
`
`The optimized multi-stage networks V.,,,(N,,N.,d,5) & Vo_com(N1,N>,4,5)
`
`according to the current invention inherit the properties of one or more, in addition to
`
`additional properties, generalized multi-stage and pyramid networks V(N,,N,,d,5) &
`
`V,(N,,N,,d,s), generalized folded multi-stage and pyramid networks
`
`20
`
`Ving (NisNy,d,8) & Vioig_p(Ni,N>,d,8), generalized butterfly fat tree and butterfly fat
`
`pyramid networks V,,(N,,N,,d,5) & Vy,(N,,N>,.d,8), generalized multi-link multi-
`
`stage and pyramid networks V_,,,(N,,N>,d,5) & Vin
`
`link -p
`
`,(N,,N>,d,5), generalized
`
`folded multi-link multi-stage and pyramid networks Vijinn Ni,N2,d,8) &
`
`Vfold-miink-p (N,N, d,s), generalized multi-link butterfly fat tree and butterfly fat
`
`25
`
`pyramid networks Vijin (Ni.N2.4.8) & Vintenop (Ni,N>,d,5), generalized hypercube
`
`networks V,,.,.(N,,N,,d,5), and generalized cube connected cycles networks
`
`Veeco (N,,N,,d,8) for s = 1,2,3 or any numberin general.
`
`Page 5 of 102
`
`Page 5 of 102
`
`
`
`8-0050 PCT
`
`BRIEF DESCRIPTION OF DRAWINGS
`
`FIG. 1A is a diagram 100A of an exemplary partial multi-stage hierarchical
`
`network corresponding to one block with 4 inputs and 2 outputs of a computational block
`
`connecting only from left-hand side, to route practical applications such as FPGArouting
`
`of hardware designs in accordance with the invention.
`
`FIG. 1B is a diagram 100B of an exemplary partial multi-stage hierarchical
`
`network corresponding to one block with 8 inputs and 4 outputs of a computational block
`
`connecting from both left-hand side and right-hand side, to route practical applications
`
`such as FPGA routing of hardware designs in accordance with the invention.
`
`10
`
`FIG. 2A is a diagram 200A, in an embodimentof, a stage in a ring of multi-stage
`
`hierarchical network correspondingto one block.
`
`FIG. 2B is a diagram 200B, in an embodimentof, a stage in a ring of multi-stage
`
`hierarchical network correspondingto one block.
`
`FIG. 2C is a diagram 200C, in an embodimentof, a stage in a ring of multi-stage
`
`15
`
`hierarchical network correspondingto one block.
`
`FIG. 2D is a diagram 200D,in an embodimentof, a stage in a ring of multi-stage
`
`hierarchical network correspondingto one block.
`
`FIG. 2E is a diagram 200E, in an embodimentof, a stage in a ring of multi-stage
`
`hierarchical network correspondingto one block.
`
`20
`
`FIG. 3A is a diagram 300A, in an embodimentof, all the connections between
`
`two successive stages of two different rings in the same block or in two different blocks
`
`of a multi-stage hicrarchical network.
`
`FIG.3B is a diagram 300B, in an embodimentof, all the connections between two
`
`successive stages of two different rings in the same block or in two different blocks of a
`
`25
`
`multi-stage hierarchical network.
`
`Page 6 of 102
`
`Page 6 of 102
`
`
`
`S-0050 PCT
`
`FIG.4 is a diagram 400, in an embodimentof, all the connections between two
`
`successive stages of two different rings in the same block or in two different blocks of a
`
`multi-stage hierarchical network.
`
`FIG. 5 is a diagram 500, in an embodiment of, all the connections between two
`
`successive stages of two different rings in the same block or in two different blocks of a
`
`multi-stage hierarchical network.
`
`FIG. 6 is a diagram 600, in an embodiment of, all the connections between two
`
`successive stages of two different rings in the same block or in two different blocks ofa
`
`multi-stage hierarchical network.
`
`10
`
`FIG. 7 is a diagram 700,
`
`is an embodiment of hop wire connection chart
`
`corresponding to a block of multi-stage hierarchical network.
`
`FIG. 8 is a diagram 800, is an embodiment of 2D-grid of blocks with each block
`
`corresponding to a partial multi-stage network to implement an exemplary multi-stage
`
`hierarchical network, in accordance with the invention.
`
`15
`
`FIG. 9A is a diagram 900A, in an embodimentof, a stage in a ring of multi-stage
`
`hierarchical network corresponding to one block, with delay optimizations.
`
`FIG. 9B is a diagram 900B,in an embodimentof, a stage in a ring of multi-stage
`
`hierarchical network correspondingto one block, with delay optimizations.
`
`FIG. 9C is a diagram 900C, in an embodimentof, a stage in a ring of multi-stage
`
`20
`
`hierarchical network correspondingto one block, with delay optimizations.
`
`FIG. 9D is a diagram 900D,in an embodiment of, a stage in a ring of multi-stage
`
`hicrarchical network corresponding to onc block, with dclay optimizations.
`
`FIG. 9E is a diagram 900E, in an embodimentof, a stage in a ring of multi-stage
`
`hierarchical network corresponding to one block, with delay optimizations.
`
`25
`
`FIG. 10A is a diagram 1000A, in an embodiment of, a stage in a ring of multi-
`
`stage hierarchical network corresponding to one block, with delay optimizations.
`
`-7-
`
`Page 7 of 102
`
`Page 7 of 102
`
`
`
`S-0050 PCT
`
`FIG. 10B is a diagram 1000B, in an embodiment of, a stage in a ring of multi-
`
`stage hierarchical network correspondingto one block, with delay optimizations.
`
`FIG. 10C is a diagram 1000C, in an embodiment of, a stage in a ring of multi-
`
`stage hierarchical network correspondingto one block, with delay optimizations.
`
`FIG. 10D is a diagram 1000D, in an embodimentof, a stage in a ring of multi-
`
`stage hierarchical network corresponding to one block, with delay optimizations.
`
`FIG. 10E is a diagram 1000E, in an embodiment of, a stage in a ring of multi-
`
`stage hierarchical network corresponding to one block, with delay optimizations.
`
`FIG. 10F is a diagram 1000F, in an embodimentof, a stage in a ring of multi-
`
`10
`
`stage hierarchical network corresponding to one block, with delay optimizations.
`
`FIG. 11A is a diagram 1100A, in an embodiment of, a stage in a ring of multi-
`
`stage hierarchical network corresponding to one block, with delay optimizations.
`
`FIG. 11B is a diagram 1100B, in an embodiment of, a stage in a ring of multi-
`
`stage hierarchical network corresponding to one block, with delay optimizations.
`
`15
`
`FIG. 11C is a diagram 1100C, in an cmbodiment of, a stage in a ring of multi-
`
`stage hierarchical network correspondingto one block, with delay optimizations.
`
`FIG. 12 is a diagram 1200, in an embodiment, all the connections between two
`
`successive stages of two different rings in the same block or in two different blocks of a
`
`multi-stage hierarchical network with delay optimizations.
`
`20
`
`FIG. 13 is a diagram 1300, in one embodiment, all the connections between two
`
`successive stages of two different rings in the same block or in two different blocks of a
`
`multi-stage hierarchical network with delay optimizations.
`
`FIG. 14 is a diagram 1400, in an embodimentof, all the connections between two
`
`successive stages of two different rings in the same block or in two different blocks of a
`
`25
`
`multi-stage hierarchical network with delay optimizations.
`
`Page 8 of 102
`
`Page 8 of 102
`
`
`
`S-0050 PCT
`
`FIG. 15 is a diagram 1500, in an embodimentof, all the connections between two
`
`successive stages of two different rings in the same block or in two different blocks of a
`
`multi-stage hierarchical network with delay optimizations.
`
`FIG. 16A1 is a diagram 1600A1 of an exemplary prior art implementation of a
`
`two by two switch; FIG. 16A2 is a diagram 1600A2 for programmable integrated circuit
`
`prior art implementation of the diagram 1600A1 of FIG. 16A1; FIG. 16A3 is a diagram
`
`1600A3 for one-time programmable integrated circuit prior art implementation of the
`
`diagram 1600A1 of FIG. 16A1; FIG. 16A4 is a diagram 1600A4 for intcgrated circuit
`
`placement and route implementation of the diagram 1600A1 of FIG. 16A1.
`
`10
`
`DETAILED DESCRIPTION OF THE INVENTION
`
`Fully connected multi-stage hierarchical networksare an overkill in every
`
`dimension such as area, power, and performancefor certain practical routing applications
`
`and need to be optimized to significantly improve savings in area, power and
`
`performanceofthe routing network. The present invention discloses several
`
`15
`
`embodiments ofthe optimized multi-stage hierarchical networks for practical routing
`
`applications along with their VLSI layout (floor plan) feasibility and simplicity.
`
`The multi-stage hierarchical networks considered for optimization in the current
`
`invention include: generalized multi-stage networks V(N,,N,,d,s), generalized folded
`
`multi-stage networks V’,,,(N,,N,,d,8), generalized butterfly fat tree networks
`
`20
`
`lin
`Vien (N,,N,,d,8), generalized multi-link multi-stage networks V_.,(V,,N,4,5),
`
`generalized folded multi-link multi-stage networks Vigmine (Ni, N2,d,5), generalized
`
`multi-link butterfly fat tree networks Viving5¢(Ni,N>,d,5) , generalized hypercube
`
`networks /,.,,.(N,,N,,d,s), and generalized cube connected cycles networks
`
`V_.(N,,N,,d,8) for s = 1,2,3 or any numberin gencral. Altcrnatively the optimized
`
`25
`
`multi-stage hierarchical networks disclosed in this invention inherit the properties of one
`
`or more of these networks, in addition to additional properties that may not be exhibited
`
`these networks.
`
`Page 9 of 102
`
`Page 9 of 102
`
`
`
`S-0050 PCT
`
`The optimized multi-stage hierarchical networks disclosed are applicable for
`
`practical routing applications, with several goals such as: 1) all the signals in the design
`
`starting from an inlet link of the network to an outlet link of the network need to be setup
`
`without blocking. These signals may consist of broadcast, unicast and multicast
`
`connections; Each routing resource may needto be used by only one signal or
`
`connection; 2) physical area consumed bythe routing network to setup all the signals
`
`needs to be small; 3) power consumption of the network needs to be small, after the
`
`signals are setup. Power may be both static power and dynamic power; 4) Delay of the
`
`signal or a connection needs to be small afterit is setup through a path using several
`
`10
`
`routing resources in the path. The smaller the delay of the connections will lead to faster
`
`performanceofthe design. Typically delay of the critical connections determines the
`
`performanceof the design on a given network; 5) Designs need to be not only routed
`
`through the network (1.e., all the signals need to be setup from inlet links of the network
`
`to the outlct links of the network.), but also the routing necds to be in faster time using
`
`15
`
`efficient routing algorithms; 6) Efficient VLSI layout of the networkis also critical and
`
`can greatly influence all the other parameters including the area taken up by the network
`
`on the chip, total number of wires, length of the wires, delay through the signal paths and
`
`hence the maximum clock speed of operation.
`
`The different varieties of multi-stage networks described in various embodiments
`
`20
`
`in the current invention have not been implemented previously on the semiconductor
`
`chips. The practical application of these networks includes Field Programmable Gate
`
`Array (FPGA) chips. Current commercial FPGA products such as Xilinx’s Vertex,
`
`Altera’s Stratix, Lattice’s ECPx implement island-style architecture using mesh and
`
`segmented mesh routing interconnects using either full crossbars or sparse crossbars.
`
`25
`
`These routing interconnects consumelargesilicon area for crosspoints, long wires, large
`
`signal propagation delay and hence consumelot of power.
`
`The current invention discloses the optimization of multi-stage hierarchical
`
`networks for practical routing applications of numerous types of multi-stage networks.
`
`The optimizations disclosed in the current invention are applicable to including the
`
`-10-
`
`Page 10 of 102
`
`Page 10 of 102
`
`
`
`S-0050 PCT
`
`numerous generalized multi-stage networks disclosed in the following patent
`
`applications:
`
`1) Strictly and rearrangeably nonblocking for arbitrary fan-out multicast and
`
`unicast for generalized multi-stage networks V(N,,N.,,d,s) with numerous connection
`
`topologies and the scheduling methods are described in detail in the US Application
`
`Serial No. 12/530,207 that is incorporated by reference above.
`
`2) Strictly and rearrangeably nonblocking for arbitrary fan-out multicast and
`
`unicast for generalized butterfly fat tree networks Vien (N,,N,.d,8) with numerous
`
`connection topologies and the scheduling methodsare described in detail in the US
`
`Application Serial No. 12/601,273 that is incorporated by reference above.
`
`3) Rearrangeably nonblocking for arbitrary fan-out multicast and unicast, and
`
`strictly nonblocking for unicast for generalized multi-link multi-stage networks
`
`Vting (N|,N,,d,8) and generalized folded multi-link multi-stage networks
`
`Ve‘01id_mline AY1, N,,d,5) with numerous connection topologies and the scheduling methods
`
`15
`
`are described in detail in the US Application Serial No. 12/ 601,274 that is incorporated
`
`by reference above.
`
`4) Strictly and rearrangeably nonblocking for arbitrary fan-out multicast and
`
`unicast for generalized multi-link butterfly fat tree networks V45¢(N,,N2,d,5) with
`
`numcrous conncction topologics and the scheduling methodsare described in detail in the
`
`20
`
`US Application Serial No. 12/601,273 that is incorporated by reference above.
`
`5) Strictly and rearrangeably nonblocking for arbitrary fan-out multicast and
`
`unicast for generalized folded multi-stage networks V,,,(N,,N,.,d,5) with numerous
`
`connection topologies and the scheduling methods are described in detail in the US
`
`Application Serial No. 12/601,274 that is incorporated by reference above.
`
`25
`
`6) Strictly nonblocking for arbitrary fan-out multicast and unicast for gencralized
`
`multi-link multi-stage networks V,,,,,(N,,N>,d,s) and generalized folded multi-link
`
`-11-
`
`Page 11 of 102
`
`Page 11 of 102
`
`
`
`S-0050 PCT
`
`multi-stage networks Viai4mine (N;,N>,d,5) with numerous connection topologies and
`
`the scheduling methods are described in detail in the US Application Serial No.
`
`12/601,274 that is incorporated by reference above.
`
`7) VLSI layouts of numerous types of multi-stage networks are described in the
`
`US Application Serial No. 12/601,275 entitled "VLSI LAYOUTS OF FULLY
`
`CONNECTED NETWORKS"that is incorporated by reference above.
`
`8) VLSI layouts of numerous types of multi-stage networks are described in the
`
`PCT Application Serial No. PCT/US10/52984 entitled "VLSI LAYOUTS OF FULLY
`
`CONNECTED GENERALIZED AND PYRAMID NETWORKSWITH LOCALITY
`
`10
`
`EXPLOITATION"that is incorporated by reference above.
`
`In addition the optimization with the VLSI layouts disclosed in the current
`
`invention are also applicable to generalized multi-stage pyramid networks
`
`V(N,,N,,d,5), generalized folded multi-stage pyramid networksV,old—p (N,,N,,d,8),
`
`gencralized butterfly fat pyramid networks V,,(N,,N,,d,5), generalized multi-link
`
`15
`
`multi-stage pyramid networks V,,
`
`link—p
`
`(N,,N,,d,8), generalized folded multi-link multi-
`
`stage pyramid networks Vii.mintp(Ni,N2,d,8), generalized multi-link butterfly fat
`
`pyramid networks V,,,..-5,(N,,N2.d.5), generalized hypercube networks
`
`VcubeN,,N,,d,8) and generalized cube connected cycles networks V.¢-(N,,N,,d,8)
`
`for s = 1,2,3 or any numberin general.
`
`20
`
`Finally the current invention discloses the optimizations and VLSI layouts of
`
`multi-stage hierarchical networks V,,,,(N,,.N.,,d,s8) and the optimizations and VLSI
`
`layouts of multi-stage hierarchical networks V,_.5,,(N,,N.,d, 8) for practical routing
`
`applications (particularly to set up broadcast, unicast and multicast connections), where
`
`“Comb”denotes the combination of and ‘“D-Comb”denotes the delay optimized
`
`25
`
`combination of any of the generalized multi-stage networks ’(N,,N,,d,s), generalized
`
`folded multi-stage networks V’,,,(N,,N.,,d,5), generalized butterfly fat tree networks
`
`-12-
`
`Page 12 of 102
`
`Page 12 of 102
`
`
`
`S-0050 PCT
`
`Vin (N,,N>,d,8), generalized multi-link multi-stage networks V,,,4(N,,N 4,5),
`
`generalized folded multi-link multi-stage networks Vi4_tine (N,,N>,d4,8), generalized
`
`multi-link butterfly fat tree networks Vj.,44,(N,,N>,d,5), generalized multi-stage
`
`pyramid networks V,(N,,N,,d,8), generalized folded multi-stage pyramid networks
`
`Vtap (N\,N>,4.8), goncralized butterfly fat pyramid networks V,, (N,,N,,d,5),
`
`generalized multi-link multi-stage pyramid networksV,,,,._,(N,,N»,d,8), generalized
`
`folded multi-link multi-stage pyramid networks Vigmuine—p(N1,No,4,5), generalized
`
`multi-link butterfly fat pyramid networks V,,,,.45, (Ni,N.,d,5), generalized hypercube
`
`networks V,.,,,.(N,,N,,d,5), and generalized cube connected cycles networks
`
`10
`
`V...CN,,N,,d,5) for s = 1,2,3 or any numberin general.
`
`Multi-stage hierarchical network V,.,,,,(N,,N,,d,8):
`
`Referring to diagram 100A in FIG. 1A, in one embodiment, an exemplary partial
`
`multi-stage hierarchical network V.,,(N,,N,,d,s) where N; = 200; No = 400; d = 2;
`
`and s = 1 corresponding to one computational block, with each computational block
`
`15
`
`having 4 inlet links namely I1, I2, 13, and 14; and 2 outlet links namely O1 and 02. And
`
`for each computational block the corresponding partial multi-stage hierarchical network
`
`Veom(N,,N,,d,8) 100A consists of two rings 110 and 120, where ring 110 consists of
`
`“m+1” stages namely (ring 1, stage 0), (ring 1, stage 1), ... (ring 1, stage “m-1”), and
`
`(ring 1, stage “m”), and ring 120 consists of “n+1” stages namely (ring 2, stage 0), (ring
`
`20
`
`2, stage 1), ... (ring 2, stage “n-1”?), and (ring 2, stage “n’”’), where “m”and “‘n”are
`
`positive integers.
`
`Ring 110 hasinlet links Ri(1,1) and Ri(1,2), and has outlet links Bo(1,1) and
`
`Bo(1,2). Ring 120 has inlet links Fi(2,1) and Fi(2,2), and outlet links Bo(2,1) and
`
`Bo(2,2). And hencethe partial multi-stage hierarchical network V,,(N,,N>,d,5) 100A
`
`25
`
`consists of 4 inlet links and 4 outlet links corresponding to the two rings 110 and 120.
`
`Outlet link O1 of the computational block is connected to inlet link Ri(1,1) of ring 110
`
`and also inlet link of Fi(2,1) of rmg 120. Similarly outlet link O2 of the computational
`
`-13-
`
`Page 13 of 102
`
`Page 13 of 102
`
`
`
`S-0050 PCT
`
`block is connectedto inlet link Ri(1,2) of Ring 110 andalso inlet link of Fi(2,2) of Ring
`
`120. And outlet link Bo(1,1) of Ring 110 is connected to inlet link I] of the
`
`computational block. Outlet link Bo(1,2) of Ring 110 is connected to inlet link 12 of the
`
`computational block. Similarly outlet link Bo(2,1) of Ring 120 is connectedto inlet link
`
`13 of the computational block. Outlet link Bo(2,2) of Ring 120 is connectedto inletlink
`
`14 of the computational block. Since in this embodiment outlet link O1 of the
`
`computational block is connected to both inlet link Ri(1,1) of ring 110 and inlet link
`
`Fi(2,1) of ring 120; and outlet link O2 of the computational block is connected to both
`
`inlet link Ri(1,2) of ring 110 andinlet link Fi(2,2) of ring 120, the partial multi-stage
`
`10
`
`hierarchical network V,,,,(N,,N>,d,5) LOOA consists of 2 inlet links and 4 outlet links.
`
`The two dimensional grid 800 in FIG.8 illustrates an exemplary arrangement of
`
`100 blocks arranged in 10 rows and 10 columns, in an embodiment. Each row of 2D-grid
`
`consisting of 10 block numbers namely the first row consists of the blocks (1,1), (1,2),
`
`(1,3), ..., 1,9), and (1,10). The second row consists of the blocks (2,1), (2,2), (2,3), ...,
`
`15
`
`(2,9), and (2,10). Similarly 2D-grid 800 consists of 10 rows of each with 10 blocks and
`
`finally the tenth row consists of the blocks (10,1), (10,2), (10,3), ... , (10,9), and (10,10).
`
`Each block of 2D-grid 800, in one embodiment, is part of the die area of a semiconductor
`
`integrated circuit, so that the complete 2D-grid 800 of 100 blocks represents the complete
`
`die of the semiconductor integrated circuit. In one embodiment, each block of 2D-grid
`
`20
`
`800 consists of one of the partial multi-stage hierarchical network
`
`Veo(N,,N>,d,8) 100A with 2 inlet links and 4 outlet links and the corresponding
`
`computational block with 4 inlet links and 2 outlet links. For example block (1,1) of 2D-
`
`grid 800 consists of one of the partial multi-stage hierarchical network
`
`Vounp(',,N,,d,8) 100A with 2 inlet links and 4 outlet links and the corresponding
`
`25
`
`computational block with 4 inlet links and 2 outlet links. Similarly each of the 100 blocks
`
`of 2D-grid 800 has a separate partial multi-stage hierarchical network
`
`Veome (N,,N>,d,8) 100A with 2 inlet links and 4 outlet links and the corresponding
`
`computational block with 4 inlet links and 2 outlet links. Hence the complete multi-stage
`
`hierarchical network ,,,,CN,,N,,d,8) corresponding to 2D-grid 800 has N; = 200 inlet
`
`30
`
`links and N> = 400 outlet links. And there are 100 computational blocks each one
`
`-|4-
`
`Page 14 of 102
`
`Page 14 of 102
`
`
`
`S-0050 PCT
`
`corresponding to one of the blocks with each computational block having4inlet links and
`
`2 outlet links. Also the 2D-grid 800 is organized in the fourth quadrant of the 2D-Plane.
`
`In other embodiments the 2D-grid 800 may be organizedas eitherfirst quadrant, or
`
`second quadrantor third quadrant of the 2D-Plane.
`
`Referring to partial multi-stage hierarchical network V.,,,(N,,N>,d,5) 100A in
`
`FIG. 1A,the stage (ring 1, stage 0) consists of 4 inputs namely Ri(1,1), Ri(1,2), Ui(1,1),
`
`and Ui(1,2); and 4 outputs Bo(1,1), Bo(1,2), Fo(1,1), and Fo(1,2). The stage (ring 1, stage
`
`0) also consists of eight 2:1 multiplexers (A multiplexer is hereinafter called a “mux’”)
`
`namely R(1,1), RC,2), F,1), F.,2), 01,1), 00,2), BU, 1), and B,2). The 2:1 Mux
`
`10
`
`R(1,1) has two inputs namely Ri(1,1) and Bo(1,1) and has one output Ro(1,1). The 2:1
`
`Mux R(1,2) has two inputs namely Ri(1,2) and Bo(1,2) and has one output Ro(1,2). The
`
`2:1 Mux F(1,1) has two inputs namely Ro(1,1) and Ro(1,2) and has one output Fo(1,1).
`
`The 2:1 Mux F(1,2) has two inputs namely Ro(1,1) and Ro(1,2) and has one output
`
`Fo(1,2).
`
`The 2:1 Mux U(1,1) has two inputs namely Ui(1,1) and Fo(1,1) and has one
`
`output Uo(1,1). The 2:1 Mux U(1,2) has two inputs namely Ui(1,2) and Fo(1,2) and has
`
`one output Uo(1,2). The 2:1 Mux B(1,1) has two inputs namely Uo(1,1) and Uo(1,2) and
`
`has one output Bo(1,1). The 2:1 Mux B(1,2) has two inputs namely Uo(1,1) and Uo(1,2)
`
`and has one output Bo(1,2).
`
`20
`
`The stage (ring 1, stage 1) consists of 4 inputs namcly Ri(1,3), Ri(1,4), Uid1,3),
`
`and Ui(1,4); and 4 outputs Bo(1,3), Bo(1,4), Fo(1,3), and Fo(1,4). The stage (ring 1, stage
`
`1) also consists of eight 2:1 Muxes namely R(1,3), RC,4), FC,3), FU.4), Ud,3), Ud,4),
`
`BC1,3), and B(1,4). The 2:1 Mux R(1,3) has two inputs namely Ri(1,3) and Bo(1,3) and
`
`has one output Ro(1,3). The 2:1 Mux R(1,4) has two inputs namely Ri(1,4) and Bo(1,4)
`
`25
`
`and has one output Ro(1,4). The 2:1 Mux F(1,3) has two inputs namely Ro(1,3) and
`
`Ro(1,4) and has one output Fo(1,3). The 2:1 Mux F(1,4) has two inputs namely Ro(1,3)
`
`and Ro(1,4) and has one output Fo(1,4).
`
`The 2:1 Mux U(1,3) has two inputs namely Ui(1,3) and Fo(1,3) and has one
`
`output Uo(1,3). The 2:1 Mux UC1,4) has two inputs namely Ui(1,4) and Fo(1,4) and has
`
`-15-
`
`Page 15 of 102
`
`Page 15 of 102
`
`
`
`S-0050 PCT
`
`one output Uo(1,4). The 2:1 Mux B(1,3) has two inputs namely Uo(1,3) and Uo(1,4) and
`
`has one output Bo(1,3). The 2:1 Mux B(1,4) has two inputs namely Uo(1,3) and Uo(1,4)
`
`and has one output Bo(1,4).
`
`The output Fo(1,1) of the stage (ring 1, stage 0) is connected to the input Ri(1,3)
`
`of the stage (ring 1, stage 1) whichis call