`
`OPTIMIZATION OF MULTI-STAGE HIERARCHICAL NETWORKS FOR
`
`PRACTICAL ROUTING APPLICATIONS
`
`Venkat Konda
`
`5
`
`CROSS REFERENCE TO RELATED APPLICATIONS
`
`This application is Continuation In Part Application to and claims priority of the
`
`US. Provisional Patent Application Serial No. 61/ 531,615 entitled "OPTIMIZATION
`
`OF MULTI—STAGE HIERARCHICAL NETWORKS FOR PRACTICAL ROUTING
`
`APPLICATIONS" by Venkat Konda assigned to the same assignee as the current
`
`10
`
`application, filed September 7, 2011.
`
`This application is related to and incorporates by reference in its entirety the US
`
`Application Serial No. 12/ 530,207 entitled "FULLY CONNECTED GENERALIZED
`
`MULTI—STAGE NETWORKS" by Venkat Konda assigned to the same assignee as the
`
`current application, filed September 6, 2009, the US. Provisional Patent Application
`
`15
`
`Serial No. 60/905,526 entitled "LARGE SCALE CROSSPOINT REDUCTION WITH
`
`NONBLOCKING UNICAST & MULTICAST IN ARBITRARILY LARGE MULTI-
`
`STAGE NETWORKS" by Venkat Konda assigned to the same assignee as the current
`
`application, filed March 6, 2007, and the US. Provisional Patent Application Serial No.
`
`60/940, 383 entitled "FULLY CONNECTED GENERALIZED MULTl-STAGE
`
`20
`
`NETWORKS" by Venkat Konda assigned to the same assignee as the current application,
`
`filed May 25, 2007.
`
`This application is related to and incorporates by reference in its entirety the US
`
`Application Serial No. 12/601,273 entitled "FULLY CONNECTED GENERALIZED
`
`BUTTERFLY FAT TREE NETWORKS" by Venkat Konda assigned to the same
`
`25
`
`assignee as the current application, filed November 22, 2009, the US. Provisional Patent
`
`Application Serial No. 60/940, 387 entitled "FULLY CONNECTED GENERALIZED
`
`BUTTERFLY FAT TREE NETWORKS" by Venkat Konda assigned to the same
`
`assignee as the current application, filed May 25, 2007, and the US Provisional Patent
`
`-1-
`
`Page 1 of 102
`
`FLEX LOGIX EXHIBIT 1006
`
`Page 1 of 102
`
`FLEX LOGIX EXHIBIT 1006
`
`
`
`S—OOSO PCT
`
`Application Serial No. 60 / 940, 390 entitled "FULLY CONNECTED GENERALIZED
`
`MULTI—LINK BUTTERFLY FAT TREE NETWORKS" by Venkat Konda assigned to
`
`the same assignee as the current application, filed May 25, 2007
`
`This application is related to and incorporates by reference in its entirety the US
`
`Application Serial No. 12/ 601,274 entitled "FULLY CONNECTED GENERALIZED
`
`MULTI—LINK MULTI—STAGE NETWORKS" by Venkat Konda assigned to the same
`
`assignee as the current application, filed November 22, 2009, the US. Provisional Patent
`
`Application Serial No. 60 / 940, 389 entitled "FULLY CONNECTED GENERALIZED
`
`REARRANGEABLY NONBLOCKING MULTI-LINK MULTI-STAGE NETWORKS"
`
`10
`
`by Venkat Konda assigned to the same assignee as the current application, filed May 25,
`
`2007, the US. Provisional Patent Application Serial No. 60/ 940, 391 entitled "FULLY
`
`CONNECTED GENERALIZED FOLDED MULTI—STAGE NETWORKS" by Venkat
`
`Konda assigned to the same assignee as the current application, filed May 25, 2007 and
`
`the US. Provisional Patent Application Serial No. 60/940, 392 entitled "FULLY
`
`15
`
`CONNECTED GENERALIZED STRICTLY NONBLOCKING MULTI—LINK MULTI-
`
`STAGE NETWORKS" by Venkat Konda assigned to the same assignee as the current
`
`application, filed May 25, 2007.
`
`20
`
`25
`
`This application is related to and incorporates by reference in its entirety the US
`
`Application Serial No. 12/ 601,275 entitled "VLSI LAYOUTS OF FULLY
`
`CONNECTED GENERALIZED NETWORKS" by Venkat Konda assigned to the same
`
`assignee as the current application, filed November 22, 2009, and the US. Provisional
`
`Patent Application Serial No. 60 / 940, 394 entitled "VLSI LAYOUTS OF FULLY
`
`CONNECTED GENERALIZED NETWORKS" by Venkat Konda assigned to the same
`
`assignee as the current application, filed May 25, 2007.
`
`This application is related to and incorporates by reference in its entirety the PCT
`
`Application Serial No. PCT/ U810 / 52984 entitled "VLSI LAYOUTS OF FULLY
`
`CONNECTED GENERALIZED AND PYRAMID NETWORKS WITH LOCALITY
`
`EXPLOITATION" by Venkat Konda assigned to the same assignee as the current
`
`application, filed October 16, 2010, the US. Provisional Patent Application Serial No.
`
`-2-
`
`Page 2 of 102
`
`Page 2 of 102
`
`
`
`S—OOSO PCT
`
`61/ 252, 603 entitled "VLSI LAYOUTS OF FULLY CONNECTED NETWORKS
`
`WITH LOCALITY EXPLOITATION" by Vcnkat Konda assigncd to thc samc assigncc
`
`as the current application, filed October 16, 2009, and the US. Provisional Patent
`
`Application Serial No. 61 / 252, 609 entitled "VLSI LAYOUTS OF FULLY
`
`CONNECTED GENERALIZED AND PYRAMID NETWORKS" by Venkat Konda
`
`assigned to the same assignee as the current application, filed October 16, 2009.
`
`BACKGROUND OF INVENTION
`
`10
`
`15
`
`20
`
`25
`
`Multi-stage interconnection networks such as Benes networks and butterfly fat
`
`tree networks are widely useful in telecommunications, parallel and distributed
`
`computing. However VLSI layouts, known in the prior art, of these interconnection
`
`networks in an integrated circuit are inefficient and complicated.
`
`Other multi—stage interconnection networks including butterfly fat tree networks,
`
`Banyan networks, Batcher—Banyan networks, Baseline networks, Delta networks, Omega
`
`networks and Flip networks have been widely studied particularly for self—routing packet
`
`switching applications. Also Benes Networks with radix of two have been widely studied
`
`and it is known that Benes Networks of radix two are shown to be built with back to back
`
`baseline networks which are rearrangeably nonblocking for unicast connections.
`
`The most commonly used VLSI layout in an integrated circuit is based on a two—
`
`dimensional grid model comprising only horizontal and vertical tracks. An intuitive
`
`interconnection network that utilizes two—dimensional grid model is 2D Mesh Network
`
`and its variations such as segmented mesh networks. Hence routing networks used in
`
`VLSI layouts are typically 2D mesh networks and its variations. However Mesh
`
`Networks require large scale cross points typically with a growth rate of 0(N2) where N
`
`is the number of computing elements, ports, or logic elements depending on the
`
`application.
`
`Multi—stage interconnection network with a growth rate of 0(N >< log N) requires
`
`significantly small number of cross points. US. Patent 6,185,220 entitled “Grid Layouts
`
`-3-
`
`Page 3 of 102
`
`Page 3 of 102
`
`
`
`S—OOSO PCT
`
`of Switching and Sorting Networks” granted to Muthukrishnan et al. describes a VLSI
`
`layout using existing VLSI grid model for Benes and Butterfly networks. US. Patent
`
`6,940,308 entitled “Interconnection Network for a Field Programmable Gate Array”
`
`granted to Wong describes a VLSI layout where switches belonging to lower stage of
`
`Benes Network are laid out close to the logic cells and switches belonging to higher
`
`stages are laid out towards the center of the layout.
`
`Due to the inefficient and in some cases impractical VLSI layout of Benes and
`
`butterfly fat tree networks on a semiconductor chip, today mesh networks and segmented
`
`mesh networks are widely used in the practical applications such as field programmable
`
`gate arrays (FPGAs), programmable logic devices (PLDs), and parallel computing
`
`interconnects. The prior art VLSI layouts of Benes and butterfly fat tree networks and
`
`VLSI layouts of mesh networks and segmented mesh networks require large area to
`
`implement the switches on the chip, large number of wires, longer wires, with increased
`
`power consumption, increased latency of the signals which effect the maximum clock
`
`speed of operation. Some networks may not even be implemented practically on a chip
`
`due to the lack of efficient layouts.
`
`Fully connected Benes and butterfly fat tree networks are an over kill for certain
`
`practical routing applications and need to be optimized to significantly improve area,
`
`power and performance of the routing network.
`
`SUMMARY OF INVENTION
`
`Significantly optimized multi-stage networks, useful in wide target applications,
`
`with VLSI layouts (or floor plans) using only horizontal and vertical links to route large
`
`scale sub-integrated circuit blocks having inlet and outlet links, and laid out in an
`
`integrated circuit device in a two-dimensional grid arrangement of blocks, (for example
`
`in an FPGA where the sub-integrated circuit blocks are Lookup Tables, or memory
`
`blocks, or DSP blocks) arc prcscntcd. The optimized multi-stagc networks in cach block
`
`employ several rings of stages of switches with inlet and outlet links of sub—integrated
`
`10
`
`15
`
`20
`
`25
`
`Page 4 of 102
`
`Page 4 of 102
`
`
`
`S—OOSO PCT
`
`circuit blocks connecting to rings from either left—hand side only, or from right—hand side
`
`only, or from both left—hand side and right—hand side.
`
`The optimized multi-stage networks with their VLSI layouts employ shuffle
`
`exchange links where outlet links of cross links from switches in a stage of a ring in one
`
`sub—integrated circuit block are connected to either inlet links of switches in the another
`
`stage of a ring in another sub-integrated circuit block or inlet links of switches in the
`
`another stage of a ring in the same sub-integrated circuit block so that said cross links are
`
`either vertical links or horizontal and vice versa.
`
`The VLSI layouts exploit spatial locality so that different sub-integrated circuit
`
`10
`
`blocks that are spatially nearer are connected with shorter shuffle exchange links
`
`compared to the shuffle exchange links between spatially farther sub—integrated circuit
`
`blocks. The optimized multi—stage networks provide high routability for broadcast,
`
`unicast and multicast connections, yet with the benefits of significantly lower cross points
`
`hence smaller area, lower signal latency, lower power and with significant fast
`
`15
`
`compilation or routing time.
`
`The optimized multi-stage networks VComb (N1,N2,d,s) & VD_Comb(N1,N2 ,d,S)
`
`according to the current invention inherit the properties of one or more, in addition to
`
`additional properties, generalized multi—stage and pyramid networks V (N1, N 2 , d, s) &
`
`VP (N1 , N2 , d , s) , generalized folded multi—stage and pyramid networks
`
`Vfald (N1 ,N2 , d, S) & VWW (N1 ,N2 , d ,S), generalized butterfly fat tree and butterfly fat
`
`pyramid networks Vbfl (N1, N2 , d,s) & bep (N1, N2,d, s) , generalized multi—link multi—
`
`stage and pyramid networks lemk (N1,N2,d,s) & Vm
`
`link—p
`
`(N1,N2,61, s), generalized
`
`folded multi-link multi-stage and pyramid networks VMHW (N1 , N2 , d, s) &
`
`VMd_mlmk_p (N1 , N2 , d , S), generalized multi-link butterfly fat tree and butterfly fat
`
`pyramid networks le,,k_bfl (N1 ,N2,61 ,S) & lefn,_bfp (N1 ,N2,d, S), generalized hypercube
`
`networks Vmbe (N1 , N2 , d, s) , and generalized cube connected cycles networks
`
`20
`
`25
`
`VCCC (N1 , N2, d, S) for s = 1,2,3 or any number in general.
`
`Page 5 of 102
`
`Page 5 of 102
`
`
`
`S—OOSO PCT
`
`BRIEF DESCRIPTION OF DRAWINGS
`
`FIG. 1A is a diagram 100A of an exemplary partial multi-stage hierarchical
`
`network corresponding to one block with 4 inputs and 2 outputs of a computational block
`
`connecting only from left—hand side, to route practical applications such as FPGA routing
`
`of hardware designs in accordance with the invention.
`
`FIG. 1B is a diagram 100B of an exemplary partial multi—stage hierarchical
`
`network corresponding to one block with 8 inputs and 4 outputs of a computational block
`
`connecting from both left-hand side and right—hand side, to route practical applications
`
`such as FPGA routing of hardware designs in accordance with the invention.
`
`10
`
`15
`
`20
`
`FIG. 2A is a diagram 200A, in an embodiment of, a stage in a ring of multi-stage
`
`hierarchical network corresponding to one block.
`
`FIG. 2B is a diagram 200B, in an embodiment of, a stage in a ring of multi—stage
`
`hierarchical network corresponding to one block.
`
`FIG. 2C is a diagram 200C, in an cmbodimcnt of, a stagc in a ring of multi-stagc
`
`hierarchical network corresponding to one block.
`
`FIG. 2D is a diagram 200D, in an embodiment of, a stage in a ring of multi-stage
`
`hierarchical network corresponding to one block.
`
`FIG. 2E is a diagram 200E, in an embodiment of, a stage in a ring of multi-stage
`
`hierarchical network corresponding to one block.
`
`FIG. 3A is a diagram 300A, in an embodiment of, all the connections between
`
`two successive stages of two different rings in the same block or in two different blocks
`
`of a multi—stagc hicrarchical nctwork.
`
`FIG. 3B is a diagram 300B, in an embodiment of, all the connections between two
`
`successive stages of two different rings in the same block or in two different blocks of a
`
`25
`
`multi—stage hierarchical network.
`
`Page 6 of 102
`
`Page 6 of 102
`
`
`
`S—OOSO PCT
`
`FIG. 4 is a diagram 400, in an embodiment of, all the connections between two
`
`successive stages of two different rings in the same block or in two different blocks of a
`
`multi—stage hierarchical network.
`
`FIG. 5 is a diagram 500, in an embodiment of, all the connections between two
`
`successive stages of two different rings in the same block or in two different blocks of a
`
`multi-stage hierarchical network.
`
`FIG. 6 is a diagram 600, in an embodiment of, all the connections between two
`
`successive stages of two different rings in the same block or in two different blocks of a
`
`multi-stage hierarchical network.
`
`FIG. 7 is a diagram 700,
`
`is an embodiment of hop wire connection chart
`
`corresponding to a block of multi-stage hierarchical network.
`
`FIG. 8 is a diagram 800, is an embodiment of 2D—grid of blocks with each block
`
`corresponding to a partial multi—stage network to implement an exemplary multi-stage
`
`hierarchical network, in accordance with the invention.
`
`FIG. 9A is a diagram 900A, in an embodiment of, a stage in a ring of multi-stage
`
`hierarchical network corresponding to one block, with delay optimizations.
`
`FIG. 9B is a diagram 900B, in an embodiment of, a stage in a ring of multi—stage
`
`hierarchical network corresponding to one block, with delay optimizations.
`
`FIG. 9C is a diagram 900C, in an embodiment of, a stage in a ring of multi-stage
`
`hierarchical network corresponding to one block, with delay optimizations.
`
`FIG. 9D is a diagram 900D, in an embodiment of, a stage in a ring of multi-stage
`
`hicrarchical nctwork corrcsponding to onc block, with dclay optimizations.
`
`FIG. 9E is a diagram 900E. in an embodiment of, a stage in a ring of multi-stage
`
`hierarchical network corresponding to one block, with delay optimizations.
`
`FIG. 10A is a diagram 1000A, in an embodiment of, a stage in a ring of multi-
`
`stage hierarchical network corresponding to one block, with delay optimizations.
`
`-7-
`
`10
`
`15
`
`20
`
`25
`
`Page 7 of 102
`
`Page 7 of 102
`
`
`
`S—OOSO PCT
`
`FIG. 10B is a diagram 1000B, in an embodiment of, a stage in a ring of multi—
`
`stage hierarchical network corresponding to one block, with delay optimizations.
`
`FIG. 10C is a diagram 1000C, in an embodiment of, a stage in a ring of multi—
`
`stage hierarchical network corresponding to one block, with delay optimizations.
`
`FIG. 10D is a diagram 1000D, in an embodiment of, a stage in a ring of multi-
`
`stage hierarchical network corresponding to one block, with delay optimizations.
`
`FIG. 10E is a diagram 1000B, in an embodiment of, a stage in a ring of multi-
`
`stage hierarchical network corresponding to one block, with delay optimizations.
`
`FIG. 10F is a diagram 1000F, in an embodiment of, a stage in a ring of multi—
`
`stage hierarchical network corresponding to one block, with delay optimizations.
`
`FIG. 11A is a diagram 1100A, in an embodiment of, a stage in a ring of multi—
`
`stage hierarchical network corresponding to one block, with delay optimizations.
`
`FIG. 11B is a diagram 1100B, in an embodiment of, a stage in a ring of multi—
`
`stage hierarchical network corresponding to one block, with delay optimizations.
`
`FIG. 11C is a diagram 1100C, in an cmbodimcnt of, a stagc in a ring of multi—
`
`stage hierarchical network corresponding to one block, with delay optimizations.
`
`FIG. 12 is a diagram 1200, in an embodiment, all the connections between two
`
`successive stages of two different rings in the same block or in two different blocks of a
`
`multi—stage hierarchical network with delay optimizations.
`
`FIG. 13 is a diagram 1300, in one embodiment, all the connections between two
`
`successive stages of two different rings in the same block or in two different blocks of a
`
`multi—stage hierarchical network with delay optimizations.
`
`FIG. 14 is a diagram 1400, in an embodiment of, all the connections between two
`
`successive stages of two different rings in the same block or in two different blocks of a
`
`multi-stage hierarchical network with delay optimizations.
`
`10
`
`15
`
`20
`
`25
`
`Page 8 of 102
`
`Page 8 of 102
`
`
`
`S—OOSO PCT
`
`FIG. 15 is a diagram 1500, in an embodiment of, all the connections between two
`
`successive stages of two different rings in the same block or in two different blocks of a
`
`multi—stage hierarchical network with delay optimizations.
`
`FIG. 16A1 is a diagram 1600Al of an exemplary prior art implementation of a
`
`two by two switch; FIG. 16A2 is a diagram 1600A2 for programmable integrated circuit
`
`prior art implementation of the diagram 1600Al of FIG. 16Al; FIG. 16A3 is a diagram
`
`1600A3 for one-time programmable integrated circuit prior art implementation of the
`
`diagram 1600A1 of FIG. 16A1; FIG. 16A4 is a diagram 1600A4 for intcgratcd circuit
`
`placement and route implementation of the diagram 1600A1 of FIG. 16A1.
`
`10
`
`DETAILED DESCRIPTION OF THE INVENTION
`
`Fully connected multi-stage hierarchical networks are an over kill in every
`
`dimension such as area, power, and performance for certain practical routing applications
`
`and need to be optimized to significantly improve savings in area, power and
`
`performance of the routing network. The present invention discloses several
`
`15
`
`embodiments of the optimized multi—stage hierarchical networks for practical routing
`
`applications along with their VLSI layout (floor plan) feasibility and simplicity.
`
`The multi—stage hierarchical networks considered for optimization in the current
`
`invention include: generalized multi—stage networks V(N1, N2,64] ,s) , generalized folded
`
`multi—stage networks Vfold (N1 , N2 , d, s) , generalized butterfly fat tree networks
`
`20
`
`[in
`Vbfl(N1,N2,d,s) , generalized multi-link multi-stage networks Vm k (N1,N2 , d, S) ,
`
`generalized folded multi-link multi—stage networks ijldwlmk (N1 , N2 , d , S) , generalized
`
`multi-link butterfly fat tree networks Vm,1.nk_bfl (N1 , N2, d, s) , generalized hypercube
`
`networks Vmbe (N1 , N2 , d, s) , and generalized cube connected cycles networks
`
`V (N1 , N2 , d, S) for s = 1,2,3 or any numbcr in gcncral. Altcrnativcly the optimized
`
`25
`
`multi—stage hierarchical networks disclosed in this invention inherit the properties of one
`
`or more of these networks, in addition to additional properties that may not be exhibited
`
`these networks.
`
`Page 9 of 102
`
`Page 9 of 102
`
`
`
`S—OOSO PCT
`
`The optimized multi-stage hierarchical networks disclosed are applicable for
`
`practical routing applications, with several goals such as: 1) all the signals in the design
`
`starting from an inlet link of the network to an outlet link of the network need to be setup
`
`without blocking. These signals may consist of broadcast, unicast and multicast
`
`connections; Each routing resource may need to be used by only one signal or
`
`connection; 2) physical area consumed by the routing network to setup all the signals
`
`needs to be small; 3) power consumption of the network needs to be small, after the
`
`signals are setup. Power may be both static power and dynamic power; 4) Delay of the
`
`signal or a connection needs to be small after it is setup through a path using several
`
`routing resources in the path. The smaller the delay of the connections will lead to faster
`
`performance of the design. Typically delay of the critical connections determines the
`
`performance of the design on a given network; 5) Designs need to be not only routed
`
`through the network (i.e., all the signals need to be setup from inlet links of the network
`
`to the outlet links of the network), but also the routing needs to be in faster time using
`
`efficient routing algorithms; 6) Efficient VLSI layout of the network is also critical and
`
`can greatly influence all the other parameters including the area taken up by the network
`
`on the chip, total number of wires, length of the wires, delay through the signal paths and
`
`hence the maximum clock speed of operation.
`
`The different varieties of multi—stage networks described in various embodiments
`
`in the current invention have not been implemented previously on the semiconductor
`
`chips. The practical application of these networks includes Field Programmable Gate
`
`Array (FPGA) chips. Current commercial FPGA products such as Xilinx’s Vertex,
`
`Altera’s Stratix, Lattice’s ECPX implement island-style architecture using mesh and
`
`segmented mesh routing interconnects using either full crossbars or sparse crossbars.
`
`These routing interconnects consume large silicon area for crosspoints, long wires, large
`
`signal propagation delay and hence consume lot of power.
`
`The cu1re11t invention discloses the optimization of multi—stage hierarchical
`
`networks for practical routing applications of numerous types of multi-stage networks.
`
`The optimizations disclosed in the current invention are applicable to including the
`
`10
`
`15
`
`20
`
`25
`
`-10-
`
`Page 10 of 102
`
`Page 10 of 102
`
`
`
`S—OOSO PCT
`
`numerous generalized multi-stage networks disclosed in the following patent
`
`applications:
`
`1) Strictly and rearrangeably nonblocking for arbitrary fan—out multicast and
`
`unicast for generalized multi-stage networks V(N1, N2 , d, s) with numerous connection
`
`topologies and the scheduling methods are described in detail in the US Application
`
`Serial No. 12/ 530,207 that is incorporated by reference above.
`
`2) Strictly and rearrangeably nonblocking for arbitrary fan-out multicast and
`
`unicast for generalized butterfly fat tree networks be, (N1 , N2 , d, S) with numerous
`
`connection topologies and the scheduling methods are described in detail in the US
`
`Application Serial No. 12/ 601,273 that is incorporated by reference above.
`
`3) Rearrangeably nonblocking for arbitrary fan-out multicast and unicast, and
`
`strictly nonblocking for unicast for generalized multi-link multi-stage networks
`
`mek (N1 , N2 , d, s) and generalized folded multi-link multi-stage networks
`
`0
`Vf
`
`,d_m,mk (N1, N2 , d, S) with numerous connection topologies and the scheduling methods
`
`are described in detail in the US Application Serial No. 12/ 601,274 that is incorporated
`
`by reference above.
`
`4) Strictly and rearrangeably nonblocking for arbitrary fan—out multicast and
`
`unicast for generalized multi—link butterfly fat tree networks lemk_bfl (N1 ,N2 , d ,s) with
`
`numerous conncction topologics and thc schcduling methods are dcscribcd in detail in the
`
`US Application Serial No. 12/ 601,273 that is incorporated by reference above.
`
`5) Strictly and rearrangeably nonblocking for arbitrary fan-out multicast and
`
`unicast for generalized folded multi-stage networks VfoM (N1 , N2 ,d , S) with numerous
`
`connection topologies and the scheduling methods are described in detail in the US
`
`Application Serial No. 12/ 601,274 that is incorporated by reference above.
`
`6) Strictly nonblocking for arbitrary fan—out multicast and unicast for gcncralizcd
`
`multi—link multi—stage networks szmk(N12N22 51,3) and generalized folded multi—link
`
`10
`
`15
`
`20
`
`25
`
`-11-
`
`Page 11 of 102
`
`Page 11 of 102
`
`
`
`S—OOSO PCT
`
`multi—stage networks anldww (N1 , N2 , d, s) with numerous connection topologies and
`
`the scheduling methods are described in detail in the US Application Serial No.
`
`12/ 601,274 that is incorporated by reference above.
`
`7) VLSI layouts of numerous types of multi—stage networks are described in the
`
`US Application Serial No. 12/ 601,275 entitled "VLSI LAYOUTS OF FULLY
`
`CONNECTED NETWORKS" that is incorporated by reference above.
`
`8) VLSI layouts of numerous types of multi-stage networks are described in the
`
`PCT Application Serial No. PCT/ U810 / 52984 entitled "VLSI LAYOUTS OF FULLY
`
`CONNECTED GENERALIZED AND PYRAMID NETWORKS WITH LOCALITY
`
`EXPLOITATION" that is incorporated by reference above.
`
`In addition the optimization with the VLSI layouts disclosed in the current
`
`invention are also applicable to generalized multi-stage pyramid networks
`
`VP (N1, N2 , d, s) , generalized folded multi-stage pyramid networks V/old—p (N1,1V2,d,S),
`
`gcncralizcd buttcrfly fat pyramid nctworks VW (N1 , N2 , d, s) , gcncralizcd multi—link
`
`multi-stage pyramid networks Vm
`
`link—p
`
`(N1 , N2 , d, s) , generalized folded multi-link multi-
`
`stage pyramid networks Vfoli,,_,,m.nk_p (N1, N2 , d, s) , generalized multi—link butterfly fat
`
`pyramid networks lemmfl, (N1 , N2 , d, s), generalized hypercube networks
`
`Vmbe (N1 , N2 , d, s) and generalized cube connected cycles networks VCCC(N1, N2 , (1,3)
`
`for s = 1,2,3 or any number in general.
`
`Finally the current invention discloses the optimizations and VLSI layouts of
`
`multi-stage hierarchical networks VComb (N1 ,N2 , d, s) and the optimizations and VLSI
`
`layouts of multi—stage hierarchical networks V040,“, (N1, N2 , d, s) for practical routing
`
`applications (particularly to set up broadcast, unicast and multicast connections), where
`
`“Comb” denotes the combination of and “D-Comb” denotes the delay optimized
`
`combination of any of the generalized multi-stage networks V(N1, N2 , d ,s) , generalized
`
`folded multi-stage networks Vfold(Z\/'1,N2 , d, s) , generalized butterfly fat tree networks
`
`10
`
`15
`
`20
`
`25
`
`-12-
`
`Page 12 of 102
`
`Page 12 of 102
`
`
`
`S—OOSO PCT
`
`Vbfl (N1 ,N2 , d , S) , generalized multi-link multi-stage networks me (N1 , N2 , d , S) ,
`
`generalized folded multi-link multi—stage networks Vfoldwlmk (N1 , N2 , d , S) , generalized
`
`multi—link butterfly fat tree networks lemk_bfl (N1 , N2, d, S) , generalized multi-stage
`
`pyramid networks VP (N1 , N2 , d ,S) , generalized folded multi—stage pyramid networks
`
`Vfold—p (N1 , N2 , d, S), gcncralizcd buttcrfly fat pyramid nctworks Vbfi, (Nl , N2 , d, S),
`
`generalized multi-link multi-stage pyramid networks leinkfi, (N1 , N2, d, S) , generalized
`
`folded multi—link multi—stage pyramid networks Vfo,d—mlz'nk—p (N1 , N2, d, S), generalized
`
`multi—link butterfly fat pyramid networks szmabfp (N1, N2 , d, S), generalized hypercube
`
`networks Vmbe (N1 , N2 , d, S), and generalized cube connected cycles networks
`
`10
`
`(‘z‘t‘
`V (N1,N2,d,S) for s = 1,2,3 or any number in general.
`
`Multi—stage hierarchical network chmb (N1 , N2 , d, S) :
`
`15
`
`20
`
`25
`
`Referring to diagram 100A in FIG. 1A, in one embodiment, an exemplary partial
`
`multi—stage hierarchical network VComb (N1 , N2 ,d, S) where N1 = 200; N2 = 400; d = 2;
`
`and s = 1 corresponding to one computational block, with each computational block
`
`having 4 inlet links namely 11, 12, I3, and I4; and 2 outlet links namely 01 and 02. And
`
`for each computational block the corresponding partial multi-stage hierarchical network
`
`VComb (N1 , N2,61 ,S) 100A consists of two rings 110 and 120, where ring 110 consists of
`
`“m+l” stages namely (ring 1, stage 0), (ring 1, stage 1),
`
`(ring 1, stage “m-l”), and
`
`(ring 1, stage “m”), and ring 120 consists of “n+1” stages namely (ring 2, stage 0), (ring
`
`2, stage 1),
`
`(ring 2, stage “n-1”), and (ring 2, stage “n”), where “m” and n are
`
`positive integers.
`
`Ring 110 has inlet links Ri(1,1) and Ri(1,2), and has outlet links Bo(1,1) and
`
`Bo(1,2). Ring 120 has inlet links Fi(2,1) and Fi(2,2), and outlet links Bo(2,1) and
`
`Bo(2,2). And hence the partial multi—stage hierarchical network VComb (N1 , N2 , d , S) 100A
`
`consists of 4 inlet links and 4 outlet links corresponding to the two rings 110 and 120.
`
`Outlet link 01 of the computational block is connected to inlet link Ri(1, 1) of ring 110
`
`and also inlet link of Fi(2, 1) of ring 120. Similarly outlet link 02 of the computational
`
`-13-
`
`Page 13 of 102
`
`Page 13 of 102
`
`
`
`S—OOSO PCT
`
`block is connected to inlet link Ri(1,2) of Ring 110 and also inlet link of Fi(2,2) of Ring
`
`120. And outlet link Bo(1,1) of Ring 110 is connected to inlet link 11 of the
`
`computational block. Outlet link Bo(l ,2) ofRing l 10 is connected to inlet link T2 ofthe
`
`computational block. Similarly outlet link Bo(2, 1) of Ring 120 is connected to inlet link
`
`13 of the computational block. Outlet link Bo(2,2) of Ring 120 is connected to inlet link
`
`14 of the computational block. Since in this embodiment outlet link 01 of the
`
`computational block is connected to both inlet link Ri(1,1) of ring 110 and inlet link
`
`Fi(2,1) of ring 120; and outlet link 02 of the computational block is connected to both
`
`inlet link Ri(1,2) of ring 110 and inlet link Fi(2,2) of ring 120, the partial multi—stage
`
`hierarchical network VComb(N1,N2, d,s) 100A consists of 2 inlet links and 4 outlet links.
`
`The two dimensional grid 800 in FIG. 8 illustrates an exemplary arrangement of
`
`100 blocks arranged in 10 rows and 10 columns, in an embodiment. Each row of 2D—grid
`
`consisting of 10 block numbers namely the first row consists of the blocks (1,1), (1,2),
`
`(1,3),
`
`, (1,9), and (1,10). The second row consists ofthe blocks (2,1), (2,2), (2,3),
`
`,
`
`(2,9), and (2,10). Similarly 2D—grid 800 consists of 10 rows of each with 10 blocks and
`
`finally the tenth row consists ofthe blocks (10,1), (10,2), (10,3),
`
`, (10,9), and (10,10).
`
`Each block of 2D-grid 800, in one embodiment, is part of the die area of a semiconductor
`
`integrated circuit, so that the complete 2D-grid 800 of 100 blocks represents the complete
`
`die of the semiconductor integrated circuit. In one embodiment, each block of 2D-grid
`
`800 consists of one of the partial multi-stage hierarchical network
`
`VCDmb (N1 , N2 ,d, s) 100A with 2 inlet links and 4 outlet links and the corresponding
`
`computational block with 4 inlet links and 2 outlet links. For example block (1,1) of 2D—
`
`grid 800 consists of one of the partial multi-stage hierarchical network
`
`chmb (N1 , N2 ,d, s) 100A with 2 inlet links and 4 outlet links and the corresponding
`
`computational block with 4 inlet links and 2 outlet links. Similarly each of the 100 blocks
`
`of 2D-grid 800 has a separate partial multi-stage hierarchical network
`
`VComb (N1 , N2 ,d, s) 100A with 2 inlet links and 4 outlet links and the corresponding
`
`computational block with 4 inlet links and 2 outlet links. Hence the complete multi—stage
`
`hierarchical network VCmb(N1,N2, d, s) corresponding to 2D-grid 800 has N1 = 200 inlet
`
`links and N2 = 400 outlet links. And there are 100 computational blocks each one
`
`-14-
`
`10
`
`15
`
`20
`
`25
`
`30
`
`Page 14 of 102
`
`Page 14 of 102
`
`
`
`S—OOSO PCT
`
`corresponding to one of the blocks with each computational block having 4 inlet links and
`
`2 outlet links. Also the 2D-grid 800 is organized in the fourth quadrant of the 2D-Plane.
`
`In other embodiments the 2D-grid 800 may be organized as either first quadrant, or
`
`second quadrant or third quadrant of the 2D—P1ane.
`
`Referring to partial multi-stage hierarchical network VComb (N1 ,N2, d, s) 100A in
`
`FIG. 1A, the stage (ring 1, stage 0) consists of 4 inputs namely Ri(l,1), Ri(1,2), Ui(1,l),
`
`and Ui(l,2); and 4 outputs Bo(l,1), Bo(1,2), Fo(1,1), and Fo(1,2). The stage (ring 1, stage
`
`0) also consists of eight 2:1 multiplexers (A multiplexer is hereinafter called a “mux”)
`
`namely R(l,1), R(l,2), F(1,1), F(1,2), U(1,1), U(1,2), B(1,1), and B(1,2). The 2:1 Mux
`
`R(1,1) has two inputs namely Ri(1,1) and Bo(1,l) and has one output Ro(1,1). The 2:1
`
`Mux R(1,2) has two inputs namely Ri(1,2) and Bo(1,2) and has one output Ro(l,2). The
`
`2:1 Mux F(1,1) has two inputs namely Ro(1,l) and Ro(1,2) and has one output Fo(1,1).
`
`The 2:1 Mux F(1,2) has two inputs namely Ro(1,1) and Ro(1,2) and has one output
`
`Fo(1,2).
`
`The 2:1 Mux U(1,1) has two inputs namely Ui(1,1) and Fo(1,1) and has one
`
`output Uo( 1,1). The 2:1 Mux U( 1,2) has two inputs namely Ui(1,2) and Fo(1,2) and has
`
`one output Uo(l,2). The 2:1 Mux B(1,l) has two inputs namely Uo(l,1) and Uo(l,2) and
`
`has one output Bo(1,1). The 2:1 Mux B(1,2) has two inputs namely Uo(1, 1) and Uo(1,2)
`
`and has one output Bo(1,2).
`
`Thc stagc (ring 1, stagc 1) consists of 4 inputs namcly Ri(1,3), Ri(1,4), Ui(l,3),
`
`and Ui(l,4); and 4 outputs Bo(1,3), Bo(1,4), Fo(1,3), and Fo(1,4). The stage (ring 1, stage
`
`1) also consists of eight 2:1 Muxes namely R(l,3), R(l,4), F(l,3), F(l,4), U(l,3), U(l,4),
`
`B(1,3), and B(1,4). The 2:1 Mux R(1,3) has two inputs namely Ri(1,3) and Bo(1,3) and
`
`has one output Ro(1,3). The 2:1 Mux R(1,4) has two inputs namely Ri(l,4) and Bo(l,4)
`
`and has one output Ro(1,4). The 2:1 Mux F(1,3) has two inputs namely Ro(1,3) and
`
`Ro(1,4) and has one output Fo(1,3). The 2:1 Mux F(1,4) has two inputs namely Ro(l,3)
`
`and Ro(1,4) and has one output Fo(1,4).
`
`The 2:1 Mux U(1,3) has two inputs namely Ui(1,3) and Fo(1,3) and has one
`
`output Uo(l,3). The