`
`(12) United States Patent
`US 9,374,322 B2
`(10) Patent N0.:
`Konda
`
`(45) Date of Patent: Jun. 21, 2016
`
`(54)
`
`OPTIMIZATION OF MULTI-STAGE
`HIERARCHICAL NETWORKS FOR
`PRACTICAL ROUTING APPLICATIONS
`
`(5 6)
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`(71)
`
`Applicant: Konda Technologies Inc., San Jose, CA
`(US)
`
`(72)
`
`Inventor: Venkat Konda, San Jose, CA (US)
`
`(73)
`
`Assignee: Konda Technologies Inc., San Jose, CA
`(US)
`
`( * )
`
`Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 125 days.
`
`(21)
`
`Appl. N0.: 14/199,168
`
`(22)
`
`Filed:
`
`Mar. 6, 2014
`
`(65)
`
`(51)
`
`(52)
`
`(58)
`
`Prior Publication Data
`
`US 2014/0313930 A1
`
`Oct. 23, 2014
`
`Int. Cl.
`
`H04L 12/933
`US. Cl.
`
`(2013.01)
`
`CPC .................................. H04L 49/1515 (2013.01)
`Field of Classification Search
`CPC ................................................... H04L 49/1515
`USPC .......................................................... 370/254
`See application file for complete search history.
`
`Ring ‘x”, Stage ”p"
`
`5,654.695 A *
`
`6,091,723 A *
`
`6,335,930 B1*
`
`8/1997 Olnowich .......... H04Q11/0478
`340/223
`7/2000 Even ................... 1104L 49/1507
`340/221
`1/2002 Lee ....................... H04L 49/101
`370/387
`6,469,540 B2* 10/2002 Nakaya ............ H03K19/17728
`326/38
`7,440,449 B2* 10/2008 Carson ...................... G06T 7/20
`257/499
`6/2003 Fontana ................ H04L 12/437
`370/216
`2/2011 Konda ................ G06F 17/5077
`326/41
`2012/0269190 A1* 10/2012 Konda .................. G06F17/509
`370/388
`
`2003/0117946 A1*
`
`2011/0037498 A1*
`
`* cited by examiner
`
`Primary Examiner 7 Rasheed Gidado
`
`ABSTRACT
`(57)
`Significantly optimized multi-stage networks, useful in wide
`target applications, with VLSI layouts using only horizontal
`and vertical links to route large scale sub-integrated circuit
`blocks having inlet and outlet links, and laid out in an inte-
`grated circuit device in a two-dimensional grid arrangement
`ofblocks are presented. The optimized multi-stage networks
`in cach block cmploy scvcral rings of stagcs of switchcs with
`inlet and outlet links of sub-integrated circuit blocks connect-
`ing to rings from either left—hand side only, or from right—hand
`side only, or from both left-hand side and right-hand side; and
`employ shuffle exchange links where outlet links of cross
`links from switches in a stage of a ring in one sub-integrated
`circuit block are connected to either inlet links of switches in
`the another stage of a ring in the same or another sub-inte-
`grated circuit block.
`20 Claims, 19 Drawing Sheets
`
`Pilil
`
`W's-wig“
`
`Ring .yi.‘ Stags “q“
`
`Page 1 of 48
`
`FLEX LOGIX EXHIBIT 1035
`
`Page 1 of 48
`
`FLEX LOGIX EXHIBIT 1035
`
`
`
`US. Patent
`
`Jun. 21, 2016
`
`Sheet 1 0f 19
`
`US 9,374,322 B2
`
`(i-uz‘im'i
`
`Ring1.Stage“m-i"
`
`100 (Luz'mi
`
`
`
`
`
` Ring1,Stage1
`Ring2,Stage1
`
`
`
`
` Ring1.Stage0
`
`
`(ram 1
`
`(inin ’1
`
` 5012.2)
`
`Computational
`Block
`
`FIG.1A
`
`Page 2 of 48
`
`Page 2 of 48
`
`
`
`US. Patent
`
`Jun. 21, 2016
`
`Sheet 2 0f 19
`
`US 9,374,322 B2
`
`
`L+UZ‘Z)!H\
`
`
`
`
`“321%WE A
`ux—I
`
`
`Ring1,Stage“m-1"
`
`
` Rxng1,Stage1 Ring2,Stage1
`
`
`
`
`
`
`
`
`
`FIG.13
`
`Page 3 of 48
`
`Page 3 of 48
`
`
`
`US. Patent
`
`Jun. 21, 2016
`
`Sheet 3 0f 19
`
`US 9,374,322 B2
`
`FIG. 2A y
`
`FIG. 213
`
`2° B
`
`F0(k,2m+1)
`
`Ri(k,2m+1)
`
` Ui(k,2m+2) 80(k,2mf2)
`
`u.(l§.2m+2)
`
`FoQk,2m+2)
`
`20 C
`
`Fo\(k,2m+1)
`,
`
`Fq(k,2m+2)
`)
`
`20 D
`
`F0(k,2m+1)
`J
`
`Page 4 of 48
`
`Page 4 of 48
`
`
`
`US. Patent
`
`Jun. 21, 2016
`
`Sheet 4 0f 19
`
`US 9,374,322 B2
`
`FIG. 2E
`
`Ri(k,2m+1)\
`
`Page 5 of 48
`
`Page 5 of 48
`
`
`
`US. Patent
`
`Jun. 21, 2016
`
`Sheet 5 0f 19
`
`US 9,374,322 B2
`
`
`i(x,2p+4)I
`I
`l
`
`7 ‘éfi'g‘l F292”)
`
`Ui(x,2p+1)
`Ui(x,2p+3)
`Eo(x.2p+3) a
`E
`,3
`J
`
`II
`
`Ui(x.2p+4|
`5
`
` \ R
`
`\’
`
`Ui(x.2p+2)
`<
`[J
`
`\
`Eo(x.2p+4)
`
`r
`
`Page 6 of 48
`
`Page 6 of 48
`
`
`
`US. Patent
`
`Jun. 21, 2016
`
`Sheet 6 0f 19
`
`US 9,374,322 B2
`
`FIG. 3B
`
`11
`11
`Ring x , Stage “p"
`
`)
`
`BOW-2W4)
`
`Ring “”,y Stage “q”
`
`Ring “".y Stage “W"
`
`\jop(1.2)
`H3p(212)
`
`8+o-
`N.30mi
`
`\
`
`Page 7 of 48
`
`Page 7 of 48
`
`
`
`US. Patent
`
`Jun. 21, 2016
`
`Sheet 7 0f 19
`
`US 9,374,322 B2
`
`FIG. 4
`
`Ring “"x Stage “p”
`
`Ring .u‘x Stage “p+1"
`
` 11
`
`a:
`Ring y , Stage “q+1"
`
`Page 8 of 48
`
`Page 8 of 48
`
`
`
`US. Patent
`
`Jun. 21, 2016
`
`Sheet 8 0f 19
`
`US 9,374,322 B2
`
`FIG. 5
`
`Ring “x", Stage “p”
`
`Bo(x,2p+1)
`
`Bo(x,2p+2)
`
`)
`
`Uo(x,2p:1)
`
`NW
`N Em\
`
`Page 9 of 48
`
`Page 9 of 48
`
`
`
`US. Patent
`
`Jun. 21, 2016
`
`Sheet 9 0f 19
`
`US 9,374,322 B2
`
`FIG. 6
`
`Rlng “XH‘ Stage up”
`
`6
`
`
`
`
`
`
`
`E
`I g
`F II‘ 3
`>
`DI
`
`
`FE
`h
`17+bzA
`
`
`Fo(y!2q+4)
`
`Ui(y.2q+3)
`
`BO(y,2q+3)
`\
`Uo(y,2q+3
`
`Page 10 0f48
`
`Page 10 of 48
`
`
`
`US. Patent
`
`Jun. 21, 2016
`
`Sheet 10 of 19
`
`US 9,374,322 B2
`
`FIG. 7
`
`A)”
`
`L1 V1 U1 H1 K1 L2
`
`AMWE
`
`Stages in 1St Ring
`
`Stages in 2nd Ring
`
`L1 V3 U3 L2 H3 K3 V5
`
`NWM
`
`Page 11 of 48
`
`Page 11 of 48
`
`
`
`US. Patent
`
`Jun. 21, 2016
`
`Sheet 11 0f 19
`
`US 9,374,322 B2
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`w.Urm
`
`8:8.88.8
`
`
`
`8888.88.88.88.88.88.88.88.88.8
`
`:88
`
`Page 12 of 48
`
`
`
`8888.88.88.88.88.88.88.88.88.8
`
`
`
`838.88.88.88.88.88.88.88.88.8
`
`
`
`88.88.88.88.88.88.88.88.88.88.8
`
`88.88.88.88.88.88.88.88.88.88.8
`
`
`
`838.88.88.88.88.88.88.88.88.8
`
`8888.88.88.88.88.88.88.88.88.8
`
`
`
`8888.88.88.88.88.88.88.88.88.8
`
`
`
`88.0:8.0:8.0:8.088.088.088888888.088.08
`
`
`
`
`
`
`
`Page 12 of 48
`
`
`
`
`
`
`
`
`
`
`
`
`US. Patent
`
`Jun. 21, 2016
`
`Sheet 12 of 19
`
`US 9,374,322 B2
`
`FIG. 9A
`
`90
`
`Fi(k,gm+1)
`*
`
`F0(k,2m+1)
`
`
`
`
`
`
`
`Fi(k,2m+1)A)
`YFi(k,2m+1),
`
`Fi(k,2m+2) \_
`
`Bo(k,2m+1)\
`
`Bo(k,2m+2)_\
`
`Page 13 of 48
`
`YUT(k,2m+1)
`
`Bo(k,2m+2)
`
`J
`Fu‘(k‘2m+1)
`
`,
`F"k'2m+13
`YFi(k,2ug_+1 )
`
`Ui(k,2m+1)
`5
`('
`
`Bo(k,2mf 1)
`jy
`
`YU‘izk,2m+1)
`
`,)
`Ui(k,2m+2)
`
`Bo(k,2mf2)
`I
`
`Page 13 of 48
`
`
`
`US. Patent
`
`Jun. 21, 2016
`
`Sheet 13 of 19
`
`US 9,374,322 B2
`
`FIG. 10A
`
`100 A
`
`FIG. 10B
`
`100 B
`
`YRi(k,2mf-1) a /
`,\
`Ri(k,2m+2)/>
`
`Ri(k,2m+1)
`\2
`
`BO(k,2m+1)r
`
`<
`'
`
`“ka23., MIN)
`
`b"
`
`Fogk,2m+1)
`
`‘7
`Fo k,2m+2
`(-3
`
`)
`
`Ui(k,2m+1)
`J
`
`U|(k 2m+2)
`
`Ri(k,2ir1+1)
`,
`RYi(k,2’m+1)
`
`fl
`Ri(k,2m+2fi
`
`B0(k,2mf1)
`
`Fo_gk.2m+1)
`
`7
`Fo£)k,2m+2)
`
`U|(k,2m+1)
`#3
`
`Ui(k_,2m+2)
`
`Bu(k,2m+2)
`
`Ri(k,2m+\1)
`
`FIG. 10D 100 D
`FIG. 10C yc
`
`
`Ri(k,2m+g)
`
`
`
`
`Bo(k,2m+2)
`
`
`Bo(k,2m+1\)
`
`Ri(k,2m+1)
`
`Fo(k,2m+1)
`
`Ri(k,2m:1)
`?
`
`
`
`RYi(k,2m’l1 )
`.
`
`Ri(k,2m+2):
`Fo\(k,2m+2)
`
`Uj(k,2m+1)Rl(k,2m+2)
`~’ 2»
`Bu(k,2m+1 )
`YUi(k.2m+1)
`Bo(k,2m+1),}
`
`Ul(k,2m+2)
`F,
`
`,1
`Bo(k,2m+2).
`-.
`
`{AV
`
`‘
`
`Bo(k,2m+g\)
`
`Page 14 of 48
`
`Page 14 of 48
`
`
`
`US. Patent
`
`Jun. 21, 2016
`
`Sheet 14 of 19
`
`US 9,374,322 B2
`
`FIG. 11A
`
`110 A
`
`FYi(k,2m+2)
`
`Ri(k,2r‘n¥1)
`
`Ri(k,2m+2);
`
`Bo{k,2mf1)
`
`Bo(k,2m+_2)
`FIG. 11C
`(\4
`
`FIG. 11B
`
`19%
`
`Page 15 of 48
`
`Page 15 of 48
`
`
`
`US. Patent
`
`Jun. 21, 2016
`
`Sheet 15 of 19
`
`US 9,374,322 B2
`
`V
`
`TEN59EW
`dam?@”maméE
`
`a
`
`..woflm..x.
`
`..U:wmflm
`
`4..mam
`
`Em
`
`
`
`..$9mama.....x95.
`
`Amaméi
`
`aa2.0E
`
`Page 16 of 48
`
`Page 16 of 48
`
`
`
`US. Patent
`
`Jun. 21, 2016
`
`Sheet 16 of 19
`
`US 9,374,322 B2
`
`SENS;_\
`
`$3ng.$525)ATNm
`wmflm.>9:22:
`,a.50
`
`,v
`
`33..
`
`A\
`
`.2
`
`83‘2.0E
`
`Awamém
`
`
`
`=9mama.,.x95.
`
`:améom
`
`Page 17 of 48
`
`Page 17 of 48
`
`
`
`US. Patent
`
`Jun. 21, 2016
`
`Sheet 17 of 19
`
`US 9,374,322 B2
`
`
`
`
`
`.1...mmgm.....>Em
`
`Anmeéo:
`
`Yam/3.5
`
`3.3.35
`
`..Em
`
`marmxvi
`
`
`
`__$9mmSm....x9E
`
`Q
`
`wmfim
`
`osE.05
`
`Page 18 of 48
`
`A
`
`Camioa
`
`Page 18 of 48
`
`
`
`US. Patent
`
`Jun. 21, 2016
`
`Sheet 18 of 19
`
`US 9,374,322 B2
`
`_.is.womuw>9.5.
`
`>,\_f.2%:
`
`,.332..
`
`a+aw,.:_>m
`
`Tau...5SENXHQE
`
`.VAx
`
`v /
`
`ma.Umm
`
`
`
`h.mme....x95.
`
`7..
`
`RENEQ
`
`C+nwéom
`
`Page 19 of 48
`
`Page 19 of 48
`
`
`
`S.U
`
`eta
`
`09
`
`9
`
`US 9,374,322 B2
`
`P.«Eda
`
`m2.82
`J\1N:_\Gina
`
`2.310
`
`
`
`m3233%m,So:0nANNE/0m.:_,/N:
`
`943?:
`
`n«<82
`
`m94.82.:mN436:
`
`
`
`3.31003>Fl:\3.300
`
`\_N.__
`
`So:0
`
`N._O_\._0
`
`Page 20 of 48
`
`
`
`
`
`v<c~.Uumm4:.0?—
`
`c232.:c33?:
`
`Page 20 of 48
`
`
`
`
`US 9,374,322 B2
`
`
`
`1
`OPTIMIZATION OF MULTI-STAGE
`HIERARCHICAL NETWORKS FOR
`PRACTICAL ROUTING APPLICATIONS
`
`
`CROSS REFERENCE TO RELATED
`APPLICATIONS
`
`
`
`This application is related to and claims priority ofthe PCT
`Application Serial No. PCT/US12/53814 entit ed “OPTIMI—
`
`ZATION OF IVTULTI-STAGE HIERARCHICAL NET-
`WORKS FOR PRACTICAL ROUTING APPLICATIONS”
`3y Venkat Konda assigned to the same assignee as the current
`application, filed Sep. 6, 2012 and the US. Provisional Patent
`Application Ser. No. 61/531,615 entitled “OPTIMIZATION
`
`OF MULTI-STAGE HIERARCHICAL NETWORKS FOR
`JRAC'TICAL ROUTING APPLICATIONS” by Venkat
`{onda assigned to the same assignee as the current applica-
`ion, filed Sep. 7, 2011.
`This application is related to and incorporates by reference
`'n its entirety the US. Pat. No. 8,270,400 entitled “FULLY
`CONNECTED GENERALIZED MULTI-STAGE NET-
`
`WORKS” by Venkat Konda ass' gned to the same assignee as
`he current application, issued Sep. 18, 2012, the PCT Appli-
`cation Serial No. PCT/U08/56064 entitled “FUI IY CON-
`\I3CTED GENERALIZED MULTI- STAGE NETWORKS”
`
`
`
`3yVenkat Konda assigned to the same ass1gnee as the current
`application, filed Mar. 6, 2008, the US. Provisional Patent
`Aflplication Ser. No. 60/905,526 entitled “LARGE SCALE
`C§OSSPOINT REDUCTION WITH NONBLOCKING
`NICAST & MULTICAST IN ARBITRARILY LARGE
`
`VIULTI- STAG3 NETWORKS’ byVenkat Konda assigned to
`he same assignee as the current application, filed Mar. 6,
`2007, and the US. Provisional Patent Application Ser. No.
`60/940,383 entitled “FULLY CONNECTED GENERAL-
`IZED MULTI-STAGE NETWORKS” by Venkat Konda
`assigned to the same assignee as the current application, filed
`May 25, 2007.
`This applicationis related to and incorporates by reference
`in its entirety the U.S. Pat. No. 8, 170,040 entitled “FUL3Y
`CONNECTED GENERALIZED BUTT3RFLY FAT TREE
`
`N3TWORKS” by Venkat Konda assigned to the sane
`assignee as the current application, issued May 1, 2012, he
`PCT Application Serial No. PCT/T, 08/64603 entitled
`“FULLY CONNECTED GENERALIZED BUTTERF3Y
`FAT TREE NETWORKS” by Venkat Konda assigned to he
`same assignee as the current application, filed May 22, 2008,
`the US. Provisional Patent Application Ser. No. 60/940,387
`entitled “FULLY CONNECTED GENERALIZED BL T—
`TERFLY FAT TREE NETWORKS” by Venkat Konda
`assigned to the same assignee as the current application, filed
`VIay 25, 2007, and the U. S. Provisional Patent Application
`Ser. No. 60/940,390 entitled “FULLY CONNECTED GEN—
`ERALIZED MULTI-LINK BUTTERFLY FAT TREE NET-
`
`
`
`
`
`
`
`WORKS” by Venkat Konda assigned to the same assignee as
`he current application, filed May 25, 2007.
`This applicationis related to and incorporates by reference
`'n its entirety the US. Pat. No. 8,363,649 entitled “FULLY
`CONNECTED GENERALIZED MULTI-LINK MULTI-
`STAGE NETWORKS” by Venkat Konda assigned to the
`same assignee as the current application, issued Jan. 29, 2013,
`he PCT Application Serial No. PCT/U08/64604 entitled
`“FULLY CONNECTED GENERALIZED MULTI-LINK
`
`VIULTI- STAGE NETWORKS” by Venkat Konda assigned to
`he same assignee as the current application, filed May 22,
`2008,
`the US. Provisional Patent Application Ser. No.
`60/940,389 entitled “FULLY CONNECTED GENERAL-
`IZED REARRANGEABIY NONBLOCKING MULTI-
`
`Page 21 of 48
`
`20
`
`25
`
`30
`
`40
`
`60
`
`2
`
`LINK MULTI-STAGE NETWORKS” by Venkat Konda
`assigned to the same assignee as the current application, filed
`May 25, 2007, the US. Provisional Patent Application Ser.
`No. 60/940,391 entitled “FULLY CONNECTED GENER-
`ALIZED FOLDED MULTI-STAGE NETWORKS” by Ven-
`kat Ko 1da assigned to the same assignee as the current appli-
`cation, filed May 25, 2007 and the US. Provisional Patent
`Application Ser. No. 60/940, 392 entitled “FULLY CON-
`NECT3D GENERALIZED STRICTLY NONBLOCKING
`
`MULTI-LINK MULTI- STAG3 NETWORKS” by Venkat
`Konda assigned to the same assignee as the current applica-
`tion, fi ed May 25, 2007.
`This application is related to and incorporates by reference
`in its entirety the US. Pat. I\o. 8,269,523 entitled “VLSI
`LAYO JTS OF FULLY CONNECTED GENERALIZED
`
`
`
`NETWORKS” by Venkat Konda assigned to the same
`assignee as the current application, issued Sep. 18, 2012, the
`PCT Application Serial No, PCT/U08/64605 entitled “VLSI
`LAYO JTS OF FULLY CONNECTED GENERALIZED
`
`
`
`NETWORKS” by Venkat Konda assigned to the same
`assignee as the current application, filed May 22, 2008, and
`the US. Provisional Patent Application Ser. No. 60/940,394
`entitled “VLSI LAYOUTS OF FULLY CONNECTED GEN-
`ERALIZED NETWORKS” by Venkat Konda assigned to the
`same assignee as the current application, filed May 25, 2007.
`This application is related to and incorporates by reference
`in its entirety the US. application Ser. No. 13/502,207
`entitled “VLSI LAYOUTS OE FULLY CONNECTED GEN-
`ERALIZED AND PYRAMID NETWORKS WITH
`
`
`
`LOCALITY EXP3OITATIO\I” by Venkat Konda as signed to
`the same assignee as the current application, filed Apr. 16,
`2012,
`the PCT Application Serial No. PCT/US10/52984
`entitled “VLSI LAYOUTS OE FULLY CONNECTED GEN-
`ERALIZED AI\D PYRAMID NETWORKS WITH
`
`LOCALITY EXP3OITATIO\I” by Venkat Konda as signed to
`the same assignee as the current application, filed Oct. 16,
`2010,
`the US. 3rovisional Patent Application Ser. No.
`61/252,603 entitled “VLSI ,3AYOUTS OF FULLY CON—
`NECTED NETWORKS WITH LOCALITY EXPLOITA-
`
`TION” by Venkat Konda assigned to the same assignee as the
`current application, filed Oct. 16, 2009, and the US. Provi-
`sional Patent Application Ser. No. 61/252,609 entitled “VLSI
`LAYO J'TS OF FULLY CONNECTED GENERALIZED
`
`
`
`AND PYRAMID NETWORKS” by Venkat Konda assigned
`to the same assignee as the current application, filed Oct. 16,
`2009.
`
`BACKGROUND OF INVENTION
`
`Multi—stage interconnection networks such as Benes net—
`works and butterfly fat tree networks are widely useful in
`telecommunications, parallel and distributed computing.
`However VLSI layouts, known in the prior art. of these inter-
`connection networks in an integrated circuit are inefficient
`and complicated.
`Other multi-stage interconnection networks including but-
`terfly fat tree networks, Banyan networks, Batcher-Banyan
`networks, Baseline networks, Delta networks, Omega net-
`works and Flip networks have been widely studied particu-
`larly for self-routing packet switching applications. Also
`Benes Networks with radix of two have been widely studied
`and it is known that Benes Networks of radix two are shown
`to be built with back to back baseline networks which are
`rearrangeably nonblocking for unicast connections.
`The most commonly used VISI layout in an integrated
`circuit is based on a two-dimensional grid model comprising
`only horizontal and vertical tracks. An intuitive interconnec-
`
`
`
`Page 21 of 48
`
`
`
`US 9,374,322 B2
`
`4
`connections, yet with the benefits of significantly lower cross
`points hence smaller area, lower signal latency, lower power
`and with significant fast compilation or routing time.
`The optimized multi-stage networksVComb (N1, N2, d, s) &
`VDComb (N1, N2, d, 5) according to the current invention
`inherit the properties of one or more, in addition to additional
`properties, generalized multi-stage and pyramid networks V
`(N1, N2, d, s) & VP (N1, N2, d, s), generalized folded multi—
`stage and pyramid networks VfoM (N1, N2, d, s) & Vfozfl, (N1,
`N2, d, s), generalized butterfly fat tree and butterfly fat pyra-
`mid networks bez (N1, N2, d, s) &bep m1, N2, d, s), gener-
`alized multi-link multi-stage and pyramid networks thnk
`(N1, N2, d, s) & thnk_P (N1, N2, d, s), generalized folded
`multi-link multi-stage and pyramid networks VfoZdimW (N1,
`N2, d, s) & \IfoZd-mlink-p (N1, N2, d, s), generalized multi-link
`butterfly fat tree and butterfly fat pyramid networks lemk_bfl
`(N1, N2, d, s) & thnMfi, 011, N2, (1, s), generalized hyper-
`cube networks thube (N1, N2, d, s), and generalized cube
`connected cycles networks VCCC (N1, N2, (1, s) for F1, 2, 3 or
`any number in general.
`
`BRIEF DESCRIPTION OF DRAWINGS
`
`FIG. 1A is a diagram 100A of an exemplary partial multi-
`stage hierarchical network corresponding to one block with 4
`inputs and 2 outputs of a computational block connecting
`only from left-hand side, to route practical applications such
`as FPGA routing of hardware designs in accordance with the
`invention.
`
`FIG. 1B is a diagram 100B ofan exemplary partial multi-
`stage hierarchical network corresponding to one block with 8
`inputs and 4 outputs of a computational block connecting
`from both left-hand side and right-hand side, to route practi-
`cal applications such as FPGA routing ofhardware designs 'n
`accordance with the invention.
`
`3
`tion network that utilizes two-dimensional grid model is 2D
`Mesh Network and its variations such as segmented mesh
`networks. Hence routing networks used in VLSI layouts are
`typically 2D mesh networks and its variations. However
`MeshNetworks require large scale cross points typically with
`a growth rate of O(N2) where N is the number of computing
`elements, ports, or logic elements depending on the applica-
`tion.
`Multi-stage interconnection network with a growth rate of
`O(N><log N) requires significantly small number of cross
`points. US. Pat. No. 6,185,220 entitled “Grid Layouts of
`Switching and Sorting Networks” granted to Muthukrishnan
`et al. describes aVLSI layout using existing VLSI grid model
`for Benes and Butterfly networks. US. Pat. No. 6,940,308
`entitled “Interconnection Network for a Field Programmable
`Gate Array” granted to Wong describes a VLSI layout where
`switches belonging to lower stage of Benes Network are laid
`out close to the logic cells and switches belonging to higher
`stages are laid out towards the center of the layout.
`Due to the inefiicient and in some cases impractical VLSI
`layout of Benes and butterfly fat tree networks on a semicon-
`ductor chip, today mesh networks and segmented mesh net-
`works are widely used in the practical applications such as
`field programmable gate arrays (FPGAs), programmable
`logic devices (PLDs), and parallel computing interconnects.
`The prior art VLSI layouts of Benes and butterfly fat tree
`networks and VLSI layouts ofmesh networks and segmented
`mesh networks require large area to implement the switches
`on the chip,
`large number of wires,
`longer wires, with
`increased power consumption, increased latency of the sig-
`nals which effect the maximum clock speed of operation.
`Some networks may not even be implemented practically on
`a chip due to the lack of efiicient layouts.
`Fully c01mected Benes and butterfly fat tree networks are
`an over kill for certain practical routing applications and need
`to be optimized to significantly improve area, power and
`performance of the routing network.
`
`SUMMARY OF INVENTION
`
`20
`
`25
`
`30
`
`40
`
`Significantly optimized multi-stage networks, useful in
`wide target applications, with VLSI layouts (or floor plans)
`using only horizontal and vertical links to route large scale
`sub—integrated circuit blocks having inlet and outlet links, and
`laid out in an integrated circuit device in a two-dimensional
`grid arrangement of blocks, (for example in an FPGA where
`the sub-integrated circuit blocks are Lookup Tables, or
`memory blocks, or DSP blocks) are presented. The optimized
`multi-stage networks in each block employ several rings of
`stages of switches with inlet and outlet links of sub—integrated
`circuit blocks connecting to rings from either left-hand side
`only, or from right—hand side only, or from both left—hand side
`and right-hand side.
`The optimized multi—stage networks with their VLSI lay—
`outs employ shuflie exchange links where outlet links ofcross
`links from switches in a stage of a ring in one sub-intcgratcd
`circuit block are connected to either inlet links of switches in
`
`the another stage of a ring in another sub-integrated circuit
`block or inlet links of switches in the another stage ofa ring in
`the same sub-integrated circuit block so that said cross links
`are either vertical links or horizontal and vice versa.
`
`60
`
`The VLSI layouts exploit spatial locality so that different
`sub-integrated circuit blocks that are spatially nearer are con-
`nected with shorter shuffle exchange links compared to the
`shuffle exchange links between spatially farther sub-inte-
`grated circuit blocks. The optimized multi-stage networks
`provide high routability for broadcast, unicast and multicast
`
`Page 22 of 48
`
`FIG. 2A is a diagram 200A, in an embodiment of, a stage 'n
`a ring of multi-stage hierarchical network corresponding 0
`one bloc <.
`FIG. 2B is a diagram 200B, in an embodiment of, a stage in
`a ring of multi-stage hierarchical network corresponding 0
`one bloc (.
`FIG. 2C is a diagram 200C, in an embodiment of, a stage 'n
`a ring of multi—stage hierarchical network corresponding 0
`one bloc <.
`
`
`
`
`
`FIG. 2D is a diagram 200D, in an embodiment of, a stage in
`a ring of multi-stage hierarchical network corresponding 0
`one bloc <.
`FIG. 2E is a diagram 200E, in an embodiment of, a stage 'n
`a ring of multi—stage hierarchical network corresponding 0
`one bloc <.
`FIG. 3A is a diagram 300A, in an embodiment of, all the
`connections between two successive stages of two dif ere 1t
`rings in the same block or in two dif erent blocks of a multi—
`stage hierarchical network.
`FIG. 3B is a diagram 300B, in an cmbodimcnt of, all the
`connections between two successive stages of two diferent
`rings in the same block or in two dif erent blocks of a multi-
`stage hierarchical network.
`FIG. 4 is a diagram 400, in an embodiment of, all the
`connections between two successive stages of two diferent
`rings in the same block or in two dif erent blocks of a multi-
`stage hierarchical network.
`FIG. 5 is a diagram 500, in an embodiment of, all the
`connections between two successive stages of two dif erent
`rings in the same block or in two dif erent blocks of a multi-
`stage hierarchical network.
`
`
`
`
`
`Page 22 of 48
`
`
`
`5
`FIG. 6 is a diagram 600, in an embodiment of, all the
`connections between two successive stages of two different
`rings in the same block or in two different blocks of a multi—
`stage hierarchical network.
`FIG. 7 is a diagram 700, is an embodiment of hop wire
`connection chart corresponding to a block of multi-stage
`hierarchical network.
`FIG. 8 is a diagram 800, is an embodiment of 2D-grid of
`blocks with each block corresponding to a partial multi—stage
`network to implement an exemplary multi-stage hierarchical
`nctwork, in accordance with the invcntion.
`FIG. 9A is a diagram 900A, in an embodiment of, a stage in
`a ring of mu ti-s agc hierarchical nctwork corrcsponding to
`one b ock, w'th delay optimizations.
`FIG. 9B is a diagram 900B, in an embodiment of, a stage in
`a ring of mu ti-s age hierarchical network corresponding to
`one b ock, with delay optimizations.
`FIG. 9C is a diagram 900C, in an embodiment of, a stage in
`a ring of mu ti-s age hierarchical network corresponding to
`one b ock, w'th delay optimizations.
`FIG. 9D is a diagram 900D, in an embodiment of, a stage in
`a ring of mu ti-s age hierarchical network corresponding to
`one b ock, with delay optimizations.
`FIG. 9E is a diagram 900E, in an embodiment of, a stage in
`a ring of mu ti-s age hierarchical network corresponding to
`one b ock, with delay optimi7ations.
`FIG. 10A is a c iagram 1000A, in an embodiment of, a stage
`in a ring ofmul i-stage hierarc iical network corresponding to
`one b ock, with delay optimizations.
`FIG. 10B is a ciagram 1000 3, in an embodiment of, a stage
`in a ring ofmul i-slage hierarc iical network corresponding to
`one b ock, with delay optimizations.
`FIG. 10C is a ciagram 1000C, in an embodiment of, a stage
`in a ring ofmul i-stage hierarc 1ical network corresponding to
`one b ock, with delay optimizations.
`FIG. 10D is a c iagram 1000 D, in an embodiment of, a stage
`in a ri 1g ofmul i-stage hierarc 1ical network corresponding to
`one b ock, with delay optimizations.
`FIG. 10E is a c iagram 10003, in an embodiment of, a stage
`in a ring ofmul i-stage hierarc 1ical network corresponding to
`one b ock, with delay optimizations.
`FIG. 10F is a ciagram 1000F, in an embodiment of, a stage
`in a ri 1g ofmul i-stage hierarc 1ical network corresponding to
`one b ock, with delay optimizations.
`FIG. 11A is a c iagram 1100A, in an embodiment of, a stage
`in a ring ofmul i—stage hierarc 1ical network corresponding to
`one b ock, with delay optimizations.
`FIG. 11B is a ciagram 1100 3, in an embodiment of, a stage
`in a ri 1g ofmul i-stage hierarc 1ical network corresponding to
`one b ock, with delay optimizations.
`FIG. 11C is a ciagram 1100C. in an embodiment of, a stage
`in a ring ofmul i—stage hierarc 1ical network corresponding to
`one b ock, with delay optimizations.
`FIG. 12 is a diagram 1200, in an embodiment, all the
`connections between two successive stages of two different
`rings in thc same block or in two diffcrcnt blocks of a multi-
`stage hierarchical network with delay optimizations.
`FIG. 13 is a diagram 1300, in one embodiment, all the
`connections between two successive stages of two different
`rings in the same block or in two different blocks of a multi-
`stage hierarchical network with delay optimizations.
`FIG. 14 is a diagram 1400, in an embodiment of, all the
`connections between two successive stages of two different
`rings in the same block or in two different blocks of a multi-
`stage hierarchical network with delay optimizations.
`FIG. 15 is a diagram 1500, in an embodiment of, all the
`connections between two successive stages of two different
`
`
`
`
`
`
`
`US 9,374,322 B2
`
`6
`rings in the same block or in two different blocks of a multi-
`stage hierarchical network with delay optimizations.
`FIG. 16A1 is a diagram 1600A1 of an exemplary prior art
`implementation of a two by two switch; FIG. 16A2 is a
`diagram 1600A2 for programmable integrated circuit prior
`art implementation of the diagram 1600A1 of FIG. 16A1;
`FIG. 16A3 is a diagram 1600A3 for one-time programmable
`integrated circuit prior art implementation of the diagram
`1600A1 of FIG. 16A1; FIG. 16A4 is a diagram 1600A4 for
`integrated circuit placement and route implementation of the
`diagram 1600A1 of FIG. 16A1.
`
`DETAILED DESCRIPTION OF THE INVENTION
`
`
`
`Fully connected multi-stage hierarchical networks are an
`over kill in every dimension such as area, powe‘, and perfor-
`mance for certain practical routing applications and need to
`be optimized to significantly improve savings in area, power
`and performance of the routing network. The present inven-
`tion discloses several embodiments of the optimized multi-
`stage hierarchical networks for practical routing applications
`along with their VLSI layout (floor plan) feasibility and sim-
`plicity.
`The multi-stage hierarchical networks considered for opti-
`mization in the current invention include: generalized multi-
`stage networks V (N1, N2, d, s), generalized folded multi-
`stage networks Vfold (N1, N2, d, s), generalized butterfly fat
`tree networks Vbfl (N1, N2, (1, s), generalized multi-link multi-
`stage networks Vmh.”k (N1, N2, d, s), generalized folded multi-
`link multi-stage networks Vfoldwlmk (N1, N2, d, s), general-
`izedmulti-link butterfly fat tree networks VImZink—bft (N1, N2, d,
`s), generalized hypercube networks thube (N1, N2, d, s), and
`generalized cube connected cycles networks Vcm (N1, N2, d,
`s) for s:l, 2, 3 or any number in general. Alternatively the
`optimized multi-stage hierarchical networks disclosed in this
`invention inherit the properties of one or more of these net-
`works, in addition to additional properties that may not be
`exhibited these networks.
`
`
`
`The optimized multi-stage hierarchical networks disclosed
`are applicable for practical routing applications, with several
`goals such as: 1) all the signals in the design sta ”ting from an
`inlet link of the network to an outlet link of the network need
`to be setup without blocking. These signals may consist of
`broadcast, unicast and multicast c01mections; Each routing
`resource may need to be used by only one signal or connec-
`tion; 2) physical area consumed by the routing network to
`setup all the signals needs to be small; 3) power consumption
`of the network needs to be small, after the signals are setup.
`Power may be both static power and dynamic power; 4) Delay
`ofthe signal or a connection needs to be small after it is setup
`through a pathusing several routing resources in the path. The
`smaller the delay of the connections will lead to faster per—
`formance of the design. Typically delay of the critical con-
`nections determines the performance ofthe design on a given
`network; 5) Designs need to be not only routed through the
`nctwork (i.c., all the signals nccd to bc sctup from inlct links
`of the network to the outlet links of the network), but also the
`routing needs to be in faster time using efficient routing algo-
`rithms; 6) Eflicient VLSI layout ofthe network is also critical
`and can greatly influence all the other parameters including
`the area taken up by the network on the chip, total number of
`wires, length of the wires, delay through the signal paths and
`hence the maximum clock speed of operation.
`The different varieties ofmulti-stage networks described in
`various embodiments in the current invention have not been
`
`implemented previously on the semiconductor chips. The
`practical application of these networks includes Field Pro-
`
`20
`
`25
`
`30
`
`40
`
`60
`
`
`
`
`
`Page 23 of 48
`
`Page 23 of 48
`
`
`
`US 9,374,322 B2
`
`7
`grammable Gate Array (FPGA) chips. Current commercial
`FPGA products such as Xilinx’ s Vertex, Altera’s Stratix, Lat-
`tice’s ECPX implement island—style architecture using mesh
`and segmented mesh routing interconnects using either full
`crossbars or sparse crossbars. These routing interconnects
`consume large silicon area for crosspoints, long wires, large
`signal propagation delay and hence consume lot of power.
`The current invention discloses the optimization of multi-
`stage hierarchical networks for practical routing applications
`of numerous types of multi-stage networks. The optimiza-
`tions disclosed in the current invention are applicable to
`including the numerous generalized multi-stage networks
`disclosed in the following patent applications:
`1) Strictly and rearrangeably nonblocking for arbitrary
`fan-out multicast and unicast for generalized multi-stage net-
`works V (N 1, N 2, d, s) with numerous connection topologies
`and the scheduling methods are described in detail in the U.S.
`Pat. No. 8,270,400 that is incorporated by reference above.
`2) Strictly and rearrangeably nonblocking for arbitrary
`fan-out multicast and unicast for generalized butterfly fat tree
`networks Vbfl (N1 , N2, d, s) with numerous connection topolo-
`gies and the scheduling methods are described in detail in the
`US. Pat. No. 8,170,040 that is incorporated by reference
`above.
`
`3) Rearrangeably nonblocking for arbitrary fan-out multi-
`cast and unicast, and strictly nonblocking for unicast for
`generalized multi-link multi-stage networks Vmh.”k (N1, N2, d,
`s) and generalized folded multi-link multi-stage networks
`VfoMm1W, (N1 , N2, d, s) with numerous connection topologies
`and the scheduling methods are described in detail in the US.
`Pat. No. 8,363,649 that is incorporated by reference above.
`4) Strictly and rearrangeably nonblocking for arbitrary
`fan-out multicast and unicast for generalized multi-link but-
`terfly fat tree networks szmk_bfl (N1, N2, d, s) with numerous
`connection topologies and the scheduling methods are
`described in detail in the US. Pat. No. 8,170,040 that is
`incorporated by reference above.
`5) Strictly and rearrangeably nonblocking for arbitrary
`fan-out multicast and unicast for generalized folded multi-
`stage networks Vfold (N1, N2, (1, s) with numerous connection
`topologies and the scheduling methods are described in detail
`in the US. Pat. No. 8,363,649 that is incorporated by refer-
`ence above.
`
`6) Strictly nonblocking for arbitrary fan—out multicast and
`unicast
`for generalized multi-link multi-stage networks
`th-nk (N1, N2, d, s) and generalized folded multi—link multi—
`stage networks V’fOM_mhnk (N1, N2, d, s) with numerous con-
`nection topologies and the scheduling methods are described
`in detail in the US. Pat. No. 8,363,649 that is incorporated by
`reference above.
`7) VLSI layouts ofnumerous types ofmulti-stage networks
`are described in the US. Pat. No. 8,269,523 entitled “VLSI
`LAYOUTS OF FULLY CONNECTED NETWORKS” that
`is incorporated by reference above.
`8) VLSI layouts ofnumerous types ofmulti-stage networks
`are described in the US. application Ser. No. 13/502,207
`entitled“VLS1 LAYOUTS OF FULLY CONNECTED GEN-
`ERALIZED AND PYRAMID NETWORKS WITH
`
`,SOCALITY EXPLOITATION” that is incorporated by ref-
`erence above.
`In addition the optimization with the VLSI layouts dis-
`closed in the current invention are also applicable to general-
`ized multi-stage pyramid networks VP (N1, N2, d, s), gener-
`alized folded multi-stage pyramidnetworks Vfozd? (N1, N2, d,
`s), generalized butterfly fat pyramid networks bep (N1, N2, d,
`s), generalized multi-link multi-stage pyramid networks
`V,nh,,k_P (N1, N2, d, s), generalized folded multi-link multi-
`
`20
`
`25
`
`30
`
`40
`
`60
`
`8
`
`stage pyramid networks Vf0,d_m,mk_P (N1, N2, d, s), general-
`ized multi-link butterfly fat pyramid networks VmIMHfi (N1,
`N2, d, s), generalized hypercube networks thube (N1, N2, d,
`s) and generalized cube connected