`
`a2) United States Patent
`US 9,374,322 B2
`(10) Patent No.:
`Konda
`Jun. 21, 2016
`(45) Date of Patent:
`
`(54)
`
`OPTIMIZATION OF MULTI-STAGE
`HIERARCHICAL NETWORKS FOR
`PRACTICAL ROUTING APPLICATIONS
`
`(56)
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`(71)
`
`Applicant: Konda Technologies Inc., San Jose, CA
`(US)
`
`(72)
`
`Inventor: Venkat Konda, San Jose, CA (US)
`
`(73)
`
`Assignee: Konda Technologies Inc., San Jose, CA
`(US)
`
`(*)
`
`Notice:
`
`Subject to any disclaimer, the term ofthis
`patent is extended or adjusted under 35
`US.C. 154(b) by 125 days.
`
`(21)
`
`Appl. No.: 14/199,168
`
`(22)
`
`Filed:
`
`Mar.6, 2014
`
`(65)
`
`(31)
`
`(52)
`
`(58)
`
`Prior Publication Data
`
`US 2014/0313930 Al
`
`Oct. 23, 2014
`
`Int. Cl.
`
`HOAL 12/933
`US. Cl.
`
`(2013.01)
`
`CPO ieee teeters HO4L 49/1515 (2013.01)
`Field of Classification Search
`CPC vicecceeceeetesteetersesseesessssssneceecneones HO4L 49/1515
`USPC liecccccccceeseeteetenseseesesseesessesansetsensenees 370/254
`See application file for complete search history.
`
`5,654,695 A *
`
`6,091,723 A *
`
`6,335,930 BI*
`
`8/1997 Olnowich .......... H04Q 11/0478
`340/2.23
`7/2000 Even woe II04L 49/1507
`340/2.21
`1/2002 Lee wo. HO4L 49/101
`370/387
`6,469,540 B2* 10/2002 Nakaya ............ HO3K 19/17728
`326/38
`7,440,449 B2* 10/2008 Carson .....cc GO6T 7/20
`257/499
`6/2003 Fontana ou... HO4L 12/437
`370/216
`2/2011 Konda ou... GO6F 17/5077
`326/41
`2012/0269190 A1* 10/2012 Konda wc. GO6F 17/509
`370/388
`
`2003/0117946 A1*
`
`2011/0037498 A1*
`
`* cited by examiner
`
`Primary Examiner — Rasheed Gidado
`
`ABSTRACT
`(57)
`Significantly optimized multi-stage networks, useful in wide
`target applications, with VLSI layouts using only horizontal
`and vertical links to route large scale sub-integrated circuit
`blocks having inlet and outlet links, and laid out in an inte-
`grated circuit device in a two-dimensional grid arrangement
`ofblocks are presented. The optimized multi-stage networks
`in cach block employ several rings of stages of switches with
`inlet and outlet links of sub-integrated circuit blocks connect-
`ing to rings from eitherleft-hand side only, or from right-hand
`sideonly, or from bothleft-hand side and right-handside; and
`employ shuffle exchange links where outlet links of cross
`links from switches in a stage of a ring in one sub-integrated
`circuit block are connectedto eitherinlet links of switches in
`the another stage of a ring in the same or another sub-inte-
`grated circuit block.
`20 Claims, 19 Drawing Sheets
`
`Botyitar2)
`
`Ring ‘x", Stage “p”
`
`Page 1 of 48
`
`FLEX LOGIX EXHIBIT 1035
`
`Page 1 of 48
`
`FLEX LOGIX EXHIBIT 1035
`
`
`
`U.S. Patent
`
`Jun. 21, 2016
`
`Sheet 1 of 19
`
`US 9,374,322 B2
`
`s
`
`(Wed
`
`AU-LZ" DIAL
`
`FIG.1A
`
`
`
` Ring1,Stage
`“m”
` NET XEY
`100
`
` Ring1,Stage“m-1”
`
`
` Ring1,Stage1 Ring2,Stage1
`
`
` Ring1,Stage0 Ring2,Stage0
` B0t2.2)
`
` (@ze4 a Wein 1
`
`Computational
`Block
`
`Page 2 of 48
`
`Page 2 of 48
`
`
`
`U.S. Patent
`
`Jun. 21, 2016
`
`Sheet 2 of 19
`
`US 9,374,322 B2
`
`|
`
`
`
`oz
`
`
`
`(4uz‘1)o4~ fl
`
`
`
`Ring1,Stage“m”
`
`E©a
`£no
`D>
`&oe
`
`Ring2,Stage1
`
`Ring1,Stage1
`
`Ring1,Stage0
`
`2
`
`FIG.1B
`
`
`
`
`Page 3 of 48
`
`Page 3 of 48
`
`
`
`U.S. Patent
`
`Jun. 21, 2016
`
`Sheet 3 of 19
`
`US 9,374,322 B2
`
`FIG.2A 24
`
`20
`
`FIG. 2B
`
`Fotk2mn )
`ra
`
`Fo(k,2m+2)
`a
`
`Ri(k,2m+1)
`5
`i
`
`Ri(k,2m+2)
`:
`_
`
`Uitk.2m+1) Botk,2m+1
`_?
`‘)
`
`UKk2rm2) Bo(k,2m+2)
`
`200B
`
`Fo(k,2m+1)
`C “4
`
`Fotk,2m+z)
`
`Ui(k,2m+1)
`—
`
`Uilk.2me2)
`
`
`5
`
`FIG. 2¢
`
`200
`
`FIG. 2D
`
`200D
`
`Fo(k,2m+1)
`-)
`
`Fo(k,2m+2)
`
`Page 4 of 48
`
`Page 4 of 48
`
`
`
`U.S. Patent
`
`Jun. 21, 2016
`
`Sheet 4 of 19
`
`US 9,374,322 B2
`
`Ri(k,2m+1)
`
`FIG. 2E
`
`~~
`
`Page 5 of 48
`
`Page 5 of 48
`
`
`
`US.
`
`Patent
`
`Jun. 21, 2016
`
`Sheet 5 of 19
`
`US 9,374,322 B2
`
`FIG. 3A
`
` Ring “x”, Stage “p”
`
`Rix.2p+1)
`Ri(x.2p+3)
`
`
`
`
`=
`
`
`
`
`
`
`Fo(x.2p+3)
`a“
`
`[- Al
`
`
`coovaneC
`
`
`
`
`Eo(x,2p+2)
`
`
`
`Rifx,2p+4)
`
`,
`
`Fo(x,2p+4)
`y-
`
`Ui(x,2p+3)
`
`Ui(x.2p+4)
`y
`
`dq
`y+dz'x
`
`
`
`2Flees
`
`
`
`
`l
`rn
`
`
`
`
`
`
`
`.S>
`
`
`
`
`
`
`
`—_Hop(1,2)
`Hop(2.2)
`
`
`
`Hop(t.1y
`Hop(2,1)7 >
`
`Page 6 of 48
`
`Page 6 of 48
`
`
`
`U.S. Patent
`
`Jun. 21, 2016
`
`Sheet 6 of 19
`
`US 9,374,322 B2
`
`FIG, 3B
`
`an
`Ring “x”, Stage “p”
`
`Ri(x.2p+1)
`
`Hop(1.1f~
`Hop(2,1)7~
`
`.Hop(1.2)
`Hop(2.2)
`
`Ring "y”, Stage “q’
`
`Ring “y’, Stage “q+1"
`
` pRily.2q+3)
`
`Roly,2q+3
`,
`(y,2q+3)
`
`Fo(y.2q+3)
`¢
`
`Uily,2q+3)
`
`Uily,2q4+4)
`>
`
`Rity.2g+4if
`
`Boly,2q+3)
`._Boty.2q+4)
`
`Page 7 of 48
`
`Page 7 of 48
`
`
`
`U.S. Patent
`
`Jun. 21, 2016
`
`Sheet 7 of 19
`
`US 9,374,322 B2
`
`FIG. 4
`
`Ring “x”, Stage “p”
`
`Foly,2q+4)
`
`Ring “x”, Stage “p+1"
`
`
`
`
`Uily.2q+3)
`
`\
`Uily,2q+4)
`
`Page 8 of 48
`
`Page 8 of 48
`
`
`
`U.S. Patent
`
`Jun. 21, 2016
`
`Sheet 8 of 19
`
`US 9,374,322 B2
`
`-
`
`FIG. 5
`
`Ring “x”, Stage “p’
`
`Ring “x”, Stage “p+1"
`
`Page 9 of 48
`
`Page 9 of 48
`
`
`
`U.S. Patent
`
`Jun. 21, 2016
`
`Sheet 9 of 19
`
`US 9,374,322 B2
`
`FIG.6
`
`Ring “x”, Stage “p”
`
`Page 10 of 48
`
`Page 10 of 48
`
`
`
`U.S. Patent
`
`Jun. 21, 2016
`
`Sheet 10 of 19
`
`US 9,374,322 B2
`
`FIG. 7
`
`_
`
`L1 V1 U1H1 K1L2
`
`saasinareTES ES SS
`
`L1V3 U3 L2 H3 K3 V5
`
`“cones $A
`
`Page 11 of 48
`
`Page 11 of 48
`
`
`
`U.S. Patent
`
`Jun. 21, 2016
`
`Sheet 11 of 19
`
`US 9,374,322 B2
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`(OL‘oL)|(6'0L)@ol)|(or)|(9'or)|(Gor)|(rol)|(e'or)=or)|Cod)
`
`
`
`
`
`
`
`
`
`(oL'z(62)(g'z)(2'2)(9°)(o°2)(v2)(eZ)(22)(L°2)
`
`
`
`
`
`(ol'e(6'e)(g'e)(2'€)(9€)(g‘€)(re)(e'e)(2'€)(L'€)
`
`
`
`
`
`
`
`
`
`
`(oL'r(6'v)(8‘v)(2'v)(9'r)(g‘v)(v'v)(e'r)(Z'v)(Lv)
`
`
`
`
`
`
`
`
`
`(o's(6's)(8's)(2'g)(9°¢)(g°¢)(v's)(e's)(27'S)(L°S)
`
`
`
`
`
`
`
`
`
`(oL'9(6'9)(g'9)(2'9)(9°9)(g‘9)(r'9)(e'9)(2'9)(L'9)
`
`
`
`
`
`
`
`
`(org(62)(8°)(2'2)(9°2)(a°2)(PL)(e'Z)(22)(L°2)
`
`
`
`
`
`
`
`
`(o's(6'8)(s'8)(2'8)(98)(gg)(v'8)(e'g)(Z'8)(L'8)
`
`
`
`
`
`
`
`
`
`(or'6(6'6)(8'6)(2'6)(9°6)(g°6)(v6)(€'6)(2'6)(L°6)
`
`
`(OVE(e'L)(ZL)
`
`
`
`
`
`
`
`
`
`
`
`8Old
`
`
`
`
`
`yoolg
`
`Page 12 of 48
`
`Page 12 of 48
`
`
`
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`Jun. 21, 2016
`
`Sheet 12 of 19
`
`US 9,374,322 B2
`
`FIG, 9A
`
`90
`
`Fi(k,2m+1)
`.
`}
`YFi(k,2m+1)
`
`Fi(k,2m+2)
`
`!
`Fo(k,2m+1)
`
`FIG.9B
`
`9008
`
`FIG, 9C Bee
`
`_S
`
`Fi(k,2m+1)
`
`Fo(k,2m+2)
`5
`
`Uik,2m+1)
`: a
`
`YUitk,.2m#1)
`ot
`Lilk,2m+2)
`
`Fi(k,2m+2)
`
`Bo(k,2m+1)
`
`Botk,2mt2)
`a
`
`900D
`
`Fo(k2mey
`5
`—
`
` Plk2m*t)
`2
`7
`YFi(k,2m+1)
`
`Fo(k,2m+1)
`3
`
`Fo(k,2m+2)
`Fe
`
`Uitk:2met)
`
`UYitk,2m-+1)
`3
`Ui(k,2m+2)
`
`Fock.2met)
`:
`=
`
`Fitk,2m+1)_,
`YFi(k,2m+1).,
`
`Fo(k,2m+2)
`900E
` Fo(k,2m+1)
`Uitk,2m+2)
`
`
`Fi(k,2m+2). Fotk,2me2)—Fi(k.2ma2) Fo(k,2m+2)
`
`
`eo
`~
`oo
`
`Bo(k,2m+1).
`
`Bo(k,2m+2
`o(k,2m+2).
`a
`
`Ui(k,2m+1)
`5
`
`Bojk,2m+1)
`
`YUitk,2m+1)
`
`=)
`Ui(k,2m#2)
`
`Bo(k,2m+2)
`‘2
`
`ik.2met)
`at
`
`UYitk,2m+1)
`
`I
`
`Page 13 of 48
`
`Page 13 of 48
`
`
`
`U.S. Patent
`
`Jun. 21, 2016
`
`Sheet 13 of 19
`
`US 9,374,322 B2
`
`FIG. 10A
`
`1000A
`
`FIG. 10B
`
`1000B
`
`Ri(k,.2m+1)}
`t
`YRi(k,.2m+1)
`
`YRok2m]
`
`Ri(k.2m+2)?rise
`Bo(k.2met)Bi;
`Botk.2m+2)S|
`
`RYi(k,271+1)
`
`FIG. 10D=1099D
`
`FIG. 10C 1099
`
`Fo(k,2m+1)
`
`Fo(k,2m+2)
`
`Uitk,2m+1)
`Yui2m)
`Ui(k,2m+2)
`
`Ri(k.2m+2};
`Ui(k,2m+2)
`
`
`sal
`Bo(k,2m+2)
`C
`
`Ri(k,2m+1)
`2
`
`Ri(k,2m+2)
`
`Botk.2m+)
`
`Ri(k,2m+1)
`YRI(k,2m+1)_
`
`Ri(k,2m+2)—
`
`Bo(k,2m+1),
`
`Bo(k,2m+2).
`
`}
`
`Page 14 of 48
`
`rothioms' Rileamet)
`~
`RYi(k,2m+1)
`Fo(k.2m+2)
`
`i
`Uitkamet}e2)
`—
`Bo(k,2m-+1)
`Titk,2met)
`"
`Ui(k,2m+2)
`-
`
`Botk,2m+2)
`
`Page 14 of 48
`
`
`
`U.S. Patent
`
`Jun. 21, 2016
`
`Sheet 14 of 19
`
`US 9,374,322 B2
`
`FIG. 11A
`
`FYi(k,2m+2)
`
`Ri(k,2m+1)
`
`Ri(k,2m+2)7
`
`Bo(k,2m+1)
`
`Bo(k,2m+2)
`
`FIG. 11C
`
`FYi(k,2m+2)
`
`Ri(k,2m#1)
`
`(~
`Ri(k,2m+2)
`
`Bo(k,2m+1)
`
`\
`Fo(k,2m+#1)
`
`FYo(k,2m+2)
`
`2
`
`Ui,2me2)
`
`Ri(k,2m+1)
`
`FIG. 11B
`
`ae
`
`‘ Ui(k,2m+1)
`Bo(k,2m+1)
`BYo(k,2m+2)~
`
`Page 15 of 48
`
`Page 15 of 48
`
`
`
`U.S. Patent
`
`\©=SoNn
`
`MN
`
`°o
`
`US 9,374,322 B2
`
`N—_(esbz'Ajog
`
` mM=—x__ALZ)d0H=I~(1°b)IOH
`
`a=nN===+920CfesdzNAM
`»BIS(1X, Bury
`
`
`
`at+d,eBeyg“x,Bury
`
`}
`
`e4dz'X)OAYL
`
`(esdz0K
`
`00etZIOd
`
`Page 16 of 48
`
`(.+be‘MInA
`
`d
`
`6,oBeg
`‘A,Bury
`
`Page 16 of 48
`
`
`
`U.S. Patent
`
`Jun. 21, 2016
`
`Sheet 16 of 19
`
`US 9,374,322 B2
`
`
`
`oerlO
`
`«(e+dz50ry
`
` ud,
`
`abeys‘x,Bury
`
`C4201 lesdzWINA}C(.+d7*x)og
`
`'
`
`L—._(LL)
`(2+dz°x)IN
`
`Cc
`
`(z+dz'"0g
`
`
`
`~(Ledeisa
`
`(dey
`
`Page 17 of 48
`
`
`
`ab+b,a6ers‘4,Bury<b,a6e1g‘A,Bury
`
`
`
`
`
`Cp+bz"A)ry
`
`(ezbzAjog
`
`e+bz*A)OAY
`
`(3
`
`
`
`(.\:(exbe'A)l4<!5.pabzAaa
`
`>
`
`
`
`iC(rsbe'Aogiz+bz"Aog
`
`Page 17 of 48
`
`
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`Jun. 21, 2016
`
`Sheet 17 of 19
`
`US 9,374,322 B2
`
`
`
`
`
`bth,ebeig‘A,Bury
`
`ZAON
`
`bebZAVIAN
`
`(Z+bz‘AIin
`
`
`(pede'xjog(dex)
`(erde'xio4(dex
`(Loas{LE6Or 3
`+dz‘xon,KCc.
`-:c
`
`ul+d,eBeigx,Buryi,Bey‘x,Bury
`
`(e+d2x)I4
`
`(2doHLLe)doH
`
`(\
`
`(pidzxin
`
`\
`
`(e+dz*)IN)
`
`(+dz‘xjog
`
`0077rl‘91d
`
`Page 18 of 48
`
`Page 18 of 48
`
`
`
`U.S. Patent
`
`Jun. 21, 2016
`
`Sheet 18 of 19
`
`US 9,374,322 B2
`
`(pebz“A)IN(Z+bz‘A)OAR
`
`(“xJINogz4deK)INdexOAg >c(L+ez"x)0g
`
`onsSl‘Old
`
`Page 19 of 48
`
`.d,BeisUx,Bury
`
`(L+d2"x)oy
`
`oS
`
`\
`
`'x)OH
`
`(Z+dz“0ry
`
`
`
`
`
`ab+b,abe1g‘A,Bury
`
`(erbz‘Aag
`
`C(.+b2'Kog
`
`Page 19 of 48
`
`
`
`U.S. Patent
`
`Jun. 21, 2016
`
`Sheet 19 of 19
`
`US 9,374,322 B2
`
`LeLVOO9L
`
`
`
`IhHlJ(Z'Wdo(tL)
`
`pv009levo091
`
`
`=Zl/°Od04doa‘JvaGWA
`
`
`
`
`PV9OTDLAEVOL‘OL
`
`
`
`(JAY101.1d)(JAW1O11d)
`
`
`
`Z10.(110
`
`Z10110
`
`CV9OTOM
`
`QAV10114)
`
`
`
`(ZAd—_-(Z'2)d9
`IZI
`
`yZ10(110
`
`
`
`r1IV9l‘Od
`
`(IV10114)
`
`zZWO009L
`
`VOLOld
`
`Page 20 of 48
`
`Page 20 of 48
`
`
`
`
`US 9,374,322 B2
`
`1
`OPTIMIZATION OF MULTI-STAGE
`HIERARCHICAL NETWORKS FOR
`PRACTICAL ROUTING APPLICATIONS
`
`
`CROSS REFERENCE TO RELATED
`APPLICATIONS
`
`
`
`This application is related to and claimspriority ofthe PCT
`Application Serial No. PCT/US12/53814 entitled “OPTIMI-
`
`ZATION OF MULTI-STAGE HIERARCHICAL NET-
`WORKSFOR PRACTICAL ROUTING APPLICATIONS”
`by Venkat Konda assigned to the sameassigneeas the current
`application,filed Sep. 6, 2012 and the U.S. Provisional Patent
`Application Ser. No. 61/531,615 entitled “OPTIMIZATION
`
`OF MULTI-STAGE HIERARCHICAL NETWORKS FOR
`PRACTICAL ROUTING APPLICATIONS” by Venkat
`Konda assigned to the same assignee as the current applica-
`ion, filed Sep. 7, 2011.
`This applicationis related to and incorporates byreference
`in its entirety the U.S. Pat. No. 8,270,400 entitled “FULLY
`CONNECTED GENERALIZED MULTI-STAGE NET-
`
`WORKS?”by Venkat Konda assignedto the same assignee as
`he current application, issued Sep. 18, 2012, the PCT Appli-
`cation Serial No. PCT/U08/56064 entitled “FUTLTY CON-
`NECTED GENERALIZED MULTI-STAGE NETWORKS”
`
`
`
`
`
`by Venkat Konda assigned to the sameassignee as the current
`application, filed Mar. 6, 2008, the U.S. Provisional Patent
`Application Ser. No. 60/905,526 entitled “LARGE SCALE
`CROSSPOINT REDUCTION WITH NONBLOCKING
`UNICAST & MULTICAST IN ARBITRARILY LARGE
`
`MULTI-STAGE NETWORKS”by Venkat Kondaassigned to
`he same assignee as the current application, filed Mar. 6,
`2007, and the U.S. Provisional Patent Application Ser. No.
`60/940,383 entitled “FULLY CONNECTED GENERAL-
`IZED MULTI-STAGE NETWORKS”by Venkat Konda
`assigned to the same assignee as the current application,filed
`May 25, 2007.
`This applicationis related to and incorporates byreference
`in its entirety the U.S. Pat. No. 8,170,040 entitled “FULLY
`CONNECTED GENERALIZED BUTTERFLY FAT TREE
`
`NETWORKS” by Venkat Konda assigned to the same
`assignee as the current application, issued May 1, 2012,
`the
`PCT Application Serial No. PCT/U08/64603 entitled
`“FULLY CONNECTED GENERALIZED BUTTERFLY
`FAT TREE NETWORKS?”by Venkat Konda assigned to the
`sameassigneeas the current application, filed May22, 2008,
`the U.S. Provisional Patent Application Ser. No. 60/940,387
`entitled “FULLY CONNECTED GENERALIZED BUT-
`TERFLY FAT TREE NETWORKS”by Venkat Konda
`assigned to the same assignee as the current application,filed
`May 25, 2007, and the U.S. Provisional Patent Application
`Ser. No. 60/940,390 entitled “FULLY CONNECTED GEN-
`ERALIZED MULTI-LINK BUTTERFLY FAT TREE NET-
`
`
`
`
`
`
`
`WORKS”by Venkat Konda assignedto the same assignee as
`he current application, filed May 25, 2007.
`This applicationis related to and incorporates byreference
`in its entirety the U.S. Pat. No. 8,363,649 entitled “FULLY
`CONNECTED GENERALIZED MULTI-LINK MULTI-
`STAGE NETWORKS”by Venkat Konda assigned to the
`sameassignee as the current application, issued Jan. 29, 2013,
`he PCT Application Serial No. PCT/U08/64604 entitled
`‘TULLY CONNECTED GENERALIZED MULTI-LINK
`
`MULTI-STAGE NETWORKS”by Venkat Kondaassigned to
`he same assignee as the current application, filed May 22,
`2008,
`the U.S. Provisional Patent Application Ser. No.
`60/940,389 entitled “FULLY CONNECTED GENERAL-
`IZED REARRANGEABLY NONBILOCKING MULTT-
`
`Page 21 of 48
`
`10
`
`20
`
`25
`
`30
`
`40
`
`50
`
`60
`
`2
`
`LINK MULTI-STAGE NETWORKS” by Venkat Konda
`assigned to the same assigneeas the current application,filed
`May 25, 2007, the U.S. Provisional Patent Application Ser.
`No. 60/940,391 entitled “FULLY CONNECTED GENER-
`ALIZED FOLDED MULTI-STAGE NETWORKS”by Ven-
`kat Konda assigned to the same assignee asthe current appli-
`cation, filed May 25, 2007 and the US. Provisional Patent
`Application Ser. No. 60/940,392 entitled “FULLY CON-
`NECTED GENERALIZED STRICTLY NONBLOCKING
`
`MULTI-LINK MULTI-STAGE NETWORKS”by Venkat
`Kondaassigned to the same assignee as the current applica-
`tion, filed May 25, 2007.
`This applicationis related to and incorporates by reference
`in its entirety the U.S. Pat. No. 8,269,523 entitled “VISTI
`LAYOUTS OF FULLY CONNECTED GENERALIZED
`
`
`
`NETWORKS” by Venkat Konda assigned to the same
`assignee as the current application, issued Sep. 18, 2012, the
`PCT Application Serial No. PCT/U08/64605entitled “VLSI
`LAYOUTS OF FULLY CONNECTED GENERALIZED
`
`
`
`NETWORKS” by Venkat Konda assigned to the same
`assignee as the current application, filed May 22, 2008, and
`the U.S. Provisional Patent Application Ser. No. 60/940,394
`entitled “VLSI LAYOUTS OF FULLY CONNECTED GEN-
`ERALIZED NETWORKS?”by Venkat Kondaassignedto the
`same assignee as the current application,filed May 25, 2007.
`This applicationis related to and incorporates by reference
`in its entirety the U.S. application Ser. No. 13/502,207
`entitled “VLSI LAYOUTS OF FULLY CONNECTED GEN-
`ERALIZED AND PYRAMID NETWORKS WITH
`
`LOCALITY EXPLOITATION”by Venkat Konda assigned to
`the same assignee as the current application, filed Apr. 16,
`2012,
`the PCT Application Serial No. PCT/US10/52984
`entitled “VLSI LAYOUTS OF FULLY CONNECTED GEN-
`ERALIZED AND PYRAMID NETWORKS WITH
`
`
`
`
`
`LOCALITY EXPLOITATION”by Venkat Konda assigned to
`the same assignee as the current application, filed Oct. 16,
`2010,
`the U.S. Provisional Patent Application Ser. No.
`61/252,603 entitled “VLSI LAYOUTS OF FULLY CON-
`NECTED NETWORKS WITH LOCALITY EXPLOITA-
`
`
`
`TION”by Venkat Konda assigned to the same assignee asthe
`current application,filed Oct. 16, 2009, and the US. Provi-
`sional Patent Application Ser. No. 61/252,609 entitled “VLSI
`LAYOUTS OF FULLY CONNECTED GENERALIZED
`
`AND PYRAMID NETWORKS”by Venkat Kondaassigned
`to the sameassignee as the current application,filed Oct. 16,
`2009.
`
`BACKGROUNDOF INVENTION
`
`Multi-stage interconnection networks such as Benesnet-
`works and butterfly fat tree networks are widely useful in
`telecommunications, parallel and distributed computing.
`However VLSI layouts, knownin the priorart, of these inter-
`connection networks in an integrated circuit are inefficient
`and complicated.
`Other multi-stage interconnection networksincluding but-
`terfly fat tree networks, Banyan networks, Batcher-Banyan
`networks, Baseline networks, Delta networks, Omega net-
`works and Flip networks have been widely studied particu-
`larly for self-routing packet switching applications. Also
`Benes Networks with radix of two have been widely studied
`and it is known that Benes Networksof radix two are shown
`to be built with back to back baseline networks which are
`rearrangeably nonblocking for unicast connections.
`The most commonly used VIST layout in an integrated
`circuit is based on a two-dimensional grid model comprising
`only horizontal and vertical tracks. An intuitive interconnec-
`
`Page 21 of 48
`
`
`
`US 9,374,322 B2
`
`3
`tion network that utilizes two-dimensional grid model is 2D
`Mesh Network and its variations such as segmented mesh
`networks. Hence routing networks used in VLSI layouts are
`typically 2D mesh networks and its variations. However
`Mesh Networksrequire large scale cross points typically with
`a growth rate of O(N) where N is the number of computing
`elements, ports, or logic elements depending on the applica-
`tion.
`Multi-stage interconnection network with a growthrate of
`O(Nxlog N) requires significantly small number of cross
`points. U.S. Pat. No. 6,185,220 entitled “Grid Layouts of
`Switching and Sorting Networks” granted to Muthukrishnan
`etal. describes a VLSI layout using existing VLSI grid model
`for Benes and Butterfly networks. U.S. Pat. No. 6,940,308
`entitled “Interconnection Network for a Field Programmable
`Gate Array” granted to Wong describes a VLSI layout where
`switches belonging to lower stage of Benes Networkare laid
`out close to the logic cells and switches belonging to higher
`stages are laid out towards the center of the layout.
`Dueto the inefficient and in some cases impractical VLSI
`layout of Benes and butterfly fat tree networks on a semicon-
`ductor chip, today mesh networks and segmented mesh net-
`works are widely used in the practical applications such as
`field programmable gate arrays (FPGAs), programmable
`logic devices (PLDs), and parallel computing interconnects.
`The prior art VLSI layouts of Benes and butterfly fat tree
`networks and VLSIlayouts ofmesh networks and segmented
`mesh networks require large area to implement the switches
`on the chip,
`large number of wires,
`longer wires, with
`increased power consumption, increased latency of the sig-
`nals which effect the maximum clock speed of operation.
`Some networks may not even be implementedpractically on
`a chip dueto the lack ofefficient layouts.
`Fully connected Benes and butterfly fat tree networks are
`an overkill for certain practical routing applications and need
`to be optimized to significantly improve area, power and
`performanceof the routing network.
`
`SUMMARYOF INVENTION
`
`Significantly optimized multi-stage networks, useful in
`wide target applications, with VLSI layouts (or floor plans)
`using only horizontal and vertical links to route large scale
`sub-integrated circuit blocks having inlet and outlet links, and
`laid out in an integrated circuit device in a two-dimensional
`grid arrangementof blocks, (for example in an FPGA where
`the sub-integrated circuit blocks are Lookup Tables, or
`memory blocks, or DSP blocks) are presented. The optimized
`multi-stage networks in each block employ several rings of
`stages of switches with inlet andoutlet links of sub-integrated
`circuit blocks connecting to rings from either left-hand side
`only, or from right-hand sideonly, or from both left-hand side
`and right-handside.
`The optimized multi-stage networks with their VLSI lay-
`outs employ shuffle exchangelinks where outletlinks ofcross
`links from switches in a stage of a ring in one sub-integrated
`circuit block are connectedto either inlet links of switches in
`
`2oS
`
`25
`
`3°
`
`40
`
`45
`
`the another stage of a ring in another sub-integrated circuit
`block orinlet links of switchesin the anotherstage ofa ring in
`the same sub-integrated circuit block so that said cross links
`are either vertical links or horizontal andvice versa.
`
`60
`
`The VLSI layouts exploit spatial locality so that different
`sub-integrated circuit blocksthatare spatially nearer are con-
`nected with shorter shuffle exchange links compared to the
`shuffle exchange links between spatially farther sub-inte-
`grated circuit blocks. The optimized multi-stage networks
`provide high routability for broadcast, unicast and multicast
`
`Page 22 of 48
`
`4
`connections, yet with the benefits of significantly lower cross
`points hence smaller area, lower signal latency, lower power
`and with significant fast compilation or routing time.
`The optimized multi-stage networks Voonz (N,, No, d, 8) &
`Vcomp (N,, No. d, s) according to the current invention
`inherit the properties of one or more, in addition to additional
`properties, generalized multi-stage and pyramid networks V
`(N,, N,, d,s) & V, (N,, N3, d,s), generalized folded multi-
`stage and pyramid networks V,27(N,, No, d,s) & Vier, Ni;
`N,, d,s), generalized butterfly fat tree and butterfly fat pyra-
`mid networks Von (N,, N,, d. s) Vin (N,,N,, d, s). gener-
`alized multi-link multi-stage and pyramid networks V,,;.4
`(N,, No, d, 8) & Vintine-p (Ni, No, 4, 8), generalized folded
`multi-link multi-stage and pyramid networks Ve22nine (Ni;
`No, d, 8) & Vysia-mint-p (Ni, No, d,s), generalized multi-link
`butterfly fat tree and butterfly fat pyramid networksV,,,:,.4-4
`(N,, Nz, d, 8) & Vivineagy (Ni, No, d, s), generalized hyper-
`cube networks V,.,,¢ (N,, N», d, s), and generalized cube
`connected cycles networks Vocc (N,, N;, d, s) for s=1, 2, 3 or
`any numberin general.
`
`BRIEF DESCRIPTION OF DRAWINGS
`
`FIG. 1A is a diagram 100A of an exemplary partial multi-
`stage hierarchical network corresponding to one block with 4
`inputs and 2 outputs of a computational block connecting
`only from left-hand side, to route practical applications such
`as FPGArouting of hardware designs in accordance withthe
`invention.
`
`FIG. 1B is a diagram 100B ofan exemplary partial multi-
`stage hierarchical network corresponding to one block with8
`inpuls and 4 outpuls of a computational block connecting
`from both left-hand side and right-handside, to route practi-
`cal applications such as FPGA routing ofhardware designs in
`accordance with the invention.
`
`FIG. 2A is a diagram 200A, in an embodimentof, a stage
`a ring of multi-stage hierarchical network corresponding
`one block.
`FIG.2B is a diagram 200B,in an embodimentof, a stage in
`a ring of multi-stage hierarchical network corresponding to
`one block.
`FIG. 2C is a diagram 200C, in an embodimentof, a stage
`a ring of multi-stage hierarchical network corresponding
`one block.
`
`in
`°
`
`in
`°
`
`
`
`
`
`FIG. 2D is a diagram 200D, in an embodimentof, a stage in
`a ring of multi-stage hierarchical network corresponding to
`one block.
`FIG. 2E isa diagram 200E, in an embodimentof, a stage
`a ring of multi-stage hierarchical network corresponding
`one block.
`FIG.3A is a diagram 300A, in an embodimentof, all the
`connections between two successive stages of two different
`rings in the sameblockor in two different blocks of a multi-
`stage hierarchical network.
`FIG.3B is a diagram 300B, in an embodimentof, all the
`connections between two successive stages of two different
`rings in the sameblockorin two different blocks of a multi-
`stage hierarchical network.
`VIG. 4 is a diagram 400, in an embodimentof, all the
`connections between two successive stages of two different
`rings in the sameblockor in two different blocks of a multi-
`stage hierarchical network.
`FIG. 5 is a diagram 500, in an embodimentof, all the
`connections between two successive stages of two different
`rings in the sameblockor in two different blocks of a multi-
`stage hierarchical network.
`
`in
`°
`
`
`
`
`
`Page 22 of 48
`
`
`
`
`
`
`
`
`
`
`
`
`
`5
`FIG. 6 is a diagram 600, in an embodimentof, all the
`connections between two successive stages of two different
`rings in the same block or in two different blocks of a multi-
`stage hierarchical network.
`FIG. 7 is a diagram 700, is an embodiment of hop wire
`connection chart corresponding to a block of multi-stage
`hierarchical network.
`FIG.8 is a diagram 800, is an embodiment of 2D-grid of
`blocks with each block correspondingto a partial multi-stage
`network to implement an exemplary multi-stage hierarchical
`network, in accordance with the invention.
`FIG. 9A isa diagram 900A,in an embodimentof, a stage in
`a ring of multi-stage hicrarchical network corresponding to
`one block, with delay optimizations.
`FIG.9B is a diagram 900B, in an embodimentof, a stage in
`a ring of multi-stage hierarchical network corresponding to
`one block, with delay optimizations.
`FIG.9C is a diagram 900C,in an embodimentof, a stage in
`a ring of multi-stage hierarchical network corresponding to
`one block, with delay optimizations.
`TIG. 9D isa diagram 900D,in an embodimentof, a stage in
`a ring of multi-stage hierarchical network corresponding to
`one block, with delay optimizations.
`FIG. 9F is a diagram 900K, in an embodiment of, a stage in
`a ring of multi-stage hierarchical network corresponding to
`one block, with delay optimizations.
`FIG.10A is a diagram1000A,in an embodimentof, a stage
`inaring ofmulti-stage hierarchical network corresponding to
`one block, with delay optimizations.
`FIG. 10B is a diagram 1000B, in an embodimentof, a stage 30
`inaring ofmulti-stage hierarchical network corresponding to
`one block, with delay optimizations.
`FIG. 10C is a diagram 1000C,in an embodimentof, a stage
`in aring ofmulti-stage hierarchical network corresponding to
`one block, with delay optimizations.
`FIG. 10D is a diagram 1000D,in an embodimentof, a stage
`in aring ofmulti-stage hierarchical network corresponding to
`one block, with delay optimizations.
`FIG. 10Eis a diagram 1000E, in an embodimentof, a stage
`in aring ofmulti-stage hierarchical network corresponding to
`one block, with delay optimizations.
`FIG. 10F is a diagram 1000F,in an embodimentof, a stage
`in aring ofmulti-stage hierarchical network corresponding to
`one block, with delay optimizations.
`FIG. 11A isa diagram 1100A, in anembodimentof, a stage
`inaring ofmulti-stage hierarchical network corresponding to
`one block, with delay optimizations.
`FIG. 11B is a diagram 1100B, in an embodimentof, a stage
`in aring ofmulti-stage hierarchical network corresponding to
`one block, with delay optimizations.
`FIG. 11C is a diagram 1100C,in an embodimentof, a stage
`inaring ofmulti-stage hierarchical network corresponding to
`one block, with delay optimizations.
`FIG. 12 is a diagram 1200, in an embodiment, all the
`connections between two successive stages of two difterent
`rings in the same block or in two different blocks of a multi-
`stage hierarchical network with delay optimizations.
`FIG. 13 is a diagram 1300, in one embodiment, all the
`connections between two successive stages of two different
`rings in the same block or in two different blocks of a multi-
`stage hierarchical network with delay optimizations.
`TIG. 14 is a diagram 1400, in an embodimentof, all the
`connections between two successive stages of two different
`rings in the same block or in two different blocks of a multi-
`stage hierarchical network with delay optimizations.
`FIG. 15 is a diagram 1500, in an embodimentof, all the
`connections between two successive stages of two different
`
`US 9,374,322 B2
`
`
`
`6
`rings in the sameblockorin two different blocks of a multi-
`stage hierarchical network with delay optimizations.
`FIG. 16A1 is a diagram 1600A1 of an exemplaryprior art
`implementation of a two by two switch; FIG. 16A2 is a
`diagram 1600A2 for programmable integrated circuit prior
`art implementation of the diagram 1600A1 of FIG. 16A1;
`VIG. 16A3is a diagram 1600A3 for one-time programmable
`integrated circuit prior art implementation of the diagram
`1600A1 of FIG. 16A1; FIG. 16A4is a diagram 1600A4 for
`integrated circuit placement and route implementation of the
`diagram 1600A1 of FIG. 16A1.
`
`DETAILED DESCRIPTION OF THE INVENTION
`
`Fully connected multi-stage hierarchical networks are an
`over kill in every dimension such as area, power, and pertfor-
`mancefor certain practical routing applications and need to
`be optimized to significantly improve savings in area, power
`and performance of the routing network. The present inven-
`tion discloses several embodiments of the optimized multi-
`stage hierarchical networksfor practical routing applications
`along with their VLSIlayout (floor plan) feasibility and sim-
`plicity.
`The multi-stage hierarchical networks considered for opti-
`mization in the current invention include: generalized multi-
`stage networks V (N,, N3, d, s), generalized folded multi-
`stage networks V7 (N,, No, d,s), generalized butterfly fat
`tree networks Vin (N,,N.,, d, s), generalized multi-link multi-
`stage networks Vrin (N,, No, d,s), generalized folded multi-
`link multi-stage networks Vjj2¢-miimz (Ni, No, d, 8), general-
`ized multi-link butterfly fat tree networks V,ring-og (Ni, Nod,
`s), generalized hypercube networksV,,.,,-(N,, N>, d,s), and
`generalized cube connected cycles networks V.. (N,, N., d,
`s) for s=1, 2, 3 or any numberin general. Alternatively the
`optimized multi-stage hierarchical networks disclosed in this
`invention inherit the properties of one or more of these net-
`works, in addition to additional properties that may not be
`exhibited these networks.
`
`
`
`The optimized multi-stage hierarchical networks disclosed
`are applicable for practical routing applications, with several
`goals such as: 1) all the signals in the design starting from an
`inlet link of the network to an outlet link of the network need
`to be setup without blocking. These signals may consist of
`broadcast, unicast and multicast connections; Each routing
`resource may need to be used by only one signal or connec-
`tion; 2) physical area consumed by the routing network to
`setup all the signals needs to be small; 3) power consumption
`of the network needs to be small, after the signals are setup.
`Power maybe bothstatic power and dynamic power; 4) Delay
`ofthe signal or a connection needsto be smallafter it is setup
`through a path using several routing resourcesin the path. The
`smaller the delay of the connections will lead to faster per-
`formance of the design. Typically delay of the critical con-
`nections determines the performanceofthe design on a given
`network; 5) Designs need to be not only routed through the
`network(i.c., all the signals need to be sctup from inlct links
`ofthe networkto the outlet links of the network), but also the
`routing needsto bein faster time using efficient routing algo-
`rithms; 6) Efficient VLSI layout ofthe networkis also critical
`and can greatly influence all the other parameters including
`the area taken up by the networkon the chip, total numberof
`wires, length of the wires, delay through the signal paths and
`hence the maximum clock speed of operation.
`Thedifferent varieties ofmulti-stage networks described in
`various embodiments in the current invention have not been
`
`implemented previously on the semiconductor chips. The
`practical application of these networks includes Field Pro-
`
`20
`
`25
`
`40
`
`45
`
`60
`
`Page 23 of 48
`
`Page 23 of 48
`
`
`
`US 9,374,322 B2
`
`10
`
`20
`
`25
`
`7
`grammable Gate Array (FPGA) chips. Current commercial
`FPGAproducts suchas Xilinx’s Vertex, Altera’s Stratix, Lat-
`tice’s ECPx implementisland-style architecture using mesh
`and segmented mesh routing interconnects using either full
`crossbars or sparse crossbars. These routing interconnects
`consumelarge silicon area for crosspoints, long wires, large
`signal propagation delay and hence consumelot of power.
`The current invention discloses the optimization of multi-
`stage hierarchical networks for practical routing applications
`of numerous types of multi-stage networks. The optimiza-
`tions disclosed in the current invention are applicable to
`including the numerous generalized multi-stage networks
`disclosed in the following patent applications:
`1) Strictly and rearrangeably nonblocking for arbitrary
`fan-out multicast and unicast for generalized multi-stage net-
`works V(N,, N,, d, 5) with numerous connection topologies
`and the scheduling methodsare described in detail inthe U.S.
`Pat. No. 8,270,400 that is incorporated by reference above.
`2) Strictly and rearrangeably nonblocking for arbitrary
`fan-out multicast and unicast for generalized butterfly fat tree
`networks V,,,(N,, N., d,s) with numerous connection topolo-
`gies and the scheduling methodsare described in detail in the
`US. Pat. No. 8,170,040 that is incorporated by reference
`above.
`
`8
`stage pyramid networks Vyai¢municp (Ni, No, d, 8), general-
`ized multi-link butterfly fat pyramid networksV,,,7,...0 (Ni,
`N,, d,s), generalized hypercube networks V,,.,,,¢ (Ni, No, d,
`s) and generalized cube connected cycles networks Vcoc (N,,
`N,, d, s) for s=1, 2, 3 or any numberin general.
`Finally the current invention discloses the optimizations
`and VLSIlayouts ofmulti-stage hierarchical networks V,.4
`(N,, N2, d, s) and the optimizations and VLSI layouts of
`multi-stage hierarchical networks Vcomp (N,, N>, d, s) for
`practical routing applications (particularly to set up broad-
`cast, unicast and multicast connections), where “Comb”
`denotes the combination of and “D-Comb”denotesthe delay
`optimized combination of any of the gencralized multi-stage
`networks V (N,, N,, d, s), generalized folded multi-stage
`networks Vy,77 (N;, No, d, s), generalized butterfly fat tree
`networks Vig (Ni, No, d, s), generalized multi-link multi-
`stage networksV,,1in(N,, N>, d,s), generalized folded multi-
`link multi-stage networks Vj.74mim (Ni, No, d,s), general-
`ized multi-link butterfly fat tree networksV,,rins-o7 (Ni No, d,
`s), generalized multi-stage pyramid networksV,, (N,, Nz, d,
`s), generalized folded multi-stage pyramid networks V1,
`(N,,N;, d,s), generalized butterfly fat pyramid networks V,4,
`(N,, N,, d, s), generalized multi-link multi-stage pyramid
`networksV,,jnt-p (Ni; No,d,s), generalized folded multi-link
`3) Rearrangeably nonblocking for arbitrary fan-out multi-
`multi-stage pyramid networks V14.miink-p (Ni, Nz, d,s), gen-
`cast and unicast, and strictly nonblocking for unicas