`McCubbrey
`
`(10) Patent N0.:
`(45) Date of Patent:
`
`US 7,587,699 B2
`Sep. 8, 2009
`
`US007587699B2
`
`(54) AUTOMATED SYSTEM FOR DESIGNING
`AND DEVELOPING FIELD
`PROGRAMMABLE GATE ARRAYS
`
`(75) Inventor: David L. McCubbrey, AnnArbor, MI
`(US)
`
`'
`(73) Asslgneei Pixel Velocity, Inc, AnnArbor, MI (US)
`
`( * ) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 282 days.
`
`(21) Appl. N0.: 11/432,186
`
`(22) Filed:
`
`May 10, 2006
`
`(65)
`
`Prior Publication Data
`
`Us 2006/0206850 A1
`
`Sep' 14’ 2006
`_
`_
`Related U's' Apphcatlon Data
`(62) Division of application No. 10/441,581, ?led on May
`19, 2003, noW Pat. No. 7,073,158.
`(60) Provisional application No. 60/381 ,295, ?led on May
`17 2002
`’
`'
`(51) Int Cl
`(2006 01)
`Goa, '17/50
`(52) us. Cl. .................. .. 716/17; 716/6; 716/9; 716/12
`_
`_
`_
`(58) Field of Classi?cation Search ................... .. 716/6,
`_
`_
`7_16/9’ 12’ 17
`See apphcanon ?le for Complete Search hlstory'
`References Cited
`
`(56)
`
`US. PATENT DOCUMENTS
`
`6,086,629 A
`6,301,695 B1
`6,370,677 B1
`6,457,164 B1
`6,526,563 B1
`6,557,156 B1
`2003/0086300 A1
`2005/0165995 A1 *
`
`7/2000 McGettigan et a1.
`10/2001 Burnham et a1.
`4/2002 Carruthers et a1.
`9/2002 HWang et a1.
`2/2003 Baxter
`4/2003 Guccione
`5/2003 Noyes et a1.
`7/2005 Gemelli et a1. ............ .. 710/305
`
`* Cited by examiner
`
`Primary ExamineriThuan Do
`(74) Attorney, Agent, or Firmileffrey Schox
`
`(57)
`
`ABSTRACT
`
`An automated system and method for programming ?eld
`programmable gate arrays (FPGAS) is disclosed for imple
`menting user-de?ned algorithms speci?ed in a high level
`language. The system is particularly suited foruse With image
`processing algorithms and can speed up the process of imple
`menting and testing a fully Written high-level user-de?ned
`algomhm to a matter of a few mmutés’ ram-er than the days’
`Weeks or even months presently requ1red us1ng convent1ona1
`software tools. The automated system includes an analyzer
`module and a mapper module. The analyzer determines What
`logic components are required and their interrelationships,
`and Observes 1h‘? relatlve nmmg between the reqmred Con?
`ponents and then part1a1 products. The mapper module un
`liZes the Output from the analyzer module and determines
`Where the required logic components must be placed on a
`given target FPGA in order to reliably route, Without inter
`ference, the required interconnections betWeen various com
`ponents and U0.
`
`5,841,439 A * 11/1998 Pose et a1. ................ .. 345/418
`
`10 Claims, 13 Drawing Sheets
`
`Source
`Operation
`Code
`Graph
`1:) Analyze :1) Map
`
`Hardware
`Spec
`Generate
`Z?
`Bitstream
`
`101 100...
`
`1:
`
`System Constrain
`tS,
`Target Platform
`
`XILINX, EX. 1002
`Page 1 of 24
`
`
`
`US. Patent
`
`Sep. 8, 2009
`
`Sheet 1 0f 13
`
`US 7,587,699 B2
`
`I05
`
`I08
`
`I08
`
`65
`
`I08
`
`I08
`
`CLB
`
`CLB
`
`CLB
`
`CLB
`
`IOB
`
`PSM
`
`PSM
`
`PSM
`
`IOB
`
`CLB
`
`CLB
`
`CLB
`
`CLB
`
`I08
`
`IOB
`
`I
`
`I
`
`PSM
`
`PSM
`
`PSM
`
`I
`
`CLB
`
`CLB
`
`CLB
`
`CLB
`
`I05
`
`I PSM
`
`PSM
`
`PSM
`
`I
`
`IOB
`
`CLB
`
`CLB
`
`CLB
`
`CLB
`
`I05
`
`I08
`
`FPGA Q
`
`I08
`
`I08
`
`75 v
`CAP
`1_2.5.
`CONFIGURATION
`PORT 1%
`
`I08
`
`FIG. 1
`
`L70
`
`XILINX, EX. 1002
`Page 2 of 24
`
`
`
`US. Patent
`
`Sep. 8, 2009
`
`Sheet 2 0f 13
`
`US 7,587,699 B2
`
`z
`/
`
`Con?qurable Lo lC
`
`Programmable llO
`
`UUUUmmUQ/B
`
`UUUUUUUB @
`
`UUUUUUDU
`
`_1 L; M\
`
`ED555566
`
`Block Ram
`
`T L_
`
`UUUUUUUU
`
`mnimmimm
`
`Multiplier
`
`UUUUUUDD
`UDUUUUUU
`
`Switch
`Matrix
`
`cout A
`
`A cout
`Slice
`
`Slice
`
`Slice
`
`Slice
`
`cin
`
`SLICE
`
`RAM/
`Shift!
`LUT
`
`RAM/
`Shl?/
`LUT
`
`Reg
`
`Reg
`
`E
`
`4
`
`XILINX, EX. 1002
`Page 3 of 24
`
`
`
`US. Patent
`
`Sep. 8, 2009
`
`Sheet 3 0f 13
`
`US 7,587,699 B2
`
`(26
`)
`——i> Analyzer
`
`( 28
`)
`Mapper
`
`24
`
`FPGA
`
`K120
`
`H6. 5
`
`Consider Application
`Constraints
`l
`Select Architecture
`
`Automatically Identify Order
`and Dependents in Source
`Code
`l
`Map Out FPGA Using Selected
`Architecture and Identi?ed Order +
`Dependencies
`
`30
`
`Source
`Code
`
`Operation
`Graph
`:o
`
`Map
`
`Hardware
`Spec
`Generate
`a
`Bitstream
`
`Analyze
`
`ll
`
`System Constraints,
`Target Platform
`
`FlG. 7
`
`XILINX, EX. 1002
`Page 4 of 24
`
`
`
`XILINX, EX. 1002
`Page 5 of 24
`
`
`
`US. Patent
`
`Sep. 8, 2009
`
`Sheet 5 0f 13
`
`US 7,587,699 B2
`
`FIIG. 12
`
`XILINX, EX. 1002
`Page 6 of 24
`
`
`
`XILINX, EX. 1002
`Page 7 of 24
`
`
`
`US. Patent
`
`Sep. 8, 2009
`
`Sheet 7 0f 13
`
`US 7,587,699 B2
`
`Instructions
`
`Stige
`StZge
`Sta+ge
`Master
`360mg 1 3 2 g, k :3:
`Serial
`Serial
`Image
`?
`Image
`Input
`Output
`
`E Image
`Image
`i Buffers i Raster Sub- _, Combiner
`—> Array Processor
`
`FIG. 13
`
`Master
`
`Recon?guration Instructions
`L
`
`‘
`
`l
`
`__, ControI __> 3tage1 _> Stage 2 —————> Stagek ——> ,
`Serral
`SerraI
`Image
`Image
`Input
`Output
`
`I
`Raster Sub
`Array Processor
`
`FIG. 15
`
`XILINX, EX. 1002
`Page 8 of 24
`
`
`
`US. Patent
`
`Sep. 8, 2009
`
`Sheet 8 0f 13
`
`US 7,587,699 B2
`
`Serial
`Image
`Load &
`Unload
`
`Master
`Control
`
`'
`
`Instructions
`—>
`To All PEs
`
`FIG. 14
`
`XILINX, EX. 1002
`Page 9 of 24
`
`
`
`XILINX, EX. 1002
`Page 10 of 24
`
`
`
`XILINX, EX. 1002
`Page 11 of 24
`
`
`
`US. Patent
`
`Sep. 8, 2009
`
`Sheet 11 0f 13
`
`US 7,587,699 B2
`
`Edge = DCyI(snsr,2) - ECyl(snsr,2); Out = edge - snsr
`
`XILINX, EX. 1002
`Page 12 of 24
`
`
`
`US. Patent
`
`Sep. 8, 2009
`
`Sheet 12 0f 13
`
`US 7,587,699 B2
`
`snsr
`
`@w
`
`XILINX, EX. 1002
`Page 13 of 24
`
`
`
`US. Patent
`
`Sep. 8, 2009
`
`Sheet 13 0f 13
`
`US 7,587,699 B2
`
`nozmlwmggs
`
`aoamlomgse
`
`ii
`
`825465
`
`825.25
`U
`
`XILINX, EX. 1002
`Page 14 of 24
`
`
`
`US 7,587,699 B2
`
`1
`AUTOMATED SYSTEM FOR DESIGNING
`AND DEVELOPING FIELD
`PROGRAMMABLE GATE ARRAYS
`
`CROSS-REFERENCE TO RELATED
`APPLICATIONS
`
`This application is a divisional of US. patent application
`Ser. No. 10/441,581 ?led May 19, 2003 now US. Pat. No.
`7,073,158 entitled “Automated System for Designing and
`Developing Field Programmable Gate Arrays”, Which is
`hereby incorporated in its entirety by this reference.
`This application claims the bene?t of US. provisional
`patent application Ser. No. 60/381,295 ?led May 17, 2002
`entitled “Automated System for Designing and Developing
`Field Programmable Gate Arrays”, Which is hereby incorpo
`rated in its entirety by this reference.
`
`TECHNICAL FIELD
`
`This invention relates in general to systems and methods
`for designing, developing and programming ?eld program
`mable gate arrays (FPGAs), and in particular to automated
`systems and methods for designing, developing and program
`ming FPGAs to implement a user-Written algorithm speci?ed
`in a high-level language for processing data vectors With one,
`tWo or more dimensions, such as often are found in image
`processing and other computationally intense applications.
`
`BACKGROUND
`
`There are known bene?ts of using FPGAs for embedded
`machine vision or other image processing applications. These
`include processing image data at high frame rates, converting
`and mapping the data and performing image segmentation
`functions that Were all previously handled by dedicated, pro
`prietary processors. FPGAs are Well-knoWn for having a
`much greater poWer to process images, on the order of 10 to
`100 times that of conventional advanced microprocessors of
`comparable siZe. This is in part a function of the fully pro
`grammed FPGA being set up as a dedicated circuit designed
`to perform speci?c tasks and essentially nothing else.
`Another bene?t of FPGAs is their loW poWer consumption
`and loW Weight. FPGAs are very suitable for embedded avi
`onic applications, in-the-?eld mobile vision applications and
`severe-duty applications, such as mobile vehicles, including
`those Which are off-road, Where severe bumps and jolts are
`commonplace. These applications are very demanding in that
`they have severe space, Weight, and poWer constraints. Mod
`ern FPGAs noW have the processing capacity on a par With
`dedicated application-speci?c integrated circuits (ASICs),
`and are or can be made very rugged.
`FPGAs have groWn in popularity because they can be
`programmed to implement particular logic operations and
`reprogrammed easily as opposed to an application speci?c
`integrated circuit (hereafter ASIC) Where the functionality is
`?xed in silicon. But this very generic nature of FPGAs, delib
`erately made so they can be used in many different applica
`tions, is also a draWback due to the many di?iculties associ
`ated With ef?ciently and quickly taking a high level design
`speci?ed by a user, and translating it into a practical hardWare
`design that meets all applicable timing, ?oor plan and poWer
`requirements so that it Will run successfully upon the target
`FPGA. As is Well-knoWn, a high level user-generated design
`is typically speci?ed by a sequence of matrix array or math
`ematic operations, including local pixel neighborhood opera
`tions (such as erosion, dilation, edge detection, determination
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`of medial axis, etc.) and other forms of arithmetic or Boolean
`operations (e. g., addition, multiplication; accumulation;
`exclusive-OR, etc.), lookup table and shift register functions,
`and other functions like convolution, autocorrelation, and the
`like. In order to be able to handle all of this diverse logic, the
`individual logic blocks used in the FPGAs are made to be
`fairly generic.
`The problem in supporting all these applications and func
`tions is hoW to design recon?gurable hardWare resources that
`provide the most effective use of general purpose FPGA
`silicon for the speci?c image processing tasks to Which a
`given FPGA is put to use. FPGAs are by their very nature
`general purpose circuits that can be programmed to perform
`many different functions, such as digital signal processing
`used in Wireless communication, encryption and decryption
`for communications over the Internet, etc.
`One expected bene?t of FPGAs, since they are reprogram
`mable, is that they Would help eliminate the cost/risk of ASIC
`development. One of the feW things really holding back the
`larger use of FPGAs in vision applications has been the dif
`?culty in translating desired user-de?ned image processing
`algorithms into hardWare, and the dif?culty of updating those
`algorithms once they are in hardWare. If there Were a devel
`opment system for the design and programming of FPGAs
`that greatly simpli?ed the development of an image process
`ing algorithm or other sequence of desired operations into the
`bitstream coding required to program FPGAs, this might Well
`open up opportunities for Wider use of FPGAs in such appli
`cations as medical, automotive collision avoidance and com
`mercial video.
`For example, in the medical area, many medical imaging
`techniques have extremely high processing requirements.
`FPGAs, assuming that they can be programmed With the
`desired sequence of complex image processing steps, should
`produce smaller, faster and less expensive versions of existing
`image processing devices that presently require ASIC devices
`be developed. In addition, many neW applications Will
`become possible for the ?rst time, because FPGAs can give
`speedups of one, tWo and even three orders of magnitude over
`PCs, at a reasonable price. Automotive vision applications
`that are on the horizon include proposals to help enhance
`driver situational aWareness. Possible automotive vision
`applications include systems to assist With lane-changes, to
`provide backup obstacle Warnings, and to provide forWard
`collision Warnings.
`Commercial video FPGAs, if they Were much easier to
`design, program and test, Would likely ?nd much Wider use in
`video transcoders, compression, encryption and standards
`support, particularly in areas like MPEG-4. Many video
`applications are already being done With FPGAs, but the
`design, development and testing of such FPGAs is at present
`very labor-intensive in terms of designer and engineering
`services, Which drives up unit costs and sloWs doWn the
`transfer of proposed designs into actual commercial embodi
`ments.
`
`SUMMARY
`
`In light of the foregoing limitations and needs, the present
`invention provides an FPGA-based image processing plat
`form architecture that is capable dramatically speeding up the
`development of user-de?ned algorithms, such as those found
`in imaging applications.As a convenient shorthand reference,
`since the present invention is assigned to Pixel Velocity, Inc.
`of AnnArbor, Mich. (“PVI”), the system of the present inven
`tion Will at times be referred to as the PVI system, and the
`
`XILINX, EX. 1002
`Page 15 of 24
`
`
`
`US 7,587,699 B2
`
`3
`methods of the present invention discussed therein Will at
`times be referred to as the PVI methods.
`Generally, the present invention pertains to an automated
`system for programming ?eld programmable gate arrays (FP
`GAs) to implement a desired algorithm for processing data
`vectors With one, tWo or more of the dimensions. The PVI
`system automates the process of determining What logic com
`ponents are necessary and produces an optimiZed placement
`and routing of the logic on the FPGA. With this invention,
`FPGA programming development Work that used to take
`Weeks or months, in terms of trying to implement and test a
`previously-created user-de?ned algorithm, such as a
`sequence of steps to be carried out as part of an image pro
`cessing application in a machine vision system, can noW be
`completed in less than one day.
`As is Well-knoWn, Verilog and VHDL are languages for
`describing hardWare structures in development systems for
`Writing and programming FPGAs. In the methods and sys
`tems of the present invention, Verilog is used to develop What
`PVI refers to as “gateWare” Which provides speci?c hard
`Ware-level interfaces to things like image sensors and other
`U0. The end user evokes this functionality in much the Way
`prede?ned library functions are used in softWare today. The
`PVI system focuses solely on the image processing domain.
`At the application level, a user’s image processing algorithm
`is developed and veri?ed in C++ on a PC. An image class
`library and overloaded operators are preferably provided as
`part of the PVI system of the present invention to give users a
`Way of expressing algorithms at a high level. The PVI system
`uses that high level representation to infer a “correct-by
`construction” FPGA hardWare image data?oW processor
`automatically.
`In the method and systems of the present invention, the
`dedicated image processor is derived from the user’s source
`code and merged With prebuilt “gateWare” automatically, as
`part of the process of producing one or more loW-level ?les
`that may be referred to as hardWare-gate-programming ?les
`(or HGP ?les for short) for programming the FPGA(s) using
`knoWn loW-level softWare tools available from each FPGA
`manufacturer. The user thus ends up With a machine that
`poWers up and runs their algorithm on a continuous stream of
`images. A key advantage is that algorithm developers can
`Write and verify algorithms in a familiar and appropriate Way,
`then produce a “push-button” dedicated machine in only min
`utes, fabricated to do just that algorithm. In other Words, the
`PVI system of the present invention analyZes the imaging
`algorithm code speci?ed by the end user, that is the algorithm
`developer, and, by applying a sequence of steps, Which are
`further described beloW, generates a hardWare-gate-program
`ming ?le composed entirely of conventional commands and
`instructions that can be interpreted by loW-level FPGA pro
`gramming tools to produce bitstreams. These HGP ?les are
`used as a loW-level input ?le containing the code that speci
`?es, to conventional loW-level programming (LLP) softWare
`tools available from the FPGA manufacturer (that is, the
`bitstream generators used to hard code the FPGAs), the
`required connections to be programmed into the target FPGA.
`These LLP softWare tools are capable of reading and acting
`upon the commands represented by the HGP ?les in order to
`?eld-program the FPGA using conventional techniques. The
`method and systems of the present invention are preferably
`arranged to automatically apply, upon user command, the
`
`50
`
`55
`
`60
`
`65
`
`20
`
`25
`
`30
`
`35
`
`40
`
`4
`HGP ?le output they produce to these LLP softWare tools,
`thus completing the programming of the FPGA in a fully
`automatic manner.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The draWings form an integral part of the description of the
`preferred embodiments and are to be read in conjunction
`thereWith. Like reference numerals designate the same or
`similar components or features in the various Figures, Where:
`FIG. 1 is a simpli?ed block diagram of a knoWn FPGA.
`FIGS. 2, 3 and 4 are further simpli?ed the block diagrams
`illustrating a knoWn style of FPGA, Where FIG. 2 shoWs the
`overall layout of the FPGA, and also shoWs one of its speci?c
`sections enlarged to reveal the arrangement details of CLBs,
`block RAM and multiplier logic therein,
`FIG. 3 is an enlargement of a single CLB unit shoWing its
`sWitch matrix and its associated slices, Which contain still
`further units of con?gurable logic therein, and
`FIG. 4 is a enlarged vieW of one of the slices, shoWing its
`RAM, registers, shift registers and lookup tables, all of Which
`are programmable.
`FIG. 5 is a simpli?ed block diagram shoWing the sequence
`of operations used by the system and methods of the present
`invention, starting With a user-de?ned algorithm on the left,
`Whose content is entered into an analyZer module, Whose
`output in turn is entered into a mapper module, Whose output
`is a loW level source code that can be used to program an
`FPGA.
`FIG. 6 is a ?owchart illustrating the overall method of the
`present invention.
`FIG. 7 is another simpli?ed by block diagram like that
`shoWn in FIG. 5 Which represents the major steps utiliZed in
`methods of the present invention.
`FIG. 8 is a simpli?ed layout shoWing a preferred serpentine
`arrangement for a succession of image processing operations
`Which have been mapped onto a portion of the overall FPGA
`shoWn in FIG. 2.
`FIG. 9 is a more detailed vieW of the simpli?ed layout of
`FIG. 8 shoWing hoW the individual operations of the user
`de?ned sequence may be mapped onto CLBs typically
`betWeen tWo separate sections of RAM Which are used as
`delay lines in order to ensure that proper timing is maintained
`betWeen partial products of the image processing sequence.
`FIG. 10 is a simpli?ed perspective vieW of a presently
`preferred arrangement of printed circuit boards (PCBs),
`called a multi-processor stack, Wherein each of the PCBs
`preferably contains at least one FPGA, and also may typically
`have associated thereWith driver circuits, input/ output cir
`cuits, poWer circuits and the like in order to ensure proper
`operation of the FPGA, and also has in-line connectors rep
`resented by the elongated blocks for interconnecting the
`PCBs together, and for receiving to input/ output signals at the
`top and bottom of the stack, and also shoWing, on the top
`PCB, an image sensor and a miniature focusing lens in the
`center of the top board.
`FIG. 11 is a block diagram shoWing the interrelationship
`and Wiring connections betWeen the four PCBs in the stack of
`FIG. 10, Which illustrates the signal ?oW paths betWeen the
`individual PCBs and also illustrates a Workstation being con
`nected to the microcontroller PCB, Which Workstation passes
`the bitstream from the loW level programming tool located on
`the Workstation to the FPGA/program ?ash/RAM microcon
`troller, Which thereafter handles the loading of the bitstream
`after poWer up to the individual FPGAs.
`FIG. 12 is a simpli?ed perspective vieW of a digital camera
`With its generally rectangular enclosure, having an external
`
`XILINX, EX. 1002
`Page 16 of 24
`
`
`
`US 7,587,699 B2
`
`5
`lens on its left surface, Which external lens is used to project
`a visual image onto the image sensor located on the top PCB
`of the FIG. 10 stack shown located Within the camera enclo
`sure.
`FIG. 13 is a simpli?ed block diagram of a ?rst possible
`target architecture for the system of the present invention,
`namely a multi-pipeline raster sub-array.
`FIG. 14 is a simpli?ed block diagram of a second possible
`target architecture for the system of the present invention,
`namely a parallel array processor.
`FIG. 15 is a simpli?ed block diagram of a third possible
`target architecture for the system of the present invention,
`namely a pipeline raster sub-array processor.
`FIG. 16 is a more detailed diagram shoWing some of the
`details of the FIG. 15 target architecture.
`FIG. 17 illustrates on the bottom half thereof a Sobel opera
`tion data?oW produced by the analyZer module of the system
`of the present invention, and on the top half thereof illustrates
`the mapping of that Sobel operation data?oW onto a multi
`pipeline sub-array processor.
`FIG. 18 is an illustration of high-level source code, de?ned
`by an end user, and its translation into an associated operation
`data?oW diagram.
`FIG. 19 is an illustration of the simpli?cation of the FIG. 18
`operation data?oW diagram by the removal of unnecessary
`operations.
`FIG. 20 is an illustration of pipeline compensation being
`added to the resulting product in FIG. 19 in order to equalize
`the timing betWeen alternate data paths.
`FIG. 21 is an illustration of operator elaboration modifying
`the graph When the operator is built from more than one
`primitive component, as Would be carried out by the mapper
`When presented With a image processing sequence of the type
`shoWn on the left side of FIG. 21.
`
`DETAILED DESCRIPTION OF THE PREFERRED
`EMBODIMENTS
`
`The present invention is illustrated and described herein in
`connection With preferred embodiments, With the under
`standing that the present disclosure is to be considered as an
`exempli?cation of the principles of the invention and the
`associated functional speci?cations required for its imple
`mentation. HoWever, it shouldbe appreciated that the systems
`and methods of the present invention may be implemented in
`still different con?gurations and forms, and that other varia
`tions Within the scope of the present invention are possible
`based on the teachings herein.
`Prior to discussing the embodiments of the present inven
`tion, it is useful to look more closely at some of the knoWn
`characteristics of existing design, development and program
`ming systems used to provide hardWare programming bit
`streams to program FPGAs. Typically, such design and devel
`opment systems are implemented on Workstations operating
`under any suitable operating system, such as UNIX, Win
`doWs, Macintosh or Linux. Such development systems typi
`cally Will have suitable applications softWare such as ISE
`development system from Xilinx, and C++ or Java program
`ming compilers, to alloW programs Written by users to run
`thereon.
`Due to advancing semiconductor processing technology,
`integrated circuits have greatly increased in functionality and
`complexity. For example, programmable devices such as ?eld
`programmable gate arrays (FPGAs) and programmable logic
`devices (PLDs), can incorporate ever-increasing numbers of
`functional blocks and more ?exible interconnect structures to
`provide greater functionality and ?exibility.
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`A typical FPGA comprises a large plurality of con?gurable
`logic blocks (CLBs) surrounded by input-output blocks and
`interconnectable through a routing structure. The ?rst FPGA
`is described in US. reissue Pat. Re No. 34,363 to Freeman,
`and is incorporated herein by reference. The CLBs and rout
`ing structure of the FPGA are arranged in an array or in a
`plurality of sub-arrays Wherein respective CLBs and associ
`ated portions of the routing structure are placed edge to edge
`in What is commonly referred to as a tiled arrangement. Such
`a tiled arrangement is described in US. Pat. No. 5,682,107 to
`Tavana et al., the disclosure of Which is hereby incorporated
`by reference herein. The CLB portion of a tile comprises a
`plurality of primitive cells Which may be interconnected in a
`variety of Ways to perform a desired logic function. For
`example, a CLB may comprise a plurality of lookup tables
`(LUTs), multiplexers and registers. As used herein, the term
`“primitive cell” normally means the loWest level of user
`accessible component.
`FIG. 1 is a simpli?ed schematic diagram of a conventional
`FPGA 60. FPGA 60 includes user logic circuits such as
`input/output blocks (IOBs), con?gurable logic blocks
`(CLBs), and programmable interconnect 65, Which contains
`programmable sWitch matrices (PSMs). Each IOB and CLB
`can be con?gured through con?guration port 70 to perform a
`variety of functions. Programmable interconnect 65 can be
`con?gured to provide electrical connections betWeen the vari
`ous CLBs and IOBs by con?guring the PSMs and other
`programmable interconnection points (PIPS, not shoWn)
`through con?guration port 70. Typically, the IOBs can be
`con?gured to drive output signals or to receive input signals
`from various pins (not shoWn) of FPGA 60.
`FPGA 60 also includes dedicated internal logic. Dedicated
`internal logic performs speci?c functions and can only be
`minimally con?gured by a user. For example, con?guration
`port 70 is one example of dedicated internal logic. Other
`examples may include dedicated clock nets (not shoWn),
`poWer distribution grids (not shoWn), and boundary scan
`logic (i.e. IEEE Boundary Scan Standard 1 149.1, not shoWn).
`FPGA 60 is illustrated With 16 CLBS, l6 IOBs, and 9
`PSMs for clarity only. Actual FPGAs may contain thousands
`of CLBS, thousands of IOBs, and thousands of PSMs. The
`ratio of the number of CLBs, IOBs, and PSMs can also vary.
`FPGA 60 also includes dedicated con?guration logic cir
`cuits to program the user logic circuits. Speci?cally, each
`CLB, IOB, PSM, and PIP contains a con?guration memory
`(not shoWn) Which must be con?guredbefore each CLB, 10B,
`PSM, or PIP can perform a speci?ed function. Typically the
`con?guration memories Within an FPGA use static random
`access memory (SRAM) cells. The con?guration memories
`of FPGA 60 are connected by a con?guration structure (not
`shoWn) to con?guration port 70 through a con?guration
`access port (CAP) 75. A con?guration port (a set of pins used
`during the con?guration process) provides an interface for
`external con?guration devices to program the FPGA. The
`con?guration memories are typically arranged in roWs and
`columns. The columns are loaded from a frame register Which
`is in turn sequentially loaded from one or more sequential
`bitstreams. (The frame register is part of the con?guration
`structure referenced above.) In FPGA 60, con?guration
`access port 75 is essentially a bus access point that provides
`access from con?guration port 70 to the con?guration struc
`ture of FPGA 60.
`FIG. 1A illustrates a conventional method used to con?g
`ure FPGA 60. Speci?cally, FPGA 60 is coupled to a con?gu
`ration device 230 such as a serial programmable read only
`memory (SPROM), an electrically programmable read only
`memory (EPROM), or a microprocessor. Con?guration port
`
`XILINX, EX. 1002
`Page 17 of 24
`
`
`
`US 7,587,699 B2
`
`7
`70 receives con?guration data, usually in the form of a con
`?guration bitstream, from con?guration device 230. Typi
`cally, con?guration port 70 contains a set of mode p ins, a
`clock pin and a con?guration data input pin. Con?guration
`data from con?guration device 230 is transferred serially to
`FPGA 60 through the con?guration data input pin. In some
`embodiments of FPGA 60, con?guration port 70 comprises a
`set of con?guration data input pins to increase the data trans
`fer rate betWeen con?guration device 230 and FPGA 60 by
`transferring data in parallel. HoWever, due to the limited
`number of dedicated function pins available on an FPGA,
`con?guration port 70 usually has no more than eight con?gu
`ration data input pins. Further, some FPGAs alloW con?gu
`ration through a boundary scan chain. Speci?c examples for
`con?guring various FPGAs can be found on pages 4-46 to
`4-59 of “The Programmable Logic Data Book”, published in
`January, 1998 by Xilinx, Inc., and available from Xilinx, Inc.,
`2100 Logic Drive, San Jose, Calif. 95124, Which pages are
`incorporated herein by reference. Additional methods to pro
`gram FPGAs are described by in US. Pat. No. 6,028,445 to
`LaWman issued Feb. 22, 2000, assigned to Xilinx, Inc. and
`entitled “Decoder Structure and Method for FPGA Con?gu
`ration,” the disclosure of Which is hereby incorporated by
`reference herein.
`US. Pat. No. 6,086,629 to McGettigan et al. issued Jul. 11,
`2000, is entitled “Method for Design Implementation of
`Routing in an FPGA Using Placement Directives Such as
`Local Outputs and Virtual Buffers” (the ’629 patent), and is
`assigned to Xilinx, Inc. As explained therein, When an FPGA
`comprises thousands of CLBs in large arrays of tiles, the task
`of establishing the required multitude of interconnections
`betWeen primitive cells inside a CLB and betWeen the CLBs
`becomes so onerous that it requires softWare tool implemen
`tation. Accordingly, the manufacturers of FPGAs including
`Xilinx, Inc., have developed place and route softWare tools
`Which may be used by their customers to implement their
`respective designs. Place and route tools not only provide the
`means of implementing users’ designs, but can also provide
`an accurate and ?nal analysis of static timing and dynamic
`poWer consumption for an implemented design scheme. In
`fact, better place and route softWare provides iterative pro
`cesses to minimiZe timing and poWer consumption as a ?nal
`design implementation is approached. Iterative steps are usu
`ally necessary to reach a ?nal design primarily because of the
`unknoWn impact of the placement step on routing resources
`(Wires and connectors) available to interconnect the logic of a
`user’ s design. Iterative place and route procedures can be time
`consuming. A typical design implementation procedure can
`take many hours of computer time using conventional place
`and route softWare tools. Thus, as previously noted, there is an
`ongoing need to provide a method for reducing design imple
`mentation time by increasing the accuracy of static timing and
`dynamic poWer analysis during computer-aided design pro
`cedures for FPGAs. The ’629 patent addresses these issues of
`accuracy of static timing and dynamic poWer analyses. HoW
`ever, it does not provide a streamlined method for translating
`user-created algorithms into bitstreams.
`The ’629 patent also discusses the challenge presented to
`softWare tools used to place a user’s design into a coarse
`grained FPGA is to make optimum use of the features other
`than lookup tables and registers that are available in the FPGA
`architecture. These can include fast carry chains, XOR gates
`for generating sums, multiplexers for generating ?ve-input
`functions, and possibly other features available in the archi
`tecture. In order to achieve maximum density and maximum
`performance of user logic in an FPGA, the softWare must
`make use of these dedicated features Where possible. The
`
`40
`
`45
`
`20
`
`25
`
`30
`
`35
`
`50
`
`55
`
`60
`
`65
`
`8
`’629 patent also states that there is a need to densely pack the
`user’s design into the architecture that Will implement the
`design.
`The ’629 patent also discusses that it is Well-knoWn to
`specify or provide library elements Which re?ect features of
`the FPGA architecture in the typical development system
`provided to end-users. Several architectural features and
`associated timing and poWer parameters can be represented
`by variable parameters for one library element. For example,
`a lookup table library element has one variation in Which the
`lookup table output signal is applied to a routing line external
`to the con?gurable logic block (CLB), and another variation
`in Which the lookup table output signal is applied to another
`internal element of the CLB such as a ?ve-input function
`multiplexer or a carry chain control input. These tWo varia
`tions have different timing parameters associated With them
`because the time delay for driving an element internal to the
`CLB is less than the time delay for driving an interconnect
`line external to the CLB.
`If the FPGA user is using VHDL or schematic capture for
`design entry, the VHDL or schematic capture design entry
`tool Will auto-select the library elements, but the user must
`still control the design entry tool so it selects and connects the
`library elements properly. Alternatively, the user may design
`at a higher level using macros that incorporate the library
`elements. These macros Will have b