throbber
APPLYING POINTER ANALYSIS TO THE SYNTHESIS
`OF HARDWARE FROM C
`
`A DISSERTATION
`
`SUBMITTED TO THE DEPARTMENT OF
`
`ELECTRICAL ENGINEERING
`
`OF STANFORD UNIVERSITY
`
`IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
`
`FOR THE DEGREE OF
`
`DOCTOR OF PHILOSOPHY
`
`Luc Séméria
`
`June 2001
`
`XILINX, EX. 1024
`Page 1 of 146
`
`

`
`
`
`
`
` Copyright by Luc Séméria 2001
`All Rights Reserved
`
`XILINX, EX. 1024
`Page 2 of 146
`
`

`
`
`
`
`
`ii
`
`I certify that I have read this dissertation and that, in my opinion, it
`is fully adequate in scope and quality as a dissertation for the degree
`of Doctor of Philosophy.
`
`______________________________________
`Giovanni De Micheli, Principal Advisor
`
`I certify that I have read this dissertation and that, in my opinion, it
`is fully adequate in scope and quality as a dissertation for the degree
`of Doctor of Philosophy.
`
`______________________________________
`Monica Lam
`
`I certify that I have read this dissertation and that, in my opinion, it
`is fully adequate in scope and quality as a dissertation for the degree
`of Doctor of Philosophy.
`
`______________________________________
`Michael Flynn
`
`Approved for the University Committee on Graduate Studies:
`
`______________________________________
`
`XILINX, EX. 1024
`Page 3 of 146
`
`

`
`
`
`
`
`
`
`iii
`
`ABSTRACT
`
`With the advances in Computer Aided Design (CAD) technology, the design of digital
`
`circuits is becoming more and more as the development of software. Hardware is modeled
`
`using Hardware Description Languages (HDLs) very much as software is described using
`
`programming languages. Synthesis tools are used like compilers to map these HDL mod-
`
`els into hardware. However, modern systems, which consist of mixed software/hardware
`
`modules, are often initially modeled using programming languages instead of HDLs. Spe-
`
`cifically, C/C++-based languages with hardware support are used to quickly verify the
`
`functionality and estimate the performances at the system level.
`
`One of the greatest challenges in C/C++-based design methodology is to efficiently
`
`map C/C++ models into hardware. Many of the networking and multimedia applications
`
`implemented in hardware or mixed hardware/software systems are making use of complex
`
`data structures stored in one or multiple memories. As a result, many of the C/C++ fea-
`
`XILINX, EX. 1024
`Page 4 of 146
`
`

`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`iv
`tures that were originally designed for software applications are now making their way
`
`into hardware. Such features include dynamic memory allocation and pointers used to
`
`manage data.
`
`In this thesis, I present a solution for efficiently mapping arbitrary C code with point-
`
`ers and
`malloc
`
`/
`
`free
`
` into hardware. In hardware, a pointer is not only the address of data
`
`in memory, but it may also reference data mapped to registers, ports or wires. Pointer anal-
`
`ysis is used to find the set of locations each pointer may reference in a program at compile
`
`time. The values of the pointers are then encoded, and branching statements are used to
`
`dynamically access data referenced by pointers. Dynamic memory allocation and deallo-
`
`cation are supported by instantiating hardware memory allocators tailored to an applica-
`
`tion and a memory architecture. Several optimizations may also be performed. A heuristic
`
`algorithm is presented to efficiently encode the values of the pointers. Compiler tech-
`
`niques may also be used to reduce storage before loads and stores.
`
`An implementation using the
`SUIF
`
` compiler framework is presented, followed by some
`
`examples of implementations taken from multimedia and networking applications.
`
`XILINX, EX. 1024
`Page 5 of 146
`
`

`
`
`
`
`
`
`
`
`
`
`
`
`
`
`v
`
`ACKNOWLEDGMENTS
`
` I would like thank my advisor Giovanni De Micheli for giving me the opportunity of
`
`working on this thesis, for his support and his directions. This work couldn’t have been
`
`possible without the advises of several other people. First, I would like to acknowledge the
`
`support of Synopsys Inc. that initiated this research and financed it. Specifically, I would
`
`like to thank Raul Camposano, Abhijit Ghosh, Joachim Kunkel and Don Mc Millan for
`
`their support. I would also like to thank many of my friends met at Synopsys whose com-
`
`ments helped me at the different stages of this research: Olivier Coudert, Maurizio Dami-
`
`ani, Stan Liao, Preeti Panda, Srinivas Raghvendra, Mike Sharp. This research couldn’t
`
`have been possible without the help of the
`
`SUIF
`
` group at Stanford and more specifically
`
`Monica Lam and Dave Heine. Finally, thank you to Koichi Sato from
`
`NEC
`
` corp., who was
`
`a visiting researcher at Stanford in 1999 and worked with me on the synthesis of dynamic
`
`memory allocation and deallocation.
`
`XILINX, EX. 1024
`Page 6 of 146
`
`

`
`
`
`
`
`vi
`A thesis is more than its technical contributions, it is also a personal achievement. For
`
`this I would warmly thank my family and friends. First my wife Cécile who joined me in
`
`this adventure. Thank you for your loving trust and your support. I would also like to
`
`thank my family here, Bernard and Joan, and in France, my parents, my brother and sisters
`
`whose encouragements throughout my studies have always been the greatest help.
`
`This is also a good opportunity to thank some of the people who really made a differ-
`
`ence in my student life. Jean-Marie Fluchaire, high-school mathematics professor at the
`
`Lycée du Grésivaudan Meylan, and Henri Koen, professor of mathematics in Classes Pré-
`
`paratoires at Lycée Masséna Nice.
`
`XILINX, EX. 1024
`Page 7 of 146
`
`

`
`
`
`
`
`
`
`
`
`
`
`
`
`
`1.
`
`2.
`
`3.
`
`4.
`
`5.
`
`6.
`
`vii
`
`TABLE OF CONTENTS
`
`Introduction
`1.1 Motivations
`1.2
`Design methodology
`1.3
`Objectives and Contributions
`Related Work
`2.1 Modeling and synthesizing hardware from C/C++
`2.2
`Software compilation of C and C++ onto parallel architectures
`2.3
`Application-specific memory management methodology
`Background
`3.1 Memory space representation
`3.2
`Pointer analysis
`3.3
`Definition of the subset
`Synthesis of Hardware From C
`4.1 Memory Refinement
`4.2
`Pointer Synthesis
`4.3
`Synthesis of
`malloc
`4.4
`Summary
`Implementation
`5.1
`Toolflow
`5.2
`Results
`Optimization of Loads and Stores
`6.1
`Optimization of loads
`
` and
`
`free
`
`1
`2
`4
`7
`10
`11
`18
`19
`23
`27
`30
`32
`35
`36
`40
`48
`51
`53
`53
`57
`62
`63
`
`XILINX, EX. 1024
`Page 8 of 146
`
`

`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`viii
`
`Optimization of stores
`6.2
`Results
`6.3
`Summary
`6.4
`Encoding of Pointers
`7.1
`Definition of the problem
`7.2
`Problem formulation
`7.3
`Simplified problem
`7.4
`Encoding with folding
`7.5
`Encoding with splitting
`7.6
`Encoding algorithm
`7.7
`Results
`7.8
`Summary
`Library of Allocators and Optimization of
`malloc
`8.1
`Optimized general purpose allocator
`8.2
`Specific-purpose allocator
` and
`8.3
`Optimizing sequences of
`free
`malloc
`8.4
`Experimental results and discussion
`8.5
`Summary
`Conclusion
`
`7.
`
`8.
`
`9.
`
` and
`
`free
`
` calls
`
`66
`70
`71
`72
`73
`76
`79
`83
`86
`90
`96
`98
`99
`100
`102
`102
`104
`107
`108
`
`XILINX, EX. 1024
`Page 9 of 146
`
`

`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`ix
`
`LIST OF TABLES
`
`3.1
`
`5.1
`
`6.1
`
`6.2
`
`7.1
`
`8.1
`
`8.2
`
`
`
`Location set examples (f
`
`
`
`=offset of field F), (s
`
`=stride or array element size) 28
`
`Result of the synthesis using target library tsmc.35u (area in library units). 61
`
`Area after synthesis and optimization using target library lsi_10k (in library
`units).
`70
`
`Timing after synthesis and optimization using target library lsi_10k (in ns). 70
`
`Area after synthesis and optimization using tsmc.35 library (in library units).
`For each example, P represents the number of pointers and N the number of
`variables.
`97
`
`Implementation of the different allocators (area in library units using the
`. and
`. represent respectively the area of
`tsmc.35 target library;
`comb
`non-comb
`combinational logic and non-combinational logic (i.e. registers, etc.) at
`100MHz)
`
`105
`
`Results for the different examples and optimizations (size in library units using
`the tsmc.35 target library; frequency 100MHz for test1, test2 and ATM,
`50MHz for Video Filter; CPU time for synthesis measured on Sun Ultra2 does
`not include high-level synthesis)
`106
`
`XILINX, EX. 1024
`Page 10 of 146
`
`

`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`x
`
`LIST OF ILLUSTRATIONS
`
`1.1
`
`2.1
`
`3.1
`
`4.1
`
`4.2
`
`4.3
`
`4.4
`
`4.5
`
`5.1
`
`5.2
`
`5.3
`
`6.1
`
`Top-down design flow of a system
`
`5
`
`Interface of the allocator block implementing
` and
` functions. 21
`malloc
`free
`
`Representation of
`; the offset and stride
`struct{int a; int b;} r[]
`correspond to the locations
` where
` is integer.
`28
`r[i].b
`i
`
`Memory refinement from a continuous memory space to a set of memories,
`registers and wires
`36
`
`Memory layout of
`struct {char c1; char c2;
`.
`short s; int i;} csi
`
`Encoding of pointers in an array.
`
`38
`
`41
`
`Implementation of
`, where
` may point to
` or
` and
` may point to
`*q=*p+1
`p
`a
`b
`q
` or an element of
`.
`45
`c
`table[]
`
`Architecture for multiple memories and allocators.
`
`Position of SpC between high-level synthesis and architectural mapping.
`
`SystemC and traditional C-to-HDL methodology using SpC.
`
`Architecture of the 2D IDCT.
`
`Optimization of a load.
`
`51
`
`54
`
`56
`
`60
`
`64
`
`XILINX, EX. 1024
`Page 11 of 146
`
`

`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`xi
`Example of code segment before and after optimizing load.
`
`CDFG for
` with
`*p=*p+1
`
`.
`p->{a,b}
`
`65
`
`68
`
`(a) Example of pointer-dependence graph and (b) definitions of the points-to
`set of each pointer.
`74
`
`Example of (a) non-optimal and (b) optimal encodings; codes that are changed
`in the optimal encoding are shown in bold.
`76
`
`Global encoding and selection of the relevant bits for each pointer.
`
`Example of (a) relation matrix and (b) corresponding affinity graph.
`
`Example of optimal encoding.
`
`Pointer-dependence graph and definitions of the points-to sets.
`
`Relation matrix and corresponding affinity graph before folding.
`
`79
`
`82
`
`83
`
`84
`
`85
`
`Relation matrix and corresponding affinity graph after folding
` and
`a
`
`e.
`
`85
`
`Result of the encoding after folding
` and
`a
`
`.
`e
`
`Pointer-dependence graph and definitions of the points-to sets.
`
`Relation matrix and corresponding affinity graph before splitting.
`
`Result of the encoding without splitting.
`
`86
`
`87
`
`87
`
`87
`
`Relation matrix and corresponding affinity graph after splitting symbol
`. 88
`a
`
`Result of the encoding after splitting symbol
`.
`a
`
`Graph embedding algorithm with splitting and folding.
`
`Pointer-dependence graph and definitions of the points-to sets.
`
`88
`
`91
`
`93
`
`(a) Relation matrix and (b) affinity graph at the beginning of iteration 1; (c) af-
`finity graph at the beginning of iteration 2.
`93
`
`
`
`Result of the encoding after splitting and folding (where ‘-’ is don’t care). 96
`
`
`
`Architecture of an optimized general-purpose allocator.
`
`101
`
`Encoding of pointers in an array for optimized general-purpose allocator. 101
`
`6.2
`
`6.3
`
`7.1
`
`7.2
`
`7.3
`
`7.4
`
`7.5
`
`7.6
`
`7.7
`
`7.8
`
`7.9
`
`7.10
`
`7.11
`
`7.12
`
`7.13
`
`7.14
`
`7.15
`
`7.16
`
`7.17
`
`7.18
`
`8.1
`
`8.2
`
`XILINX, EX. 1024
`Page 12 of 146
`
`

`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`CHAPTER 1. Introduction
`
`Definition 1.0.
`
`Example 1.0.
`
`Proposition 1.0.
`
`Figure 1.0
`
`Table 1.0:
`
`Page 1
`
`(1.0)
`
`CHAPTER 1. INTRODUCTION
`
`In the last decades, the complexity of digital systems implemented on silicon has
`
`grown at the rate of 2x every 16 months according to Moore’s law. Within 5 years, inte-
`
`grated circuits (IC) will include as many as 190 million transistors [82]. However, the pro-
`
`ductivity of designers does not improve at the same rate. Studies have shown that
`
`designers or programmers write in average 10 lines of code per day. In order to bridge this
`
`productivity gap, Computer Aided Design (CAD) tools and methodologies have been
`
`developed to facilitate the task of designing systems on silicon. CAD tools enable more
`
`efficient system modeling by abstracting some of the implementation details that can be
`
`automatically derived from the specification.
`
`The most dramatic achievements are for the design of digital circuits. Since the seven-
`
`ties, the placement and the connections of standard logic cells from a netlist onto silicon
`
`have been automated in
`
`
`
`physical design (i.e. Place & Route)
`
`
`
`tools. The eighties have seen
`
`the development of
`
`logic synthesis
`
` and the release of tools that automatically generate a
`
`netlist from a cycle-accurate behavioral model expressed using
`
`Hardware Description
`
`
`
`Languages (HDLs). Examples of HDLs include Verilog HDL [44] and VHDL [48]. In the
`
`XILINX, EX. 1024
`Page 13 of 146
`
`

`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`CHAPTER 1. Introduction
`last decade
`high-level synthesis
`
` (HLS)
`
`tools
`
` (aka
`
`behavioral
`
` or
`
`architectural
`
`
`
`Page 2
`synthesis
`
`tools
`
`) have been developed to automate the mapping of sequential behavioral descriptions
`
`into cycle-accurate representations. Such tools perform both scheduling of operations
`
`(arithmetic and logic operations as well as a memory and register accesses) and resource
`
`binding [14].
`
`1.1 Motivations
`
`Different languages have been used as input to high-level synthesis. HDLs are the
`
`most commonly used. However, designers often write system-level models using pro-
`
`gramming languages, such as C or C++, to estimate the system performances and verify
`
`the functional correctness of the design. C and C++ offer many advantages. The first moti-
`
`vation for using programming languages, as opposed to HDLs at the system level, is the
`
`growing amount of software running on any system. Many of today’s chips incorporate
`
`processor cores running instruction codes compiled from programming languages. More-
`
`over, among programming languages, C and C++ have been the most widely used in the
`
`last two decades. As a result there is a vast amount of legacy code and libraries that can be
`
`reused to quickly model systems.
`
`Even for hardware applications, C and C++ have been often used to accelerate the
`
`design process. C/C++ can be efficiently compiled onto today’s architectures and thus are
`
`used to develop fast simulation models (e.g. to model microprocessors and microcontrol-
`
`ers). Besides, describing both the software and hardware in the same language facilitates
`
`the integration and the verification of hardware components within a software environ-
`
`XILINX, EX. 1024
`Page 14 of 146
`
`

`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`CHAPTER 1. Introduction
`Page 3
`ment. Recent initiatives [71,79,86,89] attempt to standardize a C/C++-based language for
`
`both hardware and software design.
`
`As a result, today, C/C++ or C/C++-based languages can be used to model systems for
`
`both software and hardware components. However, in order to physically map functional-
`
`ity onto hardware, one usually needed to manually translate C/C++ code into HDLs. This
`
`task is well-known for being both time consuming and error prone. The synthesis of hard-
`
`ware directly from C/C++ would then accelerate the design process. A coding style and a
`
`set of restrictions on the language are often defined to simplify the mapping of such mod-
`
`els onto hardware.
`
` Some of the components originally implemented in software may be also be mapped
`
`to specific hardware components for better performances (increased throughput or reduced
`
`latency), power savings, and/or smaller silicon area. To facilitate such software/hardware
`
`migration, synthesis tools would need to support all C/C++ language constructs.
`
`In order to help designers refine their code from a simulation model to a synthesizable
`
`behavioral description, we are trying to efficiently synthesize the full A
`
`NSI
`
` C standard
`
`[32,48]. C was originally designed to develop the UNIX operating system. It provides con-
`
`structs to directly access memory (through pointers) and to manage memories and I/O
`
`using the standard C library (
`,
`,...). These constructs are widely used in soft-
`malloc
`free
`
`ware. Nevertheless, many of the networking and multimedia applications implemented in
`
`hardware or mixed hardware/software systems are also using complex data structures
`
`stored in one or multiple memory banks. As a result, many of the C/C++ features that were
`
`originally designed for software applications are now making their way into hardware.
`
`XILINX, EX. 1024
`Page 15 of 146
`
`

`
`
`
`
`
`
`
`
`
`
`
`
`
`CHAPTER 1. Introduction
`Page 4
`The synthesis of such constructs as dynamic memory allocation, function calls, recursions,
`
`’s, type castings and pointers, turns out to be particularly challenging.
`goto
`
`1.2 Design methodology
`
`The general goal of this research is to automate the generation of digital circuits from
`
`an algorithmic or behavioral description written in C. The realization of this research is a
`
`prototyping tool which fits today’s design methodology by taking a C function as an input
`
`and generating hardware automatically. To better understand where this research fits, it is
`
`important at this point to review the different steps involved in the design of a system. In
`
`the top-down approach shown on Figure 1.1, the system is first modeled at the system
`
`level using programming languages or more application-specific descriptions (such as
`
`Matlab, Mathematica, etc.). The models used at this level are designed to verify the func-
`
`tionality and estimate the performances of the system. Several optimizations are usually
`
`performed. They do not depend on whether the final implementation is hardware or soft-
`
`ware. Algorithmic optimizations are performed, for example to reduce the number of oper-
`
`ations in a given computation. In addition, for a given algorithm, data formats are refined,
`
`for example from floating point to fixed point. Data structures are also refined. In particu-
`
`lar, different implementations of queues or of hash tables could be evaluated.
`
`At the architectural level, the system usually consists of a set of communicating pro-
`
`cesses or threads. These processes can either be mapped to hardware or software. The
`
`communication between the different processes is refined at this point. On the software
`
`side, drivers are generated. On the hardware side, controllers are synthesized to implement
`
`communication protocols. Internal and shared storages are defined for each process.
`
`XILINX, EX. 1024
`Page 16 of 146
`
`

`
`
`
`
`
`
`
`CHAPTER 1. Introduction
`
`Page 5
`
`System-level
`Optimizations
`C/C++
`
`C/C++
`Architecture
`Mapping
`
`C/C++
`
`Compiler
`
`Assembly Code
`Software
`
`System Level
`
`Architectural Level
`
`Register-Transfer Level
`
`Netlist Level
`
`Layout Level
`
`C/C++
`
`HDL
`High-level
`Synthesis
`HDL
`
`HDL
`Logic
`Synthesis
`
`HDL
`Physical
`Design
`
`Polygone Representation
`Transistors
`
`Figure 1.1: top-down design flow of a system
`
`Shared storage is used for inter-process communication whereas internal storage stores
`
`local variables and arrays. After architectural mapping and interfaces definitions, the sys-
`
`tem consists of a set of processes to be mapped to software or hardware. On the software
`
`side, compilers can be used to generate assembly code to be executed on a processor. On
`
`the hardware side, high-level synthesis may be used to generate a cycle-accurate represen-
`
`tation of the design.
`
`XILINX, EX. 1024
`Page 17 of 146
`
`

`
`
`
`
`
`
`
`CHAPTER 1. Introduction
`Page 6
`High-level synthesis takes a sequential description of an algorithm and automatically
`
`implements it as data path and control logic. The data path consists of operators, storage,
`
`and steering logic. Operators, which can be shared, usually implement algorithmic or logic
`
`operations (e.g. add, subtract, multiply, etc.). Storage is usually implemented as a set of
`
`registers or memories. Steering logic represents the connection network between storage
`
`and operators. In addition, the control logic implemented as a state-machine or as micro-
`
`code controls the internal flow of data (i.e. multiplexer select signals, register update sig-
`
`nals, tri-state buffer set signals, etc.). The tasks performed during high-level synthesis
`
`consist of scheduling and resource binding according to latency, throughput and resource
`
`constraints. Scheduling and binding correspond to the tasks of mapping operations to time
`
`slots and operators respectively. The cycle-accurate design generated after high-level syn-
`
`thesis can then be mapped to hardware using a traditional CAD flow consisting of logic
`
`synthesis and physical design.
`
`The research presented here fits as a front end to high-level synthesis tools. It can be
`
`also seen as a back end to system-level tools performing hardware/software partitioning
`
`and interface synthesis. The objective is to take C code instead of HDL as an input to high-
`
`level synthesis. From a system-level perspective, the goal is to automate the transition
`
`from a architectural description written in C to an input understandable by current high-
`
`level synthesis tools.
`
`More specifically, this research in on automating the synthesis of hardware from a C
`
`code with such constructs as pointers (including pointer arithmetic and type casting), com-
`
`plex data structures and dynamic memory allocations. These language features are not part
`
`of any synthesizable HDL subsets. To make this research possible, the code is assumed
`
`XILINX, EX. 1024
`Page 18 of 146
`
`

`
`
`
`
`
`
`
`
`
`
`CHAPTER 1. Introduction
`Page 7
`correct and fully visible (complete specification) at compile time. Existing developments
`
`both for high-level synthesis and compilers are also leveraged
`
`1.3 Objectives and Contributions
`
`The first goal for this research is to be applicable to a large subset of the C language.
`
`As a result, the front-end and initial analysis passes used can deal with any kind of C input
`
`without any assumption of a coding style or of a synthesizable subset. Better quality of
`
`results may be achieved by restricting the input to a limited subset of C or a given coding
`
`style. However, this would defeat our original goal of supporting the full ANSI C. To sup-
`
`port all C constructs, my implementation shares the same general-purpose front end and
`
`pointer analysis passes as advanced optimizing C compilers. The contribution here is on
`
`applying such advanced compiler techniques to the synthesis of hardware from C.
`
`The overall intention of this research is to support full synthesis of C. On the other
`
`hand, to make this research possible, existing tools are leveraged when it is possible. As a
`
`back end, commercial tools are used to perform high-level synthesis. The flip side of using
`
`commercial tools is that they cannot be modified and their internal data structures cannot
`
`be accessed in a research environment. In this research, HLS tools are used as black boxes.
`
`This leads to some limitations both on the architecture generated and the synthesizable
`
`subset. The main limitation is on function calls. First, the implementation of such feature
`
`as recursion in general remains a fundamental problem. Tail recursion or limited recursion
`
`could be implemented. Without recursion, one way of synthesizing functions is to inline
`
`them. However designers often want to map functions to separate components. Today’s
`
`commercial synthesis tools have many restrictions on functions mapped to components.
`
`XILINX, EX. 1024
`Page 19 of 146
`
`

`
`
`
`
`
`
`
`
`
`
`
`
`
`CHAPTER 1. Introduction
`Page 8
`Moreover, the synthesis of C functions mapped to components would require some inter-
`
`action with the HLS tool. For example sharing information would be useful to efficiently
`
`synthesize functions with parameters passed by reference. As a result, the synthesis of
`
`functions is beyond the scope of this thesis and is left as an Appendix.
`
`The contributions of this thesis are the following.
`
`1) This thesis defines the necessary steps to automate the mapping of ANSI C code
`
`into hardware. This work represents the first attempt to efficiently synthesize such
`
`constructs as complex data structures, pointers, pointer arithmetic and type casting.
`
`The resulting tool bridges the gap between high-level synthesis and system-level
`
`tools.
`
`2) A methodology for efficiently supporting dynamic memory allocation and deallo-
`
`cation is also presented. It fits into the general methodology described in Section
`
`1.2 and enables the implementation of recursive data structures into hardware. This
`
`thesis also presents how C code with
`
` and
`malloc
`
` may be automatically
`free
`
`mapped to hardware after system-level optimizations (e.g. refining data structures)
`
`and architectural mapping (defining internal and shared storage).
`
`3) This thesis also describes a set of techniques to optimize the resulting hardware.
`
`- Compiler techniques are used to reduced storage before loads and stores. The
`
`motivation for these optimizations is to reduce the number of registers neces-
`
`sary in a design.
`
`- Encoding algorithm is developed to reduce both the bit-width of pointers and
`
`the size of the circuits translating pointers’ values in assignments and compari-
`
`sons. Assignments of pointers are especially common at function calls.
`
`XILINX, EX. 1024
`Page 20 of 146
`
`

`
`
`
`
`
`
`
`CHAPTER 1. Introduction
`Page 9
`- Novel architecture of hardware memory allocator/deallocator is also intro-
`
`duced. The idea is to encode information about the memory block allocated
`
`inside of the pointers’ value to speed up deallocation.
`
`The outline of the rest of this thesis is the following. In Chapter 2, related work on the
`
`different ways of mapping C code onto different target architectures is presented. Chapter
`
`3 is a background chapter in which compiler techniques, namely pointer analysis, and
`
`their underlaying memory representations are reviewed in the light of hardware synthesis.
`
`The different steps involved in the synthesis of hardware from C code are then presented
`
`in Chapter 4. The outcome of this research, presented in Chapter 5, is the tool SpC that
`
`automates the translation of C models with pointers, complex data structures, etc. into a
`
`code synthesizable by commercially available HLS tools. Results are presented for differ-
`
`ent application domains. Different optimizations can be applied for efficiently mapping C
`
`code onto hardware. In Chapter 6, a set of compiler optimizations to reduce storage before
`
`loads and stores are presented. Chapter 7 describes an encoding algorithm that reduces
`
`both the bit-width of the pointer variables and the complexity of the circuits implementing
`
`comparisons and assignments of pointers. It is important to note at this point that the opti-
`
`mizations presented in Chapters 6 and 7 are also motivated by the synthesis of functions in
`
`C as shown in Appendix A. Finally, Chapter 8 presents how memory allocation and deal-
`
`location can be optimized by selecting different allocator architectures and by applying
`
`compiler optimization techniques. Results are also presented at the end of Chapters 6, 7
`
`and 8 to highlight the benefits of each optimization.
`
`XILINX, EX. 1024
`Page 21 of 146
`
`

`
`
`
`
`
`
`
`
`
`CHAPTER 2. Related Work
`
`Definition 2.0.
`
`Example 2.0.
`
`Proposition 2.0.
`
`Page 10
`
`(2.0)
`
`Figure 2.0
`
`Table 2.0:
`
`CHAPTER 2. RELATED WORK
`
`C/C++ are two of the most common programming languages. C and C++ are both pro-
`
`cedural imperative languages. Their semantics relies on an implicit Von Neuman architec-
`
`ture. In order to be executed, C/C++ programs must be compiled for a given target
`
`architecture. C was originally designed to target a generic software architecture consisting
`
`of a continuous memory space in which all variables are stored and a microprocessor exe-
`
`cuting a sequence of instructions. C/C+ code is therefore sequential by definition.
`
`In the last decade, the implementation of compilers evolved significantly to take
`
`advantage of new target architectures such as multi-processor systems or very large
`
`instruction word (VLIW) machines. At the same time, the features of the language were
`
`exploited, for example using object-oriented features in C++, to support new models of
`
`computation to model mixed hardware-software systems and higher lever of abstractions.
`
`These new constructs may be used to express coarse-grain parallelism, as communicating
`
`sequential processes. However finer-grain parallelism at the operation level cannot easily
`
`be expressed. High-level synthesis as well as compiler analysis may be used to find such
`
`operation-level parallelism.
`
`XILINX, EX. 1024
`Page 22 of 146
`
`

`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`CHAPTER 2. Related Work
`Page 11
`High-level synthesis (HLS) is defined as the set of transformations necessary to auto-
`
`mate the mapping of sequential behavioral descriptions into cycle-accurate descriptions.
`
`Current HLS tools input behavioral HDL descriptions. However, HLS tools supporting
`
`also C/C++ or C/C++-based languages are starting to appear.
`
`This chapter reviews first related work in the domain of high-level synthesis and mod-
`
`eling using C/C++. A selection of recent works in advanced compilers is then presented,
`
`followed by a presentation of current application-specific memory management methodol-
`
`ogy targeted at mixed software/hardware applications.
`
`2.1 Modeling and synthesizing hardware from C/C++
`
`2.1.1 High-level synthesis
`
`High-level synthesis (aka architectural or behavioral synthesis) consists of generating
`
`a structural view of a sequential architectural-level model [14]. The sequential architec-
`
`tural-level model typically consists of a set of parallel processes or tasks. HLS generates a
`
`circuit (i.e., mapping to a structural view) consisting of a data path composed of hardware
`
`resources and a control unit. The architectural-level model can be abstracted as a set of
`
`operations and dependencies. High-level synthesis then performs the following tasks:
`
`-
`
`-
`
`-
`
`
`
`identify the hardware resources (aka operators) implementing each operation
`
`
`
`
`
`schedule the execution time of the operations
`
`bind these scheduled operations to hardware resources (i.e.
`
`resource alloca-
`
`
`
`tion)
`
`In doing so the synthesis tool defines a structural model consisting of a data path, as an
`
`interconnection of hardware resources and a cycle-accurate model of a control unit that
`
`XILINX, EX. 1024
`Page 23 of 146
`
`

`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`CHAPTER 2. Related Work
`Page 12
`issues the control signals to the data path according to the schedule. Resources in the data
`
`path represent not only the operators implementing operations in the original model, but
`
`also storage (e.g. registers) and steering logic which connects operators, storage elements
`
`and Input/Output (I/O) ports.
`
`There have been numerous research projects on high-level synthesis in the past decade
`
`[8,10,11,22,23,29,35,36]. Commercial tools are now available. They include Synopsys
`
`Behavioral Compiler [91], Mentor Graphics Monet [85], Get2chip Volare [83] and Arexys
`
`ArchiMate [72]. These tools take HDL (VHDL or Verilog HDL) as an input. High-level
`
`synthesis that take C/C++ as an input can be derived from these tools. Synopsys CoCentric
`
`SystemC compiler [92] is such an example.
`
`2.1.2 Hardware description languages and C/C++
`
`C/C++ are software programming languages and have little support for describing
`
`hardware efficiently. To model hardware in C/C++, we need the following language fea-
`
`tures present in HDL but not present in C/C++.
`
`-
`
`
`
`Concurrency: hardware is inherently parallel, while C/C++ programs are
`
`inherently sequential. The notion of processes (aka
`
` blocks in Verilog
`always
`
`HDL), which encapsulates programs that execute concurrently, is introduced.
`
`A system is described as a network of processes.
`
`-
`
`-
`
`
`
`Signals: hardware processes need to use signals (akin to wires or buffered
`
`channels) to communicate with one another.
`
`
`
`Reactivity: hardware systems are in continuous interaction with their environ-
`
`ment, i.e. they are reactive. The notion of reactivity is essential to describing
`
`XILINX, EX. 1024
`Page 24 of 146
`
`

`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`CHAPTER 2. Related Work
`hardware systems at all levels of abstraction.
`
`Page 13
`
`-
`
`
`
`Data abstraction: C/C++ supports data abstractions that are useful for soft-
`
`ware programming. However, for hardware, one needs arbitrary precision
`
`signed and unsigned integers, bit vectors and fixed point types.
`
`With such a set of features added, C/C++ can efficiently model hardware/software sys-
`
`tems [58]. C/C++ contains many features which are not present in synthesizable subsets of
`
`HDLs.
`
`Different subsets of C/C++ and C-like HDLs have been defined and used for synthesis.
`
`I mention first those developed in the eighties. H
`ARDWARE
`
`C [36] is a language with a C-
`
`like syntax and a cycle-based semantics. It models hardware components at the architec-
`
`tural level using C constructs. H
`ARDWARE
`
`C can be fully synthesized. However, it makes
`
`both restrictions and additions to ANSI C. Constructs are added to define bit-widths, mod-
`
`ules and ports. The language includes a quite extended subset of the C constructs includ-
`
`ing unbounded loops and some form of f

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket