`
`Ricardo E. Gonzalez
`
`Technical Report No. CSL-TR-97-726
`
`June 1997
`
`This research has been supported by ARPA contract FBI-92-194.
`
`
`
`LOW-POWER PROCESSOR DESIGN
`
`Ricardo E. Gonzalez
`
`Technical Report: CSL-TR-97-726
`
`June 1997
`
`Computer Systems Laboratory
`Departments of Electrical Engineering and Computer Science
`Stanford University
`William Gates Computer Science Building, A-408
`Stanford, Ca 94305-9040
`<pubs@shasta.stanford.edu>
`
`Abstract
`
`Power has become an important aspect in the design of general purpose processors.
`This thesis explores how design tradeoffs affect the power and performance of the
`processor. Scaling the technology is an attractive way to improve the energy efficiency
`of the processor. In a scaled technology a processor would dissipate less power for the
`same performance or higher performance for the same power. Some micro-
`architectural changes, such as pipelining and caching, can significantly improve
`efficiency. Unfortunately many other architectural
`tradeoffs
`leave efficiency
`unchanged. This is because a large fraction of the energy is dissipated in essential
`functions and is unaffected by the internal organization of the processor.
`
`Another attractive technique for reducing power dissipation is scaling the supply and
`threshold voltages. Unfortunately this makes the processor more sensitive to variations
`in process and operating conditions. Design margins must increase to guarantee
`operation, which reduces the efficiency of the processor. One way to shrink these
`design margins is to use feedback control to regulate the supply and threshold voltages
`thus reducing the design margins. Adaptive techniques can also be used to dynamically
`trade excess performance for lower power. This results in lower average power and
`therefore longer battery life. Improvements are limited, however, by the energy
`dissipation of the rest of the system.
`
`Key Words and Phrases: processor design, processor architecture, low-power
`CMOS circuits, supply and threshold scaling.
`
`i
`
`
`
`Copyright © 1997
`
`by
`
`Ricardo E. Gonzalez
`
`ii
`
`
`
`Acknowledgments
`
`I would never have completed this document without the support, encouragement and
`guidance of many people. I cannot possibly list everyone that helped make my stay at
`Stanford a fulfilling experience; I will mention just a few.
`
`First and foremost I would like to thank Mark Horowitz, my principal advisor, for his
`support, guidance, and encouragement. Working with him for 7 years was a privilege. His
`ability to understand my ideas before I did and his continuous pursuit of knowledge were
`always a source of inspiration.
`
`I would also like to thank Kunle Olukotun and Simon Wong for their support an guidance
`as members of my reading committee. I am also thankful to Bruce Wooley and
`Arogyaswami Paulraj for participating in my oral examination committee. Noe Lozano
`did much to convince me to pursue a Ph.D. and arranged the financial support for me to
`get started.
`
`My family also gave their unwavering support to this enterprise. My parents never
`stopped asking how my research was progressing. And, more important, never accepted
`“It’s going OK” for an answer. This dissertation is dedicated to them.
`
`My friends and colleagues made my life at Stanford very enjoyable. This document now
`before you is due to them. They encouraged me, helped me spend time away from my
`workstation, and were an unending source of interesting conversations.
`
`Finally, I would like to thank all the members of the Stanford Cycling Club especially our
`coach, Art Walker, for making the past few years of my life so painfully memorable.
`
`This research was supported in part by the Advanced Research Projects Agency under
`contract J-FBI-92-194. by the School of Engineering at Stanford, and by the Intel
`Foundation.
`
`iii
`
`
`
`To my parents.
`To my parents.
`
`iV
`iv
`
`
`
`Table of Contents
`
`Chapter 1 Introduction......................................................................................................1
`
`Chapter 2 Energy-Delay Product.....................................................................................5
`2.1 Energy Dissipation in CMOS Circuits................................................................ 5
`2.2 Low-Power Metrics ............................................................................................ 7
`2.3 Low-Power Design Techniques.......................................................................... 9
`
`3.2
`
`Chapter 3 Micro-architectural Tradeoffs......................................................................19
`3.1 Lower Bound on Energy and Energy-Delay...................................................... 19
`3.1.1 Simulation Methodology ..........................................................................20
`3.1.2 Machine Models ........................................................................................22
`3.1.3 Comparison of Energy and Energy-Delay Product ..................................23
`Processor Energy-Delay Product ...................................................................... 24
`3.2.1 Simulation Methodology ..........................................................................25
`3.2.2 Machine Models ........................................................................................25
`3.2.3 Energy Optimizations ...............................................................................26
`3.2.4 Energy and Energy-Delay Results ............................................................29
`3.3 Energy Breakdown............................................................................................ 30
`3.4 Memory Hierarchy Design ............................................................................... 32
`3.4.1 System architecture ...................................................................................32
`3.4.2 Simulation Methodology ..........................................................................33
`3.4.3 Energy and Performance Tradeoffs ..........................................................35
`3.5 Future Directions .............................................................................................. 41
`3.6 Summary........................................................................................................... 47
`
`Chapter 4 Supply and Threshold Scaling......................................................................49
`4.1 Energy and Delay Model .................................................................................. 49
`4.2 Sleep Mode ....................................................................................................... 58
`4.3 Process and Operating Point Variations ........................................................... 60
`4.4 Adaptive Techniques ........................................................................................ 65
`4.5 Adaptive Power Management........................................................................... 68
`4.6 Summary........................................................................................................... 71
`
`Chapter 5 Conclusions.....................................................................................................73
`5.1 Future Work ...................................................................................................... 74
`
`v
`
`
`
`Chapter 6 Bibliography...................................................................................................77
`
`Appendix A Capacitance Estimation ............................................................................85
`
`Appendix B Memory Power Model ...............................................................................87
`
`vi
`
`
`
`List of Tables
`
`Table 2.1:
`Table 3.1:
`Table 3.2:
`Table 4.1:
`Table 4.2:
`Table 4.3:
`Table 4.4:
`Table 4.5:
`Table B.1
`
`Current processors. .................................................................................17
`Summary of SRAM characteristics. .......................................................34
`Summary of DRAM characteristics........................................................34
`Circuit element description. ....................................................................52
`Process and circuit parameters for a 0.25μm technology. ......................54
`Additional process parameters for 0.25μm technology. .........................59
`Operating modes. ....................................................................................71
`Power breakdown categories. .................................................................71
`Cache model parameters. ........................................................................88
`
`vii
`
`
`
`List of Figures
`
`Figure 1.1:
`Figure 1.2:
`Figure 1.3:
`Figure 2.1:
`Figure 2.2:
`Figure 2.3:
`Figure 2.4:
`Figure 2.5:
`Figure 2.6:
`Figure 3.1:
`Figure 3.2:
`Figure 3.3:
`Figure 3.4:
`Figure 3.5:
`Figure 3.6:
`Figure 3.7:
`Figure 3.8:
`Figure 3.9:
`Figure 3.10:
`Figure 3.11:
`Figure 3.12:
`Figure 3.13:
`Figure 4.1:
`Figure 4.2:
`Figure 4.3:
`Figure 4.4:
`Figure 4.5:
`Figure 4.6:
`Figure 4.7:
`Figure 4.8:
`Figure 4.9:
`Figure 4.10:
`Figure 4.11:
`Figure 4.12:
`Figure 4.13:
`Figure B.1:
`
`Evolution of processor performance. .......................................................2
`Evolution of processor power. .................................................................2
`Performance and energy of processors. ...................................................3
`CMOS inverter. ........................................................................................6
`Performance-energy plane. ......................................................................9
`Variation in performance and energy with supply voltage. ...................10
`Variation in performance and energy with transistor sizing. .................11
`EDP contours versus transistor size and supply voltage. .......................13
`Scalar and super-scalar processors pipeline diagrams. ..........................16
`Basic processor operation. .....................................................................21
`Normalized performance and energy of idealized machines. ................24
`Reduction in energy from simple optimizations. ...................................29
`Normalized energy and performance for RISC and TORCH. ...............30
`Energy breakdown for RISC and TORCH processors. .........................31
`Architecture of processor system. ..........................................................32
`Energy breakdown for single level hierarchy. .......................................36
`Energy-delay product for single level hierarchy. ...................................37
`Energy-delay versus associativity for single level hierarchy. ................38
`Energy-delay versus line size for single level hierarchy. ......................39
`Energy-delay for two-level on-chip cache hierarchy. ............................40
`Energy breakdown for two-level hierarchy. ..........................................41
`Comparison of three memory hierarchies. .............................................42
`Delay of circuit blocks divided by the delay of standard inverter. ........52
`EDP contours without velocity saturation. ............................................56
`EDP contours with velocity saturation. .................................................56
`EDP and performance contours with velocity saturation. .....................57
`Ratio of leakage to total power. .............................................................58
`Minimum time for threshold adjustment. ..............................................60
`Variation in energy and delay. ...............................................................62
`EDP contours with uncertainty. .............................................................63
`Ratio of EDP without and with uncertainty. ..........................................63
`EDP contours using HSPICE models. ...................................................64
`Energy and delay variations with operating conditions. ........................66
`Power versus performance for fixed and variable supply. .....................69
`Power breakdown for laptop system ......................................................70
`Cache power model. ...............................................................................87
`
`viii
`
`
`
`Chapter 1
`
`Introduction
`
`In the past five years there has been an explosive growth in the demand for portable
`computation and communication devices, from portable telephones to sophisticated
`portable multimedia terminals [1]. This interest in portable devices has fueled the
`development of low-power signal processors and algorithms, as well as the development
`of low-power general purpose processors. In the digital signal processing area, the results
`of this attention to power are quite remarkable. Designers have been able to reduce the
`energy requirements of particular functions, such as video compression, by several orders
`of magnitude [2], [3]. This reduction has come as a result of focusing on the power
`dissipation at all levels of the design process, from algorithm design to the detailed
`implementation. In the general purpose processor area, however, there has been little work
`done to understand how to design energy efficient processors. This thesis is a start at
`bridging this gap and explores power and performance tradeoffs in the design and
`implementation of energy-efficient processors.
`
`Performance of processors has been growing at an exponential rate, doubling every 18 to
`24 months, as is shown in Figure 1.1. The bad news is that the power dissipated by these
`processors has also been growing exponentially, as is shown in Figure 1.2. Although the
`rate of growth of power is perhaps not quite as fast as the performance curve, it still has
`led to processors which dissipated more than 50W [4]. Such high power levels make
`cooling these processors difficult and expensive. If this trend continues processors will
`soon dissipate hundreds of watts, which is unacceptable in most systems. Thus there is
`great interest in understanding how to continue increasing performance without also
`increasing power dissipation.
`
`For portable applications the problem is even more severe since battery life depends on
`the power dissipation. Lithium-ion batteries have an energy density of approximately
`100Wh/Kg, the highest available today [5]. To operate a 50W processor for 4 hours
`requires a 2Kg battery, hardly a portable device. To address this problem processors
`manufacturers have introduced a variety of low-power chips. The problem with these
`processors is that they tend to have poor performance, as is shown in Figure 1.3. This
`
`1
`
`
`
`(cid:0)
`
`(cid:0)
`
`(cid:0)
`
`(cid:0)
`
`(cid:0)
`
`|
`1989
`
`(cid:0)
`
`(cid:0)
`
`|
`1992
`
`|
`1995
`
`|
`1998
` Year
`
`Figure 1.1: Evolution of processor performance.
`
`(cid:0)
`
`(cid:0)
`
`(cid:0)
`
`(cid:0)
`
`(cid:0)
`
`(cid:0)
`
`|
`1989
`
`(cid:0)
`
`|
`1992
`
`|
`1995
`
`|
`1998
` Year
`
`Figure 1.2: Evolution of processor power.
`
`2
`
`Chapter 1 Introduction
`
`|400
`
`|300
`
`|200
`
`|100
`|90
`|80
`|70
`|60
`|50
`|40
`
`|30
`|
`1986
`
`|50
`|40
`
`|30
`
`|20
`
`|10
`|9
`|8
`|7
`|6
`|5
`|4
`|
`|3
`1986
`
` Performance (SPECavg92)
`
` Power (watts)
`
`
`
`Chapter 1 Introduction
`
`(cid:6)
`
`(cid:0)
`
`|400
`
`|350
`
`|300
`
` Performance (SPECavg92)
`
`figure plots on the Y-axis performance, measured as the average of SPECint92 and
`SPECfp92 [6], and on the X-axis energy, measured as watt/SPEC.
`|450
`(cid:0) 21164
`(cid:0) UltraSPARC
`(cid:2) P6
`(cid:3) R4600
`(cid:4) R4200
`(cid:5) Power 603
`
`|250
`
`|200
`
`|150
`
`|100
`
`|50
`|
`|0
`0.00
`
`(cid:3)
`
`(cid:5)
`
`(cid:4)
`
`|
`0.03
`
`|
`0.06
`
`(cid:2)
`
`|
`0.09
`
`|
`0.12
` Energy (W/SPEC)
`
`Figure 1.3: Performance and energy of processors.
`
`In order to compare processor designs that have different performance and power one
`needs a measure of “goodness”. If two processors have the same performance or the same
`power, then it is trivial to choose which is better—users prefer higher performance for the
`same power level or the lower power one if they have the same performance. But
`processor designs rarely have the same performance. In particular when determining
`whether to add a particular feature designers need to know whether it will make the
`processor more desirable. Chapter 2 introduces the energy-delay product, or EDP for
`short, as a measure of “goodness” for low-power designs. This chapter also describes the
`most common low-power techniques and explores how they affect the energy-delay
`product of CMOS circuits.
`
`Chapter 2 will show that exploiting parallelism is one important technique enabling the
`reduction of the energy-delay of a circuit. Thus Chapter 3 explores how micro-
`architectural choices, which change the amount of parallelism the processor exploits,
`affect the efficiency of the processor. Since both the performance and energy dissipation
`of modern processors depend heavily on the design of the memory hierarchy, one must
`
`3
`
`
`
`Chapter 1 Introduction
`
`look not only at the processor itself, but also explore how the design of the memory
`hierarchy affects the overall efficiency of the system. Using three idealized processor
`models Chapter 3 shows micro-architectural changes do not significantly improve the
`efficiency of the processor system. The processor’s efficiency is set by a few circuits
`elements: memories and clocks.
`
`Since memories and clocking circuits are critical components of every digital system,
`much work already has been done to reduce the energy requirements. A different approach
`to reduce the energy dissipation of clocks and memories is to change the technology by
`scaling the supply voltage and the threshold voltage of transistors. Chapter 4 explores
`tradeoffs in scaling the supply and threshold voltage of CMOS circuits. A simple
`mathematical model of the EDP predicts large gains in efficiency by scaling the supply
`and threshold voltages, especially if transistors are velocity saturated. If there is
`uncertainty in the supply and threshold, however, the gains in EDP are much smaller.
`Furthermore, to achieve these modest gains it may be necessary to give up large (3X)
`factors of performance. These tradeoffs are discussed in more detail in Chapter 4 along
`with a promising method to reduce the effect of variations by using adaptive techniques to
`control the supply and threshold.
`
`Finally, Chapter 5 summarizes the contributions of this dissertation and proposes areas for
`future research.
`
`4
`
`
`
`Chapter 2
`
`Energy-Delay Product
`
`During the past few years many different techniques for reducing the power dissipation of
`CMOS circuits have been proposed, but relatively little work has been done to compare
`the benefits and costs of these different techniques. This chapter provides a review of these
`techniques and compares the effect they have on power and speed of the circuit. The rest
`of this thesis will investigate how the most promising of these techniques affect general
`purpose processors in particular.
`
`This chapter begins by giving a brief description of the sources of energy dissipation in
`CMOS circuits, since it is important to understand this topic before addressing the
`question of how to reduce the energy dissipation. It then describes different metrics that
`can be used to compare designs. An attractive possibility is to represent every design as a
`point in the performance-energy plane. For CMOS, since energy and performance are
`highly correlated, it is often enough to compare the energy-delay product, or EDP for
`short.
`
`2.1 Energy Dissipation in CMOS Circuits
`
`There are three sources of energy dissipation in CMOS circuits; dynamic energy, static
`energy, and short-circuit energy. A simple CMOS gate consists of two transistors,
`represented as a resistor and a switch, connected to a fixed output load capacitance and a
`constant voltage source, as shown in Figure 2.1. Dynamic energy is due to the charging
`and discharging of the load capacitance. If the output node is originally at ground and
`assuming that it swings full rail, then an amount of energy equal to CV2 is drawn from the
`voltage source on a low to high transition. Of this amount, 1/2CV2 is dissipated in the p-
`transistor to charge the load capacitance and 1/2CV2 is stored in the capacitor itself. The
`stored energy is dissipated in the n-transistor to discharge the load. Thus 1/2CV2 is
`dissipated on each transition. The circuit only dissipates dynamic energy when it is active
`or switching. If the output node remains at a fixed voltage level, then no energy is
`dissipated. Most nodes in CMOS circuits transition only infrequently; therefore the energy
`per cycle is usually written as,
`
`5
`
`
`
`2.1 Energy Dissipation in CMOS Circuits
`
`E
`
`=
`
`nCV2
`-------------
`2
`
`(2.1)
`
`where n is the number of transitions during the period of interest. If the circuit is
`synchronous and clocked at a frequency f then the average power can be written as,
`
`P
`
`=
`
`aCV2f
`
`(2.2)
`
`where a is the probability of a transition at the output node divided by 2. If a node
`transitions every cycle then a=0.5.
`
`CL
`
`Figure 2.1: CMOS inverter.
`
`Static energy is due to resistive paths between the supply and ground. The two main
`sources of static energy are analog or analog-like circuits which require constant current
`sources, and leakage current. Although there is some leakage current through the reverse
`biased diode between the source/drain and the bulk, the more important component is
`leakage through the channel when the transistor is nominally off [7]. The leakage current
`density (current per μm of gate width) is proportional to e-Vth/γVt, where Vth is the
`threshold voltage of the transistor, Vt is the thermal voltage, and γ is a constant slightly
`
`6
`
`
`
`2.2 Low-Power Metrics
`
`larger than 1. Static energy is important because it can limit the energy dissipation when
`the circuit is idle or in standby mode and there is no dynamic energy dissipation.
`
`Short-circuit energy is due to both transistors being on simultaneously while the gate
`switches. Troutman [8] and Chatterjee [9] provide good descriptions of short-circuit
`current in CMOS circuits. As these papers show this component is usually small and
`therefore will be ignored for the remainder of this thesis.
`
`For most CMOS circuits in today’s technologies dynamic energy dissipation dominates.
`For example, in a 0.6μm technology with Vth=0.9V leakage current is 4-5 orders of
`magnitude smaller than dynamic current (for one inverter in a 31 element ring oscillator).
`That is, only when the circuit is idle for 99.99% of the time does leakage current become
`an important consideration. As the amount of on-chip transistor width increases or the
`threshold voltage of the transistors decreases, leakage current becomes more significant.
`Chapter 4 explores in more detail how the energy efficiency of CMOS circuits changes as
`the threshold voltage changes.
`
`In order to reduce the energy dissipation it is necessary to reduce one or more of α, C, or
`V. The next section describes and compares how some often proposed low-power design
`techniques attempt to reduce these quantities and how they effect the power and
`performance of CMOS circuits.
`
`2.2 Low-Power Metrics
`
`When optimizing a design for low power it is necessary to have a metric that can be used
`to compare different alternatives. The most obvious choice is power, measured in watts.
`Power is the rate of energy use, or P=dE/dT. A more useful definition, however, is average
`power, or the energy spent to perform a particular operation divided by the time taken to
`perform the operation Pavg=Eop/Top. How to define the operation of interest is arbitrary
`and depends on what is being compared. In the case of a processor, it could be the energy
`to run a benchmark to completion, or the energy to execute an instruction—as long as all
`processors compared execute the same instructions.
`
`Power is important for two reasons. The first is that it determines what kind of package
`can be used for the chip. For example, a small plastic package, the cheapest form of
`packaging, can only dissipate a few watts. A processor which dissipates more than that
`will have to be sold in a more expensive package. The second reason power is important is
`
`7
`
`
`
`2.2 Low-Power Metrics
`
`because it limits how long the system battery will last. But power as a metric of
`“goodness” of low-power designs has some drawbacks. The most important drawback is
`that power is proportional to the operation rate, so one can reduce the power by slowing
`down the system. In CMOS circuits this is very easy to do, one simply reduces the clock
`frequency.
`
`Regardless of what definition of an operation one uses, the basic problem with power
`remains, that power decreases simply by extending the time required to complete an
`operation. Power, therefore, is only a good metric to compare processors that have similar
`performance levels. If two processors can perform computation at the same rate, then
`clearly whichever dissipates less power is more desirable. If the processors run at different
`rates the slower processor will almost always be lower power.
`
`An alternative metric is the energy per operation, measured in jules/op, or its inverse,
`measured in SPEC/watt or MIPS/watt. This metric does not depend on the time taken to
`perform the operation, since running the processor at half the frequency means you need
`to accumulate the power for twice as long. The problem with this metric is that, from
`Equation (2.1) the energy per operation can be made smaller by lowering the supply
`voltage. However, the supply voltage also affects the speed or performance of the basic
`CMOS gates, with lower supplies increasing the delay per operation. Thus low energy
`solutions might (and often do) run very slowly.
`
`Another alternative is to use both metrics, energy and speed. Rather than representing
`designs by a single number they are represented as a point in the performance-energy
`plane, as in Figure 1.3 and Figure 2.2. Given some requirements, such as minimum
`performance or maximum energy, one can determine which is the best solution available.
`The problem with this representation is how to compare designs which have different
`performance or energy levels. Without additional requirements, such as area or cost, there
`is no way to decide which solution is better.
`
`From an optimization standpoint one possible metric is the product of energy and delay,
`measured in jules-sec, or its inverse, measured in SPEC2/watt. Optimizing the energy-
`delay product will prevent the designer from trading off a large amount of performance for
`a small savings in energy, or vice versa. As will be described later the energy-delay is also
`an attractive metric for other reasons. In Figure 2.2 the EDP corresponds to the inverse of
`the slope of a line that connects a design point to the origin. Thus finding a solution with a
`low EDP corresponds to finding a solution which lies on a steeper line.
`
`8
`
`
`
`2.3 Low-Power Design Techniques
`
`(cid:0)
`
`(cid:2)
`
`|
`0.5
`
`|
`1.0
`
`|
`|
`2.0
`1.5
` Normalized Energy
`
`Figure 2.2: Performance-energy plane.
`
`|2.1
`
`|1.8
`
`|1.5
`
`|1.2
`
`|0.9
`
`|0.6
`
`|0.3
`
`|
`|0.0
`0.0
`
` Normalized Performance
`
`2.3 Low-Power Design Techniques
`
`From Equation (2.1) one simple way to reduce the energy per operation is to lower the
`power-supply voltage. However, since both capacitance and threshold voltage are
`constant, the speed of the basic gates will also decrease with this voltage scaling. The
`delay of a CMOS gate can be modeled as the time required to discharge the output
`capacitance by the transistor current, Tg = CV/I. Using the current model presented by [10]
`this gives,
`
`Tg
`
`=
`
`K
`
`V
`---------------------------
`) α
`–(
`V Vth
`
`(2.3)
`
`where α is the velocity saturation coefficient and K is a technology specific constant.
`When transistors are not velocity saturated α=2.0 and the equation reduces to the
`quadratic model for transistor current. As transistors become more velocity saturated α
`decreases towards one. For typical 0.25μm technologies α=1.3-1.5.
`
`9
`
`
`
`2.3 Low-Power Design Techniques
`
`Figure 2.3 plots the normalized speed of operation versus the energy per operation of a
`CMOS gate as the supply voltage is scaled. The speed and energy were found by
`simulating an inverter in a 0.6μm technology using HSPICE. The threshold voltage is held
`constant in this example. At large voltages reducing the supply reduces the energy for a
`modest change in delay. This causes the curve to bend over. At voltages near the device
`threshold, small supply changes cause a large change in delay for a modest change in
`energy. But from V=1.5Vth to V=6Vth changes in energy and delay cancel each other and
`the curve approaches a straight line, which corresponds to a constant energy-delay
`product. Over this region the EDP remains within a factor of 2. In this case scaling the
`supply voltage reduces the power dissipation but at the expense of the speed of the gates.
`Looking at Equation (2.3), we see that one way to gain back the performance lost is to
`scale the threshold voltage. But this increases the leakage current. Chapter 4 explores the
`tradeoff between power and performance from scaling the supply and threshold voltages.
`It will be shown, however, that when leakage power is a very small fraction of the total
`power, as is the case in most technologies today, scaling the supply voltage does not
`significantly affect the EDP.
`|1.0
`
`EDP=X
`
` Normalized Performance
`
`|0.8
`
`|0.6
`
`|0.4
`
`|0.2
`
`|
`|0.0
`0.0
`
`EDP=2X
`
`|
`3.0
`
`|
`6.0
`
`|
`9.0
`
`|
`|
`15.0
`12.0
` Normalized Energy
`
`Figure 2.3: Variation in performance and energy with supply voltage.
`
`Another technique to reduce the energy per operation is to reduce the size of all transistors
`in the gate. This reduces the capacitance that needs to be switched when one of the input
`
`10
`
`
`
`2.3 Low-Power Design Techniques
`
`switches. Unfortunately it also decreases the current drive of the gate, making it slower.
`This can be partly compensated by making the next gate smaller. At some point, however,
`the load of the gate will no longer be dominated by the input capacitance of the following
`gates, but rather by the capacitance of the interconnect between gates.
`
`Figure 2.4 graphs the normalized energy per operation versus the speed of operation of a
`CMOS gate as the percentage of loading that is due to gate capacitance varies from 20% to
`80%. The diffusion capacitance also depends on transistor width and therefore the percent
`of loading that depends linearly on the transistor width is larger than shown in the figure.
`The dotted line indicates the point at which 80% of loading is proportional to transistor
`width. The load will be mostly wire capacitance for small transistor, and will be mostly
`gate capacitance for large devices. Continuing to increase the transistor sizes gives very
`small gains in performance for a large energy cost. This causes the curve to be almost flat.
`But for the points shown in the figure the difference in EDP is a factor of 2.5.
`EDP=X
`|1.0
`
`|0.8
`
`|0.6
`
`|0.4
`
`|0.2
`
`|
`|0.0
`0.0
`
`EDP=2.5X
`
`|
`2.0
`
`|
`4.0
`
`|
`|
`8.0
`6.0
` Normalized Energy
`
` Normalized Performance
`
`Figure 2.4: Variation in performance and energy with transistor sizing.
`
`Clearly, real circuits are more complex. The gate and wire capacitance is different for
`every gate, nodes transition at different frequencies, and not all gates are on the critical
`path. While this problem is difficult to solve precisely, the basic tradeoff remains the same.
`
`11
`
`
`
`2.3 Low-Power Design Techniques
`
`Sizing the transistors allows the designer to tradeoff speed for power. At either extreme
`(very large or very small transistors) the tradeoffs become poor.
`
`Often one wants to optimize two, or perhaps more, variables simultaneously. For example
`one may want to find the optimal supply voltage and transistor size. One way to visualize
`the data is to plot contours of energy, delay, and perhaps energy-delay, on the supply
`voltage vs transistor size plane. Figure 2.5 is an example of such a plot, and gives contours
`of inverse relative EDP versus transistor size and supply voltage. The Y-axis shows the
`percent of the total load that is due to gate capacitance, and the X-axis shows the supply
`voltage. For the range of supply voltage and gate loading shown in this figure there is a
`local minimum in the EDP at V=1.7V and 40% gate loading. The relative EDP is the EDP
`normalized to the minimum value. The figure plots contours of the inverse of this metric.
`Thus the value at the minima is 1 and decreases as one moves away from the minima. The
`contour labeled 0.5 has twice the minimum energy-delay. The advantage of representing
`the data this way is that it is much easier to understand how the variables of interest
`(energy, delay, energy-delay) change with the optimization variables (supply voltage,
`transistor sizing). If the data were plotted in the energy vs. perform