`
`THIRTY—NINTH IEEE COMPUTER SOCIETY INTERNATIONAL CONFERENCE
`
`digest of papers
`
`EXHIBIT 1017
`
`IEEE Computer 800me Press
`
`© The lnstwmte U" Em‘tuml cmd Emmmmcs Em; reels Inc
`
`MICROSOFT CORP.
`
`i
`
`
`
`The PowerPC™ 603 Microprocessor:
`A Low-Power Design for Portable Applications
`
`Sonya Gary t, Carl Dietz*, Jim Eno t,
`Gianfranco Gerosa t, Sung Park*, Hector Sanchez t
`
`t Motorola, Incorporated, *International Business Machines Corporation
`Somerset Design Center, 9737 Great Hills Trail, Austin TX 78759
`
`Abstract
`The PowerPC 603™ microprocessor is a lor:J'°wer
`. The
`implementation of the PowerPC Architecture
`superscalar organiwtion includes dynamic localized shut(cid:173)
`down of execution units to reduce normal-mode power
`consumption. Three levels of static /ow-power operation
`are software programmable for system power manage(cid:173)
`ment. The 603* PIL (Phase Lock Loop) is capable of gen(cid:173)
`erating an internal processor clock at lX, 2X, 3X or 4X the
`system clock speed to allow control of system power while
`maintaining processor performance. Wzrious design fea(cid:173)
`tures optimize the 603 for both power and performance,
`creating an ideal microprocessor solution for ponable
`applications.
`
`1: Introduction
`
`1be marlcet for a pcxtable computer is dependent on its
`performance versus its portability and physical features.
`1be role of the microprocessor in a portable system is
`important for both of these conflicting demands. The
`microprocessor must provide the performance and flexibil(cid:173)
`ity to support a range of applications while promoting the
`extension of battery life by minimizing power dissipation
`during both normal operation and standby.
`1be PowerPC 603 microprocessor was designed with the
`performance and low-power requirements of portable sys(cid:173)
`tems in mind. The 603 combines an efficient 32-bit imple(cid:173)
`mentation of the PowerPC RISC architecture and effective
`static and dynamic power management features. The 3.3V
`CMOS design can achieve a peak instruction rate of 3 per
`cycle and keep power dissipation under 3 watts at 80
`MHz.
`
`•1n this doc:wnelll, the tmm "PowerPC 603 Microp-cx:euor" and
`"603" are used to cleDOle lhe second microproceaa from the PowerPC
`An:hitec:ture family. PowerPC, PowerPC An:hitec:ture and PowerPC 603
`are trademarks of International B11.1ioeu Machines Corporation. SPEC,
`SPECint92 and SPECfp92 are trademarks of Standard Performance Eval(cid:173)
`uation Corporation.
`
`1063-6390/94 $3.00 © 1994 IEEE
`
`307
`
`2: PowerPC 603 microarchitecture overview
`
`The architectural organization of the 603 focuses on high
`performance, low cost and power management Figure 1
`shows a high-level block diagram of the 603.
`
`Instruction Fetch and Branch Units
`
`64
`
`D-MMU
`8KB Data
`Cache
`
`1r1
`
`Control
`
`Bus Interface Unit
`64
`Address Data Control
`
`Figure 1: 603 Block Diagram
`
`The instruction fetch unit fetches two instructions per
`cycle from the instruction cache into a six-entry instruc(cid:173)
`tion queue. Branches are folded out of this queue by the
`tnnch 1D1it, which can then initiate fetching down either a
`sequential or target stream. Programmable static branch
`prediction is used for unresolved branches.
`The dispatch unit may issue up to two instructions per
`cycle to four independent execution units. These units are
`the fixed-point unit, the floating-point IDlit, the load/store
`
`
`
`unit and the system register unit With branch folding, a
`maximum issue rate of three instructions per cycle can be
`achieved. Execution unit access to general purpose or
`floating-point registers is handled with rename registers
`managed by the dispatdl unit
`The completion/exception unit tracks all dispatched
`instructions in a five-entry queue. A maximum of two
`instructions per cycle are retired in order from. this queue.
`As instructioos are retired, architectural register values are
`ccmmitted from. the appropriate rename registers. With
`branch folding, up to three instructions can be completed
`per cycle.
`The 603 contains separate 8KB, 2-way set associative
`instruction and data caches, each with a 32-byte line size.
`Both caches are blocking and allow one access per cycle.
`The data cache supports writeback operation and snoops,
`while the instruction cache does not support either. Mem(cid:173)
`ory management features include dual 6-k,ntry instruction
`and data ttanslation lookaside buffers, dual 4-entry
`instruction and data block address translation registers,
`and sixteen segment registers.
`
`3: Dynamic power management
`
`Dynamic power management is a software-enabled
`mode on the 603 which turns on power-saving logic for
`the execution units, caches and memory management units
`(MMUs) dming normal operation. Once the mode is
`enabled, no other software intervention is necessary.
`Dynamic power management logic automatically manip(cid:173)
`ulates clock regenerators to reduce average power con(cid:173)
`sumption. Power
`is reduced by eliminating clock
`switching and inhibiting change of registered values. In a
`static design, if registered values do not change, the logic
`those values feed does not switch. Oocks to a particular
`logic block are disabled or enabled on a cycle-by-cycle
`basis once it is determined whether that block is needed
`for instruction execution.
`
`3.1: Dynamic power management
`Implementation using clock regenerators
`
`The 603 clock regenerators produce two clocks, Cl and
`C2, which feed master and slave latches, Ll and L2,
`respectively. F.ach clock regenetator features two 'freere'
`inputs which are used for dynamic power management
`control. The assertion of these inputs force Cl and C2 low.
`The logic dedicated to support these freeze inputs in the
`clock regenerators accounts for under 0.3% of the total
`chip area.
`Tuning constraints on the assertion of freeze inputs are
`fairly strict Cl freeze must be defined by outputs of an L2
`latch to ensure freezing the correct Cl pulse and not that
`
`308
`
`of the previous cycle. It must also be early enough to fully
`block the next Cl pulse. If the assertion of Cl (C2) freeze
`is late, a small Cl (C2) pulse may be geneiated. Functioo(cid:173)
`ally this is not a problem since the 603 is a fully static
`design, and latched data is a don't care during the time Cl
`and C2 are frozen. However, late freere assertions do not
`allow maximum power savings. Power is consumed by the
`extta clock switching and potentially by spurious activity
`due to data changes. Therefore, it must be determined
`early in a cycle if a logic block must be clocked at the end
`of that cycle. This determination produces the Cl freeze.
`This freeze is fed through a transparent L1 latch to gener(cid:173)
`ate the C2 freeze. This is done to ensure a freeze of the
`correct C2 pulse and not the ooe of the previous cycle.
`Tuning constraints on the negation of freeze signals are
`based on minimum clock pulse width requirements for Ll
`and L2 and maintaining sufficient usable cycle time.
`Negation of both Cl and C2 freeres must be early enough
`to allow at least the minimum pulse width to latch new
`values so that the 603 functions cOlTCCtly. The negation of
`C2 freeze should also be early enough to allow a full C2
`pulse so that usable cycle time is not sacrificed by delay(cid:173)
`ing L2 outputs.
`Cl freeze and C2 freeze serve to disable or enable a
`clock regeneiator on a cycle-by-cycle basis so that there is
`no performance sacrifice to the affected logic. Figure 2
`shows the required timing for turning the Cl and C2
`clocks off and on using the Cl and C2 freere signals.
`,---·
`:
`'
`,---------·
`'
`'
`
`,
`
`C2
`
`Cl
`
`Cl FREEZE
`
`C2FREEZB -------
`TURNING nm CLOCKS OFF
`
`C2
`
`Cl
`
`f
`
`I
`
`r······-··•
`----,
`'
`
`Cl FREEZE
`
`C2FREEZE
`
`TIJRNING nm CLOCKS ON
`
`Figure 2: Freeze timing for C1 and C2
`
`3.2: Execution unit dynamic power management
`
`The separation between execution units in the supersca•
`Jar organization of the 603 allows for independent
`
`
`
`dynamic power management in each of the four IIDits. The
`dynamic power management conttol in each unit is based
`on logic already present for instruction dispatch and exe(cid:173)
`cution. Only 0.08% of the total chip area of the 603 is
`attributable to dynamic power management in the execu(cid:173)
`tion units.
`1be 603 execution units employ dynamic power man(cid:173)
`agement in a simple fashion. This involves distnl>uted
`logic for clock freezing based on pipeline stages or the
`particular instruction dispatched to a unit. The clocks
`feeding the front-end instruction buffer for each unit are
`enabled if a valid instruction is being executed in that unit
`or if a valid instruction assigned to that unit is present in
`the instruction dispatch buffer. Whether the instruction is
`actually dispatched is determined too late in the cycle to
`be part of the Cl freeze equation. Other stages of the exe(cid:173)
`cution unit pipelines are frozen separately based on activ(cid:173)
`ity each cycle.
`The load/store unit is a good example of the efficiency
`of this method. The last stage of the store pipe is a com(cid:173)
`pleted store queue which bolds a store until the cache is
`available or there is a load dependency. Thus a store may
`remain in the completed store queue for some time. The
`603 may freeze the other stages of the store pipeline
`despite stores remaining in the completed store queue.
`603 execution units also free:re clocks based on the par(cid:173)
`ticular instruction dispatched to a unit. The system register
`unit of the 603, which operates on architectural control
`registers, is a prime candidate for this method. lbis unit
`bas only one single-cycle execution stage but manages
`many registers. Clocks to each registel" are enabled only
`when that register is being modified, while clocks to the
`other registers in this unit remain fro:ren.
`
`3.2.1: Estimating effectiveness of dynamic power
`management in the execution units
`The effectiveness of using dynamic power management
`in the execution units varies with the type of code run. For
`example, the floating-point unit clocks will be frozen con(cid:173)
`tinuously if integer-only code is run. If code is scheduled
`such that all of the 1IDits remain busy continuously, few
`clocks will be frozen.
`In evaluating the power saving potential of dynamic
`power management in the execution units, the percentage
`idle time of each execution unit was estimated while run(cid:173)
`ning different types of code. This was done using an archi(cid:173)
`tectural modeling tool, the Basic RISC Architecture TlDler
`(BRA1) [l], which collected statistics while running
`SPEC92 benchmarlc traces. Cycles when the dispatch
`buffer contained an instruction for a unit were not
`included as idle time for that unit Though idle time fig(cid:173)
`ures indicate the time an entire execution unit may be fro(cid:173)
`:ren, they do not account for the time during instruction
`
`execution when some portions of that unit may remain fro(cid:173)
`zen.
`Toe BRAT results, shown in Figure 3, illustrate that for
`some applications, each functional unit may be idle during
`a large percentage of the run time. Since dynamic power
`management free:res execution unit clocks during this idle
`time, it can be effective in reducing the average power
`consumption of the execution units.
`
`0
`
`20
`
`40
`60
`%Timeldle
`Figure 3: Execution Unit Idle Time for SPEC92
`
`80
`
`100
`
`3.3: Cache and MMU dynamic power
`management
`
`The 603 bas blocking caches. This ensures that when a
`cache miss occurs, all other accesses are held off until
`miss data is returned from memory. During this time the
`caches and MMUs are idle and can have their clocks dis(cid:173)
`abled by dynamic power management logic while waiting
`for data. Depending on the memory latency and the pro(cid:173)
`cessor to bus clock ratio, this idle time could be many
`cycles. Blocking caches allowed a straightforward imple(cid:173)
`mentation of dynamic power management in the caches
`and MMUs with little impact on full chip area. The entire
`dynamic power management logic for the caches and
`MMUs is approximately 0.20% of the total chip area
`
`3.3.1: Data cache dynamic power management
`Dynamic power management logic freezes all clock
`regenerators to the data cache while waiting for miss data.
`This is true for all single beat or burst reads from mem<Yy.
`If a miss in the cache requires a writeback to memory, or
`castout, the clocks will be frozen after the castout is com(cid:173)
`plete. If a miss in the cache does not require a castout, the
`clocks are frozen immediately after the miss address is
`sent to the bus interface unit The clocks remain frozen
`mitil data is returned from memory. Clocks are automati-
`
`309
`
`
`
`cally enabled fm each beat of data of a burst read. If there
`are multiple proces8m' clocks between data beats, the
`clocks will be frozen between each beat. Figure 4 shows
`the data cache power management state machine.
`
`Figure 5: Snoop Power Up State Machine
`
`3.3.3: Imtrudlon cache dynamic power management
`The dynamic power management logic fa the instruc(cid:173)
`tioo cache will freeze the clocks on all burst or single beat
`read misses. The state machine which coob'Ols this is iden(cid:173)
`tical to the data cache state machine except that the castout
`section of the state machine is omitted because the instruc(cid:173)
`tioo cache does not include a modified state. Also, the
`instruction cache is not snooped so there is no need to
`power up the instruction cache while waiting for data from
`the bus.
`
`3.3A: MMU dynamic power management
`Both instruction and data memory management unit
`clocks are frozen for any burst or single beat read. 1be
`MMUs only need to be operational for initial lookups and
`not during snooping or for data beats from memory, so
`they are frozen fa the entire miss. However, system regis(cid:173)
`ter unit accesses to the MMUs may occur while the clocks
`are frozen. If lhis situation is detected, the dynamic power
`management logic will unfreeze the clocks until the next
`power down condition is met. Figure 6 shows the state
`machine which controls the dynamic power management
`oftheMMUs.
`
`3.3.5: Estimating effectiveness of dynamic power
`management In the caches and MMUs
`To determine the potential power savings of the dynamic
`power management logic in the caches and MMUs, the
`percentage of time that the cache would be idle waiting for
`data from memory was estimated. It was found that the
`cache would be idle 60% of the time waiting for data,
`assuming the following: a 6-1-2-1 memory system, a 2:1
`processor to bus clock ratio, an 80% cache hit rate, a cache
`access every cycle, and no bus snooping. Therefore,
`dynamic power management logic could free7.e the clocks
`
`"cutout_done
`Figure 4: Data Cache Power Management State
`Machine
`
`3.3.2: Support for bus snooping
`The 603 supports bus snooping to maintain memory
`coherency. If the data cache clocks are frozen due to a
`miss in the cache, the dynamic power management logic is
`required to unfree7.e the data cache clocks to service snoop
`requests. If a snoop hits to a modified line in the cache, the
`data cache clocks will remain enabled fm the snoop
`castout. If the snoop misses or hits to an unmodified line,
`the data cache clocks can be frozen until it receives a data
`beat from memory or another snoop request. Figure 5
`shows the state machine used to control the unfreezing of
`clocks fa snoop lookups and snoop castouts.
`The data cache power management state machine and
`the snoop power up state machine do not interact. They
`are totally free-running and operate independently, but
`together they define the Cl free7.e signal for the data
`cache:
`Cl__FllEEZE = ((POWER._DOWN_SBR & "dal&_beat) I (POWER_.
`DOWN_BURST & "dal&_beat)) & SNOOP _IDLE & Asnoop_request
`This freC7.e equation allows the data cache to be pow(cid:173)
`ered down due to a read miss, then to power up for a
`snoop request and snoop castout, and then power back
`down if read data bas not been returned from the bus. This
`approach was found to be much simpler than using one
`state machine to handle the coordination of cache misses
`with snoop accesses.
`
`310
`
`
`
`to the caches appoximately 60% of the time. The MMUs
`can be froml for a greater percentage of the lime because
`they are only used for initial lookups and their clocks do
`not need to be enabled for each data beat from memory.
`
`Figure 6: MMU Power Management State
`Machine
`
`3.4: Dynamic power management results
`
`Table l shows total power measured (instantaneous cur(cid:173)
`rent measured manually) for various applications with 603
`silicon using dynamic power management Power figures
`include both internal and external power consumed; inter(cid:173)
`nal power is measured for intemal-Vdd supplied to the 603
`only, while external power is measured for external V dd
`supplied to 603 VOs plus VO external pull-ups and termi(cid:173)
`nators. lllble 2 shows the percentage deaease in internal
`power dissipation for the same applications if dynamic
`power management is used. All of the dynamic power
`management logic accounts for approximately 0.6% of the
`total 603 chip area.
`
`Table 1: Dynamic Power Management Results
`
`"-<Wr
`
`f'IICI
`(IIHz)
`
`25
`
`33
`
`50
`
`66
`
`BO
`
`Clinpack
`
`dhlyllone
`
`hanoi
`
`hlapsort
`
`nsieva
`
`0.96
`
`1.16
`
`1.58
`
`1.97
`
`2.21
`
`0.85
`
`1.06
`
`1.49
`
`1.86
`
`2.20
`
`0.86
`
`1.07
`
`1.48
`
`1.84
`
`2.17
`
`0.83
`
`1.02
`
`1.45
`
`1.80
`
`2.12
`
`0.90
`
`1.13
`
`1.58
`
`1.98
`
`2.33
`
`•3.3V nominal, room lllmparalure, SOMHz in 2: l bus mode
`
`311
`
`Table 2: Percentage Decrease In Internal Power
`Dissipation Using Dynamic Power Management
`
`Clnpack
`
`~•m111
`
`8.5%
`
`14.0%
`
`hanoi
`
`13.8%
`
`hllplOl'I
`
`14.2%
`
`nllilYI
`
`16A%
`
`4: Static power management
`
`The 603 provides three static power management modes,
`Doze. Nap and Sleep, which are programmable through a
`hardware implementation register. Static power manage(cid:173)
`ment allows power management software or an operating
`system to reduce average power consumption when the
`603 is idle for any extended period of time. The names
`Doze. Nap and Sleep are indicative of the progressive
`inaease in power savings obtainable. Once any one of the
`static power management modes is enabled, the 603 com(cid:173)
`pletes execution of all outstanding instructions and
`achieves a quiescent stale. Once the quiescent stale is
`reached, the 603 disables all clocks to units thal are not
`required to be functional dming a particular power man(cid:173)
`agement mode. Once clocks are stopped and the processor
`is in one of the power-saving modes, an external event
`asserting one of the wake-up signals, such as the external
`interrupt input pin, will bring the 603 out of thal mode.
`The 603 will resume insttuction execution by jumping to
`the address of the appropriate intemJpt vector. This opera(cid:173)
`tion is common to all three modes. A more detailed
`description of the individual modes appears in the follow(cid:173)
`ing sections.
`
`4.1: Doze mode
`
`Doze mode allows the 603 to maintain cache coherency
`while in a power-saving mode. The snoop logic and the
`data cache are kept active to service snoops as they occur
`on the bus. If a snoop bit occurs, the necessary cache
`update and a snoop copyback bus operation will occur
`while the 603 remains in Doze mode. However, if an
`address parity error occurs during a snoop, <M" a bus enor
`during a snoop copyback causes a machine check condi(cid:173)
`tion, then the 603 will exit the Doze mode and take a
`machine check intemJpL
`Along with the snoop logic, the time base/decrement«
`logic is kept active in Doze mode. This provides uninter(cid:173)
`rupted timer functionality dming Doze. A decrement«
`interrupt will cause the 003 to exit Doze mode. Asserting
`any of the pins INT_, SMI_, MCP _, SRESET_ or
`HRESET_ will also cause the 603 to exit Doze mode in
`ten system clocks (SYSCLKs) or less. Using a 1: 1 proces(cid:173)
`sor to bus clock ratio, this is the worst case logic delay
`from the external pin assertion to all clocks being enabled.
`
`
`
`4.2: Nap mode
`
`In Nap mode, all logic is disabled except the time base/
`decrem.enter logic. Since cache coherency cannot be main(cid:173)
`tained in Nap mode, the system program must ftush the
`data cache before entering Nap mode if system memcxy is
`expected to be altered while in Nap mode. Otherwise, a
`data cache flush is not nece.,sary since the 603 keeps the
`cache contents unaltered while in Nap mode. A pair of
`handshake signals, QREQ_ and QACK_, are provided for
`the system to allow the 603 to go into Nap only when
`cache coherency will no longer be a problem. When the
`603 is ready to enter the Nap mode, QREQ_ is asserted.
`The system logic responds with an active QA.CK_ to
`allow the 603 to proceed with entering Nap mode.
`Compared to Doze mode, forth« power savings are
`achieved with the data cache and bus snooping logic dis(cid:173)
`abled. Furthermore, the receivers of most input and bi<li(cid:173)
`rectional pins are disabled f<X" added power savings.
`Outputs maintain their normal idle state. A decrementer
`interrupt or assertion of any one of the pins mentioned
`previously with regard to Doze mode will wake the 603
`from Nap mode. As in the Doze mode, the wake-up
`latency is ten SYSCT..Ks <X' Jess.
`
`4.3: Sleep mode
`
`Sleep mode allows a maximum power savings by dis(cid:173)
`abling the clocks to all units. The same QREQ_JQACK...
`handshake protocol exists for Sleep mode as for Nap.
`Unlike the Doze and Nap modes, more power savings can
`be achieved in Sleep mode by disabling the PLL and
`SYSCLK. In Doze and Nap modes, the SYSCLK and PLL
`configurations must remain in the same state as they were
`prior to the commencement of the power-saving mode.
`Sleep mode provides the ability to dynamically manage
`system power by allowing system logic to disable the PIL
`or SYSCLK, change the SYSCLK frequency, or change
`the processor to bus clock ratio through the PIL configu(cid:173)
`ration pins.
`Since Sleep mode does not automatically disable the
`PLL, the system logic can implement several different lev(cid:173)
`els of power savings depending on the wake-up response
`time requirements. For example, if a quick wake-up from
`sleep is required, then by leaving the SYSCLK input and
`the PIL configuration unaltered, the 603 will wake from
`Sleep within ten SYSCLK cycles or less. However, if
`maximum power savings is a requirement, then the PU.
`and SYSCLK may be disabled completely, reducing
`power dissipation to leakage levels. However, when com(cid:173)
`ing out of Sleep mode, a maximum of 200 usec is required
`for the PIL to relock to the new SYSO.K frequency.
`After this relock time, any of the external pins mentioned
`
`in the Doze section may be applied to wake-up from
`Sleep.
`
`4.4: StaUc power management results
`
`Tuble 3 shows total power measured with ti03 silicon
`using the static power management modes. Sleep figures
`are listed f<X' configmations with the PLL and SYSCLK
`enabled, with the PLL disabled and SYSCLK enabled, and
`with both the PLL and SYSCT..K disabled.
`
`Table 3: Static Power Management Results
`
`Frlq.
`(lltfz)
`
`Ooze
`
`Nap
`
`25
`
`33
`
`50
`
`66
`
`80
`
`133.2
`
`168.0
`
`241.7
`
`307.1
`
`366.1
`
`49.4
`
`62.0
`
`88.8
`
`113.0
`
`135.1
`
`Ponr(mW)"
`
`Sleep
`
`Plloll
`CU<on
`
`12.8
`
`13.5
`
`15.1
`
`17.7
`
`19.3
`
`Pl.Loll
`CLl(ofl
`
`4.7
`
`4.7
`
`4.7
`
`4.7
`
`4.7
`
`PlL on
`Cl.Kon
`
`38.8
`
`47.8
`
`66.2
`
`88.5
`
`105.S
`
`'"3.3V nomina~ room lllmperature, BOMllz in 2: I bus mode
`
`S: Other low-power design features
`
`Several other areas of the ti03 were taigeted f<X" low(cid:173)
`power design. These include the caches and memory man(cid:173)
`agement units, the phase lock loop, clock distribution and
`clock regenerators, standard cell and datapath lilxaries and
`the bus interface unit
`
`5.1: Cache and memory management design
`
`The data portion of each cache is divided into eight 1KB
`subarrays, which were specifically designed for low(cid:173)
`power operation. Each subarray holds one word of the
`cache line for both sets of data. This organization was cho(cid:173)
`sen to minimize power: since each cache employs double(cid:173)
`w<X'd datapatbs, there is never a need to access more than
`two subarrays during any given cycle. If only one word of
`data needs to be read or written, only one subarray is
`accessed.
`Each subarray is held in a constant state of precharge if
`the array is not being accessed. That is, each bitline and
`output driver of each subarray is constantly being pre(cid:173)
`charged when the subarray is idle. To save power, the pre-
`
`312
`
`r
`
`
`
`charge signals are not clocked and only tum off while the
`subarray is being accessed. The precharge of the output
`drivers is disabled only if the access is a read access.
`Special attention was given to the timing of the pre(cid:173)
`charge signals and the wocd lines in each subarray to guar(cid:173)
`antee that no bitcell is ever enabled while the bitline
`precharge devices are enabled. To further save power,
`each subarray employs pulsed word lines to eliminate
`unnecessary bitline discluuge once the sense amplifiers
`have been strobed. The sense amplifiers within each sub(cid:173)
`array function as data latches once they have been strobed,
`eliminating the need for clocked output data latches. For
`speed reasons, the sense amplifiers for both sets of data
`within each subarray are strobed for load hits. For
`castouts, however, only the sense amplifiers for the data
`set being cast out are strobed. Each subarray contains a
`way-select mux, allowing both sets of data to share the
`same output drivers.
`The tag portion of each cache consists of a single array.
`To save power, each tag array is precharged when idle.
`Due to the speed requirements of supporting a read-mod(cid:173)
`ify-write operation each cycle, the precharge signals
`within each tag array are clocked. Like the data subarrays,
`each tag array employs pulsed wordlines and strobed
`sense amplifiers. Unlike the data subarrays, however, each
`tag array's wordline pulsing logic is designed to limit the
`bitline discharge to approximately one volt, regardless of
`frequency. Limiting bitline discharge to one volt saves
`power at lower frequencies, where one of each pair of bit(cid:173)
`lines would discharge completely.
`The (i()3's separate instruction and data MMUs each
`consist of three arrays: a sixteen-entry Segment Register
`array, a 4-entry Block-Address-Translation {BA1) array,
`and a 64-entry, 2-way set associative Translation Looka(cid:173)
`side Buffer (TLB) array. Like the cache tag arrays, each of
`these arrays is precharged when idle, but the precharge
`signal is clocked. Each of these arrays is so small and fast
`that sense amplifiers are not required and thus pulsing the
`word lines of these arrays is unnecessary. This provides a
`further savings in DC power. The TLB array contains a
`way-select mux, allowing both sets of TLB data to share
`the same output drivers.
`
`66 MHz. If lower frequencies are desired, the PLL can be
`bypassed altogether. During Sleep mode, all analog cir(cid:173)
`cuits used in the PLL design are completely shut down in
`order to attain milliwatt performance.
`
`5.3: Clock distribution
`
`A passive clock distribution network was chosen over a
`set of distributed clock buffers in order to minimize active
`power. An H1REE style distribution network (see Figure
`7) using metal3 and metal4 produced the best rise/fall time
`performance while maintaining a low total routing capaci(cid:173)
`tance. In addition, skew was minimized among all
`branches. Grid-like layouts were discounted due to their
`high routing capacitance. Global clock skew was kept
`under+/- lOOpS with this scheme; rise and fall times were
`under 1.2 nS while average power due to the H1REE
`driver and the HTREE itself was kept under 100 mW at 80
`MHz.
`
`TEST_cLK
`
`pU_cfg[3:0]
`
`LOCALGATING
`
`Cl/C2 clod::
`Cl/C2clock
`
`5.2: Phase Lock Loop
`
`MASTER
`LOGIC
`
`The (i()3 integrates a synthesizing analog PLl... which
`maintains proper edge alignment between the 603's inter(cid:173)
`nal clock and SYSCLK. In addition, the (i()3 PLL can be
`configured during power up for a processor clock fre(cid:173)
`quency that is IX, 2X, 3X, or 4X the bus clock frequency.
`This allows reduced system power for a given level of
`processor performance. The PLL is designed to lock to a
`wide range of SYSCLK frequencies from 16 MHz up to
`
`313
`
`C1/C2clock
`
`B1/B2 BIU clock
`
`Figure 7: HTREE Clock Distribution Network
`
`
`
`5.4: Clock regenerator design
`
`8: Conclusion
`
`The output drivers of all 603 clock regenerators are
`designed with enough granularity to cover a wide capaci(cid:173)
`tive load range from 0.2 - 2.2 pF. As a result, lower aver(cid:173)
`age power is obtained as compared to other schemes which
`use 'dummy' loads in order to maintain constant clock
`latencies.
`
`5.5: Standard cell and datapath libraries
`
`Static logic design is used throughout the standard cell
`and datapath element libraries where the (speed X power)
`product is optimized. Special care was taken during logic
`synthesis in repowering standard cells for greater drive
`capability. Likewise, datapath elements were only repow(cid:173)
`ered in order to meet certain timing constraints.
`
`5.6: Bus Interface Unit
`
`In normal operation, the bus interface unit (BIU) reduces
`power consumption by turning off the 64 bit data bus
`receivers when the 603 is not reading from the bus. In this
`way power is saved whenever the data bus is used by other
`masters for operations such as DMA transfers or when the
`603 is performing write operations. Dynamic power man(cid:173)
`agement logic is not implemented in the BIU due to bus
`snooping requirements. Static power management modes
`do affect the BIU. In Nap and Sleep modes, clocks to the
`BIU are disabled along with any power-consuming cir(cid:173)
`cuitry, such as a voltage reference generator. Also, when
`waking up from Nap or Sleep, the BID resumes normal
`operation only after the bus clock and processor clock
`have been synchronized.
`
`6: CMOS Technology
`
`The 603 is fabricated with a 3.3 volt 0.5 micron CMOS
`technology using four layers of metal. Split instruction and
`data caches account for over half of the 1.6 million transis(cid:173)
`tors used within a 85 mm2 die. The design is fully static
`and LSSD compliant 603 physical characteristics are sum(cid:173)
`marized in Table 4.
`
`7: 603 performance
`
`The 603 design is optimiz.ed for both power and perfor(cid:173)
`mance. Table 5 provides estimated SPEC92 performance
`for an 80MHz 603-based system using a 2:1 processor to
`bus clock ratio and a second level external cache [2].
`
`The PowerPC 603 microprocessor is a low-power imple(cid:173)
`mentation of the PowerPC Architecture with power man(cid:173)
`agement capability and competitive performance. The
`area-conscious design also makes the 603 a cost-effective
`alternative. The 603 combines all of the necessary features
`to make it the ideal microprocessor solution for portable
`applications.
`
`Table 4: 603 Physical Characteristics
`
`0.5 um Nwell CMOS, four-layer metal
`Technology:
`7.39 mm X 11.47 mm
`Die Size:
`1.6 million
`Transistors:
`split 8KB caches
`Memory:
`spl~ 32b address, 64b data, 16 • 66 MHz
`Bus:
`3.3 V nominal
`Voltage:
`3.0 Watts Max @ ao MHz
`Power:
`240 pin wire-bond COFP, C4 CQFP
`Package:
`16 • 80 MHz
`Frequency:
`Processor to bus clock ratios: 1 :1, 2:1, 3:1, 4:1
`VOs:
`165 signal, CMOS/TTL, 5V tolerant
`
`Table 5: 603 Performance Estimate
`
`Frequency SPECint92 SPEC(p92
`85*
`80MHz
`75*
`
`•estimated based on simulated results
`
`Acknowledgments
`The authors gratefully acknowledge the 603 design team
`for their support in the design, implementation and verifi(cid:173)
`cation of the power-related features of this part. In particu(cid:173)
`lar, Peter Ippolito provided the clock regenerator design
`which made the implementation of the power management
`features possible. Also, the support and promotion of the
`603 as a power-conscious miaoprocessor by the 603
`project managers, Arturo Arizpe and Jim Kahle, was
`invaluable.
`
`314
`
`
`
`References
`
`Poursepanj, A. et al.,
`'7he PowerPCTM 603
`[l]
`Microprocessor: Perfonnance Analysis and Design 'lndo(cid:173)
`Offs," Proceeding, of COMPCON 1994, February 1994.
`
`(2) Burgess, B., et al., "The PowerPC™ 603 Microprocessor:
`A High Perfonnance, Low Power, Supencalar RISC
`Microprocessor," Pmcuding1 of COMPCON 1994, February
`1994.
`
`(3) Arizpe, A., and Kahle, J., "The Future Direclion of
`PowerPCTM:• The Sixth Annual Microprocessor Forum,
`October 18, 1993.
`
`315
`
`