throbber
February 28—March 4. 1994 spring
`
`THIRTY—NINTH IEEE COMPUTER SOCIETY INTERNATIONAL CONFERENCE
`
`digest of papers
`
`EXHIBIT 1017
`
`IEEE Computer 800me Press
`
`© The lnstwmte U" Em‘tuml cmd Emmmmcs Em; reels Inc
`
`MICROSOFT CORP.
`
`i
`
`

`

`The PowerPC™ 603 Microprocessor:
`A Low-Power Design for Portable Applications
`
`Sonya Gary t, Carl Dietz*, Jim Eno t,
`Gianfranco Gerosa t, Sung Park*, Hector Sanchez t
`
`t Motorola, Incorporated, *International Business Machines Corporation
`Somerset Design Center, 9737 Great Hills Trail, Austin TX 78759
`
`Abstract
`The PowerPC 603™ microprocessor is a lor:J'°wer
`. The
`implementation of the PowerPC Architecture
`superscalar organiwtion includes dynamic localized shut(cid:173)
`down of execution units to reduce normal-mode power
`consumption. Three levels of static /ow-power operation
`are software programmable for system power manage(cid:173)
`ment. The 603* PIL (Phase Lock Loop) is capable of gen(cid:173)
`erating an internal processor clock at lX, 2X, 3X or 4X the
`system clock speed to allow control of system power while
`maintaining processor performance. Wzrious design fea(cid:173)
`tures optimize the 603 for both power and performance,
`creating an ideal microprocessor solution for ponable
`applications.
`
`1: Introduction
`
`1be marlcet for a pcxtable computer is dependent on its
`performance versus its portability and physical features.
`1be role of the microprocessor in a portable system is
`important for both of these conflicting demands. The
`microprocessor must provide the performance and flexibil(cid:173)
`ity to support a range of applications while promoting the
`extension of battery life by minimizing power dissipation
`during both normal operation and standby.
`1be PowerPC 603 microprocessor was designed with the
`performance and low-power requirements of portable sys(cid:173)
`tems in mind. The 603 combines an efficient 32-bit imple(cid:173)
`mentation of the PowerPC RISC architecture and effective
`static and dynamic power management features. The 3.3V
`CMOS design can achieve a peak instruction rate of 3 per
`cycle and keep power dissipation under 3 watts at 80
`MHz.
`
`•1n this doc:wnelll, the tmm "PowerPC 603 Microp-cx:euor" and
`"603" are used to cleDOle lhe second microproceaa from the PowerPC
`An:hitec:ture family. PowerPC, PowerPC An:hitec:ture and PowerPC 603
`are trademarks of International B11.1ioeu Machines Corporation. SPEC,
`SPECint92 and SPECfp92 are trademarks of Standard Performance Eval(cid:173)
`uation Corporation.
`
`1063-6390/94 $3.00 © 1994 IEEE
`
`307
`
`2: PowerPC 603 microarchitecture overview
`
`The architectural organization of the 603 focuses on high
`performance, low cost and power management Figure 1
`shows a high-level block diagram of the 603.
`
`Instruction Fetch and Branch Units
`
`64
`
`D-MMU
`8KB Data
`Cache
`
`1r1
`
`Control
`
`Bus Interface Unit
`64
`Address Data Control
`
`Figure 1: 603 Block Diagram
`
`The instruction fetch unit fetches two instructions per
`cycle from the instruction cache into a six-entry instruc(cid:173)
`tion queue. Branches are folded out of this queue by the
`tnnch 1D1it, which can then initiate fetching down either a
`sequential or target stream. Programmable static branch
`prediction is used for unresolved branches.
`The dispatch unit may issue up to two instructions per
`cycle to four independent execution units. These units are
`the fixed-point unit, the floating-point IDlit, the load/store
`
`

`

`unit and the system register unit With branch folding, a
`maximum issue rate of three instructions per cycle can be
`achieved. Execution unit access to general purpose or
`floating-point registers is handled with rename registers
`managed by the dispatdl unit
`The completion/exception unit tracks all dispatched
`instructions in a five-entry queue. A maximum of two
`instructions per cycle are retired in order from. this queue.
`As instructioos are retired, architectural register values are
`ccmmitted from. the appropriate rename registers. With
`branch folding, up to three instructions can be completed
`per cycle.
`The 603 contains separate 8KB, 2-way set associative
`instruction and data caches, each with a 32-byte line size.
`Both caches are blocking and allow one access per cycle.
`The data cache supports writeback operation and snoops,
`while the instruction cache does not support either. Mem(cid:173)
`ory management features include dual 6-k,ntry instruction
`and data ttanslation lookaside buffers, dual 4-entry
`instruction and data block address translation registers,
`and sixteen segment registers.
`
`3: Dynamic power management
`
`Dynamic power management is a software-enabled
`mode on the 603 which turns on power-saving logic for
`the execution units, caches and memory management units
`(MMUs) dming normal operation. Once the mode is
`enabled, no other software intervention is necessary.
`Dynamic power management logic automatically manip(cid:173)
`ulates clock regenerators to reduce average power con(cid:173)
`sumption. Power
`is reduced by eliminating clock
`switching and inhibiting change of registered values. In a
`static design, if registered values do not change, the logic
`those values feed does not switch. Oocks to a particular
`logic block are disabled or enabled on a cycle-by-cycle
`basis once it is determined whether that block is needed
`for instruction execution.
`
`3.1: Dynamic power management
`Implementation using clock regenerators
`
`The 603 clock regenerators produce two clocks, Cl and
`C2, which feed master and slave latches, Ll and L2,
`respectively. F.ach clock regenetator features two 'freere'
`inputs which are used for dynamic power management
`control. The assertion of these inputs force Cl and C2 low.
`The logic dedicated to support these freeze inputs in the
`clock regenerators accounts for under 0.3% of the total
`chip area.
`Tuning constraints on the assertion of freeze inputs are
`fairly strict Cl freeze must be defined by outputs of an L2
`latch to ensure freezing the correct Cl pulse and not that
`
`308
`
`of the previous cycle. It must also be early enough to fully
`block the next Cl pulse. If the assertion of Cl (C2) freeze
`is late, a small Cl (C2) pulse may be geneiated. Functioo(cid:173)
`ally this is not a problem since the 603 is a fully static
`design, and latched data is a don't care during the time Cl
`and C2 are frozen. However, late freere assertions do not
`allow maximum power savings. Power is consumed by the
`extta clock switching and potentially by spurious activity
`due to data changes. Therefore, it must be determined
`early in a cycle if a logic block must be clocked at the end
`of that cycle. This determination produces the Cl freeze.
`This freeze is fed through a transparent L1 latch to gener(cid:173)
`ate the C2 freeze. This is done to ensure a freeze of the
`correct C2 pulse and not the ooe of the previous cycle.
`Tuning constraints on the negation of freeze signals are
`based on minimum clock pulse width requirements for Ll
`and L2 and maintaining sufficient usable cycle time.
`Negation of both Cl and C2 freeres must be early enough
`to allow at least the minimum pulse width to latch new
`values so that the 603 functions cOlTCCtly. The negation of
`C2 freeze should also be early enough to allow a full C2
`pulse so that usable cycle time is not sacrificed by delay(cid:173)
`ing L2 outputs.
`Cl freeze and C2 freeze serve to disable or enable a
`clock regeneiator on a cycle-by-cycle basis so that there is
`no performance sacrifice to the affected logic. Figure 2
`shows the required timing for turning the Cl and C2
`clocks off and on using the Cl and C2 freere signals.
`,---·
`:
`'
`,---------·
`'
`'
`
`,
`
`C2
`
`Cl
`
`Cl FREEZE
`
`C2FREEZB -------
`TURNING nm CLOCKS OFF
`
`C2
`
`Cl
`
`f
`
`I
`
`r······-··•
`----,
`'
`
`Cl FREEZE
`
`C2FREEZE
`
`TIJRNING nm CLOCKS ON
`
`Figure 2: Freeze timing for C1 and C2
`
`3.2: Execution unit dynamic power management
`
`The separation between execution units in the supersca•
`Jar organization of the 603 allows for independent
`
`

`

`dynamic power management in each of the four IIDits. The
`dynamic power management conttol in each unit is based
`on logic already present for instruction dispatch and exe(cid:173)
`cution. Only 0.08% of the total chip area of the 603 is
`attributable to dynamic power management in the execu(cid:173)
`tion units.
`1be 603 execution units employ dynamic power man(cid:173)
`agement in a simple fashion. This involves distnl>uted
`logic for clock freezing based on pipeline stages or the
`particular instruction dispatched to a unit. The clocks
`feeding the front-end instruction buffer for each unit are
`enabled if a valid instruction is being executed in that unit
`or if a valid instruction assigned to that unit is present in
`the instruction dispatch buffer. Whether the instruction is
`actually dispatched is determined too late in the cycle to
`be part of the Cl freeze equation. Other stages of the exe(cid:173)
`cution unit pipelines are frozen separately based on activ(cid:173)
`ity each cycle.
`The load/store unit is a good example of the efficiency
`of this method. The last stage of the store pipe is a com(cid:173)
`pleted store queue which bolds a store until the cache is
`available or there is a load dependency. Thus a store may
`remain in the completed store queue for some time. The
`603 may freeze the other stages of the store pipeline
`despite stores remaining in the completed store queue.
`603 execution units also free:re clocks based on the par(cid:173)
`ticular instruction dispatched to a unit. The system register
`unit of the 603, which operates on architectural control
`registers, is a prime candidate for this method. lbis unit
`bas only one single-cycle execution stage but manages
`many registers. Clocks to each registel" are enabled only
`when that register is being modified, while clocks to the
`other registers in this unit remain fro:ren.
`
`3.2.1: Estimating effectiveness of dynamic power
`management in the execution units
`The effectiveness of using dynamic power management
`in the execution units varies with the type of code run. For
`example, the floating-point unit clocks will be frozen con(cid:173)
`tinuously if integer-only code is run. If code is scheduled
`such that all of the 1IDits remain busy continuously, few
`clocks will be frozen.
`In evaluating the power saving potential of dynamic
`power management in the execution units, the percentage
`idle time of each execution unit was estimated while run(cid:173)
`ning different types of code. This was done using an archi(cid:173)
`tectural modeling tool, the Basic RISC Architecture TlDler
`(BRA1) [l], which collected statistics while running
`SPEC92 benchmarlc traces. Cycles when the dispatch
`buffer contained an instruction for a unit were not
`included as idle time for that unit Though idle time fig(cid:173)
`ures indicate the time an entire execution unit may be fro(cid:173)
`:ren, they do not account for the time during instruction
`
`execution when some portions of that unit may remain fro(cid:173)
`zen.
`Toe BRAT results, shown in Figure 3, illustrate that for
`some applications, each functional unit may be idle during
`a large percentage of the run time. Since dynamic power
`management free:res execution unit clocks during this idle
`time, it can be effective in reducing the average power
`consumption of the execution units.
`
`0
`
`20
`
`40
`60
`%Timeldle
`Figure 3: Execution Unit Idle Time for SPEC92
`
`80
`
`100
`
`3.3: Cache and MMU dynamic power
`management
`
`The 603 bas blocking caches. This ensures that when a
`cache miss occurs, all other accesses are held off until
`miss data is returned from memory. During this time the
`caches and MMUs are idle and can have their clocks dis(cid:173)
`abled by dynamic power management logic while waiting
`for data. Depending on the memory latency and the pro(cid:173)
`cessor to bus clock ratio, this idle time could be many
`cycles. Blocking caches allowed a straightforward imple(cid:173)
`mentation of dynamic power management in the caches
`and MMUs with little impact on full chip area. The entire
`dynamic power management logic for the caches and
`MMUs is approximately 0.20% of the total chip area
`
`3.3.1: Data cache dynamic power management
`Dynamic power management logic freezes all clock
`regenerators to the data cache while waiting for miss data.
`This is true for all single beat or burst reads from mem<Yy.
`If a miss in the cache requires a writeback to memory, or
`castout, the clocks will be frozen after the castout is com(cid:173)
`plete. If a miss in the cache does not require a castout, the
`clocks are frozen immediately after the miss address is
`sent to the bus interface unit The clocks remain frozen
`mitil data is returned from memory. Clocks are automati-
`
`309
`
`

`

`cally enabled fm each beat of data of a burst read. If there
`are multiple proces8m' clocks between data beats, the
`clocks will be frozen between each beat. Figure 4 shows
`the data cache power management state machine.
`
`Figure 5: Snoop Power Up State Machine
`
`3.3.3: Imtrudlon cache dynamic power management
`The dynamic power management logic fa the instruc(cid:173)
`tioo cache will freeze the clocks on all burst or single beat
`read misses. The state machine which coob'Ols this is iden(cid:173)
`tical to the data cache state machine except that the castout
`section of the state machine is omitted because the instruc(cid:173)
`tioo cache does not include a modified state. Also, the
`instruction cache is not snooped so there is no need to
`power up the instruction cache while waiting for data from
`the bus.
`
`3.3A: MMU dynamic power management
`Both instruction and data memory management unit
`clocks are frozen for any burst or single beat read. 1be
`MMUs only need to be operational for initial lookups and
`not during snooping or for data beats from memory, so
`they are frozen fa the entire miss. However, system regis(cid:173)
`ter unit accesses to the MMUs may occur while the clocks
`are frozen. If lhis situation is detected, the dynamic power
`management logic will unfreeze the clocks until the next
`power down condition is met. Figure 6 shows the state
`machine which controls the dynamic power management
`oftheMMUs.
`
`3.3.5: Estimating effectiveness of dynamic power
`management In the caches and MMUs
`To determine the potential power savings of the dynamic
`power management logic in the caches and MMUs, the
`percentage of time that the cache would be idle waiting for
`data from memory was estimated. It was found that the
`cache would be idle 60% of the time waiting for data,
`assuming the following: a 6-1-2-1 memory system, a 2:1
`processor to bus clock ratio, an 80% cache hit rate, a cache
`access every cycle, and no bus snooping. Therefore,
`dynamic power management logic could free7.e the clocks
`
`"cutout_done
`Figure 4: Data Cache Power Management State
`Machine
`
`3.3.2: Support for bus snooping
`The 603 supports bus snooping to maintain memory
`coherency. If the data cache clocks are frozen due to a
`miss in the cache, the dynamic power management logic is
`required to unfree7.e the data cache clocks to service snoop
`requests. If a snoop hits to a modified line in the cache, the
`data cache clocks will remain enabled fm the snoop
`castout. If the snoop misses or hits to an unmodified line,
`the data cache clocks can be frozen until it receives a data
`beat from memory or another snoop request. Figure 5
`shows the state machine used to control the unfreezing of
`clocks fa snoop lookups and snoop castouts.
`The data cache power management state machine and
`the snoop power up state machine do not interact. They
`are totally free-running and operate independently, but
`together they define the Cl free7.e signal for the data
`cache:
`Cl__FllEEZE = ((POWER._DOWN_SBR & "dal&_beat) I (POWER_.
`DOWN_BURST & "dal&_beat)) & SNOOP _IDLE & Asnoop_request
`This freC7.e equation allows the data cache to be pow(cid:173)
`ered down due to a read miss, then to power up for a
`snoop request and snoop castout, and then power back
`down if read data bas not been returned from the bus. This
`approach was found to be much simpler than using one
`state machine to handle the coordination of cache misses
`with snoop accesses.
`
`310
`
`

`

`to the caches appoximately 60% of the time. The MMUs
`can be froml for a greater percentage of the lime because
`they are only used for initial lookups and their clocks do
`not need to be enabled for each data beat from memory.
`
`Figure 6: MMU Power Management State
`Machine
`
`3.4: Dynamic power management results
`
`Table l shows total power measured (instantaneous cur(cid:173)
`rent measured manually) for various applications with 603
`silicon using dynamic power management Power figures
`include both internal and external power consumed; inter(cid:173)
`nal power is measured for intemal-Vdd supplied to the 603
`only, while external power is measured for external V dd
`supplied to 603 VOs plus VO external pull-ups and termi(cid:173)
`nators. lllble 2 shows the percentage deaease in internal
`power dissipation for the same applications if dynamic
`power management is used. All of the dynamic power
`management logic accounts for approximately 0.6% of the
`total 603 chip area.
`
`Table 1: Dynamic Power Management Results
`
`"-<Wr
`
`f'IICI
`(IIHz)
`
`25
`
`33
`
`50
`
`66
`
`BO
`
`Clinpack
`
`dhlyllone
`
`hanoi
`
`hlapsort
`
`nsieva
`
`0.96
`
`1.16
`
`1.58
`
`1.97
`
`2.21
`
`0.85
`
`1.06
`
`1.49
`
`1.86
`
`2.20
`
`0.86
`
`1.07
`
`1.48
`
`1.84
`
`2.17
`
`0.83
`
`1.02
`
`1.45
`
`1.80
`
`2.12
`
`0.90
`
`1.13
`
`1.58
`
`1.98
`
`2.33
`
`•3.3V nominal, room lllmparalure, SOMHz in 2: l bus mode
`
`311
`
`Table 2: Percentage Decrease In Internal Power
`Dissipation Using Dynamic Power Management
`
`Clnpack
`
`~•m111
`
`8.5%
`
`14.0%
`
`hanoi
`
`13.8%
`
`hllplOl'I
`
`14.2%
`
`nllilYI
`
`16A%
`
`4: Static power management
`
`The 603 provides three static power management modes,
`Doze. Nap and Sleep, which are programmable through a
`hardware implementation register. Static power manage(cid:173)
`ment allows power management software or an operating
`system to reduce average power consumption when the
`603 is idle for any extended period of time. The names
`Doze. Nap and Sleep are indicative of the progressive
`inaease in power savings obtainable. Once any one of the
`static power management modes is enabled, the 603 com(cid:173)
`pletes execution of all outstanding instructions and
`achieves a quiescent stale. Once the quiescent stale is
`reached, the 603 disables all clocks to units thal are not
`required to be functional dming a particular power man(cid:173)
`agement mode. Once clocks are stopped and the processor
`is in one of the power-saving modes, an external event
`asserting one of the wake-up signals, such as the external
`interrupt input pin, will bring the 603 out of thal mode.
`The 603 will resume insttuction execution by jumping to
`the address of the appropriate intemJpt vector. This opera(cid:173)
`tion is common to all three modes. A more detailed
`description of the individual modes appears in the follow(cid:173)
`ing sections.
`
`4.1: Doze mode
`
`Doze mode allows the 603 to maintain cache coherency
`while in a power-saving mode. The snoop logic and the
`data cache are kept active to service snoops as they occur
`on the bus. If a snoop bit occurs, the necessary cache
`update and a snoop copyback bus operation will occur
`while the 603 remains in Doze mode. However, if an
`address parity error occurs during a snoop, <M" a bus enor
`during a snoop copyback causes a machine check condi(cid:173)
`tion, then the 603 will exit the Doze mode and take a
`machine check intemJpL
`Along with the snoop logic, the time base/decrement«
`logic is kept active in Doze mode. This provides uninter(cid:173)
`rupted timer functionality dming Doze. A decrement«
`interrupt will cause the 003 to exit Doze mode. Asserting
`any of the pins INT_, SMI_, MCP _, SRESET_ or
`HRESET_ will also cause the 603 to exit Doze mode in
`ten system clocks (SYSCLKs) or less. Using a 1: 1 proces(cid:173)
`sor to bus clock ratio, this is the worst case logic delay
`from the external pin assertion to all clocks being enabled.
`
`

`

`4.2: Nap mode
`
`In Nap mode, all logic is disabled except the time base/
`decrem.enter logic. Since cache coherency cannot be main(cid:173)
`tained in Nap mode, the system program must ftush the
`data cache before entering Nap mode if system memcxy is
`expected to be altered while in Nap mode. Otherwise, a
`data cache flush is not nece.,sary since the 603 keeps the
`cache contents unaltered while in Nap mode. A pair of
`handshake signals, QREQ_ and QACK_, are provided for
`the system to allow the 603 to go into Nap only when
`cache coherency will no longer be a problem. When the
`603 is ready to enter the Nap mode, QREQ_ is asserted.
`The system logic responds with an active QA.CK_ to
`allow the 603 to proceed with entering Nap mode.
`Compared to Doze mode, forth« power savings are
`achieved with the data cache and bus snooping logic dis(cid:173)
`abled. Furthermore, the receivers of most input and bi<li(cid:173)
`rectional pins are disabled f<X" added power savings.
`Outputs maintain their normal idle state. A decrementer
`interrupt or assertion of any one of the pins mentioned
`previously with regard to Doze mode will wake the 603
`from Nap mode. As in the Doze mode, the wake-up
`latency is ten SYSCT..Ks <X' Jess.
`
`4.3: Sleep mode
`
`Sleep mode allows a maximum power savings by dis(cid:173)
`abling the clocks to all units. The same QREQ_JQACK...
`handshake protocol exists for Sleep mode as for Nap.
`Unlike the Doze and Nap modes, more power savings can
`be achieved in Sleep mode by disabling the PLL and
`SYSCLK. In Doze and Nap modes, the SYSCLK and PLL
`configurations must remain in the same state as they were
`prior to the commencement of the power-saving mode.
`Sleep mode provides the ability to dynamically manage
`system power by allowing system logic to disable the PIL
`or SYSCLK, change the SYSCLK frequency, or change
`the processor to bus clock ratio through the PIL configu(cid:173)
`ration pins.
`Since Sleep mode does not automatically disable the
`PLL, the system logic can implement several different lev(cid:173)
`els of power savings depending on the wake-up response
`time requirements. For example, if a quick wake-up from
`sleep is required, then by leaving the SYSCLK input and
`the PIL configuration unaltered, the 603 will wake from
`Sleep within ten SYSCLK cycles or less. However, if
`maximum power savings is a requirement, then the PU.
`and SYSCLK may be disabled completely, reducing
`power dissipation to leakage levels. However, when com(cid:173)
`ing out of Sleep mode, a maximum of 200 usec is required
`for the PIL to relock to the new SYSO.K frequency.
`After this relock time, any of the external pins mentioned
`
`in the Doze section may be applied to wake-up from
`Sleep.
`
`4.4: StaUc power management results
`
`Tuble 3 shows total power measured with ti03 silicon
`using the static power management modes. Sleep figures
`are listed f<X' configmations with the PLL and SYSCLK
`enabled, with the PLL disabled and SYSCLK enabled, and
`with both the PLL and SYSCT..K disabled.
`
`Table 3: Static Power Management Results
`
`Frlq.
`(lltfz)
`
`Ooze
`
`Nap
`
`25
`
`33
`
`50
`
`66
`
`80
`
`133.2
`
`168.0
`
`241.7
`
`307.1
`
`366.1
`
`49.4
`
`62.0
`
`88.8
`
`113.0
`
`135.1
`
`Ponr(mW)"
`
`Sleep
`
`Plloll
`CU<on
`
`12.8
`
`13.5
`
`15.1
`
`17.7
`
`19.3
`
`Pl.Loll
`CLl(ofl
`
`4.7
`
`4.7
`
`4.7
`
`4.7
`
`4.7
`
`PlL on
`Cl.Kon
`
`38.8
`
`47.8
`
`66.2
`
`88.5
`
`105.S
`
`'"3.3V nomina~ room lllmperature, BOMllz in 2: I bus mode
`
`S: Other low-power design features
`
`Several other areas of the ti03 were taigeted f<X" low(cid:173)
`power design. These include the caches and memory man(cid:173)
`agement units, the phase lock loop, clock distribution and
`clock regenerators, standard cell and datapath lilxaries and
`the bus interface unit
`
`5.1: Cache and memory management design
`
`The data portion of each cache is divided into eight 1KB
`subarrays, which were specifically designed for low(cid:173)
`power operation. Each subarray holds one word of the
`cache line for both sets of data. This organization was cho(cid:173)
`sen to minimize power: since each cache employs double(cid:173)
`w<X'd datapatbs, there is never a need to access more than
`two subarrays during any given cycle. If only one word of
`data needs to be read or written, only one subarray is
`accessed.
`Each subarray is held in a constant state of precharge if
`the array is not being accessed. That is, each bitline and
`output driver of each subarray is constantly being pre(cid:173)
`charged when the subarray is idle. To save power, the pre-
`
`312
`
`r
`
`

`

`charge signals are not clocked and only tum off while the
`subarray is being accessed. The precharge of the output
`drivers is disabled only if the access is a read access.
`Special attention was given to the timing of the pre(cid:173)
`charge signals and the wocd lines in each subarray to guar(cid:173)
`antee that no bitcell is ever enabled while the bitline
`precharge devices are enabled. To further save power,
`each subarray employs pulsed word lines to eliminate
`unnecessary bitline discluuge once the sense amplifiers
`have been strobed. The sense amplifiers within each sub(cid:173)
`array function as data latches once they have been strobed,
`eliminating the need for clocked output data latches. For
`speed reasons, the sense amplifiers for both sets of data
`within each subarray are strobed for load hits. For
`castouts, however, only the sense amplifiers for the data
`set being cast out are strobed. Each subarray contains a
`way-select mux, allowing both sets of data to share the
`same output drivers.
`The tag portion of each cache consists of a single array.
`To save power, each tag array is precharged when idle.
`Due to the speed requirements of supporting a read-mod(cid:173)
`ify-write operation each cycle, the precharge signals
`within each tag array are clocked. Like the data subarrays,
`each tag array employs pulsed wordlines and strobed
`sense amplifiers. Unlike the data subarrays, however, each
`tag array's wordline pulsing logic is designed to limit the
`bitline discharge to approximately one volt, regardless of
`frequency. Limiting bitline discharge to one volt saves
`power at lower frequencies, where one of each pair of bit(cid:173)
`lines would discharge completely.
`The (i()3's separate instruction and data MMUs each
`consist of three arrays: a sixteen-entry Segment Register
`array, a 4-entry Block-Address-Translation {BA1) array,
`and a 64-entry, 2-way set associative Translation Looka(cid:173)
`side Buffer (TLB) array. Like the cache tag arrays, each of
`these arrays is precharged when idle, but the precharge
`signal is clocked. Each of these arrays is so small and fast
`that sense amplifiers are not required and thus pulsing the
`word lines of these arrays is unnecessary. This provides a
`further savings in DC power. The TLB array contains a
`way-select mux, allowing both sets of TLB data to share
`the same output drivers.
`
`66 MHz. If lower frequencies are desired, the PLL can be
`bypassed altogether. During Sleep mode, all analog cir(cid:173)
`cuits used in the PLL design are completely shut down in
`order to attain milliwatt performance.
`
`5.3: Clock distribution
`
`A passive clock distribution network was chosen over a
`set of distributed clock buffers in order to minimize active
`power. An H1REE style distribution network (see Figure
`7) using metal3 and metal4 produced the best rise/fall time
`performance while maintaining a low total routing capaci(cid:173)
`tance. In addition, skew was minimized among all
`branches. Grid-like layouts were discounted due to their
`high routing capacitance. Global clock skew was kept
`under+/- lOOpS with this scheme; rise and fall times were
`under 1.2 nS while average power due to the H1REE
`driver and the HTREE itself was kept under 100 mW at 80
`MHz.
`
`TEST_cLK
`
`pU_cfg[3:0]
`
`LOCALGATING
`
`Cl/C2 clod::
`Cl/C2clock
`
`5.2: Phase Lock Loop
`
`MASTER
`LOGIC
`
`The (i()3 integrates a synthesizing analog PLl... which
`maintains proper edge alignment between the 603's inter(cid:173)
`nal clock and SYSCLK. In addition, the (i()3 PLL can be
`configured during power up for a processor clock fre(cid:173)
`quency that is IX, 2X, 3X, or 4X the bus clock frequency.
`This allows reduced system power for a given level of
`processor performance. The PLL is designed to lock to a
`wide range of SYSCLK frequencies from 16 MHz up to
`
`313
`
`C1/C2clock
`
`B1/B2 BIU clock
`
`Figure 7: HTREE Clock Distribution Network
`
`

`

`5.4: Clock regenerator design
`
`8: Conclusion
`
`The output drivers of all 603 clock regenerators are
`designed with enough granularity to cover a wide capaci(cid:173)
`tive load range from 0.2 - 2.2 pF. As a result, lower aver(cid:173)
`age power is obtained as compared to other schemes which
`use 'dummy' loads in order to maintain constant clock
`latencies.
`
`5.5: Standard cell and datapath libraries
`
`Static logic design is used throughout the standard cell
`and datapath element libraries where the (speed X power)
`product is optimized. Special care was taken during logic
`synthesis in repowering standard cells for greater drive
`capability. Likewise, datapath elements were only repow(cid:173)
`ered in order to meet certain timing constraints.
`
`5.6: Bus Interface Unit
`
`In normal operation, the bus interface unit (BIU) reduces
`power consumption by turning off the 64 bit data bus
`receivers when the 603 is not reading from the bus. In this
`way power is saved whenever the data bus is used by other
`masters for operations such as DMA transfers or when the
`603 is performing write operations. Dynamic power man(cid:173)
`agement logic is not implemented in the BIU due to bus
`snooping requirements. Static power management modes
`do affect the BIU. In Nap and Sleep modes, clocks to the
`BIU are disabled along with any power-consuming cir(cid:173)
`cuitry, such as a voltage reference generator. Also, when
`waking up from Nap or Sleep, the BID resumes normal
`operation only after the bus clock and processor clock
`have been synchronized.
`
`6: CMOS Technology
`
`The 603 is fabricated with a 3.3 volt 0.5 micron CMOS
`technology using four layers of metal. Split instruction and
`data caches account for over half of the 1.6 million transis(cid:173)
`tors used within a 85 mm2 die. The design is fully static
`and LSSD compliant 603 physical characteristics are sum(cid:173)
`marized in Table 4.
`
`7: 603 performance
`
`The 603 design is optimiz.ed for both power and perfor(cid:173)
`mance. Table 5 provides estimated SPEC92 performance
`for an 80MHz 603-based system using a 2:1 processor to
`bus clock ratio and a second level external cache [2].
`
`The PowerPC 603 microprocessor is a low-power imple(cid:173)
`mentation of the PowerPC Architecture with power man(cid:173)
`agement capability and competitive performance. The
`area-conscious design also makes the 603 a cost-effective
`alternative. The 603 combines all of the necessary features
`to make it the ideal microprocessor solution for portable
`applications.
`
`Table 4: 603 Physical Characteristics
`
`0.5 um Nwell CMOS, four-layer metal
`Technology:
`7.39 mm X 11.47 mm
`Die Size:
`1.6 million
`Transistors:
`split 8KB caches
`Memory:
`spl~ 32b address, 64b data, 16 • 66 MHz
`Bus:
`3.3 V nominal
`Voltage:
`3.0 Watts Max @ ao MHz
`Power:
`240 pin wire-bond COFP, C4 CQFP
`Package:
`16 • 80 MHz
`Frequency:
`Processor to bus clock ratios: 1 :1, 2:1, 3:1, 4:1
`VOs:
`165 signal, CMOS/TTL, 5V tolerant
`
`Table 5: 603 Performance Estimate
`
`Frequency SPECint92 SPEC(p92
`85*
`80MHz
`75*
`
`•estimated based on simulated results
`
`Acknowledgments
`The authors gratefully acknowledge the 603 design team
`for their support in the design, implementation and verifi(cid:173)
`cation of the power-related features of this part. In particu(cid:173)
`lar, Peter Ippolito provided the clock regenerator design
`which made the implementation of the power management
`features possible. Also, the support and promotion of the
`603 as a power-conscious miaoprocessor by the 603
`project managers, Arturo Arizpe and Jim Kahle, was
`invaluable.
`
`314
`
`

`

`References
`
`Poursepanj, A. et al.,
`'7he PowerPCTM 603
`[l]
`Microprocessor: Perfonnance Analysis and Design 'lndo(cid:173)
`Offs," Proceeding, of COMPCON 1994, February 1994.
`
`(2) Burgess, B., et al., "The PowerPC™ 603 Microprocessor:
`A High Perfonnance, Low Power, Supencalar RISC
`Microprocessor," Pmcuding1 of COMPCON 1994, February
`1994.
`
`(3) Arizpe, A., and Kahle, J., "The Future Direclion of
`PowerPCTM:• The Sixth Annual Microprocessor Forum,
`October 18, 1993.
`
`315
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket