`a=ee)Za=Dr——Ce)©==
`IASystems Perspect
`ae
`NICIL HH. E.
`KAMRA
`i)
`1D
`
`rs)
`71)
`
`1D) |
`
`TION
`
`ve
`
`WESTE
`M
`SHRAGHIAN
`
`H(n)=~
`
`~-·-~
`
`M.-..~
`
`Seeea
`
`xoosafew§Eg
`SSSRRRISAINARESSSSSSSSRIS
`SESBSEESSN2pal8afinOeeeaeSNCS;Seae2|Bee
`
`:=es
`F
`SpecasewyBy
`
`Qualcomm, Ex. 1016, Page 1
`
`
`
`This book is in the Addison-Wesley VLSI Systems Series.
`
`Lynn Conway and Charles Seitz, Consulting Editors
`
`The VLSI Systems Series
`
`Circuits, Interconnections, and Packaging for VLSI by H.B. Bakoglu
`
`Analog VLSI and Neural Systems by Carver Me!1d
`
`The CMOS3 Cell Library edited by Dennis Heinbuch
`
`Computer Aids for VLSI Design by Steven Rubin
`
`The Design and Analysis of VLSI Circuits by Lance Glasser and Daniel Dobberpuhl
`
`Principles of CMOS VLSI Design by Neil H. E. Weste and Kamran Eshraghian
`
`Also from Addison-Wesley:
`An Introduction to VLSI Systems by Carver Mead and Lynn Conway
`
`Qualcomm, Ex. 1016, Page 2
`
`
`
`PRINCIPLES OF
`CMOS VLSI
`DESIGN
`A Systems Perspective
`
`Second Edition
`
`Neil H. E. Weste
`TLW, Inc.
`
`Kamran Eshraghian
`University of Adelaide
`
`• ••
`
`ADDISON-WESLEY PUBLISHING COMPANY
`Reading, Massachusetts• Menlo Park, California • New York
`Don Mills, Ontario • Wokingham, England • Amsterdam • Bonn
`Sydney • Singapore • Tokyo • Madrid • San Juan • Milan • Paris
`
`- - - -- - - - - ~ - - -
`
`-
`
`Qualcomm, Ex. 1016, Page 3
`
`
`
`Sponsoring Editor: Peter S. Gordon
`Production Supervisor: Peggy McMahon
`Marketing Manager: Bob Donegan
`Manufacturing Supervisor: Roy Logan
`Cover Designer: Eileen Hoff
`Composition Services: Mike Wile
`Technical Art Supervisor: Joseph K. Vetere
`Technical Art Consultant: Loretta Bailey
`Technical Art Coordinator: Alena B. Konecny
`
`Library of Congress Cataloging-In-Publication Data
`Weste, Neil H. E.
`Principles of CMOS VLSI design : a systems perspective/ Neil
`Waste, Kamran Eshraghian -- 2nd ed.
`p. cm.
`Includes bibliographical references and index.
`ISBN 0-201-53376-6
`1. lntergrated circuits--Very large scale integration--design and
`construction 2. Metal oxide semiconductors, Complementary.
`. II. Title.
`I. Eshraghian, Kamran.
`TK787 4. W46 1992
`621.3'95--dc20
`
`92-16564
`
`Cover Photo: Dick Morton
`Cover Art: Neil Waste
`Photo Credit: Plates 5, 12, and 13, Melgar Photography, Inc., Santa Clara, CA
`
`AT&T
`Copyright © 1993 by AT & T
`All rights reserved. No part of this publication may be reproduced, stored in a
`retrieval system, or transmitted, in any form or by any means, electronic, mechani(cid:173)
`cal, photocopying, recording, or otherwise, without the prior written permission of
`the publisher. Printed in the United States of America.
`1 2 3 4 5 6 7 8 9 1 0-MA-96959493
`
`Qualcomm, Ex. 1016, Page 4
`
`
`
`5.4 CMOS LOGIC STRUCTURES
`
`313
`
`When the inputs switch, nodes Q and -Q are pulled either high or low. Pos(cid:173)
`itive feedback applied to the p pull-ups causes the gate to switch. The logic
`trees may be further minimized from the full differential form using logic
`minimization algorithms. This version, which might be termed a "static"
`CVSL gate, is slower than a conventional complementary gate employing a
`p-tree and n-tree. This is because during the switching action, the p pull-ups
`have to "fight" then pull-down trees. Figure 5.39(b) shows the implementa(cid:173)
`tion of the example gate. In isolation, this is not a very efficient implementa(cid:173)
`tion of this gate; however, in certain cases, such as multiple input XOR
`gates, the implementation is quite reasonable.
`Further refinement leads to a clocked version of the CVSL gate
`(Fig. 5.39c). This is really just two "domino" gates operating on true and
`complement inputs with a minimized logic tree. The advantage of this style
`of logic over domino logic is the ability to generate any logic expression,
`making it a complete logic family (as noted in Section 5.4.7, domino logic
`can only generate noninverted forms of logic). This is achieved at the
`expense of the extra routing, active area, and complexity associated with
`dealing with double-rail logic. However, the ability to generate any logic
`function is of advantage where automated logic synthesis is required. A four(cid:173)
`way XOR gate is shown in Fig. 5.39(d).42 The performance of the dynamic
`CVSL gate may be improved with the addition of a latching sense amplifier
`as shown in Fig. 5.40.43 This variation is called Sample-Set Differential
`Logic (SSDL). It works slightly differently from dynamic CVSL. When
`elk= 0, Pi, P2, and Ni are turned on. One output will be at VDD and the other
`will be slightly below VDD because a path exists to Vss through one of then
`trees. When elk = 1, the latching sense amplifier forces the lower output
`
`,,
`,-
`f. ,,,
`f
`'
`
`Latching Sense Amplifier
`
`Differential Inputs
`
`nMOS
`
`Combinatorial
`Network
`
`FIGURE 5.40 A latching
`sense amplifier for use with
`dynamic CVSL (SSDL logic)
`
`Qualcomm, Ex. 1016, Page 5
`
`
`
`8.3 MEMORY ELEMENTS
`
`563
`
`r r
`
`j
`1
`l
`
`shifts right and one of which shifts left. The fill values can be set by appro(cid:173)
`priate connections at the ends of the shifter ranks. The output of the two
`shifters is muxed to form a final result. The value of SHIFT<2:0> gives the
`amount of the shift with SHIFT<3> = I producing a left shift, while
`SHIFT<3> = 0 produces an arithmetic right shift. Left and right rotates may
`be implemented by wrapping the end connections conditionally to the oppo(cid:173)
`site end bits.
`Shifters implemented with transmission gates are notorious for fooling
`timing analyzers unless the directionality of the pass transistors are somehow
`communicated to the timing analyzer. The multiplexer shifter may use buff(cid:173)
`ered (inverting, if need be) multiplexers, which can aid in speeding up the long
`lines in large shifters. The multiplexer version directly takes the shift amount
`as control, while the array version requires an n:m decoder (2:4 for the one
`shown in Fig. 8.46a). For these reasons the multiplexer version may be
`favored in CMOS although the version shown in Fig. 8.46 can be compact.
`Other shift options are frequently required, for instance, shuffles, bit(cid:173)
`reversals, and interchanges. One can either use the complementary transmis(cid:173)
`sion gate, static single-pass transistors (usually n-channel). Precharged ver(cid:173)
`sions of single-pass transistor shifter circuits are generally cumbersome.
`Large capacitances can be associated with the intermediate mux nodes and
`these must all be precharged to prevent charge-sharing problems. The speed
`of an n-bit shifter is proportional to log(n), so combined with the fast speed
`of transmission gates, shifting can be a fast operation.
`
`8.3 Memory Elements
`
`Memory elements form critical components in the implementation of CMOS
`systems. While off-the-shelf memories are limited by the number of 1/0
`pins, the speed of driving into the chip, and large off-chip output nodes, on(cid:173)
`chip memories can be engineered to be very fast and to have unique access
`paths. In general, CMOS ASIC processes will not compete with the density
`of state-of-the-art DRAM memory, but may be very competitive with high(cid:173)
`speed static memories. Memory elements may be divided into the following
`categories:
`
`• Random access memory.
`• Serial access memory.
`• Content access memory.
`
`Random access memory at the chip level is classed as memory that has
`an access time independent of the physical location of the data. This is con(cid:173)
`trasted with serial-access memories, which have some latency associated
`
`Qualcomm, Ex. 1016, Page 6
`
`
`
`564
`
`CHAPTER 8 SUBSYSTEM DESIGN
`
`n-k
`
`FIGURE 8.48 Memory-chip
`architecture
`
`n bit address
`
`m+k
`-2b i t s __ _
`
`memory cells
`
`n-k
`f--1--+--+--+--+--+--+---t 2 words
`
`t
`l
`
`n bit decoder
`
`row decoder
`
`row decoder
`
`row decoder
`
`row decoder
`
`column decoder
`
`k bit decoder
`
`column mux, sense amp, write buffers
`
`with the reading or writing of a particular datum and with content-address(cid:173)
`able memories. Within the general classification of random access memory,
`we can consider read only memory (ROM) or read/write memory (com(cid:173)
`monly called RAM). ROMs usually have a write time much greater than
`their read time (programmable RO Ms have write times of the order of milli(cid:173)
`seconds), while RAMs have very similar read and write times. Both types of
`memory may be further divided into static-load, synchronous, and asynchro(cid:173)
`nous categories. Static-load memories require no clock. Synchronous RAMs
`or ROMs require a clock edge to enable memory operation. The address to a
`synchronous memory only needs to be valid for a certain setup time after the
`clock edge. Asynchronous RAMs recognize address changes and output new
`data after any such change. Static-load and synchronous memories are easier
`to design and usually form the best choice for a system-level building block,
`because they can generally be clocked by the system clock.
`The memory cells used in RAMs can further be divided into static struc(cid:173)
`tures and dynamic structures. Static cells use some form of latched storage,
`while dynamic cells use dynamic storage of charge on a capacitor. We will
`concentrate on static RAMs because they are easier to design and potentially
`less troublesome than dynamic RAMs. Static RAMs tend to be faster (but
`much larger) than dynamic RAMs.
`A typical memory-chip architecture is shown in Fig. 8.48. Central to the
`design is a memory array consisting of 2n by 2m bits of storage (actually 2n-k
`by 2m+k). A row (or word) decoder addresses one worq of 2m bits out of 2n-k
`words. The column (or bit) decoder addresses 2k of 2m bits of the accessed
`row. This column decoder accesses a multiplexer, which routes the addressed
`data to and from interfaces to the external world.
`
`8.3.1 Read/Write Memory
`8.3.1.1 RAM
`
`Figure 8.49 shows one row and one column of a generic RAM architecture with
`the support circuits required by the RAM cell. The row decoder is a 1 of n-k
`decoder which may generally be thought of as a.n AND gate. One of the 2n-k
`
`Qualcomm, Ex. 1016, Page 7
`
`
`
`8.3 MEMORY ELEMENTS
`
`565
`
`Bit Line
`Conditioning
`
`clocks
`
`RAM cell
`
`Row Decoder
`
`2
`
`n-1 :k
`
`Column Decoder
`
`2
`
`k-1:0
`
`Sense Amp
`Column Mux
`Write Buffers
`
`write
`clocks
`
`Address
`
`write -data
`
`read -data
`
`FIGURE 8.49 Generic RAM
`circuit
`
`row lines is accessed at one time. The bit-line-conditioning circuitry, the ram(cid:173)
`cell, the sense amplifiers, column multiplexers, and the write buffers form a
`tightly coupled circuit that provides for the hazard-free reading and writing of
`the memory cell. The bit lines are normally run as complementary signals.
`There are many variations of these circuits to achieve varying density/speed/
`noise-margin requirements. We shall look at a variety of schemes for imple(cid:173)
`menting static RAMs. The column decoder is similar to the row decoder but is
`a 1 of k decoder. k is normally less than n and the decoder drives a multiplexer
`(rather than a selector). Frequently, the column decoder may be merged with
`the column multiplexer.
`Starting with the RAM cell itself, various circuits are shown in
`Fig. 8.50. 19 The most commonly used in ASIC memories is the 6-transistor,
`cross-coupled inverter circuit shown in Fig. 8.50(a). A typical mask-level
`layout for a 6-transistor circuit is shown in Fig. 8.51 (also Plate 8). The
`p-transistors may be replaced with high-value polysilicon resistors if the
`process supports this option (Fig. 8.50b). The value of the resistor has to be
`such that it prevents leakage from changing any value stored in the RAM
`cell. Generally the resistors are in the lO0's to l000's of Megaohms. Delet(cid:173)
`ing one of the bit-line pass transistors results in a 5-transistor RAM cell.
`Writing such a cell has to be considered carefully (see later in this section). A
`4-transistor dynamic RAM cell may be achieved by deleting the p loads of
`the static cell, as shown in Fig. 8.52(a). This cell and the other dynamic cells
`have to be refreshed to retain the contents of the memory. A 3-transistor cell
`is shown in Fig. 8.52(b). The cell stores data on the gate of the storage tran(cid:173)
`sistor. Separate read and write control lines are used. Multiple read-ports
`may be added easily, by adding read transistors. In addition, separate or
`
`Qualcomm, Ex. 1016, Page 8
`
`
`
`566
`
`CHAPTER 8 SUBSYSTEM DESIGN
`
`(a)
`
`- bit
`
`bit
`
`Figure 8.50 Static RAM cell
`circuits
`
`(bl
`
`- bit
`
`bit
`
`merged read and write data busses may be used. A I-transistor cell is shown
`in Fig. 8.52(c). 20 The memory value is again stored on a capacitor. The
`capacitor can be implemented as a transistor as shown in Figs. 8.52(d) and
`8.52(e). Sense amplifiers sense the small change in voltage that results when
`a particular cell is switched onto the bit line. This type of cell (Fig. 8.52c)
`forms the basis for most high-density DRAMs. 21 - 28 The cell shown in Fig.
`8.52(d) can be implemented in a conventional two metal, single poly pro(cid:173)
`cess. The dominant problem that arises with this type of memory when used
`in an ASIC process is the loss of the stored charge due to leakage or stray
`substrate currents created by surrounding digital logic.
`As far as the average CMOS-system design is concerned, the static
`6-transistor cell should be used since it involves the least amount of detailed
`circuit design and process knowledge and is the safest with respect to noise
`and other effects that may be hard to estimate before silicon is available. In
`addition, current processes are dense enough to allow large static RAM
`arrays. As a general system-design principle, large amounts of memory
`should only be included in a design if the performance of the system is
`affected. Commercial RAM manufacturers are much better at designing
`RAMs than the average system designer. If dense memory can be partitioned
`
`Qualcomm, Ex. 1016, Page 9
`
`
`
`,, .. ,
`
`ol'
`
`8.3 MEMORY ELEMENTS
`
`567
`
`FIGURE 8.51 Mask layout
`for 6-transistor static RAM
`
`off-chip with no performance degradation or cost impact, then this is a good
`approach to take.
`
`8.3.1.1.1 Static RAM-read
`
`We will begin our examination of CMOS static RAMs by considering a read
`operation. Imagine that the bit lines of the circuit shown in Fig. 8.49 are at
`some value and that the word line is asserted. The one node on the memory
`cell will attempt to pull the bit line up through the access transistor and the p
`load. The zero node will attempt to pull the bit line down through the access
`transistor and then channel pull-down. As an n-channel transistor is poor at
`passing a one and the p-channel transistors in the RAM cell are generally
`small (or in the case of a resistive load, the resistors are very large), design of
`
`Qualcomm, Ex. 1016, Page 10
`
`
`
`568
`
`CHAPTER 8 SUBSYSTEM DESIGN
`
`write -+------+-----+-1-(cid:173)
`read -+--------...__1--
`
`- bit
`
`(a)
`
`bit
`
`write-data
`
`(b)
`
`read-data
`
`T
`V00 or V0rJ2
`word ------+-----1--
`
`bit
`
`(d)
`
`bit
`
`(c)
`
`(e)
`
`Figure 8.52 Dynamic RAM
`circuits: (a) 4-transistor; (b) 3
`transistor; (c) 1 transistor with
`capacitor; ( d) 1 transistor with
`transistor capacitor; ( e) rep(cid:173)
`resentative layout for ( d)
`
`bit
`
`the RAM circuit concentrates on pulling the bit line from high to low. Thus
`one method of reading a RAM cell would be to precharge the bit lines high
`and then enable the word-line decoder. For a given pair of bit lines, one
`RAM cell will attempt to pull down either the bit or -bit line depending on
`the stored data. The bit-line pull-up circuit may use p-channel transistors to
`precharge each bit line (Fig. 8.53a). In this example, the sense amplifier is an
`inverter that forms a single-ended sense amplifier. The sense time is roughly
`
`Qualcomm, Ex. 1016, Page 11
`
`
`
`8.3 MEMORY ELEMENTS
`
`569
`
`precharge
`
`~ precharge
`J \~- bit,-bit
`
`_ iL__wo rd
`
`I
`
`data
`
`bit
`
`data
`
`(a)
`
`precharge
`
`s-L__precharge
`Voo -
`
`bit, -bit
`
`___j\___ word
`=:\
`I
`
`data
`
`bit
`
`data
`
`(b)
`
`FIGURE 8.53 RAM read
`options: (a) V00 precharge;
`(b) V 0a- V1n precharge
`
`the time it takes one RAM cell pull-down and access transistor to reach the
`inverter threshold. To optimize speed, one might set the inverter threshold
`above the V DD midpoint, b_ut below an adequate noise margin down from the
`V DD rail. Alternatively, one can precharge the bit lines with n-channel tran(cid:173)
`sistors, which results in the bit lines being precharged to an n threshold down
`from VDD (Fig. 8.53b). This can dramatically improve the speed of the RAM
`cell access. In addition, it reduces power dissipation because the bit lines do
`not change by the supply voltage. The key aspect of the precharged RAM
`read cycle is the timing relationship between the RAM addresses, the pre(cid:173)
`charge pulse, and the enabling of the row decoder. If the word-line assertion
`precedes the end of the precharge cycle, the RAM cells on the active word(cid:173)
`line will see both bit lines pulled high and the RAM cells may flip state. If
`
`Qualcomm, Ex. 1016, Page 12
`
`
`
`570
`
`CHAPTER 8 SUBSYSTEM DESIGN
`
`the addresses change after the precharge cycle has finished, more than one
`word line will be accessed and more than one RAM cell will have the chance
`to pull the bit lines down, leading to erroneou~ READ data. Normally, RAM
`designers generate a carefully designed timing chain than ensures the correct
`temporal relationships between precharge, row access, and sense operations.
`A RAM access method that does not require precharge is shown in
`Fig. 8.54(a). Here n-channel load transistors pull up the bit lines statically.
`When the word line is asserted, the bit line being pulled down by the RAM
`cell falls to a value that is a function of the pull-up size, the pass-transistor
`size, and the RAM inverter pull-down size. At the same time, the pull-up
`must not be able to flip the RAM cell. A differential amplifier is used to
`amplify the bit-line difference. Figure 8.54(b).shows the equivalent circuit
`of the pull-down circuit during a read operation. Voltage V1 must safely clear
`the input threshold of the RAM cell inverters. A value of .5-1 Vis appropri(cid:173)
`ate. Voltage V2 yields the bit-line difference voltage, which must be ampli(cid:173)
`fied to detect a transition on the bit line. The size of the bit-line load
`determines how fast the bit line can recover (to prevent false writes) after a
`write operation where the bit line inay have been driven to Vss· The sense
`amplifier is designed in conjunction with the bit-line pull-up and RAM cell
`to amplify this bit-line change. Design margins must be valid over all pro(cid:173)
`cess, temperature, and voltage extremes. Figure 8.55 shows the zero bit volt(cid:173)
`age (Vbit(O)) and the pull-down voltage (Vpulldown) for various ratios of pull(cid:173)
`up beta to pull-down betas. As the pull-up becomes weaker, the· Vbit(O) volt(cid:173)
`age approaches Vss and the differential voltage between a high and a low on
`the bit lines increases. However, as the pull-down transistors are limited in
`size by the desire to keep the RAM cell small, a design trade-off has to be
`made between speed and the differential bit voltage, which affects the noise
`
`...___ _ _ bit, -bit
`
`_ JL _w o r d
`
`data
`
`Figure 8.54 RAM read oper-
`ation model
`
`(al
`
`(b)
`
`Qualcomm, Ex. 1016, Page 13
`
`
`
`- - - - - - - - - - - - - - - - - - -Vbit(l)
`
`5
`
`4
`
`3
`
`2
`
`V(volts)
`
`0 ..__....__......_ _ ___. __ ..___.......__......, __ .__ _ _.__ ....... _
`8
`
`2
`
`4
`6
`Pull- upBeta/SumPull -Down Betas
`
`8.3 MEMORY ELEMENTS
`
`571
`
`FIGURE 8.55 RAM bit-line
`voltage levels versus transis(cid:173)
`tor size for static pull-up RAM
`
`t
`
`i,
`
`immunity of the cell and the write character~stics. To a first order, the bit-line
`voltage (V2) is given by
`
`P:ullup ]•
`p driver-eff
`
`where Ppullup is the gain of the load and Pdriver-effis the gain of the combina(cid:173)
`tion of the pass and pull-down transistor in series. When the gain of the pull(cid:173)
`up is high compared with the pull-down path, the bit-line voltage rises
`towards VDD - V1n- When the gain of the pull-up is very small, the bit-line
`voltage approaches zero. The pull-down voltage V1 is a result of resistive
`divider action between the word-access transistor and the RAM-cell pull(cid:173)
`down. While these transistors are in the linear region, V1 is roughly given by
`
`p
`_ V
`pass
`V
`2
`1 -
`Ppass + ppulldown
`
`·
`
`The RAM cell and the sense amplifier draw static current, which affects power
`dissipation. Figure 8.56 shows typical SPICE waveforms for the word line, bit
`lines, and sense amplifier. In this design the bit line pulls down to about
`2 volts, while the bit-line high level is about 4 volts. During access, the RAM
`cell low value is pulled up to about 1 volt, leaving about 1 volt of margin to the
`switching point of the RAM cell inverter. The sense amplifier can be seen
`starting to switch just as the bit lines start diverging. The period between word
`line deassertion and bit nearing-bit is the recovery time (during which no other
`word line should be asserted in order to prevent false writes).
`
`Qualcomm, Ex. 1016, Page 14
`
`
`
`572
`
`CHAPTER 8 SUBSYSTEM DESIGN
`
`5- ~---L-----~
`
`word
`
`-bit
`
`V(volts)
`
`Sense Common
`
`Figure 8.56 Static RAM(cid:173)
`read waveforms
`
`_________ time
`
`Current mode sensing may also be used. 29•30•31 In this technique, the
`current change in the bit lines is detected using special circuits. The theory is
`that by using low-impedance circuits, the RC delay inherent in driving the
`bit lines may be decreased.
`
`8.3.1.1.2 Static RAM-write
`The objective of the RAM write operation is to apply voltages to the RAM
`cell such that it will flip state (a condition we do not desire during the read
`operation). Figure 8.57(a) shows a straightforward write circuit. In this cir(cid:173)
`cuit, the write-enable transistors (N1,N2) are enabled to allow the data and
`complement to move to the bit lines. The word line is then asserted (actually
`the turn-on order is not important). Either the bit or-bit line is driven to Vss,
`while the other bit line is driven to a threshold down from V DD· Figure
`8.57(b) shows a more detailed view of the situation. The figure shows a zero
`stored in the cell. During a WRITE cycle where a one is to be written, node
`-Cell has to be pulled below the RAM-cell inverter threshold and at the same
`time node Cell has to be pulled above the RAM-cell inverter threshold. In the
`former case, n-transistors ND (the driver n-transistor), N1 (the write-access
`transistor), and N3 (the word-access transistor) have to pull Pbit (the RAM
`inverter pull-up) below the inverter threshold. In addition N5 (the bit-line
`pull-up) has to be pulled low by N1 and ND. On the other bit-line side, PD, N2
`and N4 have to pull Nbit as high as possible. To augment the write operations
`it may be necessary to use complementary write-access transistors, as shown
`in Fig. 8.57(c). Correct WRITE operation must be verified over all process,
`temperature, and voltage extremes. Figure 8.58 shows a plot of the wave(cid:173)
`forms during a WRITE operation. The SPICE circuit used to model the RAM
`write operation is shown at the top of the figure. write-data and -write-data
`were driven antiphase into the write transistors N2 and N6. The cell switches
`when -write-data= 3V and write-data= 2V.
`
`Qualcomm, Ex. 1016, Page 15
`
`
`
`8.3 MEMORY ELEMENTS
`
`573
`
`N6
`
`=r==:::write-data
`
`~ write
`
`___£L_word
`
`-bit
`
`bit
`
`bit,•bit
`
`cell,-cell
`
`' {
`i
`I l
`
`I ! '
`
`word
`
`wrtte-data
`
`(a)
`
`(b)
`
`-bit
`
`bit
`
`write-data - - -<
`
`(c)
`
`FIGURE 8.57 Static RAM(cid:173)
`write circuits: (a) n-channel
`pass transistors; (b) circuit
`model during write; (c) com(cid:173)
`plementary transmission gate
`version
`
`Qualcomm, Ex. 1016, Page 16
`
`
`
`574
`
`CHAPTER 8 SUBSYSTEM DESIGN
`
`bit
`
`~
`f-w,
`
`-write-data (bold)
`
`-bit
`
`]
`~·1
`
`(a)
`
`write-data
`
`bit
`
`5
`
`4
`
`3
`
`2
`
`V(volts)
`
`Figure 8.58 Static RAM(cid:173)
`write waveforms and circuit
`model
`
`-write-data
`
`0
`
`2
`
`3
`
`4
`
`5
`
`write-data (bold)
`
`Vin (-Write-Data)
`
`(b)
`
`8.3.1.1.3 Row decoders
`
`The simplest row decoder is an AND gate. Figure 8.59 shows two straight(cid:173)
`forward implementations. The first in Fig. 8.59(a) is a static complementary
`NANO gate followed by an inverter. This structure is useful for up to 5-6
`inputs or more if speed is not critical. The NANO transistors are usually
`made minimum size to reduce the load on the buffered address lines because
`there are zn-k (Ntoad + Ptoad)'s on each address line. The second implemen(cid:173)
`tation, shown in Fig. 8.59(b), uses a pseudo-nMOS NOR gate buffered with
`two inverters. The NOR gate transistors can be made minimum size, and the
`inverters can be scaled appropriately to drive the word line. Large fan-in
`AND gates can also be constructed from smaller NANO and NOR gates, as
`shown in Fig. 8.59(c). Figure 8.60 shows two possible layout styles (in sym-
`
`Qualcomm, Ex. 1016, Page 17
`
`
`
`8.3 MEMORY ELEMENTS
`
`575
`
`word<O>
`
`word<1>
`
`word<2>
`
`word<3>
`
`word
`
`(b)
`
`word
`
`a1
`
`ao
`
`(a)
`
`(c)
`
`...
`
`FIGURE 8.59 Row-decoder
`circuits: (a) complementary
`AND gate; (b) pseudo-nMOS
`gate; (c) cascaded NAND,
`NOR gates
`
`bolic form) for the row decoders. One passes the address lines over the
`decode gates, while the other uses a more standard cell style. Choice would
`depend on the size of the decoder in relation to the size of the RAM cell.
`Often, speed requirements or size restrict the use of single-level decoding,
`such as that shown in Fig. 8.59. The alternative is a predecoding scheme,
`which is illustrated in Fig. 8.6 l(a). Here the (n-k) row address lines are split
`into a p-bit predecode field and a q-bit direct decode field. The q-bit decode
`field requires a gate per word line, so q is chosen to suit the pitch of the RAM
`cell. The p-bit predecode field generates 2P predecode lines ( 4 in this exam(cid:173)
`ple), each of which is fed vertically to 2n-k_row decode gates (8 in this exam(cid:173)
`ple). Figure 8.6l(b) shows a possible implementation of a predecode
`scheme, where the predecode gate is a NANO gate and the word-decode gate
`is a NOR gate. An additional input (-elk) has been included in the NOR gate
`
`Qualcomm, Ex. 1016, Page 18
`
`
`
`(a)
`
`NANO gate
`
`buffer inverter
`
`address lines in poly (could be strapped with metal 2)
`
`~ \
`
`programming via transistor placement
`
`Figure 8.60 Typical sym(cid:173)
`bolic layouts of row decoders
`
`(b)
`
`NANO gate
`
`buffer inverter
`
`to allow the enabling of the gate, which is necessary to ensure correct timing
`of the word signal. A slow rise time and fast fall time on a word-decode gate
`might be advantageous because it ensures that any RAM cells on a word line
`transitioning low are isolated before RAM cells on a high-transitioning word
`line are accessed. Figure 8.6l(c) shows a pseudo-nMOS AND row decode
`gate. Finally, Fig. 8.62 shows a few more row decoder circuits. Figure
`8.62(a) shows some obvious ways of building large fan-in AND gates from
`smaller fan-in gates. Figure 8.62(b) is a pseudo-nMOS decoder that mini(cid:173)
`mizes draw static power. Figure 8.62(c) shows a predecode scheme where
`the predecode gates power the word-line driver. 32 Figure 8.62(d) shows a
`domino dynamic AND gate implementation.
`
`8.3.1.1.4 Column decoders
`The column decoder is responsible for selecting 2k out of 2m bits of the
`accessed row. A tree decoder is shown in Fig. 8.63. Here the data is routed
`
`576
`
`Qualcomm, Ex. 1016, Page 19
`
`
`
`8.3 MEMORY ELEMENTS
`
`577
`
`Predecode Gates
`
`Row Decode Gates
`
`word<?>
`
`word<6>
`
`word<5>
`
`word<4>
`
`word<3>
`
`word<2>
`
`word<1>
`
`word<O>
`
`a<1>
`
`a<2>
`
`(8)
`
`(b)
`
`(C)
`
`8<0>
`
`-8<0> -elk
`
`f-------word
`
`8<1> 8<2> en
`
`FIGURE 8.61 Predecode
`circuits: (a) basic approach;
`(b) actual implementation;
`(c) pseudo-nMOS example
`
`via pass gates enabled by the column-address lines. The address decoding is
`in essence distributed. Decoders for bit and -bit lines are shown, although
`one of these may be omitted for single-ended read operations. The read (and,
`usually of lesser importance, write) operations are somewhat delayed by the
`series-transmission gates. However, in comparison with gate delays these
`
`Qualcomm, Ex. 1016, Page 20
`
`
`
`(a)
`
`(b)
`
`(c)
`
`(d)
`
`usually are small for a low number of series transistors (2 to 4). Complemen(cid:173)
`tary transmission gates may also be used, if required, by either the read oper(cid:173)
`ation or write operation.
`
`Figure 8.62 Various other
`row decoder circuits :
`(a) methods of building large
`fan -in AND gates; (b) power
`saving pseudo-nMOS gate;
`(c) decoder powered;
`(d) domino
`
`578
`
`Qualcomm, Ex. 1016, Page 21
`
`
`
`8.3 MEMORY ELEMENTS
`
`579
`
`bit<7>
`
`-bil<7>
`
`selected-data
`
`to sense amps and write ckts
`
`- - - - - -selected-data
`
`-a<2>
`-a<1>
`-a<O>
`a<2>
`a<1>
`
`a<O>
`
`FIGURE 8.63 Tree-style
`column decoder
`
`If the delay of the series-pass gates was troublesome, the decoder shown
`in Fig. 8.64 could be used. Here a NAND decoder is employed on a bit-by(cid:173)
`bit basis to enable complementary transmission gates (single transistors may
`be used where possible) onto a common pair of data lines. These are then
`routed to a sense amplifier and write circuitry.
`
`8.3.1.1.5 Sense amplifiers
`
`Many sense amplifiers have been invented to provide faster sensing, smaller
`layouts, and lower power-dissipation sensing. 33 The simple inverter sense
`amplifier provides for low power sensing at the expense of speed. The differ(cid:173)
`ential sense amplifier can consume a significant amount of DC power
`(Fig. 8.54). Alternatively, one can employ clocked sense amplifiers similar
`to the SSDL gate shown in Fig. 5.40.
`
`8.3.1.1.6 RAM timing budget
`
`The critical path in a static RAM read cycle includes the clock to address
`delay time, the row address driver time, row decode time, bit-line sense time,
`and the setup time to any data register. The column decode is usually not in
`
`,.
`;,
`
`I
`r
`l I I.
`I
`
`Qualcomm, Ex. 1016, Page 22
`
`
`
`580
`
`CHAPTER 8 SUBSYSTEM DESIGN
`
`~ - - - - - - DATA
`, - - - - - - - -DATA
`to sense amp and write ckts
`
`bil<3>
`
`-bit<3>
`
`bit<3>
`
`-bit<3>
`
`bit<3>
`
`-bit<3>
`
`bil<3>
`
`-bit<3>
`
`Figure 8.64 Decoded
`column decoder
`
`a<1> a<O>
`
`the critical path because the decoder is usually smaller and the decoder has
`the row access time and bit-line sense time to operate. The write operation is
`usuaily faster than the read cycle because the bit lines are being actively
`driven by larger transistors than the memory cell transistors. However, the
`bit lines may have to be allowed to recover to their quiescent values before
`any more access cycles take place. In the static load RAM, this speed
`depends on the size of the static pull-up. Apart from carefully sizing transis(cid:173)
`tors, the RAM speed may be increased by pipelining the row decode signal.
`
`8. 3. 1.2 Register Files
`Register files are generally fast RAMs with multiple read and write ports.
`Conventional RAM cells may be made multiported by adding pass transis-
`
`Qualcomm, Ex. 1016, Page 23
`
`
`
`PRINCIPLES OF CMOS VLSI DESIGN:
`A Systems Perspective
`SECOND EDITION
`Neil H. E. Weste and Kainran Eshraghian
`
`This popular introduction to CMOS VLSI design has been revised extensively to reflect
`changes in the technology and trends in the indushy. Cove1ing CMOS design from a digital
`systems level to the circuit level, and providing a background in CMOS processing technology,
`the book includes both an explanation of basic theo1y and a guide to good enginee1ing prac(cid:173)
`tice. The material is of use to designers employing gate array, standard cell, or custom design
`approaches.
`Since the first edition appeared, CMOS technology has assumed a central position in modem
`electronic system design. Processes have grown denser, and automated design tools have
`become common, leading to far more complex chips operating at much higher speeds. With
`these advances, CMOS design approaches have changed, reflected here in greater emphasis
`on clocking, power distribution, design margining, and testing.
`
`FEATURES
`• New chapter devoted to testing;
`• New sections cover emerging technologies, such as BiCMOS, logic synthesis, and parallel
`scan testing;
`• Numerous and detailed examples-from basic gates to subsystems to chips-illustrate design
`concepts and methods;
`• Extensive artwork, completely revised and expanded, depicting CMOS schematics, simula-
`tion waveforms, and layouts.
`
`Whether the reader is first learning CMOS system design or looking for a comprehensive ref(cid:173