`HIGH-PERFORMANCE,
`OPEN-STANDARD MEMORY
`The primary objective of DRAM—
`
`.
`
`SDRAM includes several important archi-
`tectural features over standard EDO, includ-
`ing multiple internal banks, a clocked
`synchronous interface, terminated small-
`swing signaling, and programmable data
`bursts. These changes decouple internal
`DRAM address and control paths from the
`data interface to achieve higher bandwidth.
`The emerging standard for DDR adds data
`clocking on both clock edges and a return
`clock, allowing higher speed operation with
`an improved system timing margin.
`SLDRAM builds on the features of SDRAM
`and DDR by adding an address/control
`packet protocol, in-system timing and sig-
`naling optimization, and full compatibility
`from generation to generation. Command
`packets in the SLDRAM protocol include
`spare bits to accommodate addressing for
`the 4-Gbit generation and beyond.
`The first SLDRAM samples will be 64-Mbit
`devices operating at 400 Mbps/pin.
`Interfaces with bandwidths of 600 and 800
`Mbps/pin, and eventually greater than 1
`Gbps/pin, will come on the market when
`they become cost-effective. The protocol
`allows us to mix interfaces of different
`speeds. So, for example, we can plug an
`800-Mbps/pin device into a 400-Mbps/pin
`system, and it will operate correctly at 400
`Mbps/pin.
`Application variety. The standards
`working group specified SLDRAM as a gen-
`eral-purpose, high-performance DRAM for
`a wide variety of applications. As computer
`main memory—in desktop, mobile, and
`high-end servers and workstations—
`SLDRAM offers high sustainable bandwidth,
`low latency, low power, user upgradability,
`and support for large hierarchical memory
`configurations. For video, graphics, and
`telecommunications applications, SLDRAM
`provides multiple independent banks, fast
`read/write bus turnaround, and the capabil-
`
`Peter Gillingham
`
`MOSAID Technologies
`
`Bill Vogley
`
`Texas Instruments
`
`dynamic random access memory—is
`to offer the largest memory capacity
`at the lowest possible cost. Designers
`achieve this by two means. First, they opti-
`mize the process and the design to minimize
`die area. Second, they ensure that the device
`serves high-volume markets and can be
`mass-produced to achieve the greatest
`economies of scale.
`SLDRAM—synchronous-link DRAM—is a
`new memory interface specification devel-
`oped through the cooperative efforts of lead-
`ing semiconductor memory manufacturers
`and high-end computer architects and sys-
`tem designers. SLDRAM meets the high data
`bandwidth requirements of emerging
`processor architectures and retains the low
`cost of earlier DRAM interface standards.
`These and other benefits suggest that
`SLDRAM will become the mainstream com-
`modity memory of the early 21st century.
`
`Benefits
`In developing SLDRAM (see Origins box
`for details), we had several goals beyond
`high bandwidth and low cost. It was also
`important that system designers could adopt
`the new interface smoothly, that it would
`work for a wide variety of applications, and
`that its manufacturing yields would be high.
`Finally, it was important for SLDRAM to be
`an open standard.
`Evolutionary solution. Over more than
`20 years encompassing nine generations of
`DRAMs (from 1 Kbit to 64 Mbits), the market
`has always embraced evolutionary solutions.
`With evolutionary changes, system designers
`can adapt existing technology and migrate to
`higher performance solutions. As shown in
`Figure 1 next page, SLDRAM represents the
`next step in DRAM’s evolution, after EDO,
`SDRAM, and DDR (extended data out, syn-
`chronous, and double-data-rate DRAM).
`
`Synchronous-link
`
`DRAM—the highest
`
`rung yet on DRAM’s
`
`evolutionary ladder—
`
`is a high-bandwidth
`
`interface standard
`
`meeting memory
`
`requirements into the
`
`21st century.
`
`0272-1732/97/$10.00 © 1997 IEEE
`
`November/December 1997 29
`
`Petitioner Lenovo (United States) Inc. - Ex. 1014
`
`From the companion CD-ROM to the IEEE CS Press book,
`"The Anatomy of a Microprocessor: A Systems Perspective,"
`by Shriver & Smith
`
`1 of 11
`
`
`
`.
`
`SLDRAM
`
`Origins of SLDRAM
`little over a year ago, we discovered that Synclink had
`Only recently, the computer industry made the transition
`already been registered and used as a trademark. Thus,
`from fast-page-mode DRAM to EDO (extended data out)
`we instead adopted the name SLDRAM, for synchronous
`DRAM. Now, it has accepted SDRAM (synchronous DRAM)
`link DRAM. We retained the RamLink protocol but made
`as the standard for mainstream computer applications.
`the interface parallel.
`In the early days of SDRAM, as the standards were being
`To determine the most efficient method of exchanging
`developed, both manufacturers and users were concerned
`address, control, and read and write data between the con-
`about cost. They did not know how much silicon the new
`troller and memory devices, we simulated various bus con-
`functions would consume and how much additional com-
`figurations with actual traces of processor memory requests.
`plexity it would take to test and verify them. With SDRAM
`In real systems, main memory address requests are virtu-
`now fully standardized, these issues have been resolved,
`ally random, and the read/write ratio averages about 4:1.
`and the technology has achieved volume production, com-
`We rejected several configurations, including one with
`modity-level pricing.
`a single, bidirectional bus for address, control, and data.
`During the development of SDRAM in 1989 and 1990,
`Because of the need to interrupt data flow to issue a com-
`the SCI working group (IEEE Std. P1596—Scalable
`mand in this configuration, effective data bandwidth was
`Coherent Interface) was developing a memory interface
`a small fraction of theoretical peak bandwidth. The single
`as a subgroup. RamLink, which became IEEE Std. 1596.4
`bus sacrificed the benefits of hidden bank activation and
`for memory interfaces, is a point-to-point interface using
`deactivation.
`the SCI protocol but with a reduced command set that
`We also rejected a configuration having one unidirec-
`makes it memory specific. During RamLink’s development,
`tional bus from controller to memory for command,
`the working group realized that the interface would have
`address, and write data, and another from memory to con-
`to support large memory configurations. The protocol
`troller for read data. Although this was an improvement
`allowed this, and so did the architecture. The drawback
`over the single bus, effective bandwidth still suffered from
`was the serial delay from one device to the next in a point-
`the collision of commands with write data.
`to-point system. This increased latency from 4 to 6 ns per
`The configuration we chose consists of one unidirec-
`device on the interface. If a configuration incorporated 64
`tional bus for command and address and a bidirectional
`devices, the additional latency would be intolerable—
`bus for read and write data. In this optimal configuration,
`around 300 to 400 ns.
`the data bus bandwidth could be fully utilized. Placement
`It made sense to stick with the RamLink protocol, but we
`of the command and address in a packet reduces pin
`needed a parallel interface. Because the new devices
`count, allows future address expansion without pin incre-
`would fundamentally be synchronous DRAMs linked in a
`ments, and enables higher pipelining capability.
`different manner, the group chose the name SyncLink. A
`
`EDO
`
`SDRAM
`
`DDR
`
`SLDRAM
`
`Clocked inputs
`Internal banks
`Burst I/O
`
`Both clock
`edges
`Return clock
`
`Packet protocol
`I/O and timing
`calibration
`Full backward
`compatibility
`
`Figure 1. Evolution from EDO to SLDRAM, by way of
`SDRAM and DDR.
`
`ity for small, fully pipelined bursts. SLDRAM addresses the
`requirements of all major high-volume DRAM applications.
`Cost-effectiveness. As was the case with SDRAM, indus-
`try will adopt a new, high-performance DRAM interface once
`its cost premium over the conventional alternative falls below
`5% (see Figure 2, next page). Designers achieve high per-
`formance by improving the interface while leaving the DRAM
`core relatively unchanged. We could easily have improved
`core performance with reduced bit-line, word-line, and data-
`bus loading. However, this would have generated a signifi-
`cant die area penalty because of increased array
`fragmentation, grossly exceeding the 5% cost target.
`SDRAM, DDR, and SLDRAM all use a DRAM core that has
`a page-mode cycle time of roughly 10 ns. To maintain an
`
`efficient die layout and obtain an interface rate higher than
`the core cycle time, the device must fetch several words in
`parallel. For example, a 16-bit I/O, 200-Mbps/pin, DDR
`device must read or write two I/O words over a 32-bit inter-
`nal data path in one 10-ns core cycle.
`The need to widen the internal data path also leads to a
`die cost penalty. For the 16-Mbit generation of SDRAM (100-
`Mbps/pin, 16-bit I/O), this penalty has recently fallen below
`5%. DDR running at 200 Mbps/pin will become cost-effective
`for 64-Mbit devices, which have a sufficient number of active
`memory subarrays to support a 32-bit data path without sub-
`stantial area penalty. SLDRAM devices will first become avail-
`able with a 400-Mbps/pin, 16-bit I/O interface employing a
`64-bit internal data path. These devices will be cost-effec-
`tive in the 256-Mbit density. Later SLDRAM devices will run
`at 800 Mbps/pin and employ 128-bit internal data paths.
`SLDRAM or any other memory technology using a 128-bit
`internal data path will not be cost-effective until the 1-Gbit
`generation.
`Manufacturability. DRAM’s low cost is due to high man-
`ufacturing yields, but the tight timing specifications required
`for high-frequency operation are incompatible with high
`yield. Therefore, we defined SLDRAM so that virtually any
`functional part will meet the specification in one speed grade
`
`30 IEEE Micro
`
`2 of 11
`
`
`
`.
`
`Dominant DRAM generation
`16 Mbits
`64 Mbits
`256 Mbits
`
`1 Gbit
`
`SLDRAM
`(RDRAM)
`800 Mbps/pin
`
`SLDRAM
`>1 Gbps/pin
`
`SLDRAM
`400 Mbps/pin
`
`DDR
`200 Mbps/pin
`
`SDRAM
`100 Mbps/pin
`
`1995
`
`16 bits
`1998
`
`32 bits
`
`2001
`Year
`
`64 bits
`
`2004
`
`256 bits
`128 bits
`Internal data path
`2007
`
`Cost adder
`
`5%
`
`Figure 2. DRAM bandwidth cost penalty for 16-bit I/O.
`
`descriptions.) Commands consist of four consecutive 10-bit
`words on CA[9:0]. A 1 on the FLAG bit indicates the first word
`of a command. The SLDRAMs use both edges of the differ-
`ential free-running clock (CCLK/CCLK*) to latch command
`words. For a 400-Mbps/pin SLDRAM, the clock frequency is
`200 MHz, and bit period N is equal to 2.5 ns—half the clock
`period. While the LISTEN pin is high, the SLDRAMs monitor
`the CommandLink for commands. When LISTEN is low, there
`can be no commands on the CommandLink, so the SLDRAMs
`enter a power-saving standby mode. When LINKON is low,
`the SLDRAMs enter a shutdown mode, in which CCLK can be
`turned off to achieve zero power on the link. A RESET signal
`puts the SLDRAMs in a known state on power-up.
`The DataLink is a bidirectional bus for the transmission of
`write data from controller to the SLDRAMs and read data
`from the SLDRAMs back to the controller. It consists of
`DQ[17:0], DCLK0, DCLK0*, DCLK1, and DCLK1*. Read and
`write data packets of a minimum burst length of four are
`
`or another. On power-up, a system can ascertain the speed
`performance capabilities of all SLDRAMs present and then
`make the appropriate adjustments. The SLDRAM packet pro-
`tocol permits the system to adjust and match setup and hold
`time, data delay, and output drive levels of individual mem-
`ory devices for consistent system operation. Periodically dur-
`ing operation, the system recalibrates the devices to account
`for system drift. (We will discuss this in more detail later in
`the article.) The flexibility afforded by the SLDRAM packet
`protocol ensures high yield and low cost.
`Low system cost. SLDRAM achieves low system cost
`through conventional packaging and printed-circuit-board
`technology. The SLDRAM devices themselves are packaged
`in standard, 0.5-mm pitch TSOPs (thin, small-outline pack-
`ages) or in 0.8-mm, staggered-pitch vertically mounted pack-
`ages (VSMPs). With buffered modules, an SLDRAM controller
`must support only 33 high-speed signals to accommodate
`gigabyte memory configurations. Wide modules can add 16-
`bit data channels without additional control overhead for
`increased memory bandwidth. For the SLDRAM interface, we
`recommend conventional low-cost PCB material with 5-mm
`tracks, using two of four layers for interconnect.
`Open standard. SLDRAM is an open standard that will be
`formalized by IEEE (Std P1596.7) and JEDEC specifications.
`Open standards allow manufacturers to develop differentiat-
`ed products that address emerging applications and niche´
`opportunities. Open competition will ensure rapid develop-
`ment of DRAM technology at the lowest possible cost.
`
`Architectural and functional overview
`In this section we describe how SLDRAM works in a sys-
`tem, covering the basics of SLDRAM protocol, timing, and
`signals. For a full description of SLDRAM functionality, we
`encourage readers to refer to the SLDRAM data sheet, which
`is available on the SLDRAM Consortium Web page
`(www.sldram.com).
`Bus topology. The SLDRAM mul-
`tidrop bus has one memory con-
`troller and up to eight loads. A load
`can be either a single SLDRAM
`device or a buffered module with
`many SLDRAM devices. Command,
`address, and control information
`from the memory controller flows to
`the SLDRAMs on the unidirectional
`CommandLink. Read and write data
`flow between controller and
`SLDRAM on
`the bidirectional
`DataLink. Both CommandLink and
`DataLink operate at the same rate
`(400-Mbps/pin, 600-Mbps/pin, 800-
`Mbps/pin, and so on). Figure 3 illus-
`trates SLDRAM signals and data flow.
`In actual SLDRAM modules, all con-
`nections are on one side.
`Signal names and definitions. The
`CommandLink comprises signals
`CCLK, CCLK*, FLAG, CA[9:0], LISTEN,
`LINKON, and RESET. (See Table 1 for
`
`RESET*
`LINKON
`LISTEN
`CCLK (free running)
`FLAG
`CA[9:0]
`
`2
`
`10
`
`CommandLink
`
`Memory
`controller
`
`SO SI
`
`SLDRAM or
`SL module 1
`
`SO
`
`SI
`
`SLDRAM or
`SL module 8
`
`SO
`
`SI
`
`DQ[17:0]
`DCLK0
`DCLK1
`(bidirectional, intermittent)
`
`Figure 3. SLDRAM bus topology.
`
`18
`
`2
`
`2
`
`DataLink
`
`November/December 1997 31
`
`3 of 11
`
`
`
`.
`
`SLDRAM
`
`Table 1. SLDRAM signal names and descriptions. MC: memory controller;
`SSTL_2: series stub terminated logic for 2.x-V generation.
`
`Bus
`
`Signal
`
`Description
`
`Direction
`
`Level
`
`SLIO
`SLIO
`SLIO
`LVTTL
`LVTTL
`SLIO
`SLIO
`SLIO
`LVTTL
`
`MC fi
`SLDRAM
`Command clock
`CCLK/CCLK*
`Command address bus MC fi
`SLDRAM
`CA[9:0]
`MC fi
`SLDRAM
`Standby mode
`LISTEN
`MC fi
`SLDRAM
`Shutdown mode
`LINKON
`MC fi
`SLDRAM
`Hard reset
`RESET
`MC fl«
`SLDRAM
`DCLK0/DCLK0* Data clock 0
`MC fl«
`SLDRAM
`DCLK1/DCLK1* Data clock 1
`MC fl«
`SLDRAM
`DQ[17:0]
`Data bus
`MC fi
`SLDRAM, ´
`SI
`Serial input
`SLDRAM fi
`SLDRAM
`SLDRAM fi
`SLDRAM, LVTTL
`SLDRAM fi MC
`
`CommandLink
`
`DataLink
`
`Serial
`
`SO
`
`Serial output
`
`accompanied by either differential clock—DCLK0/DCLK0*
`or DCLK1/DCLK1*. The two sets of clocks allow control of
`the DataLink to pass from one device to another with the
`minimum gap. On power-up, a daisy-chained serial bus with
`input SI and output SO on each device synchronizes the
`SLDRAMs and assigns unique IDs to each one.
`Pipelined transactions. The timing diagram in Figure 4
`shows a series of page read and page write commands issued
`by the memory controller to the SLDRAMs. For purposes of
`illustration, all burst lengths are 4N, although the controller
`can dynamically mix 4N and 8N bursts. The read access time
`to an open bank, also known as page read latency, is shown
`here as 12N (30 ns). The first two commands are page reads
`to SLDRAM 0 to either the same or differnt banks. SLDRAM
`0 drives the read data on the data bus along with DCLK0 to
`provide the memory controller the necessary edges to strobe
`in read data. Since the first two page read commands are for
`the same SLDRAM, it is not necessary to insert a gap between
`the two 4N data bursts. The SLDRAM itself ensures that
`DCLK0 is driven continuously without glitches.
`The data burst for the next page read—to SLDRAM 1—
`
`must be separated by a 2N gap. This
`allows for settling of the DataLink
`bus and for timing uncertainty
`between SLDRAM 0 and SLDRAM 1.
`A 2N gap is necessary any time con-
`trol of the DataLink passes from one
`device to another. This occurs in
`reads to different SLDRAMs and in
`read-to-write and write-to-read tran-
`sitions between SLDRAMs and the
`memory controller. The memory
`controller creates the gap between
`data by inserting a 2N gap between
`commands. SLDRAM 1 begins dri-
`ving the DCLK lines well in advance
`of the actual data burst.
`The next command is a write com-
`mand in which the controller drives
`DCLK0 to strobe write data into
`SLDRAM 2. The page write latency of the SLDRAM is pro-
`grammed to equal page read latency minus 2N. To create a
`2N gap between Read1 and Write2 data on the DataLink, the
`Write2 command must be delayed 4N after the Read1 com-
`mand. Programming write latency in this manner creates an
`open 4N command slot on the CommandLink, which could
`be used for nondata commands such as row open or close,
`register write, or refresh, without affecting DataLink utiliza-
`tion. The subsequent read command to SLDRAM 3 does not
`require any additional delay to achieve the 2N gap on the
`DataLink. The final burst of three consecutive write com-
`mands shows that the 2N gap between data bursts is not nec-
`essary when the system is writing to different SLDRAM
`devices. This is because all write data originates from the
`memory controller.
`Data clocks. When control of the DataLink passes from
`one device to another, the bus remains at a midpoint level
`for nominally 2N. This results in indeterminate data and pos-
`sibly multiple transitions at the input buffers. This is accept-
`able for the data lines themselves, but not the data clocks,
`which strobe data. To solve this problem, the data clocks
`
`0N
`
`4N
`
`8N
`
`12N
`
`16N
`
`20N
`
`24N 28N 32N
`
`36N
`
`40N 44N
`
`48N
`
`52N
`
`CCLK
`
`FLAG
`
`Command
`Link
`
`DataLink
`
`DCLK0
`
`DCLK1
`
`Read0
`
`Read0
`
`Read1
`
`Write2
`
`Read3
`
`Write4
`
`Write5
`
`Write6
`
`Read0
`
`Read0
`
`Read1
`
`Write2
`
`Read3
`
`Write4 Write5
`
`Preamble
`
`Preamble
`
`Figure 4. Timing diagram, SLDRAM bus transactions.
`
`32 IEEE Micro
`
`4 of 11
`
`
`
`.
`
`tw1
`
`DLL
`
`SLDRAM 1
`
`Delay
`line
`
`Vernier
`
`T
`
`DRAM core
`
`Delay
`line
`
`T
`
`T
`
`T
`
`tw8
`
`10
`
`td1 = ti1+ tv1
`
`16
`
`tw1
`
`tw8
`
`SLDRAM 8
`
`Delay
`line
`
`Vernier
`
`T
`
`DRAM core
`
`DLL
`
`Delay
`line
`
`T
`
`T
`
`T
`
`tw8 = ti8 + tv8
`
`4N
`
`8N
`
`12N
`
`16N
`
`20N
`
`24N
`
`Memory controller
`
`CCLK
`
`CA[9:0]
`
`T T
`
`DLL
`
`Command
`
`Master
`clock
`
`DQ[15:0]
`
`DCLK
`
`Write
`data
`Read
`data
`
`T
`
`T
`
`T
`
`Delay
`line
`
`(a)
`
`0N
`
`CCLK at MC
`
`CommandLink at MC
`
`Page read 1
`
`DataLink at MC
`
`DCLK0 at MC
`
`DCLK1 at MC
`
`CommandLink
`at SLDRAM 1
`DataLink
`at SLDRAM 1
`
`CommandLink at
`SLDRAM 8
`DataLink at
`SLDRAM 8
`
`(b)
`
`tw1
`Page read 1
`
`tw8
`Page read 1
`
`Page read 8
`
`LPRI * N + td1
`Page read 8
`
`Page read 8
`
`Page read 1
`tw8
`
`Page read 8
`
`tw1
`Page read 1
`LPR8 * N + td8
`
`Page read 8
`
`Page read 1
`
`Page read 8
`
`have a 00010 preamble before the
`transition associated with the first bit
`of data occurs. The device receiving
`data can enable the DCLK input
`buffer at any time during the first 000
`period. The preamble also includes
`dummy transition 10 to remove
`pulse-width-dependent skew (also
`known as intersymbol interference,
`or ISI) from the DCLK signal. The
`receiving device ignores the first ris-
`ing and falling edge of DCLK and
`begins clocking data on the second
`rising edge. There are two data
`clocks, so the device can accommo-
`date gapless 4N write bursts to dif-
`ferent SLDRAMs and 4N read bursts
`from different SLDRAMs. The con-
`troller indicates in each command
`packet which DCLK is to be used.
`The controller transmits CCLK
`edges coincidentally with edges of
`CA[9:0] and FLAG data. DCLK edges
`originating from the controller are
`also coincident with DQ[17:0] data.
`The SLDRAMs add fractional delay
`to incoming CCLK and DCLKs to
`sample commands and write data at
`the optimum time. (We discuss this
`in greater detail in the following sec-
`tion and in the section on synchro-
`nization.) The controller can
`program the SLDRAMs to add frac-
`tional delay to outgoing DCLKs dur-
`ing read operations. This allows the
`controller read data input registers to
`directly strobe in read data using the
`received DCLK without internal
`delay adjustments.
`Timing adjustment. The controller programs each SLDRAM
`with four timing latency parameters: page read, page write,
`bank read, and bank write. We define latency as the time
`between the command burst and the start of the associated
`data burst. For consistent operation of the memory subsystem,
`each SLDRAM should be programmed with the same values.
`On power-up, the memory controller polls the status regis-
`ters in each SLDRAM to determine minimum latencies, which
`may vary by manufacturer and speed grade. The controller
`then programs each SLDRAM with the worst-case values.
`Read latency is adjustable in coarse increments of unit bit
`intervals and fine increments of fractional bit intervals. The
`controller programs the coarse and fine read latency of each
`SLDRAM so that read data bursts from different devices, at
`different electrical distances from the controller, all arrive
`back at the controller with equal delay from the command
`packet. Write latency is only adjustable in coarse increments.
`The write latency values determine when the SLDRAM begins
`looking for transitions on DCLK to strobe in write data. Since
`this can occur any time during the 000 portion of the DCLK
`
`Figure 5. SLDRAM bus timing adjustment system block diagram (a); waveforms (b).
`
`preamble, it does not require fine adjustment.
`Figure 5a shows a conceptual block diagram of the
`SLDRAM system with major timing components. A DLL
`(delay-locked loop) in both the controller and SLDRAM locks
`to the free-running clock to provide a stable reference for
`the input sampling delay lines. Input and output latches act-
`ing on both edges of the clock are labeled T.
`Figure 5b shows the timing of two consecutive page read
`commands, the first addressed to the SLDRAM nearest the
`controller and the second to the SLDRAM at the far end of
`the bus. Total latency observed by the controller on the first
`read includes the programmed page read latency (LPR1), the
`clock-to-data propagation delay through the SLDRAM itself
`(td1), and wire delay to the SLDRAM and back (2tw1). The
`clock-to-data propagation delay is composed of intrinsic
`delay (ti1) and the programmed fine vernier delay (tv1). When
`an increment vernier command issued by the controller caus-
`es the fine read vernier to overflow past 1N delay, digital
`latency LPR is incremented by one and fine vernier setting tv
`is reduced to 0. If a decrement vernier command causes the
`
`November/December 1997 33
`
`5 of 11
`
`
`
`.
`
`SLDRAM
`
`Table 2. SLDRAM command format.
`
`Flag
`
`CA9
`
`CA8
`
`CA7
`
`CA6
`
`CA5
`
`CA4
`
`CA3
`
`CA2
`
`CA1
`
`CA0
`
`1
`0
`0
`0
`
`ID8
`CMD4
`Row 7
`0
`
`ID7
`CMD3
`Row 6
`0
`
`ID6
`CMD2
`Row 5
`0
`
`ID5
`CMD1
`Row 4
`Col 6
`
`ID4
`CMD0
`Row 3
`Col 5
`
`ID3
`Bank 2
`Row 2
`Col 4
`
`ID2
`Bank 1
`Row 1
`Col 3
`
`ID1
`Bank 0
`Row 0
`Col 2
`
`ID0
`Row 9
`0
`Col 1
`
`CMD5
`Row 8
`0
`Col 0
`
`fine read vernier to underflow below 0 delay, digital laten-
`cy LPR is decremented by one and fine vernier setting tv is set
`to 1N.
`The wire delay to the SLDRAM at the far end of the bus is
`significantly greater, and the intrinsic delays of the two
`devices may differ considerably. However, we can equalize
`the timing seen by the controller by enforcing the following
`relationship between the verniers on each device:
`
`tv8 = tv1
`
`ti8 + ti1
`
`- N(LPR8
`
`- LPR1) - 2(tw8
`
`tw1)
`
`In this way the separation between bursts originating from
`different devices can be controlled, minimizing bus turn
`around delay.
`SLDRAM command format. Each SLDRAM command
`packet contains four 10-bit words. The command packet
`shown in Table 2 is for a 64-Mbit SLDRAM with eight banks,
`1,024 row addresses, and 128 column addresses. The same
`40 bits can accommodate many other organizations and den-
`sities. On power-up, the memory controller polls the
`SLDRAMs to determine how many banks, rows, and columns
`each device has. The controller can then include the appro-
`priate number of address bits in the command packet for
`each SLDRAM individually.
`Chip ID and multicasting. The first word of the command
`packet contains the chip ID bits. An SLDRAM ignores any
`command that does not match the local ID. On power-up,
`the controller uses the SI/SO signals to assign chip IDs. (We
`describe this in greater detail in the section on initialization.)
`This allows the controller to uniquely address every SLDRAM
`in the system without separate chip-enable signals or glue
`logic. The chip ID consists of nine bits, allowing up to 256
`SLDRAMs on a single hierarchical DataLink. Multicasting
`allows the controller to address any group of two (or four,
`eight, sixteen, and so on) SLDRAMs with a single command
`(see Table 3, next page). For example, chip ID 100000001
`addresses devices 0, 1, 2, and 3. This feature is useful for ini-
`tialization, refresh, and multiple DataLink configurations.
`SLDRAM commands. The command field consists of six
`bits, as shown in Table 4. When most significant bit CMD5
`is 0, the device executes normal read or write commands.
`The selection of page or bank access, burst length, read or
`write, autoprecharge, and DCLK are independent. Bank
`access commands provide bank, row, and column address-
`es to the SLDRAM, which then schedules internal row open
`and page access activity. Page access requires the row to be
`previously opened. Setting CMD5 to 1 selects row opera-
`tions, register accesses, events, or special synchronization
`commands.
`
`34 IEEE Micro
`
`For register operations, the bank, row, and column
`addresses are not required. These are replaced in the com-
`mand packet by register address and register write data. Since
`register writes do not use the DataLink, they can occur before
`write synchronization has completed. Register read data
`appears on the DataLink with DCLK exactly like a normal
`read.
`An event includes commands such as autorefresh, self-
`refresh, reset, and close all rows. Event codes are located in
`the unused address fields of the command packet. SLDRAM
`also implements as events calibration commands such as
`read data fine vernier adjustment, read data DCLK offset
`adjustment, and Voh/Vol output-level adjustment. These
`event commands, along with the special synchronization
`commands, allow the memory controller to initialize and
`maintain the very tight signal timing and voltage level
`requirements of an SLDRAM memory subsystem.
`SLDRAM calibration. System-level calibration of individ-
`ual SLDRAM timing and output drive levels is key to the com-
`ponents’ high manufacturing yields and low cost. Individual
`devices are not required to meet tight ac and dc parametric
`specifications. Rather, these will be calibrated at the system
`level to compensate for wide variation in individual device
`parameters. Calibrated parameters include Voh and Vol levels,
`input setup and hold time, and read and write data latency.
`Synchronization. In normal operation the controller
`drives CA[9:0], FLAG, and CCLK with simultaneous transi-
`tions. The SLDRAM must internally delay CCLK approxi-
`mately N/2 relative to CA[9:0] and FLAG to properly latch
`the command packet. There are many sources of timing
`error in the system, which we can compensate for with
`appropriate adjustment of the input sampling clock. These
`include static errors such as variations in bus impedance,
`loading, or track length, and dynamic errors such as cross
`talk, power supply noise, and intersymbol interference (ISI).
`We can compensate for static errors with a one-time adjust-
`ment of the sampling clock. Cross talk and power supply
`noise symmetrically advance or delay an edge depending on
`the coupling, so they cannot be ameliorated using a static
`timing adjustment. We must instead minimize these effects
`through design. ISI results in a significant asymmetric skew
`between clock and data; we can therefore improve it by
`appropriate adjustment of the internal sampling delay.
`As bus speeds approach the limit of the transmission
`media, ISI becomes a significant fraction of a bit interval.
`When the device transmits a signal with the highest edge
`rate—01010101—individual 0 and 1 levels will not reach their
`final steady-state values owing to the filtering effect of the
`limited bandwidth channel. The subsequent transition will
`
`6 of 11
`
`-
`-
`
`
`.
`
`Table 3. Multicasting definition.
`
`Table 4. SLDRAM command format.
`
`Chip ID8 to ID0
`
`Destination device(s)
`
`CMD5 to CMD0
`
`Command
`
`0 0 0 0 0 0 0 0 0
`0 0 0 0 0 0 0 0 1
`M M M M M M M M M
`0 1 1 1 1 1 1 1 1
`1 0 0 0 0 0 0 0 0
`1 0 0 0 0 0 0 0 1
`1 0 0 0 0 0 0 1 0
`1 0 0 0 0 0 0 1 1
`1 0 0 0 0 0 1 0 0
`1 0 0 0 0 0 1 0 1
`M M M M M M M M M
`1 0 0 0 0 0 1 1 1
`M M M M M M M M M
`1 0 0 0 0 1 1 1 1
`M M M M M M M M M
`1 0 1 1 1 1 1 1 1
`M M M M M M M M M
`
`0
`1
`M
`255
`0 through 1
`0 through 3
`2 through 3
`0 through 7
`4 through 5
`4 through 7
`M
`0 through 15
`M
`0 through 31
`M
`0 through 255
`M
`
`have a lower amplitude starting point and will therefore cross
`the midpoint reference sooner than a 00110011 data pattern.
`Since CCLK has every possible transition, but data can have
`long periods with no transitions, data transitions range from
`no delay to significant delay with respect to clock transitions.
`Bus reflections complicate this simple explanation of ISI.
`To properly center the input sampling clock in the received
`data eye during synchronization, the controller transmits invert-
`ed and noninverted versions of a 15-bit repeating pseudoran-
`dom SYNC sequence—111101011001000. This pattern creates
`every possible 4-bit data sequence with the exception of 0000,
`allowing for long impulse responses. An SLDRAM recognizes
`this sequence by two consecutive 1s on the FLAG bit. The
`SLDRAM then determines the optimum internal delay for CCLK
`to sample incoming bits of the known SYNC pattern at the cen-
`ter of the received eye.
`Power-up initialization. When the SLDRAM subsystem is
`powered up, the controller must take the following steps
`before normal memory operations can begin:
`
`1. Power up—The controller applies Vcc, Vref, and Vccq. It
`follows later with Vterm, the 1.25-V CommandLink and
`DataLink termination supply, to avoid latch up.
`2. Reset—The controller holds the RESET* pin on each
`SLDRAM low. This clears the SLDRAM’s internal syn-
`chronization indication, programs the internal device
`ID to 255, and sets all read and write latency registers
`to their minimum values.
`3. Synchronization—The controller begins transmitting
`CCLK, drives both DCLKs with continuous transitions,
`and sets SO to 1. On DQ[17:0], CA[9:0], and FLAG, the
`controller repeatedly transmits the 15-bit SYNC
`sequence. Before the SLDRAM has synchronized, it sets
`SO to 0. Once the SLDRAM has synchronized and has
`programmed the appropriate delays for CCLK, DCLK0,
`and DCLK1, it sets SO equal to SI. The controller stops
`
`0 0 x* x x x
`0 1 x x x x
`0 x 0 x x x
`0 x 1 x x x
`0 x x 0 x x
`0 x x 1 x x
`0 x x x 0 x
`0 x x x 1 x
`0 x x x x 0
`0 x x x x 1
`1 0 0 0 0 1
`1 0 0 0 1 0
`1 0 0 0 1 1
`1 0 0 1 0 x
`1 0 0 1 1 1
`1 0 1 0 0 0
`1 0 1 0 1 0
`1 0 0 0 1 1
`1 0 1 1 0 0
`1 0 1 1 1 0
`1 0 1 1 1 1
`1 1 1 1 1 1
`
`Page access
`Bank access
`Burst length = 4
`Burst length = 8
`Read
`Write
`Leave page open
`Autoprecharge
`Use DCLK0
`Use DCLK1
`Open row
`Close row
`Register write
`Register read
`Event
`Read sync
`Drive DCLKs = 0
`Drive DCLKs =1
`Write sync
`Disable DCLKs
`Toggle DCLKs
`Command/write sync
`
`* x indicates don’t care.
`
`sending the SYNC pattern when SI is 1. It then resets
`SO to 0, which ripples through all SLDRAMs.
`4. ID assignment—The controller sets SO to 1 once again
`and sends an ID register write command with write data
`0. Only the SLDRAM with SI set to 1 and ID set to 255
`responds to this command. This SLDRAM overwrites its
`ID register with the value 0 and sets SO to 1. The con-
`troller repeats the ID register write with write data 1,
`and so on, until it observes a high logic level on SI.
`5. Voh, Vol calibration—To calibrate each SLDRAM’s I/O
`levels, the controller sends each device drive DCLKs =
`0 or drive DCLKs = 1 commands. It then issues incre-
`ment/decrement Voh/Vol event commands until the lev-
`els match the controller’s own reference.
`6. Read synchronization—The controller issues a read sync
`command to each SLDRAM, which then transmits con-
`tinuous transitions on both DCLKs and the 15-bit SYNC
`pattern on DQ[17:0]. The controller then adjusts the delay
`of DCLK edges relative to DQ edges, using the incre-
`ment/decrement DCLK offset event command, to sam-
`ple the synchronization pattern at the optimum point.
`7. Read latency calibration—After RESET, the controller sets
`SLDRAM read and write latencies to their minimum val-
`ues. For each SLDRAM, the controller issues a drive DCLKs
`= 0 command followed by a read command. To measure
`latency, the controller monitors the specified DCLK for
`the transitions that accompany the data burst. Once the
`controller has measured minimum latencies for