throbber
USOO7603490B2
`
`(12) United States Patent
`Biran et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 7,603,490 B2
`Oct. 13, 2009
`
`(54) BARRIER AND INTERRUPT MECHANISM
`FOR HIGH LATENCY AND OUT OF ORDER
`DMADEVICE
`
`(75) Inventors: Giora Biran, Zichron-Yaakov (IL); Luis
`E. De la Torre, Austin, TX (US);
`Bernard C. Drerup, Austin, TX (US);
`Jyoti Gupta, Austin, TX (US); Richard
`Nicholas, Pflugerville, TX (US)
`
`2007/OO73915 A1
`
`3/2007 Goet al.
`
`(Continued)
`FOREIGN PATENT DOCUMENTS
`
`CN
`
`1794214 A
`
`6, 2006
`
`(*) Notice:
`
`(73)
`
`(21)
`(22)
`(65)
`
`(51)
`
`(52)
`(58)
`
`(56)
`
`Assignee:
`
`International Business Machines
`Corporation, Armonk, NY (US)
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 256 days.
`Appl. No.: 11/621,776
`
`Filed:
`
`Jan. 10, 2007
`
`Prior Publication Data
`US 2008/O168191 A1
`Jul. 10, 2008
`
`Int. C.
`(2006.01)
`G06F 3/28
`(2006.01)
`G06F I3/32
`U.S. Cl. ........................................... 710/23: 710/24
`Field of Classification Search ................... 710/23,
`710/24
`See application file for complete search history.
`References Cited
`
`OTHER PUBLICATIONS
`
`U.S. Appl. No. 1 1/532,562, filed Sep. 18, 2006, Biran et al.
`
`(Continued)
`Primary Examiner Henry W. H. Tsai
`Assistant Examiner—Hyun Nam
`(74) Attorney, Agent, or Firm—Stephen R. Tkacs; Stephen J.
`Walder, Jr.; Matthew B. Talpis
`
`(57)
`
`ABSTRACT
`
`A direct memory access (DMA) device includes a barrier and
`interrupt mechanism that allows interrupt and mailbox opera
`tions to occur in Such away that ensures correct operation, but
`still allows for high performance out-of-order data moves to
`occur whenever possible. Certain descriptors are defined to
`be “barrier descriptors.” When the DMA device encounters a
`barrier descriptor, it ensures that all of the previous descrip
`tors complete before the barrier descriptor completes. The
`DMA device further ensures that any interrupt generated by a
`barrier descriptor will not assert until the data move associ
`ated with the barrier descriptor completes. The DMA control
`ler only permits interrupts to be generated by barrier descrip
`tors. The barrier descriptor concept also allows software to
`embed mailbox completion messages into the scatter/gather
`linked list of descriptors.
`
`20 Claims, 8 Drawing Sheets
`
`B2
`B2
`B2
`B1
`
`6,848,029
`6,981,074
`7,076,578
`7,218,566
`2003/0172208
`2004.0034718
`2004/O187122
`2005/0O27902
`2005, 0108446
`2006/0206635
`
`U.S. PATENT DOCUMENTS
`Coldewey
`Oner et al.
`Poisner et al.
`Totolos, Jr. et al.
`Fidler
`Goldenberg et al.
`Gosalia et al. .............. T18, 100
`King et al. .................... T10/24
`Inogai
`Alexander et al.
`
`1/2005
`12, 2005
`T/2006
`5/2007
`9, 2003
`2, 2004
`9, 2004
`2, 2005
`5/2005
`9, 2006
`
`41 412
`401\scriptor
`
`402
`
`403
`
`404
`
`405
`
`
`
`DESCRIPTOR
`
`DESCRIPTOR
`
`SCATTER,
`ATHERLIST
`
`DESCRIPTOR
`-
`DESCRIPTOR
`
`OMADEVICE
`
`DMA
`REGUEST
`
`DMA
`REGUEST
`
`DMA
`RECUEST
`
`DMA
`REQUEST
`
`BARRIER
`CLEAR
`
`BARRIER
`CLEAR
`
`BUSENGINE
`
`READ
`
`BUS
`
`440
`
`WRITE
`
`Intel Corporation v. ACQIS LLC
`Intel Corp.'s Exhibit 1045
`Ex. 1045, Page 1
`
`

`

`US 7,603.490 B2
`Page 2
`
`U.S. PATENT DOCUMENTS
`
`OTHER PUBLICATIONS
`
`2007/OO74091 A1
`2007/OO79185 A1
`2007/0162652 A1
`2007/0204091 A1
`
`3, 2007
`4, 2007
`7/2007
`8, 2007
`
`Go et al.
`Totolos, Jr.
`Go et al.
`Hofmann et al.
`
`U.S. Appl. No. 1 1/621,789, filed Jan. 10, 2007, Biran et al.
`* cited by examiner
`
`Ex. 1045, Page 2
`
`

`

`U.S. Patent
`
`US 7,603,490 B2
`
`
`
`FIG. I.
`
`100
`
`
`
`EXTERNAL
`BUSES/
`DEVICES
`
`Ex. 1045, Page 3
`
`

`

`U.S. Patent
`
`Oct. 13, 2009
`
`Sheet 2 of 8
`
`US 7,603,490 B2
`
`2O2
`
`FIG. 2
`
`PROCESSING UNIT
`
`208
`
`ske S.
`
`>
`
`MEMORY
`
`200
`y
`216
`
`236
`
`AUDIO
`ADAPTER
`
`SIO
`
`BUS
`
`204
`
`
`
`238
`
`BUS
`
`240
`
`s
`
`USBAND
`PC/PCle
`NETWORK
`DISK CD-ROM ADAPTER E. DEVICES
`
`KEYBOARD
`AND
`MOUSE
`ADAPTER
`
`MODEM
`
`226
`
`230
`
`212
`
`232
`
`234
`
`220
`
`222
`
`224
`
`
`
`PROCESSING
`UNIT
`
`FIG. 3
`SYSTEM
`MEMORY
`
`SOUTH
`BRIDGE
`
`300
`
`LOCAL
`MEMORY
`
`LOCAL
`MEMORY
`CONTROLLER
`
`DMADEVICE
`DMA ENGINE R/W BUS ENGINE
`(DE)
`(BE)
`
`312
`
`314
`
`BUS UNIT
`DEVICE
`
`BUS UNIT
`DEVICE
`
`Ex. 1045, Page 4
`
`

`

`U.S. Patent
`
`Oct. 13, 2009
`
`Sheet 3 of 8
`
`US 7,603,490 B2
`
`411. 412
`
`401
`
`DESCRIPTOR
`402\DEscriptor
`
`
`
`404
`
`403\Descriptor
`DESCRIPTOR
`DESCRIPTOR
`DESCRIPTOR
`
`405
`
`
`
`SCATTER/
`GATHER LIST
`
`DMADEVICE
`
`BARRIER
`CLEAR
`
`BARRIER
`CLEAR
`
`BARRIER
`CLEAR
`
`BARRIER
`CLEAR
`
`Ex. 1045, Page 5
`
`

`

`U.S. Patent
`
`
`
`
`
`US 7,603,490 B2
`
`79GZ9909G8999997GGZGG
`
`Ex. 1045, Page 6
`
`

`

`U.S. Patent
`
`Oct. 13, 2009
`
`Sheet 5 of 8
`
`US 7,603,490 B2
`
`
`
`READ QUEUE
`602
`
`WRITE QUEUE
`606
`
`
`
`REGISTERS
`
`610
`
`INTERFACE
`TO DE
`
`615
`
`
`
`DE REQ
`AND ATTR
`
`DE ACK
`
`BUS INTERFACE UNIT
`
`630
`
`Ex. 1045, Page 7
`
`

`

`U.S. Patent
`
`Oct. 13, 2009
`
`Sheet 6 of 8
`
`US 7,603,490 B2
`
`BEGIN
`
`
`
`
`
`
`
`DMA
`CHANNEL READY
`TOISSUE REQUEST
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`BARRIER
`ATTRIBUTE SET
`
`BARRIER
`PENDING FOR
`CHANNEL
`808 YES
`
`RECEIVED
`BARRIER CLEAR
`FROM BE2
`
`FIG. 8
`
`
`
`
`
`ISSUEDMA
`REQUEST TO BE
`
`
`
`TOGGLE
`BARRIER TAG
`
`810
`
`BEGIN
`
`DESCRIPTOR s
`
`AVAILABLE
`
`704
`
`
`
`CONVERT DESCRIPTOR
`INTO ONE ORMORE
`DMA REQUESTS
`
`
`
`
`
`
`
`BARRIER
`DESCRIPTOR7
`
`
`
`SET BARRIER ATTRIBUTE
`FORLAST REQUEST FOR
`7081 BARRIER DESCRIPTOR
`
`
`
`FIG. 7
`
`Ex. 1045, Page 8
`
`

`

`U.S. Patent
`
`Oct. 13, 2009
`
`Sheet 7 of 8
`
`US 7,603,490 B2
`
`BEGIN
`BARRIER TAG=0
`
`FIG. 9A
`
`902
`
`SET BARRIER FLAG=0
`
`
`
`
`
`
`
`
`
`
`
`916
`
`NEW
`NO<10MAREGUEST
`O6
`
`
`
`RECEIVE DMA REQUEST FROM QUEUE
`
`
`
`BARRIER
`TAG=O AND BARRIER
`ATTRIBUTE= 1?
`
`PLACEDMA
`REQUEST IN
`HOLDING REGISTER
`
`
`
`
`
`
`
`BARRIER TAG=0,
`BARRIER AT TRIBUTE=0
`REOUESTSSENT
`
`914 YES
`
`918
`
`ISSUEDMA TRANSACTION TO BUS
`
`
`
`
`
`
`
`
`
`920
`
`SEND BARRIER CLEAR SIGNAL TO DE
`
`
`
`
`
`
`
`INTERRUPT
`BIT SET2
`
`YES
`
`N- TO FIG. 9B
`
`
`
`ISSUEDMA
`TRANSACTION TO BUS
`
`BARRIER TAG=0,
`BARRIER ATTRIBUTE=0
`REQUESTSSENT
`
`DMA
`REQUEST IN
`HOLDING REGISTER
`?
`
`SEND INTERRUPT
`TO SOFTWARE
`
`924
`
`Ex. 1045, Page 9
`
`

`

`U.S. Patent
`
`Oct. 13, 2009
`
`Sheet 8 of 8
`
`US 7,603,490 B2
`
`FROM FIG. 9A N
`
`.
`
`
`
`
`
`
`
`
`
`
`
`942
`
`HOLDDMA
`REQUEST
`
`
`
`
`
`
`
`
`
`
`
`
`
`BARRIER
`TAG = 1 AND BARRIER
`ATTRIBUTE=1?
`
`BARRIER TAG=1
`BARRIER AT TRIBUTE=0
`REQUESTSSENT
`
`940 YES
`
`926
`
`SET BARRIER FLAG = 1
`BE-"ON ONE"
`
`. . DMA REQUEST
`
`RECEIVE DMA REQUEST FROM QUEUE
`
`
`
`
`
`
`
`
`
`
`
`944-1 issue DMATRANSACTIONTo Bus
`
`ISSUEDMA TRANSACTION TO BUS
`
`946
`
`SEND BARRIER CLEAR SIGNAL TO DE
`
`
`
`
`
`
`
`INTERRUPT
`BITSET2
`
`YES
`
`952
`
`SET BARRIER FLAG=0
`BE="ONZERO"
`
`N TO FIG. 9A
`
`FIG. 9B
`
`ISSUEDMA
`TRANSACTION TO BUS
`
`BARRIER TAG = 1,
`BARRIER AT TRIBUTE=0
`REQUESTSSENT
`
`DMA
`REQUEST IN
`HOLDING REGISTER
`?
`
`SEND INTERRUPT
`TO SOFTWARE
`
`950
`
`Ex. 1045, Page 10
`
`

`

`US 7,603,490 B2
`
`1.
`BARRER AND INTERRUPT MECHANISM
`FOR HIGH LATENCY AND OUT OF ORDER
`DMADEVICE
`
`BACKGROUND
`
`10
`
`15
`
`25
`
`30
`
`35
`
`1. Technical Field
`The present application relates generally to an improved
`data processing system and method. More specifically, the
`present application is directed to a direct memory access
`controller with a barrier and interrupt mechanism for high
`latency and out of order direct memory access device.
`2. Description of Related Art
`Many system-on-a-chip (SOC) designs contain a device
`called a direct memory access (DMA) controller. The purpose
`of DMA is to efficiently move blocks of data from one loca
`tion in memory to another. DMA controllers are usually used
`to move data between system memory and an input/output
`(I/O) device, but are also used to move data between one
`region in System memory and another. A DMA controller is
`called “direct” because a processor is not involved in moving
`the data.
`Without a DMA controller, data blocks may be moved by
`having a processor copy data piece-by-piece from one
`memory space to another under Software control. This usually
`is not preferable for large blocks of data. When a processor
`copies large blocks of data piece-by-piece, it Is slow because
`the processor does not have large memory buffers and must
`move data in Small inefficient sizes, such as 32-bits at a time.
`Also, while the processor is doing the copy, it is not free to do
`other work. Therefore, the processor is tied up until the move
`is completed. It is more efficient to offload these data block
`moves to a DMA controller, which can do them much faster
`and in parallel with other work.
`DMA controllers usually have multiple “channels. As
`used herein, a "channel is an independent stream of data to
`be moved by the DMA controller. Thus, DMA controllers
`may be programmed to perform several block moves on dif
`ferent channels simultaneously, allowing the DMA device to
`transfer data to or from several I/O devices at the same time.
`Another feature that is typical of DMA controllers is a
`scatter/gather operation. A scatter/gather operation is one in
`which the DMA controller does not need to be programmed
`by the processor for each block of data to be moved from
`Some source to some destination. Rather, the processor sets
`up a descriptor table or descriptor linked list in system
`memory. A descriptor table or linked list is a set of descrip
`tors. Each descriptor describes a data block move, including
`Source address, destination address, and number of bytes to
`transfer. Non-scatter/gather block moves, which are pro
`grammed via the DMA registers directly, are referred to as
`“single programming DMA block moves.
`A linked list architecture of a DMA controller is more
`flexible and dynamic than the table architecture. In the linked
`list architecture, the processor refers one of the DMA chan
`nels to the first descriptor in the chain, and each descriptor in
`the linked list contains a pointer to the next descriptor in
`memory. The descriptors may be anywhere in system
`memory, and the processor may add onto the list dynamically
`as the transfers occur. The DMA controller automatically
`traverses the table or list and executes the data block moves
`described by each descriptor until the end of the table or list is
`reached.
`Modern DMA devices may be connected to busses that
`allow read data to be returned out of order. That is, the DMA
`65
`controller may issue several read transactions to the bus that
`are all part of the same or different data block moves and the
`
`40
`
`45
`
`50
`
`55
`
`60
`
`2
`data may be returned by the target devices in a different order
`than the order in which the reads were issued. Typically, each
`read transaction is assigned a “tag” number by the initiator So
`that when read data comes back from the bus, the initiator will
`know based on the tag to which transaction the data belongs.
`The transaction queued can be completed in any order. This
`allows the DMA device to achieve the best performance by
`queuing many transactions to the bus at once, including queu
`ing different transactions to different devices. The read trans
`actions can complete in any order and their associated writes
`started immediately when the read data arrives. Allowing the
`reads and their associated writes to compete in any order
`achieves the best performance possible on a given bus, but can
`cause certain problems.
`When system software sets up a large block of memory to
`be moved between an I/O device and memory or from one
`region in memory to another, the Software will want to know
`when that block of data has been completely moved so that it
`can act on the data. Because the processor or some other
`device may act on the data when the transfer is complete, it is
`imperative that the interrupt not be generated until all of the
`data associated with the move has been transferred; other
`wise, the processor may try to act on data that is not yet
`transferred and will, thus, read incorrect data. With out of
`order execution, a DMA device cannot simply generate an
`interrupt when the last transaction in a block completes.
`Some systems work by having “completion codes' moved
`to “mailboxes” when a series of data moves have been com
`pleted. A mailbox is a messaging device that acts as a first
`in-first-out (FIFO) for messages. When the DMA controller
`delivers messages to the mailbox by writing to the mailbox
`address, the DMA controller may deliver messages to the
`processor in order. Messages are typically small, on the order
`of eight or sixteen bytes. When software sets up a series of
`block moves in a scatter/gather list, the Software can input the
`completion messages in the descriptor linked list so that the
`DMA device may move both the data blocks and the comple
`tion code messages via the same list of scatter/gather descrip
`tOrS.
`However, in order for software to work correctly, when the
`DMA controller writes a completion message to the mailbox,
`it is imperative that all descriptors prior to the descriptor
`writing to the mailbox have completed, because the mailbox,
`like an interrupt, tells the processor that a certain amount of
`data has been moved. Because all transactions can complete
`out of order for performance, the DMA device can write a
`completion message to the mailbox prior to some of the other
`transactions from previous descriptors having completed
`unless there is a mechanism to prevent it.
`
`SUMMARY
`
`In one illustrative embodiment, a method is provided in a
`direct memory access engine in a direct memory access
`device for performing a direct memory access block move.
`The method comprises receiving a direct memory access
`block move descriptor. The direct memory access block move
`descriptor indicates a source and a target. The direct memory
`access block move descriptoris identified as a barrier descrip
`tor. The method further comprises converting the direct
`memory access block move descriptor into one or more direct
`memory access requests for the direct memory access block
`move descriptor, identifying a last direct memory access
`request within the one or more direct memory access
`requests, and setting a barrier attribute associated with the last
`direct memory access request. For each given direct memory
`access request in the one or more direct memory access
`
`Ex. 1045, Page 11
`
`

`

`3
`requests, the method determines whether the barrier attribute
`is set for the given direct memory access request, determines
`whether a barrier is pending for a channel associated with the
`given direct memory access request if the barrier attribute is
`set, and issues the given direct memory access request to abus
`engine in the direct memory access device if a barrier is not
`pending for the channel associated with the given direct
`memory access request.
`In another illustrative embodiment, a method is provided in
`abus engine in a direct memory access device for performing
`a direct memory access block move. The method comprises
`receiving a direct memory access request from a direct
`memory access queue. A direct memory access engine in the
`direct memory access device converts a direct memory access
`block move descriptor into one or more direct memory access
`requests, sets a barrier attribute for a last direct memory
`access request within the one or more direct memory access
`requests to mark a barrier, and stores the one or more direct
`memory access requests in the direct memory access queue.
`The method further comprises determining whether the direct
`memory access request has a barrier attribute set, determining
`whether all direct memory access requests before the barrier
`have completed if the direct memory access request has a
`barrier attribute set, and holding the direct memory access
`request from completing if all direct memory access requests
`before the barrier have not completed.
`In yet another illustrative embodiment, a direct memory
`access device comprises a direct memory access engine and a
`bus engine. The direct memory access engine is configured to
`receive a direct memory access block move descriptor. The
`direct memory access block move descriptor indicates a
`Source and a target. The direct memory access block move
`descriptor is identified as a barrier descriptor. The direct
`memory access engine is further configured to convert the
`direct memory access block move descriptor into one or more
`direct memory access requests for the direct memory access
`block move descriptor, identify a last direct memory access
`request within the one or more direct memory access
`requests, set a barrier attribute for the last direct memory
`access request to mark a barrier, and issue the one or more
`direct memory access requests to a direct memory access
`queue. The bus engine is configured to receive a direct
`memory access request from the direct memory access queue
`determine whether the received direct memory access request
`has a barrier attribute set, determine whether all direct
`memory access requests before the barrier have completed if
`the received direct memory access request has a barrier
`attribute set, and hold the received direct memory access
`request from completing if all direct memory access requests
`before the barrier have not completed.
`These and other features and advantages of the present
`invention will be described in, or will become apparent to
`those of ordinary skill in the art in view of the following
`detailed description of the exemplary embodiments of the
`present invention.
`
`10
`
`15
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The invention, as well as a preferred mode of use and
`further objectives and advantages thereof, will best be under
`stood by reference to the following detailed description of
`illustrative embodiments when read in conjunction with the
`accompanying drawings, wherein:
`FIG. 1 is an exemplary block diagram of a data processing
`system in which aspects of the illustrative embodiments may
`be implemented;
`
`60
`
`65
`
`US 7,603,490 B2
`
`4
`FIG. 2 is a block diagram of an exemplary data processing
`system in which aspects of the illustrative embodiments may
`be implemented;
`FIG. 3 is a block diagram illustrating a south bridge in
`accordance with an illustrative embodiment;
`FIG. 4 illustrates overall operation of a direct memory
`access device with barrier descriptors in accordance with an
`illustrative embodiment;
`FIG. 5A depicts an example descriptor in accordance with
`an illustrative embodiment;
`FIG. 5B depicts an example of DMA request attributes in
`accordance with an illustrative embodiment;
`FIG. 6 illustrates an overall bus engine queue structure in
`accordance with one illustrative embodiment;
`FIG. 7 is a flowchart illustrating the operation of a direct
`memory access engine processing descriptors in accordance
`with an illustrative embodiment;
`FIG. 8 is a flowchart illustrating the operation of a direct
`memory access engine issuing requests to a bus engine in
`accordance with an illustrative embodiment; and
`FIGS. 9A and 9B are flowcharts illustrating ion of a bus
`engine enforcing a barrier in dance with an illustrative
`embodiment.
`
`DETAILED DESCRIPTION OF THE
`ILLUSTRATIVE EMBODIMENTS
`
`With reference now to the figures and in particular with
`reference to FIGS. 1-2, exemplary diagrams of data process
`ing environments are provided in which illustrative embodi
`ments of the present invention may be implemented. It should
`be appreciated that FIGS. 1-2 are only exemplary and are not
`intended to assert or imply any limitation with regard to the
`environments in which aspects or embodiments of the present
`invention may be implemented. Many modifications to the
`depicted environments may be made without departing from
`the spirit and scope of the present invention.
`FIG. 1 is an exemplary block diagram of a data processing
`system in which aspects of the illustrative embodiments may
`be implemented. The exemplary data processing system
`shown in FIG. 1 is an example of the Cell Broadband Engine
`(CBE) data processing system. While the CBE will be used in
`the description of the preferred embodiments of the present
`invention, the present invention is not limited to such, as will
`be readily apparent to those of ordinary skill in the art upon
`reading the following description.
`As shown in FIG. 1, the CBE 100 includes a power pro
`cessor element (PPE) 110 having a processor (PPU) 116 and
`its L1 and L2 caches 112 and 114, and multiple synergistic
`processor elements (SPEs) 120-134 that each has its own
`synergistic processor unit (SPU) 140-154, memory flow con
`trol 155-162, local memory or store (LS) 163-170, and bus
`interface unit (BIU unit) 180-194 which may be, for example,
`a combination direct memory access (DMA), memory man
`agement unit (MMU), and bus interface unit. A high band
`width internal element interconnect bus (EIB) 196, a bus
`interface controller (BIC) 197, and a memory interface con
`troller (MIC) 198 are also provided.
`The local memory or local store (LS) 163-170 is a non
`coherent addressable portion of a large memory map which,
`physically, may be provided as Small memories coupled to the
`SPUs 140-154. The local stores 163-170 may be mapped to
`different address spaces. These address regions are continu
`ous in a non-aliased configuration. A local store 163-170 is
`associated with its corresponding SPU 140-154 and SPE
`120-134 by its address location, such as via the SPU Identi
`fication Register, described in greater detail hereafter. Any
`
`Ex. 1045, Page 12
`
`

`

`5
`resource in the system has the ability to read-write from/to the
`local store 163-170 as long as the local store is not placed in
`a secure mode of operation, in which case only its associated
`SPU may access the local store 163-170 or a designated
`secured portion of the local store 163-170.
`The CBE 100 may be a system-on-a-chip such that each of
`the elements depicted in FIG.1 may be provided on a single
`microprocessor chip. Moreover, the CBE 100 is a heteroge
`neous processing environment in which each of the SPUs
`may receive different instructions from each of the other
`SPUs in the system. Moreover, the instruction set for the
`SPUs is different from that of the PPU, e.g., the PPU may
`execute Reduced Instruction Set Computer (RISC) based
`instructions while the SPU execute vectorized instructions.
`The SPEs 120-134 are coupled to each other and to the L2
`cache 114 via the EIB 196. In addition, the SPEs 120-134 are
`coupled to MIC 198 and BIC 197 via the EIB 196. The MIC
`198 provides a communication interface to shared memory
`199. The BIC 197 provides a communication interface
`between the CBE 100 and other external buses and devices.
`The PPE 110 is a dual threaded PPE 110. The combination
`of this dual threaded PPE 110 and the eight SPEs 120-134
`makes the CBE 100 capable of handling 10 simultaneous
`threads and over 128 outstanding memory requests. The PPE
`110 acts as a controller for the other eight SPEs 120-134
`which handle most of the computational workload. The PPE
`110 may be used to run conventional operating systems while
`the SPEs 120-134 perform vectorized floating point code
`execution, for example.
`The SPEs 120-134 comprise a synergistic processing unit
`(SPU) 140-154, memory flow control units 155-162, local
`memory or store 163-170, and an interface unit 180-194. The
`local memory or store 163-170, in one exemplary embodi
`ment, comprises a 256 KB instruction and data memory
`which is visible to the PPE 110 and can be addressed directly
`by software.
`The PPE 110 may load the SPEs 120-134 with small pro
`grams or threads, chaining the SPEs together to handle each
`step in a complex operation. For example, a set-top box incor
`porating the CBE 100 may load programs for reading a DVD,
`Video and audio decoding, and display, and the data would be
`passed off from SPE to SPE until it finally ended up on the
`output display. At 4 GHz, each SPE120-134 gives a theoreti
`cal 32 GFLOPS of performance with the PPE 110 having a
`similar level of performance.
`The memory flow control units (MFCs) 155-162 serve as
`an interface for an SPU to the rest of the system and other
`elements. The MFCs 155-162 provide the primary mecha
`nism for data transfer, protection, and synchronization
`between main storage and the local storages 163-170. There
`is logically an MFC for each SPU in a processor. Some
`implementations can share resources of a single MFC
`between multiple SPUs. In such a case, all the facilities and
`commands defined for the MFC must appear independent to
`software for each SPU. The effects of sharing an MFC are
`limited to implementation-dependent facilities and com
`mands.
`With reference now to FIG. 2, a block diagram of an exem
`plary data processing system is shown in which aspects of the
`illustrative embodiments may be implemented. In the
`depicted example, data processing system 200 employs a hub
`architecture including South bridge and input/output (I/O)
`controller hub (SB/ICH) 204. Processing unit 202 is con
`nected to system memory 208 via memory interface control
`ler (MIC) 210. Processing unit 202 is connected to SB/ICH
`204 through bus interface controller (BIC) 206.
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`US 7,603,490 B2
`
`5
`
`10
`
`15
`
`6
`In the depicted example, local area network (LAN) adapter
`212 connects to SB/ICH 204. Audio adapter 216, keyboard
`and mouse adapter 220, modem 222, read only memory
`(ROM) 224, hard disk drive (HDD) 226, CD-ROM drive 230,
`universal serial bus (USB) ports and other communication
`ports 232, and PCI/PCIe devices 234 connect to SB/ICH204
`through bus 238 and bus 240. PCI/PCIe devices may include,
`for example, Ethernet adapters, add-in cards, and PC cards for
`notebook computers. PCI uses a card bus controller, while
`PCIe does not. ROM 224 may be, for example, a flash binary
`input/output system (BIOS).
`HDD 226 and CD-ROM drive 230 connect to SB/ICH 2.04
`through bus 240. HDD 226 and CD-ROM drive 230 may use,
`for example, an integrated drive electronics (IDE) or serial
`advanced technology attachment (SATA) interface. Super I/O
`(SIO) device 236 may be connected to SB/ICH204.
`An operating system runs on processing unit 202. The
`operating system coordinates and provides control of various
`components within the data processing system 200 in FIG. 2.
`As a client, the operating system may be a commercially
`available operating system. An object-oriented programming
`system, Such as the Java programming system, may run in
`conjunction with the operating system and provides calls to
`the operating system from JavaTM programs or applications
`executing on data processing system 200 (Java is a trademark
`of Sun Microsystems, Inc. in the United States, other coun
`tries, or both).
`As a server, data processing system 200 may be, for
`example, an IBM(R) eServer'TM pSeries(R) computer system,
`running the Advanced Interactive Executive (AIX(R) operat
`ing system or the LINUX(R) operating system (eServer,
`pSeries and AIX are trademarks of International Business
`Machines Corporation in the United States, other countries,
`or both, while LINUX is a trademark of Linus Torvalds in the
`United States, other countries, or both). Data processing sys
`tem 200 may include a plurality of processors in processing
`unit 202. Alternatively, a single processor system may be
`employed.
`Instructions for the operating system, the object-oriented
`programming System, and applications or programs are
`located on storage devices, such as HDD 226, and may be
`loaded into main memory 208 for execution by processing
`unit 202. The processes for illustrative embodiments of the
`present invention may be performed by processing unit 202
`using computer usable program code, which may be located
`in a memory such as, for example, main memory 208, ROM
`224, or in one or more peripheral devices 226 and 230, for
`example.
`A bus system, such as bus 238 or bus 240 as shown in FIG.
`2, may be comprised of one or more buses. Of course, the bus
`system may be implemented using any type of communica
`tion fabric or architecture that provides for a transfer of data
`between different components or devices attached to the fab
`ric or architecture. A communication unit, such as modem
`222 or network adapter 212 of FIG. 2, may include one or
`more devices used to transmit and receive data. A memory
`may be, for example, main memory 208, ROM 224, or a cache
`Such as found in NB/MCH 202 in FIG. 2.
`Those of ordinary skill in the art will appreciate that the
`hardware in FIGS. 1-2 may vary depending on the implemen
`tation. Other internal hardware or peripheral devices, such as
`flash memory, equivalent non-volatile memory, or optical
`disk drives and the like, may be used in addition to or in place
`of the hardware depicted in FIGS. 1-2. Also, the processes of
`the illustrative embodiments may be applied to a multipro
`cessor data processing system, other than the SMP system
`
`Ex. 1045, Page 13
`
`

`

`US 7,603,490 B2
`
`5
`
`10
`
`15
`
`25
`
`7
`mentioned previously, without departing from the spirit and
`Scope of the present invention.
`Moreover, the data processing system 200 may take the
`form of any of a number of different data processing systems
`including client computing devices, server computing
`devices, a tablet computer, laptop computer, telephone or
`other communication device, a personal digital assistant
`(PDA), video game console, or the like. In some illustrative
`examples, data processing system 200 may be a portable
`computing device which is configured with flash memory to
`provide non-volatile memory for storing operating system
`files and/or user-generated data, for example. Essentially,
`data processing system 200 may be any known or later devel
`oped data processing system without architectural limitation.
`South bridge 204 may include a direct memory access
`(DMA) controller. DMA controllers are usually used to move
`data between system memory and an input/output (I/O)
`device, but are also used to move data between one region in
`system memory and another. High latency devices present
`unique challenges if high bus utilization is desired. When
`talking to a high latency device, there must be enough simul
`taneous transactions outstanding so that the time it takes to
`receive data from the high latency device is less than or equal
`to the amount of time it takes to transfer the data from all of
`the other outstanding transactions queued ahead of it. If this
`criterion is met, then there seldom will be gaps or stalls on the
`bus where the DMA is waiting for data and does not have any
`other data available to transfer.
`With trends towards further integration, particularly with
`systems-on-a-chip, many devices in FIG.2 may be integrated
`within south bridge 204. For example, a single bus may be
`integrated within south bridge 204. Also, controllers and
`interfaces, such as USB controller, PCI and PCIe controllers,
`memory controllers, and the like, may be integrated within
`south bridge 204 and attached to the internal bus. Further
`more, South bridge 204 may include a memory controller to
`which a memory module may be connected for local memory.
`Also note that processing unit 202 may include an internal
`bus, such as EIB 196 in FIG. 1, through which the DMA
`device may access system memory 208.
`40
`FIG. 3 is a block diagram illustrating a south bridge in
`accordance with an illustrative embodiment. Processing unit
`302, for example, issues DMA commands to bus 320 in south
`bridge 300. DMA device 310 within south bridge 300 may
`then execute the DMA commands by performing read opera
`tions from source devices, such as bus unit device 322, and
`write operations to target devices, such as bus unit device 324.
`In an alternative example, a DMA command may request to
`move a block of data from bus unit device 322 to system
`memory 304, or according to yet another example, a DMA
`50
`command may request to move a block of data from memory
`304 to bus unit device 324. Bus unit device 322 and bus unit
`device 324 may be, for example, memory controllers, USB
`controllers, PCI controllers, storage device controllers, and
`the like, or combinations thereof.
`The source devices and target devices may include low
`latency devices, such as memory, and high latency devices,
`such as hard disk drives. Note, however, that devices that are
`generally low latency, such as memory devices, may also be
`high latency in Some instances depending upon their loca
`60
`tions within the bus and bridge hierarchy. Many of the com
`ponents of south bridge 300 are not shown for simplicity. A
`person of ordinary skill in the art will recognize that south
`bridge 300 will include

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket