throbber
United States Patent [19]
`Ekanadham et al.
`
`US006085295A
`[11] Patent Number:
`[45] Date of Patent:
`
`6,085,295
`*Jul. 4, 2000
`
`[54] METHOD OF MAINTAINING DATA
`COHERENCY IN A COMPUTER SYSTEM
`HAVING A PLURALITY OF
`INTERCONNECTED NODES
`
`7/1998 Laudon m1. ........................ .. 711/141
`5,787,476
`5,829,035 10/1998 James et al. .......................... .. 711/141
`Primary Examiner—Hiep T Nguyen
`Attorney, Agent, or Firm—Douglas W. Cameron
`
`[57]
`
`ABSTRACT
`
`A method of providing coherent shared memory access
`among a plurality of shared memory multiprocessor nodes.
`For each line of data in each of the nodes, a list of those
`processors of the node that have copies of the line in their
`caches is maintained. If a memory command is issued from
`a processor of one node, and if the command is directed to
`a line of memory of another node, then the memory com
`mand is sent directly to an adapter'of the one node. When the
`This patent issued on a Continued pros_
`adapter receives the command, it forwards the command
`ecution application ?led under 37 CFR
`from the one adapter to another adapter of the other node.
`153w), and is Subject to the twenty year
`patent term provisions of 35 USC When 'the other adapter receives the command, the com
`154(a)(2)_
`mand~ is forwarded to' the local memory of the other node.
`The list of processors is then updated in the local memory of
`the other node to include or exclude the other adapter
`depending on the command. If the memory command is
`issued from one of the processors of one of the nodes, and
`if the command is directed to a line of memory of the one
`node, then the Command is Sent directly to local memory
`When the local memory receives the Command and if the
`adapter of the node is in the list of processors for a line
`associated with the command and if the command is a write
`Command, then the command is forwarded to the adapter of
`the one node. When the adapter receives the command, the
`command is forwarded to remote adapters in each of the
`remote nodes which have processors which have cache
`copies of the line. Finally, when the latter remote adapters
`receive the command, the command is forwarded to the
`processors having the cache copies of the line.
`
`[75] Inventors: Kattamuri Ekanadham, Mohegan
`Lake; Beng-Hong Lim, Wappingers
`Falls; Pratap Chandra Pattnaik,
`Ossinging; Marc Snir, Briarcliff Manor,
`all of NY.
`
`[73] Assignee: International Business Machines
`Corporation, Armonk, NY.
`
`[ * ]
`
`Notice:
`
`[21] Appl' NO" 08/954’496
`[22]
`Filed;
`Oct, 20, 1997
`
`7
`[51] Int. Cl. .................................................... .. G06F 12/16
`[52] US. Cl. ........................ .. 711/145; 711/121; 711/143;
`711/144; 711/153; 711/156; 711/173; 709/214;
`709/215; 709/218; 709/219; 710/129; 710/131
`[58] Field Of Search ................................... .. 711/121, 141,
`711/144, 145, 148, 143, 153, 156, 173;
`709/214, 218, 219, 215; 710/129, 131
`
`[56]
`
`References Cited
`Us‘ PATENT DOCUMENTS
`
`5,535,365
`5,749,095
`
`7/1996 Barriuso et a1. ...................... .. 711/155
`5/1998 Hagersten ............................. .. 711/141
`
`7 Claims, 8 Drawing Sheets
`
`NODE 1
`
`NODE 2
`
`PROXY
`PROCESSOR
`
`PROXY
`MEMORY
`
`NETAPP, INC. EXHIBIT 1003
`Page 1 of 14
`
`

`
`U.S. Patent
`
`Jul. 4,2000
`
`Sheet 1 0f8
`
`6,085,295
`
`FIG.1
`PRIOR ART
`
`NETAPP, INC. EXHIBIT 1003
`Page 2 of 14
`
`

`
`U.S. Patent
`
`Jul. 4, 2000
`
`Sheet 2 0f 8
`
`6,085,295
`
`MEMORY M )
`
`//—24—3
`
`26
`
`(LIST OF LOCAL P)
`
`F|G.2
`PRIOR ART
`
`/
`
`NETAPP, INC. EXHIBIT 1003
`Page 3 of 14
`
`

`
`U.S. Patent
`
`Jul. 4, 2000
`
`Sheet 3 0f 8
`
`6,085,295
`
`NODE 2
`
`F|G.3G
`
`NODE 1
`
`Pn
`
`I J SWITCH /
`
`M
`
`A
`
`NODE 1
`
`P1
`
`M
`
`SWITCH
`
`Pn
`
`A
`
`PROXY
`PROCESSOR
`
`MEMORY
`
`NODE 2
`
`@ @
`mm:
`M 0
`
`FlG.3c W
`
`NETAPP, INC. EXHIBIT 1003
`Page 4 of 14
`
`

`
`U.S. Patent
`
`Jul. 4, 2000
`
`Sheet 4 0f 8
`
`6,085,295
`
`TO LOCAL SWITCH
`
`/
`
`\
`
`LINE
`41
`LOCAL PROC. 42 NODE
`LISTS / LISTS / STATES 43
`24 \
`o
`
`|
`
`|
`
`l
`
`l
`
`I
`
`|
`
`|
`
`l
`
`A
`
`|
`
`40-1
`
`40-2
`
`SM
`
`SM
`
`\
`
`ADAPTER )
`
`TO NETWORK
`FIG.4
`
`NETAPP, INC. EXHIBIT 1003
`Page 5 of 14
`
`

`
`U.S. Patent
`
`Jul. 4, 2000
`
`Sheet 5 0f 8
`
`6,085,295
`
`40-1
`-__ '/
`NO OTHER NODE
`’
`‘T
`HAS THIS LINE
`
`R
`r
`‘
`
`R
`
`R
`
`[40-2
`NO LOCAL CACHE
`HAS THIs LINE
`(THIS NODE HAS
`NO COPY)
`
`e SOME OTHER NODE
`\
`,
`HAS THIS LINE
`w
`EXCLUSIVE
`(THIS NODE HAs
`NO COPY)
`
`6 SOME LOCAL CACHE
`‘ HAS THIS LINE
`w
`EXCLUSIVE
`
`ADAPTER ACTING AS
`A PROXY PROCESSOR
`
`ADAPTER ACTING AS
`A PROXY MEMORY
`
`NODE 1(HOME)
`
`NODE 2(CLIENT)
`
`DIRECTORY IN MEMORY MAINTAINS
`STATE BITS FOR EACH PROCESSOR AND
`FOR THE ADAPTER. POSSIBILITIES ARE
`SHOWN BELOW:
`
`PPPA
`24-1\[||]I]|] ONLY MEMORY HAS THIS LINE
`24—2\g[|g|] LOCAL CACHES HAvE SHARED COPY
`2dr"3\|I|I|]|] A LOCAL CACHE HAS EXCLUSIVE COPY
`§4_4\E|UU@ SOME REMOTE CACHES HAvE SHARED COPY
`4"5\IIIEIIIE LOCAL AND REMOTE CACHES HAVE SHARED COPY
`24-6\[|I]|]| SOME REMOTE CACHES HAS EXCLUSIVE COPY
`
`IEEI
`
`NO POINTER
`POINTER IN SHARED STATE
`POINTER IN EXCLUSIVE STATE
`
`FIG.5
`
`/
`
`NETAPP, INC. EXHIBIT 1003
`Page 6 of 14
`
`

`
`U.S. Patent
`
`Jul. 4, 2000
`
`Sheet 6 0f 8
`
`6,085,295
`
`STARTING STATE vs I
`
`R
`—
`
`40-1
`134/
`
`NO OTHER NODE
`HAS THIS LINE
`
`STARTING STATE IS E
`40-2
`'3- f
`NO OTHER NODE
`HAS THIS LINE
`
`R
`
`‘
`
`Q SOME OTHER NODE
`HAS THIS LINE
`EXCLUSIVE
`
`W K"
`
`‘ Q SOME OTHER NODE
`\
`,
`HAS THIS LINE
`W ‘~’
`EXCLUSIVE
`
`\
`
`\
`
`J05‘
`
`ADAPTER ACTING AS
`A PROXY PROCESSOR
`
`ADAPTER ACTING AS
`A PROXY PROCESSOR
`
`>
`
`NODE 1(HOME)
`
`DIRECTORY IN MEMORY
`MAINTAINS STATE BITS
`FOR EACH PROCESSOR
`AND FOR THE ADAPTER.
`
`NODE 2(CLIENT)
`
`DIRECTORY IN MEMORY
`MAINTAINS STATE BITS
`FOR EACH PROCESSOR
`AND FOR THE ADAPTER.
`
`FIG.6
`
`/
`
`NETAPP, INC. EXHIBIT 1003
`Page 7 of 14
`
`

`
`U.S. Patent
`
`Jul. 4, 2000
`
`Sheet 7 0f 8
`
`6,085,295
`
`T.&(
`\mmmm
`5%
`
`E2. 55 min
`
`mm: ME 18 @ . :1
`
`
`588%
`l/ @ égm yww .V|.VN
`
`$3258
`E05:
`
`a: z_
`
`mmwmonz m2]
`
`mmi
`
`omIm<I
`
`ME. 20
`
`mmmmanz m2]
`
`
`
`@23 ME.
`
`50$
`
`EOE: / z
`
`NETAPP, INC. EXHIBIT 1003
`Page 8 of 14
`
`

`
`U.S. Patent
`
`Jul. 4, 2000
`
`Sheet 8 0f 8
`
`6,085,295
`
`m2: ME 18 . :1
`
`
`
`Ia (\5 206%
`
`
`55% mmmmmmfwzz
`T?) mm;
`
`
`
`
`
`Elm mmmmonz m2: E
`E7: E5 81w: :65: f: < >
`
`
`
`$3258
`
`588% m2: ME
`
`
`m5 2: 52mm
`to E8 :20
`
`
`
`:65: <> ME 7: X
`
`OF
`I
`
`00
`
`NETAPP, INC. EXHIBIT 1003
`Page 9 of 14
`
`

`
`1
`METHOD OF MAINTAINING DATA
`COHERENCY IN A COMPUTER SYSTEM
`HAVING A PLURALITY OF
`INTERCONNECTED NODES
`
`DESCRIPTION
`
`6,085,295
`
`2
`and must not require changes to the existing memory
`controller of a multiprocessor node.
`
`SUMMARY OF THE INVENTION
`
`It is therefore an objective of this invention to provide a
`solution to the problem of providing shared-memory access
`across multiple sWitch-based SMP nodes.
`The invention comprises an adapter to extend cache
`coherent memory access across SMP nodes, and a method
`for using the memory system of a sWitch-based SMP node
`to interface to the adapter.
`The key concepts in the adapter design are:
`All communications betWeen the processors, the memo
`ries and the adapters are made point to point Without the
`need for broadcasts Within the SMP node.
`In a node Where a line is mapped onto the node’s memory,
`the adapter acts as a proxy processor representing all the
`processors outside the node that share the line. More
`speci?cally, When a remote processor issues a memory
`command to a local memory, the remote adapter at the
`remote processor’s node, forWards the memory command to
`the local adapter, Which is responsible for insuring that the
`command is executed at the local memory.
`In a node Where a remote line is brought into the cache of
`a processor, but not into the node’s memory, the adapter acts
`as a proxy memory representing the remote memory that the
`line is mapped onto. More speci?cally, When a memory
`command is issued from a local processor to a remote
`memory, the memory command is directed to the adapter
`Which is responsible for insuring that the command is
`executed at that remote memory.
`The adapter is versatile enough to be used for either
`CC-NUMA (Cache Coherent Non Uniform Memory
`Access) and S-COMA (Simple Cache Only Memory
`Architecture) systems.
`By appearing as either a local processor or a local
`memory, the adapter uses the local SMP coherence protocol
`Within a node to accomplish the above tasks, Without any
`changes to the memory controllers.
`In situations Where the memory controller is limited in the
`amount of storage it can use for the directory and must
`employ a dynamic allocation scheme for the directory
`entries, an extension of this invention involves a modi?ca
`tion to the memory controller that overcomes the the storage
`limitation of the memory controller.
`Accordingly, this invention provides coherent shared
`memory access across a number of interconnected multi
`processor nodes. With this invention each line of data in
`each memory of the node maintains a list of processors of
`the node that have copies of the line in their caches. When
`a memory command is issued from one of the processors of
`a node to a memory of another node, the command is
`directed to an adapter of the issuing node. The adapter of the
`issuing node then receives the command and forWards the
`command to the adapter at the remote node. When the
`adapter at the remote node receives the command, it then
`forWards the command to its local memory, Which then
`updates its list of processors to include or exclude the
`adapter. HoWever, if the memory command is issued from a
`processor to a local memory then the command is simply
`forWarded directly to that local memory. When that local
`memory receives the command and if an adapter is in the list
`of processors for the line in the memory command, then the
`command is forWarded to that adapter. The adapter then
`forWards the command to the other adapters of remote nodes
`
`1. Technical Field
`This invention relates to a method of providing cache
`coherence in a shared memory system composed of a
`netWork of multiprocessor nodes.
`2. Description of the Prior Art
`A shared-memory multiprocessor system, comprised of a
`plurality of processing nodes With memory and caches,
`provides system-Wide access to the memory in the system.
`It is imminent that each node of such parallel systems in the
`near future is a small cache-coherent multiprocessor, e.g, a
`symmetric multiprocessor (SMP), that consists of a small
`number (8 to 16) of slots connected by a bus or sWitch. Each
`slot can be occupied by a processor or a memory module.
`Each processor in the node can access any memory location
`in the node.
`Technology considerations limit the siZe of an SMP node
`to a small number of processors. A method for building a
`shared-memory multiprocessor With a larger number of
`processors is to connect a number of SMP nodes With a
`netWork, and provide an adapter to extend the SMP’s
`memory across the SMP nodes (see FIG. 1). Existing adapter
`designs plug into the memory bus of bus-based SMP nodes
`and collectively provide shared memory across the system,
`so that any processor in any node can access any location in
`any memory module in the system. Resources Within a node
`are termed local and resources on other nodes are termed
`remote.
`The adapter maintains a directory of all nodes sharing a
`line and monitors local accesses to the line in order to ensure
`coherence across nodes. On bus-based SMPs, the monitor
`ing is straightforward. All address transactions appear on the
`bus and the adapter can snoop and respond to them. Thus, it
`is possible to design the adapter Without having to make any
`changes to the SMP hardWare, provided that the adapter can
`be connected to the bus as a master/slave device.
`HoWever, as the siZe and speed of an SMP node groWs,
`technology limitations force the transition from bus-based to
`sWitch-based interconnects for both address and data trans
`actions Within the node. The design of an adapter for
`sWitch-based systems is complicated by the fact that a
`sWitch-based system uses a point-to-point interconnection
`that, unlike a bus, does not alloW an adapter to observe all
`address transactions. In a sWitch-based SMP, the memory M
`maintains a directory 26. For each line 25 the directory
`keeps a list 24-x of the processors Within the node that have
`cached copies of the line (see FIG. 2), Where X is an integer
`betWeen 1 and n, Where n is the number of lines 25. See for
`example one of the lines 25 and its list 24-4 at the bottom of
`FIG. 2. It is understood that memory M can be any one of
`the memories M1 through MN.
`Each processor communicates directly With the memory
`via the sWitch. In turn, the memory sends appropriate
`messages only to processors that need to be involved in the
`cache coherence protocol, and the memory has no knoWl
`edge of the adapter.
`There is, therefore, a need for an adapter Which extends
`shared memory access across multiple sWitch-based multi
`processor nodes. Such an adapter must not rely on a broad
`cast of memory commands Within a multiprocessor node,
`
`10
`
`15
`
`25
`
`35
`
`45
`
`55
`
`65
`
`NETAPP, INC. EXHIBIT 1003
`Page 10 of 14
`
`

`
`3
`Which have cache copies of the line corresponding to the
`command. If the adapter is not found in the list, then the
`memory proceeds in accordance With the standard SMP
`protocol.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 schematically illustrates the shared memory mul
`tiprocessor system composed of multiple SMP nodes con
`nected via a netWork.
`FIG. 2 schematically illustrates a memory module in a
`sWitch-based SMP, composed of a set of lines and a direc
`tory that maintains a list of processors sharing the line Within
`the SMP.
`FIGS. 3A—3C are schematically illustrates several
`memory access scenarios in the system Where the adapter
`acts either as a proxy processor or a proxy memory.
`FIG. 4 schematically illustrates the major components of
`the adapter that are pertinent to this invention.
`FIG. 5 schematically illustrates the possible directory
`states for a memory line and the state transitions folloWed by
`the ?nite state machine in the adapter.
`FIG. 6 schematically illustrates the state transitions fol
`loWed by the ?nite state machine in the adapter for imple
`menting an S-COMA system.
`FIG. 7 schematically illustrates a directory organization in
`a memory module in a sWitch-based SMP Where directory
`entries are dynamically allocated for each line that is cached
`by some processor.
`FIG. 8 schematically illustrates a modi?cation to a
`memory module in a sWitch-based SMP that reduces the
`directory space requirements.
`
`10
`
`15
`
`25
`
`DESCRIPTION OF THE PREFERRED
`EMBODIMENT
`
`35
`
`The preferred embodiment of our system that is based on
`a netWork of sWitch-based SMP nodes With an adapter
`attached to each node. FIG. 1 illustrates a high-level diagram
`of such a multiprocessing system. Each node has a plurality
`of processors P1, P2, .
`.
`.
`, PN interconnected to each other
`by a sWitch
`The sWitch also interconnects the memory
`modules M1, M2, .
`.
`.
`, MN and adapters A. The nodes in
`turn, are connected to each other through a netWork as
`shoWn.
`The processors and memories maintain cache coherence
`of data accesses Within the SMP node.
`The adapter connects to the sWitch and plays the role of
`either a memory or a processor. The behavior of the adapter
`is different for different memory lines. When a line is homed
`at the local memory of the node, the adapter behaves as a
`proxy processor for that line. When a line is homed at the
`memory of a remote node, the adapter behaves as a proxy
`memory for that line. These roles are illustrated in FIGS.
`3A—3C and are elaborated further beloW.
`FIG. 4 illustrates the internal structure of the adapter that
`enables it to extend cache-coherent shared memory across
`multiple nodes. It comprises a set of node lists 41 and local
`processor lists 42. List 41 is maintained for each line of local
`memory that is cached at a remote node, and list 42 is
`maintained for each line of remote memory that is cached by
`a local processor. It also maintains a 2-bit line state directory
`43 for lines that are cached by the local processors. The ?nite
`state machine (FSM) 40-1 and 2 runs the cache coherence
`protocol to keep the copies of the lines coherent. When the
`adapter acts as a proxy processor for a line, it uses the node
`
`45
`
`55
`
`65
`
`6,085,295
`
`4
`list associated With the line to determine the remote nodes
`that need to be noti?ed for coherence actions. When the
`adapter acts as a proxy memory for a line, it uses the local
`processor list associated With the line to determine the local
`processors that need to be noti?ed for coherence actions.
`Proxy Processor
`In a node in Which a line is homed in the local memory,
`the adapter plays the role of a proxy processor representing
`the accesses to the line made by the processors in other
`nodes of the system. In this role, the adapter maintains a
`state for the line and the list of all nodes sharing that line.
`The state can be I (indicating that no other node has this
`line), E (indicating that some other node has exclusive copy
`of this line) or S (indicating that this line is shared by this
`and other nodes). As a proxy processor, the adapter receives
`requests from other adapters and performs the reads and
`Writes in this node on their behalf. Whenever a local
`processor requires exclusive access to the line While it is in
`shared state, it communicates With other adapters and invali
`dates the line in all other nodes. When another node requests
`for exclusive copy of the line. The adapter only invalidates
`the copies in all other nodes, but also requests the local
`memory to grant the exclusive access. The memory control
`ler treats the adapter as another processor.
`Proxy Memory
`In a node in Which a line is homed at a remote memory,
`the adapter acts as a proxy memory. It captures all the
`transactions for the corresponding address and runs the
`memory protocol. In this role, the adapter maintains a state
`for the line and the list of local caches sharing the line. The
`state can be I (indicating that no local cache this line), E
`(indicating that some local cache has exclusive copy of this
`line) or S (indicating that this line is shared by this and other
`nodes). As a proxy memory, the adapter responds to all
`requests to the line and obtains the contents of the line from
`the remote node (Where that line is backed by memory) and
`supplies the contents to the local caches. It performs the
`usual coherence control operations in the node and coordi
`nates With other adapters. In order to maintain global
`coherence, it may have to issue some bus transactions as a
`master, as illustrated later.
`FIGS. 3A—3C present examples of memory accesses that
`involve the adapter as a proxy processor or memory. In FIG.
`3(a), a processor issues a memory command to a line that is
`mapped to a remote memory. The adapter in the issuing
`processor’s node acts as a proxy memory and retrieves the
`line from the remote memory. It sends a message to the
`adapter at the remote memory’s node requesting the memory
`command be performed there. The adapter at the remote
`memory’s node acts as a proxy processor and issues the
`original memory command at the remote node.
`In FIG. 3(b), a processor issues a memory command to a
`line that is mapped to local memory. The directory in the
`local memory indicates that the line is cached at a processor,
`Which is in fact the adapter playing the role of a proxy
`processor. The line needs to be invalidated, hence the local
`memory sends a command directly to the adapter. The
`adapter consults its node list for the line to determine Which
`remote nodes to forWard the invalidations to, and sends a
`message to all such nodes. The adapter at the remote node
`acts as a proxy processor and consults its local processor list
`and issues invalidation commands over the sWitch to each of
`the processors in the list.
`In FIG. 3(c), a processor issues a memory command that
`can be satis?ed locally Without involving remote nodes. In
`this case, the directory in the local memory does not point
`to the adapter so that the adapter is not involved in the
`
`NETAPP, INC. EXHIBIT 1003
`Page 11 of 14
`
`

`
`6,085,295
`
`5
`memory command. Contrast this situation With that of a
`bus-based SMP system Where the adapter still has to par
`ticipate in the memory command even if the command can
`be satis?ed locally.
`We noW elaborate on the tWo roles of the adapter for
`CC-NUMA and S-COMA systems.
`CC-NUMA
`In a CC-NUMA system, the adapters behave as shoWn in
`FIG. 5. The ?gure shoWs the adapter in node 1 acting as a
`proxy processor and adapter in node 2 acting as a proxy
`memory for a line that is backed up by a memory in node 1.
`The state diagrams shoW hoW each adapter keeps track of the
`state of the line When handling local requests (shoWn by
`solid arcs) and remote requests (shoWn by dashed arcs)
`Each line has a single node, called its home node, Where
`it is backed up in the memory. All other nodes, called client
`nodes, can only have copies of it in their caches, but not in
`their memories. The home node adapter for a line acts as
`proxy processor for that line and a client node adapter acts
`as a proxy memory for that line. The actions of the adapters
`in the tWo nodes are summariZed beloW:
`In the FSM 40-1 at the top left of FIG. 5 at the home node
`(NODE 1):
`1. The memory controller maintains a directory for the
`line, containing the list of processors Within that node that
`share (or have a cache copy) the line. The adapter is treated
`as just another processor sharing the line. FIG. 5 shoWs
`different samples of the directory 24, indicated by 24-1
`through 24-6, of a four entry list of processors. Sample 24-1
`illustrates the case Where the list of processors is empty. An
`empty list indicates that no processors have cached the line.
`In the Sample 24-2, tWo local processors have copies of the
`line in their caches as indicated by the boxes having x’s
`inside. Sample 24-3 indicates that only a single processor
`has an exclusive copy of line in its cache. See the solid black
`box in the list of processors 24-3. In sample 24-4, the list of
`processors includes a local adapter, Which means that the
`adapter is active is acting as a proxy processor for that line.
`Since the adapter is acting is acting as a proxy processor, at
`least one remote processor must have a copy of the line in
`its cache. In sample 24-5 the list of processors includes both
`an adapter and a local processor. Thus, the list of 24-5
`indicates that a local processor and at least one remote
`processor have copies of the line in its cache. Finally, sample
`24-6 indicates that the local adapter has an exclusive copy of
`the line. Sample 24-6 indicates that a remote processor has
`an exclusive copy of the line in its cache.
`2. The adapter maintains the state of the line (see top left
`side of FIG. 5). The state I indicates that no other node has
`this line, state E indicates that some other node has exclusive
`copy of this line and state S indicates that this line is shared
`by this and other nodes.
`3. The adapter maintains the list of nodes that share the
`line.
`The folloWing description of state transitions use the
`folloWing format
`current state%[memory command/action response]
`anext state and ‘rWitm’ stands for ‘read With intent to
`modify’.
`The “current oWner” refers to the processor With the
`exclusive cache copy of the line.
`Transitions: (See state diagram on top left of FIG. 5)
`
`15
`
`25
`
`35
`
`45
`
`55
`
`1%[remote read/read locally and send data]—>S
`1%[remote Write/rWitm locally and send data]—>E
`SQ[remote read/read locally and send data]—>S
`SQ[remote Write/rWitm locally, invalidate in other nodes
`and send data]—>E
`
`65
`
`6
`SQ[local Write/invalidate in all other nodes]QI
`EQ[remote read/send read to current oWner, send data]QS
`EQ[remote Write/send rWitm to current oWner, send data]
`QE
`EQ[local read/send read to current oWner, supply data to
`local]—>S
`EQ[local Write/send rWitm to current oWner, supply data to
`local]—>I
`
`In the FSM 40-1 at the top right of FIG. 5 at the client node
`(NODE 2):
`1. The adapter maintains the state of the line—state I
`indicates that no local cache this line, state E indicates that
`some local cache has exclusive copy of this line and state S
`indicates that this line is shared by this and other nodes.
`2. The adapter maintains the list of local caches that share
`the line. (Bus based systems can eliminate this.)
`Transitions: (See state diagram on top right of FIG. 5)
`
`1%[local read/get data from home and supply to local]QS
`1%[local Write/ get excl access from home and supply data to
`local]—>E
`SQ[local read/get data from home or use shared
`intervention]—>S
`SQ[local Write/get excl access from home and supply
`data]—>E
`SQ[remote Write/issue rWitm locally and send data]—>I
`EQ[local read/supply data or use Cache to Cache transfer]
`QS
`EQ[local Write/supply data or use Cache to Cache transfer]
`QE
`EQ[remote read/issue local read and send data]—>S
`EQ[remote Write/issue rWitm locally and send data]—>I
`
`S-COMA
`Referring to FIG. 6, in an S-COMA system also, each line
`has a single home node Which is responsible for keeping a
`consistent copy of the line. HoWever, every client node that
`shares a copy of the line backs it up into its local memory.
`Thus, When a line is present in many nodes, the memory in
`each of those nodes maintains a directory entry for that line.
`The memory in each node maintains a local directory for a
`line shared by processors Within that node. The adapter at the
`home node maintains the list of nodes sharing that line.
`The adapter in each of these nodes acts as a proxy
`processor for that line. To invalidate a line from the node, the
`adapter issues an RWITM thereby preventing the local
`memory from giving the line out to any local caches. The
`adapter for the home node starts in the I state indicating that
`the local memory has the valid copy and that no other node
`has a valid copy. A client adapter starts in the E state
`indicating that the local memory (or caches) have no valid
`copy and that only one other node has a valid copy. The
`actions of the adapters in the tWo nodes are summariZed
`beloW:
`In Each Node:
`1. The memory controller maintains a directory for the
`line, containing the list of processors Within that node that
`share the line. The adapter is treated as just another proces
`sor sharing the line.
`2. The adapter maintains the state of the line—The state
`I indicates that no other node has this line, state E indicates
`that some other node has exclusive copy of this line and state
`S indicates that this line is shared by this and other nodes.
`3. FSM 40 transitions are the same as in CC-NUMA home
`node.
`
`NETAPP, INC. EXHIBIT 1003
`Page 12 of 14
`
`

`
`6,085,295
`
`7
`In the FSM 40-1 at the top left of FIG. 6 at the Home Node
`(NODE 1)::
`1. The adapter initializes the state of the line to I.
`2. The adapter maintains the list of nodes that share the
`line.
`In the FSM 40-2 at the top right of FIG. 6 at the Client
`Node (NODE 2):
`1. The adapter initialiZes the state of the line to E.
`Directory SiZe in Memory Controller
`In an SMP node, the directory in the memory controller
`needs to have an entry for a line only When the line is present
`in the cache of at least one processor. Lines that are not
`present in any cache are stored in memory and hence no
`directory entry is needed for those lines. Thus, the siZe of the
`directory at any time does not exceed the total siZe of all
`caches in the node. Usually this maximum (i.e. the total siZe
`of all caches in the node) is much smaller than the total
`memory at that node.
`HoWever, With the adapter design described above for a
`CC-NUMA system, the memory controller sees the adapter
`as another processor and if a line is present in any cache of
`any other node, the memory controller acts as if the line is
`present in the (imaginary) cache of the adapter. Thus, the
`maximum directory siZe needed is the sum of the siZes of all
`caches in all nodes of the system. Thus the demand for
`directory storage increases With the siZe of the system.
`In the S-COMA system described above, the directory
`storage required is even larger. The memory controller may
`have to keep the directory entry for a line, even When the line
`is not present in any cache of any node in the system. This
`is because, the line may be stored in the local memories of
`the nodes (treated as L3 caches). Thus the maximum direc
`tory siZe required is proportional to the total siZe of memory
`in a node. This can be quite large.
`Consider a memory controller (as depicted in FIG. 7) that
`maintains the directory in a limited amount of storage. It can
`Work as folloWs: It allocates storage for an entry on demand.
`When the capacity exceeds, some line is evicted from all the
`caches and stored in the memory. Since that line need not
`have a directory entry, its storage is used for some other
`entry.
`Memory Controllers can manage With a small directory
`since they need to keep an entry only for lines that are out
`in some cache. The addresses for such lines are hashed and
`HIT in the directory. Having no entry in the directory is
`interpreted to mean that memory has the only copy of the
`line. The addresses of these lines are hashed and MISS in the
`directory. Thus, the siZe of the directory is proportional to
`the total amount of cache supported for this memory.
`The above strategy presents a problem When used in
`conjunction With the proposed adapter. To evict a line, the
`memory controller must issue an invalidate to the adapter
`also. But evicting the line from the adapter implies that the
`line must be invalidated in all other nodes in the system.
`Thus each time the memory controller evicts a line (because
`of limited capacity of directory storage), the line is brought
`into that node in an exclusive manner, invalidating all other
`outside copies. This is highly undesirable. We can alleviate
`this problem, With the folloWing modi?cations to the
`memory controller.
`Additional State in the Memory Controller
`When the directory entry for a line is to be evicted, there
`is only a small amount of state information the memory must
`remember to reconstruct the entry. It must knoW (a) Whether
`the contents of the memory location are valid or not and (b)
`Whether the adapter has any exclusive/shared copies of this
`line. If this information (perhaps using 2 bits per line) can be
`
`10
`
`15
`
`25
`
`35
`
`45
`
`55
`
`65
`
`8
`kept by a memory controller for each line, then the ?nite
`directory storage strategy outlined above can be made to
`Work. The strategy is illustrated in FIG. 8. By adding
`additional information structure that comprises 2 bits (VA)
`per memory line, it is possible to use a small directory. The
`folloWing actions are taken When a line is accessed:
`Directory entry for the line is present: Normal protocol
`takes place Without any change.
`Directory entry for the line is not present: Create a neW
`directory entry for this line. Read the VA bits from the bit
`vector and set the state bits according to VA bits (3 possi
`bilities: VA=10 means the adapter does no have it, i.e., no
`remote nodes have cached the line. VA=11 means adapter
`has it in shared mode and VA=01 means adapter has it in
`exclusive mode). NoW do the normal processing using the
`directory entry.
`Directory entry is to be evicted: When a directory entry is
`to be evicted (because of loW capacity), invalidate com
`mands are issued to all the caches Within the node and the
`data is stored in the memory location. The VA bits are set
`accordingly (VA is set to 10 if the adapter currently has no
`copy (i.e., no other node has a copy), VA is set to 11 if the
`adapter currently has a shared copy, and VA is set to 01 if the
`adapter currently has an exclusive copy). Thus, maintenance
`of the VA bits enable the elimination of the directory entries
`28 shoWn in FIG. 8. See entries 28 in the directory of the
`memory controller of FIG. 7.
`Having thus described our invention, What We claim as
`neW and desire to secure by Letters Patent is:
`1. In a computer system having a plurality of intercon
`nected shared memory multiprocessor nodes, each of said
`nodes having at least one processor and a memory, a method
`of providing coherent shared memory access from any of
`said processors to any of said memories of said system, said
`method comprising:
`a. for each line of data in each memory of each node,
`maintaining a list of those of said processors of said
`node that have said line in their c

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket