`Ekanadham et al.
`
`US006085295A
`[11] Patent Number:
`[45] Date of Patent:
`
`6,085,295
`*Jul. 4, 2000
`
`[54] METHOD OF MAINTAINING DATA
`COHERENCY IN A COMPUTER SYSTEM
`HAVING A PLURALITY OF
`INTERCONNECTED NODES
`
`7/1998 Laudon m1. ........................ .. 711/141
`5,787,476
`5,829,035 10/1998 James et al. .......................... .. 711/141
`Primary Examiner—Hiep T Nguyen
`Attorney, Agent, or Firm—Douglas W. Cameron
`
`[57]
`
`ABSTRACT
`
`A method of providing coherent shared memory access
`among a plurality of shared memory multiprocessor nodes.
`For each line of data in each of the nodes, a list of those
`processors of the node that have copies of the line in their
`caches is maintained. If a memory command is issued from
`a processor of one node, and if the command is directed to
`a line of memory of another node, then the memory com
`mand is sent directly to an adapter'of the one node. When the
`This patent issued on a Continued pros_
`adapter receives the command, it forwards the command
`ecution application ?led under 37 CFR
`from the one adapter to another adapter of the other node.
`153w), and is Subject to the twenty year
`patent term provisions of 35 USC When 'the other adapter receives the command, the com
`154(a)(2)_
`mand~ is forwarded to' the local memory of the other node.
`The list of processors is then updated in the local memory of
`the other node to include or exclude the other adapter
`depending on the command. If the memory command is
`issued from one of the processors of one of the nodes, and
`if the command is directed to a line of memory of the one
`node, then the Command is Sent directly to local memory
`When the local memory receives the Command and if the
`adapter of the node is in the list of processors for a line
`associated with the command and if the command is a write
`Command, then the command is forwarded to the adapter of
`the one node. When the adapter receives the command, the
`command is forwarded to remote adapters in each of the
`remote nodes which have processors which have cache
`copies of the line. Finally, when the latter remote adapters
`receive the command, the command is forwarded to the
`processors having the cache copies of the line.
`
`[75] Inventors: Kattamuri Ekanadham, Mohegan
`Lake; Beng-Hong Lim, Wappingers
`Falls; Pratap Chandra Pattnaik,
`Ossinging; Marc Snir, Briarcliff Manor,
`all of NY.
`
`[73] Assignee: International Business Machines
`Corporation, Armonk, NY.
`
`[ * ]
`
`Notice:
`
`[21] Appl' NO" 08/954’496
`[22]
`Filed;
`Oct, 20, 1997
`
`7
`[51] Int. Cl. .................................................... .. G06F 12/16
`[52] US. Cl. ........................ .. 711/145; 711/121; 711/143;
`711/144; 711/153; 711/156; 711/173; 709/214;
`709/215; 709/218; 709/219; 710/129; 710/131
`[58] Field Of Search ................................... .. 711/121, 141,
`711/144, 145, 148, 143, 153, 156, 173;
`709/214, 218, 219, 215; 710/129, 131
`
`[56]
`
`References Cited
`Us‘ PATENT DOCUMENTS
`
`5,535,365
`5,749,095
`
`7/1996 Barriuso et a1. ...................... .. 711/155
`5/1998 Hagersten ............................. .. 711/141
`
`7 Claims, 8 Drawing Sheets
`
`NODE 1
`
`NODE 2
`
`PROXY
`PROCESSOR
`
`PROXY
`MEMORY
`
`NETAPP, INC. EXHIBIT 1003
`Page 1 of 14
`
`
`
`U.S. Patent
`
`Jul. 4,2000
`
`Sheet 1 0f8
`
`6,085,295
`
`FIG.1
`PRIOR ART
`
`NETAPP, INC. EXHIBIT 1003
`Page 2 of 14
`
`
`
`U.S. Patent
`
`Jul. 4, 2000
`
`Sheet 2 0f 8
`
`6,085,295
`
`MEMORY M )
`
`//—24—3
`
`26
`
`(LIST OF LOCAL P)
`
`F|G.2
`PRIOR ART
`
`/
`
`NETAPP, INC. EXHIBIT 1003
`Page 3 of 14
`
`
`
`U.S. Patent
`
`Jul. 4, 2000
`
`Sheet 3 0f 8
`
`6,085,295
`
`NODE 2
`
`F|G.3G
`
`NODE 1
`
`Pn
`
`I J SWITCH /
`
`M
`
`A
`
`NODE 1
`
`P1
`
`M
`
`SWITCH
`
`Pn
`
`A
`
`PROXY
`PROCESSOR
`
`MEMORY
`
`NODE 2
`
`@ @
`mm:
`M 0
`
`FlG.3c W
`
`NETAPP, INC. EXHIBIT 1003
`Page 4 of 14
`
`
`
`U.S. Patent
`
`Jul. 4, 2000
`
`Sheet 4 0f 8
`
`6,085,295
`
`TO LOCAL SWITCH
`
`/
`
`\
`
`LINE
`41
`LOCAL PROC. 42 NODE
`LISTS / LISTS / STATES 43
`24 \
`o
`
`|
`
`|
`
`l
`
`l
`
`I
`
`|
`
`|
`
`l
`
`A
`
`|
`
`40-1
`
`40-2
`
`SM
`
`SM
`
`\
`
`ADAPTER )
`
`TO NETWORK
`FIG.4
`
`NETAPP, INC. EXHIBIT 1003
`Page 5 of 14
`
`
`
`U.S. Patent
`
`Jul. 4, 2000
`
`Sheet 5 0f 8
`
`6,085,295
`
`40-1
`-__ '/
`NO OTHER NODE
`’
`‘T
`HAS THIS LINE
`
`R
`r
`‘
`
`R
`
`R
`
`[40-2
`NO LOCAL CACHE
`HAS THIs LINE
`(THIS NODE HAS
`NO COPY)
`
`e SOME OTHER NODE
`\
`,
`HAS THIS LINE
`w
`EXCLUSIVE
`(THIS NODE HAs
`NO COPY)
`
`6 SOME LOCAL CACHE
`‘ HAS THIS LINE
`w
`EXCLUSIVE
`
`ADAPTER ACTING AS
`A PROXY PROCESSOR
`
`ADAPTER ACTING AS
`A PROXY MEMORY
`
`NODE 1(HOME)
`
`NODE 2(CLIENT)
`
`DIRECTORY IN MEMORY MAINTAINS
`STATE BITS FOR EACH PROCESSOR AND
`FOR THE ADAPTER. POSSIBILITIES ARE
`SHOWN BELOW:
`
`PPPA
`24-1\[||]I]|] ONLY MEMORY HAS THIS LINE
`24—2\g[|g|] LOCAL CACHES HAvE SHARED COPY
`2dr"3\|I|I|]|] A LOCAL CACHE HAS EXCLUSIVE COPY
`§4_4\E|UU@ SOME REMOTE CACHES HAvE SHARED COPY
`4"5\IIIEIIIE LOCAL AND REMOTE CACHES HAVE SHARED COPY
`24-6\[|I]|]| SOME REMOTE CACHES HAS EXCLUSIVE COPY
`
`IEEI
`
`NO POINTER
`POINTER IN SHARED STATE
`POINTER IN EXCLUSIVE STATE
`
`FIG.5
`
`/
`
`NETAPP, INC. EXHIBIT 1003
`Page 6 of 14
`
`
`
`U.S. Patent
`
`Jul. 4, 2000
`
`Sheet 6 0f 8
`
`6,085,295
`
`STARTING STATE vs I
`
`R
`—
`
`40-1
`134/
`
`NO OTHER NODE
`HAS THIS LINE
`
`STARTING STATE IS E
`40-2
`'3- f
`NO OTHER NODE
`HAS THIS LINE
`
`R
`
`‘
`
`Q SOME OTHER NODE
`HAS THIS LINE
`EXCLUSIVE
`
`W K"
`
`‘ Q SOME OTHER NODE
`\
`,
`HAS THIS LINE
`W ‘~’
`EXCLUSIVE
`
`\
`
`\
`
`J05‘
`
`ADAPTER ACTING AS
`A PROXY PROCESSOR
`
`ADAPTER ACTING AS
`A PROXY PROCESSOR
`
`>
`
`NODE 1(HOME)
`
`DIRECTORY IN MEMORY
`MAINTAINS STATE BITS
`FOR EACH PROCESSOR
`AND FOR THE ADAPTER.
`
`NODE 2(CLIENT)
`
`DIRECTORY IN MEMORY
`MAINTAINS STATE BITS
`FOR EACH PROCESSOR
`AND FOR THE ADAPTER.
`
`FIG.6
`
`/
`
`NETAPP, INC. EXHIBIT 1003
`Page 7 of 14
`
`
`
`U.S. Patent
`
`Jul. 4, 2000
`
`Sheet 7 0f 8
`
`6,085,295
`
`T.&(
`\mmmm
`5%
`
`E2. 55 min
`
`mm: ME 18 @ . :1
`
`
`588%
`l/ @ égm yww .V|.VN
`
`$3258
`E05:
`
`a: z_
`
`mmwmonz m2]
`
`mmi
`
`omIm<I
`
`ME. 20
`
`mmmmanz m2]
`
`
`
`@23 ME.
`
`50$
`
`EOE: / z
`
`NETAPP, INC. EXHIBIT 1003
`Page 8 of 14
`
`
`
`U.S. Patent
`
`Jul. 4, 2000
`
`Sheet 8 0f 8
`
`6,085,295
`
`m2: ME 18 . :1
`
`
`
`Ia (\5 206%
`
`
`55% mmmmmmfwzz
`T?) mm;
`
`
`
`
`
`Elm mmmmonz m2: E
`E7: E5 81w: :65: f: < >
`
`
`
`$3258
`
`588% m2: ME
`
`
`m5 2: 52mm
`to E8 :20
`
`
`
`:65: <> ME 7: X
`
`OF
`I
`
`00
`
`NETAPP, INC. EXHIBIT 1003
`Page 9 of 14
`
`
`
`1
`METHOD OF MAINTAINING DATA
`COHERENCY IN A COMPUTER SYSTEM
`HAVING A PLURALITY OF
`INTERCONNECTED NODES
`
`DESCRIPTION
`
`6,085,295
`
`2
`and must not require changes to the existing memory
`controller of a multiprocessor node.
`
`SUMMARY OF THE INVENTION
`
`It is therefore an objective of this invention to provide a
`solution to the problem of providing shared-memory access
`across multiple sWitch-based SMP nodes.
`The invention comprises an adapter to extend cache
`coherent memory access across SMP nodes, and a method
`for using the memory system of a sWitch-based SMP node
`to interface to the adapter.
`The key concepts in the adapter design are:
`All communications betWeen the processors, the memo
`ries and the adapters are made point to point Without the
`need for broadcasts Within the SMP node.
`In a node Where a line is mapped onto the node’s memory,
`the adapter acts as a proxy processor representing all the
`processors outside the node that share the line. More
`speci?cally, When a remote processor issues a memory
`command to a local memory, the remote adapter at the
`remote processor’s node, forWards the memory command to
`the local adapter, Which is responsible for insuring that the
`command is executed at the local memory.
`In a node Where a remote line is brought into the cache of
`a processor, but not into the node’s memory, the adapter acts
`as a proxy memory representing the remote memory that the
`line is mapped onto. More speci?cally, When a memory
`command is issued from a local processor to a remote
`memory, the memory command is directed to the adapter
`Which is responsible for insuring that the command is
`executed at that remote memory.
`The adapter is versatile enough to be used for either
`CC-NUMA (Cache Coherent Non Uniform Memory
`Access) and S-COMA (Simple Cache Only Memory
`Architecture) systems.
`By appearing as either a local processor or a local
`memory, the adapter uses the local SMP coherence protocol
`Within a node to accomplish the above tasks, Without any
`changes to the memory controllers.
`In situations Where the memory controller is limited in the
`amount of storage it can use for the directory and must
`employ a dynamic allocation scheme for the directory
`entries, an extension of this invention involves a modi?ca
`tion to the memory controller that overcomes the the storage
`limitation of the memory controller.
`Accordingly, this invention provides coherent shared
`memory access across a number of interconnected multi
`processor nodes. With this invention each line of data in
`each memory of the node maintains a list of processors of
`the node that have copies of the line in their caches. When
`a memory command is issued from one of the processors of
`a node to a memory of another node, the command is
`directed to an adapter of the issuing node. The adapter of the
`issuing node then receives the command and forWards the
`command to the adapter at the remote node. When the
`adapter at the remote node receives the command, it then
`forWards the command to its local memory, Which then
`updates its list of processors to include or exclude the
`adapter. HoWever, if the memory command is issued from a
`processor to a local memory then the command is simply
`forWarded directly to that local memory. When that local
`memory receives the command and if an adapter is in the list
`of processors for the line in the memory command, then the
`command is forWarded to that adapter. The adapter then
`forWards the command to the other adapters of remote nodes
`
`1. Technical Field
`This invention relates to a method of providing cache
`coherence in a shared memory system composed of a
`netWork of multiprocessor nodes.
`2. Description of the Prior Art
`A shared-memory multiprocessor system, comprised of a
`plurality of processing nodes With memory and caches,
`provides system-Wide access to the memory in the system.
`It is imminent that each node of such parallel systems in the
`near future is a small cache-coherent multiprocessor, e.g, a
`symmetric multiprocessor (SMP), that consists of a small
`number (8 to 16) of slots connected by a bus or sWitch. Each
`slot can be occupied by a processor or a memory module.
`Each processor in the node can access any memory location
`in the node.
`Technology considerations limit the siZe of an SMP node
`to a small number of processors. A method for building a
`shared-memory multiprocessor With a larger number of
`processors is to connect a number of SMP nodes With a
`netWork, and provide an adapter to extend the SMP’s
`memory across the SMP nodes (see FIG. 1). Existing adapter
`designs plug into the memory bus of bus-based SMP nodes
`and collectively provide shared memory across the system,
`so that any processor in any node can access any location in
`any memory module in the system. Resources Within a node
`are termed local and resources on other nodes are termed
`remote.
`The adapter maintains a directory of all nodes sharing a
`line and monitors local accesses to the line in order to ensure
`coherence across nodes. On bus-based SMPs, the monitor
`ing is straightforward. All address transactions appear on the
`bus and the adapter can snoop and respond to them. Thus, it
`is possible to design the adapter Without having to make any
`changes to the SMP hardWare, provided that the adapter can
`be connected to the bus as a master/slave device.
`HoWever, as the siZe and speed of an SMP node groWs,
`technology limitations force the transition from bus-based to
`sWitch-based interconnects for both address and data trans
`actions Within the node. The design of an adapter for
`sWitch-based systems is complicated by the fact that a
`sWitch-based system uses a point-to-point interconnection
`that, unlike a bus, does not alloW an adapter to observe all
`address transactions. In a sWitch-based SMP, the memory M
`maintains a directory 26. For each line 25 the directory
`keeps a list 24-x of the processors Within the node that have
`cached copies of the line (see FIG. 2), Where X is an integer
`betWeen 1 and n, Where n is the number of lines 25. See for
`example one of the lines 25 and its list 24-4 at the bottom of
`FIG. 2. It is understood that memory M can be any one of
`the memories M1 through MN.
`Each processor communicates directly With the memory
`via the sWitch. In turn, the memory sends appropriate
`messages only to processors that need to be involved in the
`cache coherence protocol, and the memory has no knoWl
`edge of the adapter.
`There is, therefore, a need for an adapter Which extends
`shared memory access across multiple sWitch-based multi
`processor nodes. Such an adapter must not rely on a broad
`cast of memory commands Within a multiprocessor node,
`
`10
`
`15
`
`25
`
`35
`
`45
`
`55
`
`65
`
`NETAPP, INC. EXHIBIT 1003
`Page 10 of 14
`
`
`
`3
`Which have cache copies of the line corresponding to the
`command. If the adapter is not found in the list, then the
`memory proceeds in accordance With the standard SMP
`protocol.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 schematically illustrates the shared memory mul
`tiprocessor system composed of multiple SMP nodes con
`nected via a netWork.
`FIG. 2 schematically illustrates a memory module in a
`sWitch-based SMP, composed of a set of lines and a direc
`tory that maintains a list of processors sharing the line Within
`the SMP.
`FIGS. 3A—3C are schematically illustrates several
`memory access scenarios in the system Where the adapter
`acts either as a proxy processor or a proxy memory.
`FIG. 4 schematically illustrates the major components of
`the adapter that are pertinent to this invention.
`FIG. 5 schematically illustrates the possible directory
`states for a memory line and the state transitions folloWed by
`the ?nite state machine in the adapter.
`FIG. 6 schematically illustrates the state transitions fol
`loWed by the ?nite state machine in the adapter for imple
`menting an S-COMA system.
`FIG. 7 schematically illustrates a directory organization in
`a memory module in a sWitch-based SMP Where directory
`entries are dynamically allocated for each line that is cached
`by some processor.
`FIG. 8 schematically illustrates a modi?cation to a
`memory module in a sWitch-based SMP that reduces the
`directory space requirements.
`
`10
`
`15
`
`25
`
`DESCRIPTION OF THE PREFERRED
`EMBODIMENT
`
`35
`
`The preferred embodiment of our system that is based on
`a netWork of sWitch-based SMP nodes With an adapter
`attached to each node. FIG. 1 illustrates a high-level diagram
`of such a multiprocessing system. Each node has a plurality
`of processors P1, P2, .
`.
`.
`, PN interconnected to each other
`by a sWitch
`The sWitch also interconnects the memory
`modules M1, M2, .
`.
`.
`, MN and adapters A. The nodes in
`turn, are connected to each other through a netWork as
`shoWn.
`The processors and memories maintain cache coherence
`of data accesses Within the SMP node.
`The adapter connects to the sWitch and plays the role of
`either a memory or a processor. The behavior of the adapter
`is different for different memory lines. When a line is homed
`at the local memory of the node, the adapter behaves as a
`proxy processor for that line. When a line is homed at the
`memory of a remote node, the adapter behaves as a proxy
`memory for that line. These roles are illustrated in FIGS.
`3A—3C and are elaborated further beloW.
`FIG. 4 illustrates the internal structure of the adapter that
`enables it to extend cache-coherent shared memory across
`multiple nodes. It comprises a set of node lists 41 and local
`processor lists 42. List 41 is maintained for each line of local
`memory that is cached at a remote node, and list 42 is
`maintained for each line of remote memory that is cached by
`a local processor. It also maintains a 2-bit line state directory
`43 for lines that are cached by the local processors. The ?nite
`state machine (FSM) 40-1 and 2 runs the cache coherence
`protocol to keep the copies of the lines coherent. When the
`adapter acts as a proxy processor for a line, it uses the node
`
`45
`
`55
`
`65
`
`6,085,295
`
`4
`list associated With the line to determine the remote nodes
`that need to be noti?ed for coherence actions. When the
`adapter acts as a proxy memory for a line, it uses the local
`processor list associated With the line to determine the local
`processors that need to be noti?ed for coherence actions.
`Proxy Processor
`In a node in Which a line is homed in the local memory,
`the adapter plays the role of a proxy processor representing
`the accesses to the line made by the processors in other
`nodes of the system. In this role, the adapter maintains a
`state for the line and the list of all nodes sharing that line.
`The state can be I (indicating that no other node has this
`line), E (indicating that some other node has exclusive copy
`of this line) or S (indicating that this line is shared by this
`and other nodes). As a proxy processor, the adapter receives
`requests from other adapters and performs the reads and
`Writes in this node on their behalf. Whenever a local
`processor requires exclusive access to the line While it is in
`shared state, it communicates With other adapters and invali
`dates the line in all other nodes. When another node requests
`for exclusive copy of the line. The adapter only invalidates
`the copies in all other nodes, but also requests the local
`memory to grant the exclusive access. The memory control
`ler treats the adapter as another processor.
`Proxy Memory
`In a node in Which a line is homed at a remote memory,
`the adapter acts as a proxy memory. It captures all the
`transactions for the corresponding address and runs the
`memory protocol. In this role, the adapter maintains a state
`for the line and the list of local caches sharing the line. The
`state can be I (indicating that no local cache this line), E
`(indicating that some local cache has exclusive copy of this
`line) or S (indicating that this line is shared by this and other
`nodes). As a proxy memory, the adapter responds to all
`requests to the line and obtains the contents of the line from
`the remote node (Where that line is backed by memory) and
`supplies the contents to the local caches. It performs the
`usual coherence control operations in the node and coordi
`nates With other adapters. In order to maintain global
`coherence, it may have to issue some bus transactions as a
`master, as illustrated later.
`FIGS. 3A—3C present examples of memory accesses that
`involve the adapter as a proxy processor or memory. In FIG.
`3(a), a processor issues a memory command to a line that is
`mapped to a remote memory. The adapter in the issuing
`processor’s node acts as a proxy memory and retrieves the
`line from the remote memory. It sends a message to the
`adapter at the remote memory’s node requesting the memory
`command be performed there. The adapter at the remote
`memory’s node acts as a proxy processor and issues the
`original memory command at the remote node.
`In FIG. 3(b), a processor issues a memory command to a
`line that is mapped to local memory. The directory in the
`local memory indicates that the line is cached at a processor,
`Which is in fact the adapter playing the role of a proxy
`processor. The line needs to be invalidated, hence the local
`memory sends a command directly to the adapter. The
`adapter consults its node list for the line to determine Which
`remote nodes to forWard the invalidations to, and sends a
`message to all such nodes. The adapter at the remote node
`acts as a proxy processor and consults its local processor list
`and issues invalidation commands over the sWitch to each of
`the processors in the list.
`In FIG. 3(c), a processor issues a memory command that
`can be satis?ed locally Without involving remote nodes. In
`this case, the directory in the local memory does not point
`to the adapter so that the adapter is not involved in the
`
`NETAPP, INC. EXHIBIT 1003
`Page 11 of 14
`
`
`
`6,085,295
`
`5
`memory command. Contrast this situation With that of a
`bus-based SMP system Where the adapter still has to par
`ticipate in the memory command even if the command can
`be satis?ed locally.
`We noW elaborate on the tWo roles of the adapter for
`CC-NUMA and S-COMA systems.
`CC-NUMA
`In a CC-NUMA system, the adapters behave as shoWn in
`FIG. 5. The ?gure shoWs the adapter in node 1 acting as a
`proxy processor and adapter in node 2 acting as a proxy
`memory for a line that is backed up by a memory in node 1.
`The state diagrams shoW hoW each adapter keeps track of the
`state of the line When handling local requests (shoWn by
`solid arcs) and remote requests (shoWn by dashed arcs)
`Each line has a single node, called its home node, Where
`it is backed up in the memory. All other nodes, called client
`nodes, can only have copies of it in their caches, but not in
`their memories. The home node adapter for a line acts as
`proxy processor for that line and a client node adapter acts
`as a proxy memory for that line. The actions of the adapters
`in the tWo nodes are summariZed beloW:
`In the FSM 40-1 at the top left of FIG. 5 at the home node
`(NODE 1):
`1. The memory controller maintains a directory for the
`line, containing the list of processors Within that node that
`share (or have a cache copy) the line. The adapter is treated
`as just another processor sharing the line. FIG. 5 shoWs
`different samples of the directory 24, indicated by 24-1
`through 24-6, of a four entry list of processors. Sample 24-1
`illustrates the case Where the list of processors is empty. An
`empty list indicates that no processors have cached the line.
`In the Sample 24-2, tWo local processors have copies of the
`line in their caches as indicated by the boxes having x’s
`inside. Sample 24-3 indicates that only a single processor
`has an exclusive copy of line in its cache. See the solid black
`box in the list of processors 24-3. In sample 24-4, the list of
`processors includes a local adapter, Which means that the
`adapter is active is acting as a proxy processor for that line.
`Since the adapter is acting is acting as a proxy processor, at
`least one remote processor must have a copy of the line in
`its cache. In sample 24-5 the list of processors includes both
`an adapter and a local processor. Thus, the list of 24-5
`indicates that a local processor and at least one remote
`processor have copies of the line in its cache. Finally, sample
`24-6 indicates that the local adapter has an exclusive copy of
`the line. Sample 24-6 indicates that a remote processor has
`an exclusive copy of the line in its cache.
`2. The adapter maintains the state of the line (see top left
`side of FIG. 5). The state I indicates that no other node has
`this line, state E indicates that some other node has exclusive
`copy of this line and state S indicates that this line is shared
`by this and other nodes.
`3. The adapter maintains the list of nodes that share the
`line.
`The folloWing description of state transitions use the
`folloWing format
`current state%[memory command/action response]
`anext state and ‘rWitm’ stands for ‘read With intent to
`modify’.
`The “current oWner” refers to the processor With the
`exclusive cache copy of the line.
`Transitions: (See state diagram on top left of FIG. 5)
`
`15
`
`25
`
`35
`
`45
`
`55
`
`1%[remote read/read locally and send data]—>S
`1%[remote Write/rWitm locally and send data]—>E
`SQ[remote read/read locally and send data]—>S
`SQ[remote Write/rWitm locally, invalidate in other nodes
`and send data]—>E
`
`65
`
`6
`SQ[local Write/invalidate in all other nodes]QI
`EQ[remote read/send read to current oWner, send data]QS
`EQ[remote Write/send rWitm to current oWner, send data]
`QE
`EQ[local read/send read to current oWner, supply data to
`local]—>S
`EQ[local Write/send rWitm to current oWner, supply data to
`local]—>I
`
`In the FSM 40-1 at the top right of FIG. 5 at the client node
`(NODE 2):
`1. The adapter maintains the state of the line—state I
`indicates that no local cache this line, state E indicates that
`some local cache has exclusive copy of this line and state S
`indicates that this line is shared by this and other nodes.
`2. The adapter maintains the list of local caches that share
`the line. (Bus based systems can eliminate this.)
`Transitions: (See state diagram on top right of FIG. 5)
`
`1%[local read/get data from home and supply to local]QS
`1%[local Write/ get excl access from home and supply data to
`local]—>E
`SQ[local read/get data from home or use shared
`intervention]—>S
`SQ[local Write/get excl access from home and supply
`data]—>E
`SQ[remote Write/issue rWitm locally and send data]—>I
`EQ[local read/supply data or use Cache to Cache transfer]
`QS
`EQ[local Write/supply data or use Cache to Cache transfer]
`QE
`EQ[remote read/issue local read and send data]—>S
`EQ[remote Write/issue rWitm locally and send data]—>I
`
`S-COMA
`Referring to FIG. 6, in an S-COMA system also, each line
`has a single home node Which is responsible for keeping a
`consistent copy of the line. HoWever, every client node that
`shares a copy of the line backs it up into its local memory.
`Thus, When a line is present in many nodes, the memory in
`each of those nodes maintains a directory entry for that line.
`The memory in each node maintains a local directory for a
`line shared by processors Within that node. The adapter at the
`home node maintains the list of nodes sharing that line.
`The adapter in each of these nodes acts as a proxy
`processor for that line. To invalidate a line from the node, the
`adapter issues an RWITM thereby preventing the local
`memory from giving the line out to any local caches. The
`adapter for the home node starts in the I state indicating that
`the local memory has the valid copy and that no other node
`has a valid copy. A client adapter starts in the E state
`indicating that the local memory (or caches) have no valid
`copy and that only one other node has a valid copy. The
`actions of the adapters in the tWo nodes are summariZed
`beloW:
`In Each Node:
`1. The memory controller maintains a directory for the
`line, containing the list of processors Within that node that
`share the line. The adapter is treated as just another proces
`sor sharing the line.
`2. The adapter maintains the state of the line—The state
`I indicates that no other node has this line, state E indicates
`that some other node has exclusive copy of this line and state
`S indicates that this line is shared by this and other nodes.
`3. FSM 40 transitions are the same as in CC-NUMA home
`node.
`
`NETAPP, INC. EXHIBIT 1003
`Page 12 of 14
`
`
`
`6,085,295
`
`7
`In the FSM 40-1 at the top left of FIG. 6 at the Home Node
`(NODE 1)::
`1. The adapter initializes the state of the line to I.
`2. The adapter maintains the list of nodes that share the
`line.
`In the FSM 40-2 at the top right of FIG. 6 at the Client
`Node (NODE 2):
`1. The adapter initialiZes the state of the line to E.
`Directory SiZe in Memory Controller
`In an SMP node, the directory in the memory controller
`needs to have an entry for a line only When the line is present
`in the cache of at least one processor. Lines that are not
`present in any cache are stored in memory and hence no
`directory entry is needed for those lines. Thus, the siZe of the
`directory at any time does not exceed the total siZe of all
`caches in the node. Usually this maximum (i.e. the total siZe
`of all caches in the node) is much smaller than the total
`memory at that node.
`HoWever, With the adapter design described above for a
`CC-NUMA system, the memory controller sees the adapter
`as another processor and if a line is present in any cache of
`any other node, the memory controller acts as if the line is
`present in the (imaginary) cache of the adapter. Thus, the
`maximum directory siZe needed is the sum of the siZes of all
`caches in all nodes of the system. Thus the demand for
`directory storage increases With the siZe of the system.
`In the S-COMA system described above, the directory
`storage required is even larger. The memory controller may
`have to keep the directory entry for a line, even When the line
`is not present in any cache of any node in the system. This
`is because, the line may be stored in the local memories of
`the nodes (treated as L3 caches). Thus the maximum direc
`tory siZe required is proportional to the total siZe of memory
`in a node. This can be quite large.
`Consider a memory controller (as depicted in FIG. 7) that
`maintains the directory in a limited amount of storage. It can
`Work as folloWs: It allocates storage for an entry on demand.
`When the capacity exceeds, some line is evicted from all the
`caches and stored in the memory. Since that line need not
`have a directory entry, its storage is used for some other
`entry.
`Memory Controllers can manage With a small directory
`since they need to keep an entry only for lines that are out
`in some cache. The addresses for such lines are hashed and
`HIT in the directory. Having no entry in the directory is
`interpreted to mean that memory has the only copy of the
`line. The addresses of these lines are hashed and MISS in the
`directory. Thus, the siZe of the directory is proportional to
`the total amount of cache supported for this memory.
`The above strategy presents a problem When used in
`conjunction With the proposed adapter. To evict a line, the
`memory controller must issue an invalidate to the adapter
`also. But evicting the line from the adapter implies that the
`line must be invalidated in all other nodes in the system.
`Thus each time the memory controller evicts a line (because
`of limited capacity of directory storage), the line is brought
`into that node in an exclusive manner, invalidating all other
`outside copies. This is highly undesirable. We can alleviate
`this problem, With the folloWing modi?cations to the
`memory controller.
`Additional State in the Memory Controller
`When the directory entry for a line is to be evicted, there
`is only a small amount of state information the memory must
`remember to reconstruct the entry. It must knoW (a) Whether
`the contents of the memory location are valid or not and (b)
`Whether the adapter has any exclusive/shared copies of this
`line. If this information (perhaps using 2 bits per line) can be
`
`10
`
`15
`
`25
`
`35
`
`45
`
`55
`
`65
`
`8
`kept by a memory controller for each line, then the ?nite
`directory storage strategy outlined above can be made to
`Work. The strategy is illustrated in FIG. 8. By adding
`additional information structure that comprises 2 bits (VA)
`per memory line, it is possible to use a small directory. The
`folloWing actions are taken When a line is accessed:
`Directory entry for the line is present: Normal protocol
`takes place Without any change.
`Directory entry for the line is not present: Create a neW
`directory entry for this line. Read the VA bits from the bit
`vector and set the state bits according to VA bits (3 possi
`bilities: VA=10 means the adapter does no have it, i.e., no
`remote nodes have cached the line. VA=11 means adapter
`has it in shared mode and VA=01 means adapter has it in
`exclusive mode). NoW do the normal processing using the
`directory entry.
`Directory entry is to be evicted: When a directory entry is
`to be evicted (because of loW capacity), invalidate com
`mands are issued to all the caches Within the node and the
`data is stored in the memory location. The VA bits are set
`accordingly (VA is set to 10 if the adapter currently has no
`copy (i.e., no other node has a copy), VA is set to 11 if the
`adapter currently has a shared copy, and VA is set to 01 if the
`adapter currently has an exclusive copy). Thus, maintenance
`of the VA bits enable the elimination of the directory entries
`28 shoWn in FIG. 8. See entries 28 in the directory of the
`memory controller of FIG. 7.
`Having thus described our invention, What We claim as
`neW and desire to secure by Letters Patent is:
`1. In a computer system having a plurality of intercon
`nected shared memory multiprocessor nodes, each of said
`nodes having at least one processor and a memory, a method
`of providing coherent shared memory access from any of
`said processors to any of said memories of said system, said
`method comprising:
`a. for each line of data in each memory of each node,
`maintaining a list of those of said processors of said
`node that have said line in their c



