`Attanasio et al.
`
`[54] METHOD AND APPARATUS FOR MAKING
`A CLUSTER OF COMPUTERS APPEAR AS A
`SINGLE HOST ON A NETWORK
`
`[75] Inventors: Clement R. Attanasio, Peekskill;
`Stephen E. Smith, Mahopac, both of
`N.Y.
`
`[73] Assignee:
`
`International Business Machines
`Corporation, Armonk, N.Y.
`
`[21] Appl. No.: 960,742
`[22] Filed:
`
`Oct. 14, 1992
`
`[51] Int. C1.5
` GO6F 13/00
`[52] U.S. Cl.
` 395/200; 370/85.13
`[58] Field of Search
` 395/200, 500; 370/60,
`370/92, 93, 94.1, 54, 85.13, 85.1, 85.6, 85.8,
`60.1, 110.1, 85.11, 95.1; 364/284.4, 242.94,
`284.3, 284
`
`[56]
`
`References Cited
`U.S. PATENT DOCUMENTS
`4,276,643 6/1981 Laprie et al.
`4,667,287 5/1987 Allen et al.
`4,719,621 1/1988 May
`4,958,273 9/1990 Anderson et al.
`5,023,780 6/1991 Brearley
`5,079,765 1/1992 Nakamura
`5,088,032 2/1992 Bosack
`5,093,920 3/1992 Agrawal et al.
`5,109,515 4/1992 Laggis et al.
`5,125,081 6/1992 Chiba
`5,166,931 11/1992 Riddle
`5,185,860 2/1993 Wu
`5,224,205 6/1993 Dinkin et al.
`
` 371/8
` 364/200
`370/85
` 364/200
` 364/200
` 370/85.13
` 395/200
` 395/800
` 395/725
` 395/325
` 370/94.1
` 395/200
` 395/200
`
`FOREIGN PATENT DOCUMENTS
`1002342 1/1991 Belgium .
`59-117842 7/1984 Japan .
`63-193739 8/1988 Japan .
`
`OTHER PUBLICATIONS
`D. E. Corner, "Intemetworking with TCP/IP, Princi-
`ples, Protocols and Architecture", Prentice Hall, US,
`Chapter 7, pp. 91-97; Chapter 11, pp. 159-169; and
`Chapter 12, pp. 171-203.
`"Network Services", Network Programming, Sun Mi-
`
`111111111111111111111111111111111511!!1811,121.)1111111111111111111111111111
`
`5,371,852
`Dec. 6, 1994
`
`U
`[11] Patent Number:
`[45] Date of Patent:
`
`crosystems, Inc., 2550 Garcia Ave., Mountain View,
`Calif., Chapter 1, pp. 3-30, May 1988.
`OSF DCE Release 1.0 S3, DCE Release Notes, Open
`Software Foundation, 11 Cambridge Center, Cam-
`bridge, Mass., Mar. 25, 1991, pp. i-v, 1-1 thru B-2.
`J. K. Ousterhout et al., "The Sprite Network Operating
`System", IEEE Computer, Feb. 1988, US, pp. 23-36.
`D. R. Cheriton, "The V Distributed System", Commu-
`nication of the ACM, Mar. 1988, vol. 31, No. 3, pp.
`314-333.
`A. Bhide et al., "A Highly Available Network File
`Server", USENIX Conference, Winter 1991, Dalas,
`Tex., p. 199.
`S. J. Mullender et al., "Amoeba A distributed Operating
`System for the 1990s", IEEE Computer, May 1990, US,
`pp. 44-53.
`L. Peterson et al., "The X-kernel: A Platform for Ac-
`cessing Internet Resources", IEEE Computer, May
`1990, US, pp. 23-33.
`A. Litman, "The DUNIX Distributed Operating Sys-
`tem", Operating Systems Review, ACM Press, NY,
`vol. 22, No. 1, Jan. 1988, pp. 42-51.
`Primary Examiner—Dale M. Shaw
`Assistant Examiner—Moustafa M. Meky
`Attorney, Agent, or Finn—Louis J. Percello
`[57]
`ABSTRACT
`The present invention provides a method and apparatus
`for enabling a cluster of computers to appear as a single
`computer to host computers outside the cluster. A host
`computer communicates only with a gateway to access
`destination nodes and processes within the cluster. The
`gateway has at least one message switch which pro-
`cesses incoming and outgoing port type messages cross-
`ing the cluster boundary. This processing comprises
`examining certain information on the message headers
`and then changing some of this header information
`either to route an incoming message to the proper com-
`puter node, port and process or to make an outgoing
`message appear as if originated at the gateway node.
`The message switch uses a table to match incoming
`messages to a particular routing function which can be
`run to perform the changes necessary to correctly route
`different kinds of messages.
`
`35 Claims, 13 Drawing Sheets
`
`200
`
`106
`
`NODE
`
`220
`
`110
`INTERCONNECT
`
`107
`
`NODE
`
`210
`
`N00E
`
`NENN3R0 120
`
`ENC40S
`
`.1E0 cLusla
`
`105
`
`N00E
`
`125
`
`GATENAY 109
`
`PO0TS
`230
`
`230
`230
`
`210
`
`MESSAGE
`SVATI911
`
`240
`
`ROUTING
`0060 ON
`250
`
`REMOTE
`HOSTS
`
`130
`
`10
`
`27
`
`130
`
`130
`
`HEWLETT PACKARD ENTERPRISE CO.
`EXHIBIT 1007 - PAGE 1
`
`
`
`U.S. Patent
`
`Dec. 6, 1994
`
`Sheet 1 of 13
`
`5,371,852
`
`100
`
`NODE 2
`106
`
`/
`
`NODE 1
`105
`
`125
`
`NETWORK
`... H
`
`me
`
`130
`
`127
`
`142
`
`120
`
`us
`
`... H
`
`130
`
` ... H
`
`130
`
`NODE N
`108
`
`FIG. 1 A
`PRIOR ART
`
`NODE 3
`107
`
`NODE 2
`106
`
`NODE 3
`107
`
`100
`
`NODE 1
`105
`
`NODE N
`108
`
`127
`---,
`G
`109
`
`FIG.1 13
`PRIOR ART
`
`130
`120
`1
`
`NETWORK 2
`i127
`.1.
`1
`120
`H
`i
`1
`J
`125
`
`127
`
`NETWORK 1
`H
`
`130
`
`1 .1. 20
`
`.i H
`
`130
`NETWORK q
`
`HEWLETT PACKARD ENTERPRISE CO.
`EXHIBIT 1007 - PAGE 2
`
`
`
`U.S. Patent
`
`Dec. 6, 1994
`
`Sheet 2 of 13
`
`5,371,852
`
`H
`
`1 30
`120
`
`120
`
`120
`
`127
`
`G1
`
`109
`
`NODE 2
`106
`
`NODE 3
`107
`
`NODE 1
`105
`
`127
`
`G2
`109
`
`NODE N
`108
`
`127
`
`GP
`
`109
`
`!127
`120
`
`IL
`
`127
`
`
`
`1 20
`
`T727)127
`
`130
`
`FIG. 1 C
`PRIOR ART
`
`H
`130
`
`120
`
`120
`
`q
`120
`
`HEWLETT PACKARD ENTERPRISE CO.
`EXHIBIT 1007 - PAGE 3
`
`
`
`U.S. Patent
`
`Dec. 6, 1994
`
`Sheet 3 of 13
`
`5,371,852
`
`EXTERNAL INTERNET
`
`REMOTE
`HOSTS
`
`130
`
`—210
`
`—127
`
`130
`
`\220
`
`130
`
`PORTS
`230
`
`230
`230
`
`210
`
`MESSAGE
`SWITCH
`
`240
`
`ROUTING
`FUNCTION
`250
`
`FIG.2
`
`NETWORK 120
`
`ENCAPS
`
`-TED CLUSTER
`
`(
`
`105
`
`NODE
`
`125
`
`GATEWAY 109
`
`200
`
`7 106
`
`NODE
`
`220
`
`110
`INTERCONNECT
`
`107
`
`NODE
`
`210
`
`108
`
`\
`NODE
`
`HEWLETT PACKARD ENTERPRISE CO.
`EXHIBIT 1007 - PAGE 4
`
`
`
`U.S. Patent
`
`Dec. 6, 1994
`
`Sheet 4 of 13
`
`5,371,852
`
`CONCEPTUAL LAYERING
`APPLICATION
`PORT TO PORT (PP)
`MACHINE TO MACHINE (MM)
`NETWORK INTERFACE
`
`308
`306
`FIG.3A
`304
`PRIOR ART—PROTOCOL LAYERS
`
`302
`
`301-'"
`
`FIG.3B
`PRIOR ART —PROTOCOL HEADERS
`
`330
`
`337
`
`APPLICATION DATA
`
`336
`
`PP HEADER PP DATAGRAM DATA AREA
`
`324
`
`312
`
`FRAME HEADER
`
`MM HEADER
`
`/
`
`MM DATA AREA
`
`320
`
`341
`
`FRAME DATA AREA
`310
`
`V
`
`342
`
`24
`16 19
`4
`VERS HLEN SERVICETYPE
`TOTAL LENGTH
`IDENTIFICATION
`RAGS FRAGMENT OFFSET
`HEADER CHECKSUM
`TIME TO LIVE PROTOCOL
`SOURCE IP ADDRESS
`DESTINATION IP ADDRESS
`IP OPTIONS (IF ANY)
`DATA
`•••
`
`31.)
`
`348
`
`349
`
`PADDING
`
`F1G.3C PRIOR ART—INTERNET PROTOCOL (IP) HEADER
`
`0
`
`,-----
`347
`
`340
`\
`
`344 .<
`
`346
`
`HEWLETT PACKARD ENTERPRISE CO.
`EXHIBIT 1007 - PAGE 5
`
`
`
`U.S. Patent
`
`Dec. 6, 1994
`
`Sheet 5 of 13
`
`5,371,852
`
`356
`
`357
`
`353
`
`350
`
`352/
`
`N1/4 UDP SOURCE PORT
`UDP MESSAGE LENGTH
`
`16
`
`311
`UDP DESTINATION PORT ,
`
`UDP CHECKSUM
`
`354
`
`DATA
`
`•••
`
`FIG.3D PRIOR ART-UDP DATAGRAM
`
`363
`
`/
`
`24
`10
`\ SOURCE PORT
`DESTINATION PORT
`SEQUENCE NUMBER
`
`16
`
`ACKNOWLEDGEMENT NUMBER
`
`HLEN RESERVED CODE BITS
`
`WINDOW
`
`CHECKSUM
`
`URGENT POINTER
`
`360
`
`362
`310/
`"IP
`
`364
`
`OPTiONS(IF ANY)
`
`PADDING
`
`DATA
`
`•••
`
`FIG.3E PRIOR ART-TCP DATAGRAM
`
`3664
`
`367
`{
`
`HEWLETT PACKARD ENTERPRISE CO.
`EXHIBIT 1007 - PAGE 6
`
`
`
`U.S. Patent
`
`Dec. 6, 1994
`
`Sheet 6 of 13
`
`5,371,852
`
`200
`
`CLUSTER GATEWAY
`
`109
`
`PORT
`
`418
`415
`
`REMOTE
`HOSTS
`130
`
`H
`
`130
`
`120
`
`MESSAGE SWITCH
`400
`412
`414 416 410
`MESSAGE SWITCH TABLE
`PORT PROTO NODE -.IUNCTION
`P1
`PR1
`0
`f_l
`P2
`PR2
`0
`f_2 —
`P3
`PR1
`N1
`0
`...-
`...,
`•...
`-
`PN
`PR N
`ROUTING FUNCTIONS
`f_l
`f_2
`
`...,
`
`....,
`
`0
`
`f_N
`
`f_N
`
`110
`INTERCONNECT
`
`F1G.4
`
`NODE 1
`PORT
`105
`
`NODE 2
`PORT
`106
`
`HEWLETT PACKARD ENTERPRISE CO.
`EXHIBIT 1007 - PAGE 7
`
`
`
`U.S. Patent
`
`Dec. 6, 1994
`
`Sheet 7 of 13
`
`5,371,852
`
`FIG.5A
`
`WAIT FOR MM MESSAGE
`
`TOP
`
`505
`
`READ DESTINATION ADDRESS (DADDR)
`IN MM HEADER
`510
`
`(SEE FIGURE 6
`
`NO
`
`IS DADDR =
`CLUSTER ADDRESS
`515
`
`YES
`
`READ PROTOCOL TYPE (PROTO)
`520
`
`NO
`
`IS PROTO A
`PORT TYPE PROTOCOL
`525
`
`YES
`
`PROCESS MESSAGE
`IN GATEWAY IN
`*NORMAL" MANNER
`
`LOCATE AND READ THE DESTINATION PORT
`(DPORT) IN PP HEADER
`530
`
`(GO TO TOP)
`
`SEARCH MESSAGE SWITCH TABLE FOR
`ENTRY MATCHING DPORT,PROTO PAIR
`535
`
`HEWLETT PACKARD ENTERPRISE CO.
`EXHIBIT 1007 - PAGE 8
`
`
`
`U.S. Patent
`
`Dec. 6, 1994
`
`Sheet 8 of 13
`
`5,371,852
`
`FIG.5B
`
`NO
`
`FOUND MATCH ?
`540
`
`NO
`
`NON-SPECIFIC
`PORT ?
`545
`
`YES
`
`YES
`
`NODLADDR=GATEWAY
`NODE PORT=DPORT
`555
`
`ROUTING FUNCTION ?
`555
`NO
`
`YES
`
`COMPUTE NODE ADDR
`FROM DPORT
`NODE PORT=DPORT
`550
`
`INVOKE ROUTING
`FUNCTION WHICH
`COMPUTES NODE ADDR
`AND NODE PORT 565
`
`NODE ADDR=NODE FIELD"
`NODE PORT=DPORT
`560
`
`IWO
`
`MODIFY DATAGRAM DESTINATION
`PORT FIELD TO NODE PORT
`570
`
`MODIFY DATAGRAM DESTINATION
`ADDRESS FIELD TO NODE ADDR
`575
`4
`SEND DATAGRAM TO NODE
`USING INTERCONNECT
`580
`
`(GO TO TOP)
`
`FIG.
`5A
`FIG.
`5B
`
`FIG.5
`
`HEWLETT PACKARD ENTERPRISE CO.
`EXHIBIT 1007 - PAGE 9
`
`
`
`U.S. Patent
`
`Dec. 6, 1994
`
`Sheet 9 of 13
`
`5,371,852
`
`TOP: WAIT FOR MM MESSAGE
`605
`
`FIG.6
`
`READ DESTINATION ADDRESS (DADDR)
`IN MM HEADER
`
`610
`
`READ SOURCE ADDRESS(SADDR)
`IN MM HEADER
`
`620
`
`NO
`
`IS DADDR=
`CLUSTER ADDRESS
`615
`
`YES
`
`(SEE F1G.5
`
`NO
`
`IS DADDR
`OUTSIDE CLUSTER
`AND SADDR
`INSIDE CLUSTER ?
`625
`
`YES (OUTBOUND)
`
`CHANGE SOURCE ADDRESS
`IN HEADER OF OUTBOUND
`MESSAGE TO CLUSTER ADDRESS
`630
`
`FOWARD DATAGRAM TO DESTINATION
`640
`
`(co TO TOP)
`
`HEWLETT PACKARD ENTERPRISE CO.
`EXHIBIT 1007 - PAGE 10
`
`
`
`U.S. Patent
`
`Dec. 6, 1994
`
`Sheet 10 of 13
`
`5,371,852
`
`700
`
`1--... 337
`
`IP
`
`UDP
`
`/PROJECTS
`
`• • •
`
`324 I
`
`336
`
`715
`
`CLUSTER EXPORT TABLE
`FILESYSTEM NAME
`NODE
`
`x720
`
`25
`
`/CONTRACTS
`
`/PROJECTS
`
`/PROPOSAL
`
`2
`
`3
`
`2
`
`722 ---------v
`
`730
`
`MOUNT PORT TABLE
`
`NODE
`
`PORT NO.
`
`1
`
`2
`
`3
`
`4
`
`722
`
`820
`
`640
`
`710
`
`....
`
`735
`
`FIG.7
`
`HEWLETT PACKARD ENTERPRISE CO.
`EXHIBIT 1007 - PAGE 11
`
`
`
`U.S. Patent
`
`Dec. 6, 1994
`
`Sheet 11 of 13
`
`5,371,852
`
`LOCATE AND READ THE FILESYSTEM NAME,
`FSN,IN THE MOUNT REQUEST MESSAGE
`805
`
`SEARCH CLUSTER EXPORT TABLE FOR ENTRY
`WHICH MATCHES FSN OR CONTAINS FSN
`810
`
`( RETURN RETCODE=NOT OK
`820
`
`NO
`
`FOUND MATCHING
`ENTRY ?
`815
`
`YES
`
`READ NODE,N,FROM EXPORT TABLE
`825
`
`V
`SEARCH MOUNT PORT TABLE FOR ENTRY MATCHING
`NODE AND READ MOUNT PORT NUMBER,P
`
`830
`
`FIG.8
`
`SET FUNCTION RETURN VARIABLES
`NODE,_ADDR=N
`NODEJ'ORT=P
`840
`
`850
`(RETURN RETCODE=OK)
`
`HEWLETT PACKARD ENTERPRISE CO.
`EXHIBIT 1007 - PAGE 12
`
`
`
`U.S. Patent
`
`Dec. 6, 1994
`
`Sheet 12 of 13
`
`5,371,852
`
`LOCATE AND READ NFS FILEHANDLE,FH,
`IN THE NFS REQUEST MESSAGE
`920
`
`LOCATE AND READ NODE_ID,N,IN NFS FILEHANDLE
`935
`
`SET FUNCTION RETURN VARIABLES
`NODLADDR=N
`NODE2ORT=2049
`
`940
`
`(RETURN RETCODE=OK
`
`FIG.9A
`
`900
`
`IP
`
`UDP
`
`H337
`
`NFS FILEHANDLE
`
`• • •
`
`324
`
`336
`
`915
`
`NFS FILEHANDLE DATA
`
`925
`FIG.9B
`
`NODE ID N
`NL__
`920
`
`HEWLETT PACKARD ENTERPRISE CO.
`EXHIBIT 1007 - PAGE 13
`
`
`
`U.S. Patent
`
`Dec. 6, 1994
`
`Sheet 13 of 13
`
`5,371,852
`
`200
`
`CLUSTER GATEWAY
`
`109
`
`MESSAGE SWITCH
`
`400
`
`125
`
` \I
`1010
`513 „1.__
`
`REMOTE
`HOSTS
`...] 9.2.43.5
`1019
`130
`
`412
`414 416 410
`MESSAGE SWITCH TABLE
`..----
`—...-----
`1/4 PORT
`PROTO
`ON DE`~FUNCTION.
`FUNCTION.
`1006
`1004 ---\.
`/
`513
`TCP
`-f inconn
`
`0
`
`
`
`418
`1000
`
`-1002
`2049
`UDP
`
`0
`
`f NFS
`
`..19.2.43.8
`
`1022
`130
`
`I
`REMOTE LOGIN ROUTING FUNCTION
`CLUSTER CONNECTION TAB
`. s_addr
`t_port
`node./
`•%, ..
`
`1022
`
`9.2.43.8 1022
`9.2.43.5 1019
`
`2
`1
`
`120
`
`1020
`--
`1026
`1024
`
`NODE 1 Ci 10
`
`)
`
`NODE 2
`
`513
`
`r login
`105
`
`"1030 1°4°°
`
`513 r login
`106
`
`FIG.10
`
`HEWLETT PACKARD ENTERPRISE CO.
`EXHIBIT 1007 - PAGE 14
`
`
`
`METHOD AND APPARATUS FOR MAKING A
`CLUSTER OF COMPUTERS APPEAR AS A
`SINGLE HOST ON A NETWORK
`
`25
`
`30
`
`35
`
`5
`
`10
`
`15
`
`20
`
`BACKGROUND OF THE INVENTION
`1. Field of the Invention
`This invention relates to the field of clustering com-
`puters. More specifically, the invention relates to a
`computer cluster which appears to be a single host
`computer when viewed from outside the cluster, e.g.
`from a network of computers.
`2. Description of the Prior Art
`The prior art discloses many ways of increasing com-
`puting power. Two ways are improving hardware per-
`formance and building tightly coupled multiprocessor
`systems. Hardware technology improvements have
`provided an approximately 100% increase in computing
`power every two years. Tightly coupled systems, i.e.,
`systems with multiple processors that all use a single
`real main storage and input/output configuration, in-
`crease computing power by making several processors
`available for computation.
`However, there are limits to these two approaches.
`Future increases in hardware performance may not be
`as dramatic as in the past. Tightly-coupled multiproces-
`sor versions of modern, pipelined and cached proces-
`sors are difficult to design and implement, particularly
`as the number of processors in the system increases.
`Sometimes a new operating system has to be provided
`to make the tightly-coupled systems operate. In addi-
`tion, overhead costs of multi-processor systems often
`reduce the performance of these systems as compared
`to that of a uniprocessor system.
`An alternative way of increasing computer power
`uses loosely-coupled uniprocessor systems. Loosely-
`coupled systems typically are independent and com-
`plete systems which communicate with one another in
`some way. Often the loosely-coupled systems are linked
`together on a network, within a cluster, and/or within a 40
`cluster which is on a network. In loosely coupled sys-
`tems in a cluster, at least one of the systems is connected
`to the network and performs communication functions
`between the cluster and the network.
`In the prior art and also shown in FIG. 1A, clusters 45
`100 comprise two or more computers (also called nodes
`or computer nodes 105 through 109) connected to-
`gether by a communication means 110 in order to ex-
`change information. Nodes (105 through 109) may
`share common resources and cooperate in doing work. 50
`The communication means 110 connecting the comput-
`ers in the cluster together can be any type of high speed
`communication link known in the art, including: 1. a
`network link like a token ring, ethernet, or fiber optic
`connection or 2. a computer bus like a memory or sys- 55
`tem bus. A cluster, for our purposes, also includes two
`or more computers connected together on a network
`120.
`Often, clusters of computers 100 can be connected by
`various known communications links 120, i.e., net- 60
`works, to other computers or clusters. The point at
`which the cluster is connected to the outside network is
`called a boundary or cluster boundary 125. The connec-
`tion 127 at the boundary is bi-directional, i.e., there are
`incoming and outgoing messages at the boundary. In- 65
`formation which originates from a computer (also
`called a host or host computer) 130 that is on the net-
`work 120 outside the cluster, which then crosses the
`
`1
`
`5,371,852
`
`2
`boundary 127, and which finally enters the cluster 100
`destined for one node (called a destination node) within
`the cluster 100, is called an incoming message. Like-
`wise, a message which originates from a node (called a
`source node) within the cluster 100 and crosses the
`boundary 125 destined for a host 130 on the network
`outside the cluster is called an outgoing message. A
`message from a source node within the cluster 100 to a
`destination also within the cluster 100 is called an inter-
`nal message.
`The prior art includes clusters 100 which connect to
`a network 120 through one of the computer nodes in the
`cluster. This computer, which connects the cluster to
`the network at the boundary 125, is called a gatewaY
`109. In loosely-coupled systems, gateways 109 process
`the incoming and outgoing messages. A gateway 109
`directs or routes messages to (or from) the correct node
`in the cluster. Internal messages do not interact with the
`gateway as such.
`FIG. 1B shows a prior art cluster 100, as shown in
`FIG. 1A, with the gateway 109 connected to a plurality
`(of number q) of networks 120. In this configuration,
`each network 120 has a connection 127 to the gateway
`109. A cluster boundary 125 is therefore created where
`the gateway 109 connects to each network 120.
`FIG. 1C goes on to show another embodiment of the
`prior art. In this embodiment, the cluster 100 has more
`than one computer node (105 through 109) performing
`the function of a gateway 109. The plurality of gate-
`ways 109, designated as G1 through Gp each connect to
`one or more networks 120. In FIG. 1C, gateway G1
`connects to a number r of networks 120, gateway G2
`connects to a number q of networks 120, and gateway
`Gp connects to a number s of networks 120. Using this
`configuration, the prior art nodes within the cluster 100
`are able to communicate with a large number of hosts
`130 on a large number of different networks 120.
`All the prior art known to the inventors uses gate-
`ways 109 to enable external hosts to individually com-
`municate with each node (105 through 109) in the clus-
`ter 100. In other words, the hosts 130 external to the
`cluster 100 on the network 120 have to provide informa-
`tion about any node (105 through 109) within the clus-
`ter 100 before communication can begin with that node.
`The hosts 120 external to the cluster also have to pro-
`vide information about the function running on the
`node which will be accessed or used during the commu-
`nication. Since communication with each node (105
`through 109) must be done individually between any
`external host 130 and any node within the cluster 100,
`the cluster 100 appears as multiple, individual computer
`nodes to hosts outside the cluster. These prior art clus-
`ters do not have an image of a single computer when
`accessed by outside hosts. Examples of prior art which
`lacks this single computer image follow.
`DUNIX is a restructured UNIX kernel which makes
`the several computer nodes within a cluster appear as a
`single machine to other nodes within the cluster. Sys-
`tem calls entered by nodes inside the cluster enter an
`"upper kernel" which runs on each node. At this level
`there is an explicit call to the "switch" component,
`functionally a conventional Remote Procedure Call
`(RPC), which routes the message (on the basis of the
`referred to object) to the proper node. The RPC calls a
`program which is compiled and run. The RPC is used to
`set up the communication links necessary to communi-
`cate with a second node in the cluster. A "lower kernel"
`
`HEWLETT PACKARD ENTERPRISE CO.
`EXHIBIT 1007 - PAGE 15
`
`
`
`5,371,852
`
`10
`
`4
`maintained, or has failed, the communication will fail. If
`a new node(s) is added to the cluster, i.e., the cluster is
`horizontally expanded, the new node will be unavail-
`able to communicate with other host computers outside
`5 the cluster without adding the proper access codes,
`protocols, and other required information to the outside
`hosts.
`Accordingly, there has been a long felt need for a
`cluster of computers which presents a single computer
`image, i.e., looks like a single computer, to computers
`external to the cluster (gateway) boundary. A single
`computer image cluster would have the capability of
`adding or deleting computers within the cluster; chang-
`15 ing and/or moving processes, operating systems, and
`data among computers within the cluster; changing the
`configuration of cluster resources; redistributing tasks
`among the computer within the cluster; and redirecting
`communications from a failed cluster node to an operat-
`20 ing node, without having to modify or notify any com-
`puter outside the cluster. Further, computers outside
`the cluster, would be able to access information or run
`processes within the cluster without changing the envi-
`ronment where they are operating.
`Systems like DUNIX, Amoeba, Sprite, and V pro-
`vide some degree of a single system image from within
`the cluster (i.e., within the gateway boundaries 125) by
`writing new kernels (in the case of Amoeba, a totally
`new operating system.) This requires extensive system
`30 design effort. In addition, all the nodes of the cluster
`must run the system's modified kernel and communicate
`with servers inside the system using new software and
`protocols.
`LOCUS, TCF and DCE provide single system im-
`ages only for computers which are part of their clusters
`and only with respect to file name spaces and process
`name spaces. In other aspects, the identities of the indi-
`vidual nodes are visible.
`
`3
`running on the second node then processes the message.
`DUNIX is essentially a method for making computers
`within the cluster compatible; there is no facility for
`making the cluster appear as a single computer image
`from outside the cluster.
`Amoeba is another system which provides single
`computer imaging of the multiple nodes within the
`cluster only if viewed from within the cluster. To ac-
`complish this, Amoeba runs an entirely new base oper-
`ating system which has to identify and establish commu-
`nication links with every node within the cluster.
`Amoeba cannot provide a single computer image of the
`cluster to a host computer outside the cluster. Amoeba
`also has to provide an emulator to communicate with
`nodes running UNIX operating systems.
`Sprite is a system which works in an explicitly distrib-
`uted environment, i.e., the operating system is aware of
`every node in the cluster. Sprite provides mechanisms
`for process migration, i.e., moving a partially completed
`program from one node to another. To do this, Sprite
`has to execute RPCs each time a new node is accessed.
`There is no single computer image of the cluster pres-
`ented to the network hosts outside these systems.
`V is a distributed operating system which is able to
`communicate only with nodes (and other clusters)
`which are also running V. UNIX does not run on V.
`Other techniques for managing distributed system
`clusters, include LOCUS, TCF, and DCE. These sys-
`tems require that the operating system know of and
`establish communication with each individual node in a
`cluster before files or processes can be accessed. How-
`ever, once the nodes in the cluster are communicating,
`processes or files can be accessed from any connected
`node in a transparent way. Thus, the file or process is
`accessed as if there were only one computer. These
`systems provide a single system image only for the file
`name space and process name space in these systems. In
`these systems, files and processes can not be accessed by
`host computers outside the cluster unless the host has
`established communication with a specific node within 40
`OBJECTIVES
`the cluster which contains the files and/or processes.
`An objective of this invention is an improved method
`3. Statement of Problems with the Prior Art
`Prior art computer clusters fail to appear as one entity
`and apparatus for routing messages across the boundary
`to any system on the network communicating with
`of a cluster of computers to make the cluster of comput-
`them, i.e., the prior art does not offer the network out-
`45 ers on a network appear as a single computer image to
`side its boundary a single computer image. Because of
`host computers on the network outside the cluster.
`this, i.e., because computers outside the boundary of the
`Also an objective of this invention is an improved
`cluster (meaning outside the boundary 125 of any gate-
`method and apparatus for routing messages across the
`way 109 of the cluster 100) have to communicate indi-
`boundary of a cluster of computers to enable outside
`vidually with each computer within the cluster, com-
`50 host computers on a network to use the same software
`munications with the cluster can be complicated. For
`and network protocols to access functions and informa-
`example, computers outside the boundary of the cluster
`tion within the computer cluster as they would use to
`(hosts) have to know the location of and processes run-
`access those functions and information on a single re-
`ning on each computer within the cluster with which
`mote host.
`they are communicating. The host computers need to
`Also an objective of this invention is an improved
`have the proper communication protocols and access
`method and apparatus for routing messages across the
`authorization for each node within the cluster in order
`boundary of a cluster of computers so that computer
`to establish communication. If a node within the cluster
`nodes within the cluster can communicate with outside
`changes its location, adds or deletes a program, changes
`60 hosts on networks such that, from the viewpoint of the
`communication protocol, or changes access authoriza-
`outside host, the communication is with a single remote
`tion, every host computer external to the cluster for
`host, i.e., the cluster, rather than with the individual
`which the change is relevant has to be informed and
`cluster nodes.
`modified in order reestablish communication with the
`A further objective of this invention is an improved
`altered node within the cluster.
`5 method and apparatus for routing messages across the
`The prior art lack of a single computer image to 6
`boundary of a cluster of computers so that work re-
`outside host computers also limits cluster modification
`quests from outside the cluster can be evenly distributed
`and reliability. If hosts try to communicate with a node
`within the cluster which has been removed, is being
`among the computer nodes in the cluster.
`
`25
`
`35
`
`55
`
`HEWLETT PACKARD ENTERPRISE CO.
`EXHIBIT 1007 - PAGE 16
`
`
`
`5,371,852
`
`5
`SUMMARY OF THE INVENTION
`This invention, called an encapsulated cluster, is a
`method and apparatus for routing information that
`crosses the boundary of a computer cluster. The infor-
`mation is in the form of port type messages. Both in-
`coming and outgoing messages are routed so that the
`cluster appears as a single computer image to the exter-
`nal host. The encapsulated cluster appears as a single
`host to hosts on the network which are outside the
`cluster.
`The apparatus comprises two or more computer
`nodes connected together by a communication link,
`called an interconnect, to form a cluster. (Note that in
`one embodiment of the invention, the interconnect can
`be a network.) One of the computers in the cluster,
`serving as a gateway, is connected to one or more exter-
`nal computers and/or clusters (hosts) through another
`communication link called a network. A gateway can
`be connected to more than one network and more than
`one node in the cluster can be a gateway. Each gateway
`connection to a network, i.e., boundary, has an address
`on the network. Each gateway has a message switch
`which routes incoming and outgoing messages by
`changing information on the message header based on
`running a specific routing function that is selected using
`port and protocol information in port type messages.
`Since all incoming messages are addressed to the
`gateway, the cluster appears as a single computer to
`hosts outside the cluster that are sending incoming mes-
`sages to nodes within the cluster. When processing
`incoming messages, the gateway first reads a protocol
`field in the message header and analyzes the message to
`determine if it is a port type message originating from a
`location outside the cluster. If the message is of port
`type, the location of the port number on the message is
`found. This port number and protocol type is used to
`search for a match to a port specific routing function in
`a table residing in memory within the message switch. If
`a table entry is matched, a routing function associated
`with the entry is selected and run. The routing function
`routes the message to the proper computer node within
`the cluster by altering information on the incoming
`message so that the message is addressed to the proper
`node within the cluster.
`For outgoing messages, originating from a source
`node within the cluster, the message switch first recog-
`nizes that the message is a port type message that will
`cross the cluster boundary. The message switch then
`alters the message so that the source address is the gate-
`way address rather than the address of the source node.
`In this way, computers external to the cluster perceive
`the message as coming from the gateway computer on
`the network rather than the sending node within the
`cluster.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`FIGS. 1A-1C show three embodiments of prior art
`computer clusters that are attached to external commu-
`nication links like networks.
`FIG. 2 shows an embodiment of the present inven-
`tion.
`FIGS. 3a-3e shows the general structure of an incom-
`ing and outgoing message and a more specific message
`structure using the internet communication protocol.
`FIG. 4 shows a preferred embodiment of a message
`switch.
`
`5
`
`6
`FIG. 5 is a flow chart showing the steps performed
`by the present invention to route an incoming message.
`FIG. 6 is a flow chart showing the steps performed
`by the present invention to route an outgoing message.
`FIG. 7 shows data structures used by a function in the
`message switch which processes a MOUNT request.
`FIG. 8 is a flow chart of the computer program per-
`formed by the function in the message switch which
`processes a MOUNT request.
`FIGS. 9a-9b show data structures and a flow chart
`used by a function in the message switch which pro-
`cesses NFS requests.
`FIG. 10 shows data structures used by functions in
`the message switch which process TCP connection
`15 service requests, in particular rlogin.
`
`10
`
`25
`
`DETAILED DESCRIPTION OF THE
`INVENTION
`FIG. 2 shows one embodiment of an encapsulated
`20 cluster 200, the present invention. The cluster com-
`prises a plurality of computer nodes (105 through 109)
`one of which is a gateway 109. The nodes are connected
`together by a high speed interconnect 110, e.g., a net-
`work or any other link that is commonly used in the art.
`The gateway is connected with a bidirectional commu-
`nication link 127 to a network 120. A boundary 125 is
`defined at the connection point between the network
`120 and the gateway 109. Computers, called hosts 130,
`30 connect to the network 120 and can communicate with
`nodes within the cluster by passing messages through
`the gateway 109. An incoming message 210 is shown as
`being sent from a host 130, passing through the cluster
`boundary 125, a gateway port 230, a gateway message
`35 switch 240, a gateway routing function 250, the inter-
`connect 110, and ultimately to the destination, the desti-
`nation node 107 in the cluster 200. In a similar manner,
`an outgoing message 220, is shown originating at a
`source node 105 within the cluster 200; passing through
`40 the interconnect 110, gateway message switch 240,
`gateway port 230, cluster boundary 125, and ultimately
`to the destination host 130.
`Although FIG. 2 represents a single cluster 200 with
`a single gateway 109, it is readily appreciated that one
`45 skilled in the art given this disclosure could produce
`multiple embodiments using this invention. For exam-
`ple, the cluster 200 might have multiple gateways 109
`each connected to one or more networks or single host
`computers. A single gateway 109 may also have a plu-
`50 rality of network connections each of which being ca-
`pable of communicating with one or more external
`hosts or one or more external networks. All these em-
`bodiments are within the contemplation of the inven-
`tion.
`55 The encapsulated cluster 200 connects 127 to a high
`speed communication link 120, here called a network
`120. Host computers 130, also connected to the network
`120, communicate with the encapsulated cluster 200,
`and the nodes (105 through 109) within the cluster, over
`60 the network 120. The host computers 130 used in the
`invention include any general purpose computer or
`processor that can be connected together by the net-
`work 120 in any of the many ways known in the art.
`The preferred network 120 is token-ring or ethernet.
`65 However, this high speed communication link 120
`might also include other connections known in the art
`like computer system buses. A host computer 130 could
`also be an encapsulated cluster of computers 200, i.e.,
`
`HEWLETT PACKARD ENTERPRISE CO.
`EXHIBIT 1007 - PAGE 17
`
`
`
`7
`the present invention, which gives the image of a single
`computer to the network 120.
`Nodes (105 through 109) in the encapsulated cluster
`200 can also comprise any general purpose computer or
`processor. An IBM RISC SYSTEM/6000 was the
`hardware used in the preferred embodiment and is de-
`scribed in the book SA23-2619, "IBM RISC SYS-
`TEM/6000 Technology." (RISC SYSTEM/6000 is a
`trademark of the IBM corporation.) These nodes may
`be independent uniprocessors with their own memory
`and input/output devices. Nodes can also be conven-
`tional multiprocessors whose processors share memory
`and input/output resources.
`Nodes (105 through 109) within the cluster are con-
`nected together by a high speed communications link
`called an interconnect 110. This interconnect includes
`any of the many known high speed methods of connect-
`ing general purpose computers or processors together.
`These interconnects include networks like ethernet,
`token ring