throbber
as United States
`a2) Patent Application Publication co) Pub. No.: US 2002/0156612 Al
`Oct. 24, 2002
`Schulter et al.
`(43) Pub. Date:
`
`US 20020156612A1
`
`(54) ADDRESS RESOLUTION PROTOCOL
`SYSTEM AND METHODIN A VIRTUAL
`NETWORK
`
`Related U.S. Application Data
`
`(60)
`
`Provisional application No. 60/285,296, filed on Apr.
`20, 2001.
`
`(76)
`
`Inventors: Peter Schulter, Hampstead, NH (US);
`Scott Geng, Westboro, MA (US); Pete
`Manca,Sterling, MA (US); Paul
`Curtis, Sudbury, MA (US); Ewan
`Milne, Stow, MA (US); Max Smith,
`Natick, MA (US); Alan Greenspan,
`Northboro, MA (US); Edward Duffy,
`Arlington, MA (US)
`
`(51)
`(52)
`
`(57)
`
`Publication Classification
`
`Tint, C07 nce eeecccceeeceeeeeeeececeeenecennnnneeees GO6F 9/455
`US. Cl...
`
`ABSTRACT
`
`Correspondence Address:
`Peter M. Dichiara
`Hale and Dorr LLP
`60 State Street
`Boston, MA 02109 (US)
`
`(21) Appl. No.:
`
`10/038,354
`
`(22)
`
`Filed:
`
`Jan. 4, 2002
`
`A virtual networking system and method are disclosed.
`Switched Ethernet local area network semantics are pro-
`vided over an underlying point to point mesh. Computer
`processor nodes maydirectly communicate via virtual inter-
`faces over a switch fabric or they may communicate via an
`ethernet switch emulation. Address resolution protocol logic
`helps associate IP addresses with virtual interfaces while
`allowing computer processors to reply to ARP requests with
`MACaddresses.
`virtual
`
`
`~<)
`Node105a
`
`soning(7Processing
`
`
`
`
`
`Se
`
`i
`
`
`Lt Se
`
`
`
`.
`
`AS
`my
`
`_|ndJom
`Processing
`Node 105n
`
`
`
`
`Processing
`Node 105m
`
`memory
`
` 106)
`
`
`110a
`
`110b
`
`
`LL rm
`
`Fe
`
`Processing
`
`
`
`
`
`Node
`1056
`
`107
`
`@
`
`6
`

`
`
`
`
`
`
`
`
`management
`logic
`
`local
`
`
`storage
`
`Switch Fabric
`1154
`
`Control Node
`
`Switch Fabric
`115b
`
`
`Control Node
`
`
`* 120b
`
`130
`
`400 J
`
`Google Exhibit 1019
`Google v. VirtaMove
`
`Google Exhibit 1019
`Google v. VirtaMove
`
`

`

`Patent Application Publication
`
`Oct. 24,2002 Sheet 1 of 14
`
`US 2002/0156612 Al
`
`quswsbeuew
`
`set|“Bl
`
`aBe10}s
`
`
`
`BPONjouyudg
`
`e0e!
`
`OeYOUMS
`
`esgli
`
`eBeojs
`
`OL
`
`Cz"|
`
`OO!
`
`
`
`
`
`
`
`
`
`got}eOrb
`SPONSwBulssao0s
`mite
`@poNla208Bursse00lg
`
`
`[iso:|[feo|
`Lo
`204|ag0}
`Buissaodid
`
`USOLSPON
`eBG01
`
`
`NVSabesojysLuGOLEPONce7|[aowew|Buisseo0old
`coh°
`Jeo)Ccpeg
`
`BPON[OsjUoD°
`gsi|qoztoUqe=UOUMS@,
`
`
`
`
`
`ZOl
`
`e
`
`
`
`
`
`
`
`

`

`Patent Application Publication
`
`Oct. 24,2002 Sheet 2 of 14
`
`US 2002/0156612 Al
`
`80dYUMS
`
`0d od
`
`
`

`

`Patent Application Publication
`
`Oct. 24,2002 Sheet 3 of 14
`
`US 2002/0156612 Al
`
`
`
`YOUMS[eNUlA
`
`ple
`
`91607One
`
`ole
`
`abe
`
`+Le
`
`
`
`9150)sossaooid
`
`ole
`
`
`
`9160)1osseood
`
`wi
`OlZ
`
`ge‘bls
`
`
`
`o16o|1osseoqld
`
`‘OL
`
`
`
`s16o|Jossaoald
`
`“012
`
`F-g0e4oUMs
`
`3
`
`l3
`
`YOM[ENblA
`
`91607
`
`VLC
`
`QOZUOIMS-151
`
`OYMS-1
`
`
`
`
`
`
`
`
`
`
`

`

`Patent Application Publication
`
`Oct. 24,2002 Sheet 4 of 14
`
`US 2002/0156612 Al
`
`
`
`9160)sosseooid
`
`‘org
`
`
`
`
`
`YOUMS|enHiA
`
`
`
`o16o|sosseooid
`
`“OLg
`
`x
`
`
`
`obo,sosseoold
`
` “OLZ.
`
`
`o16o,1osseo0ido160°]
`
`
`92‘bis
`
`91607}
`
`951g
`
`
`
`YOUMSJENuIA
`
`vle
`
`
`
`
`
`

`

`Patent Application Publication
`
`Oct. 24,2002 Sheet 5 of 14
`
`US 2002/0156612 Al
`
`91607duv
`
`ose
`
`JOALC»youedin
`
`qoze
`
`YoUMs
`
`BSLL
`
`OLS
`
`V¢“
`
`Bid
`
`
`
`
`
`JOAUGMJOMIONTENA
`
`
`
`
`
`suojyeoyddesaAe|Jeysiu
`
`
`
`wayjshsBuyeiedo
`
`pug
`
`
`
`
`
`
`

`

`Patent Application Publication
`
`Oct. 24,2002 Sheet 6 of 14
`
`US 2002/0156612 Al
`
`
`
`
`
`AXOig48]SN|DFENLIA
`
`O9€
`
`
`
`
`
`getJeNegNWTIENHIAai6o)aesdV
`
`
`
`AxOldNVTIENUIA
`
`oveOle
`
`gge
`
`JeAUQJeAud
`
`aseespe
`
`NWT1Bolshug
`
`Nvleolshyd
`
`JANG
`
`
`
`youebinjoueb|s
`
`qgee
`
`NW10u
`
`Oge
`
`JOAUG
`
`Bgee
`
`qe‘Big
`
`
`
`
`

`

`Patent Application Publication Oct. 24,2002 Sheet 7 of 14
`
`US 2002/0156612 Al
`
`
`
`
`dequeue
`
`outgoing
`datagram
`405
`
`Fig. 4A
`
`
`
`
`
`Send
`ave MAC
`
`
`Datagram
`
`aoness
`using MAC
`
`
`
`
`
`
`
`420
`
`Fig. 4B
`
`

`

`Patent Application Publication
`
`Oct. 24,2002 Sheet 8 of 14
`
`US 2002/0156612 Al
`
`
`
`
`Driver prepend
`
`TLV and send to
`control node for
`broadcast
`430
`
`
`
`Driver creates
`
`
`ARP request
`Packet
`425
`
`
`
`
`
`
`
`
`
` Control Node
`-Server Logic
`receives ARP
`request, updates
`
`source info in TLV
`
`
`header
`
`435
`
`
`
`
`
`
`
`
`
`Control node
`
`broadcasts ARP
`requestto
`members
`
`440
`
`
`
`Driver Logic on
`Receipt of ARP
`request
`
`
`Receive
`ARP Reply
`
`
`445
`
`
`
`
`
`Filter ARP —
`
`
`Packet on Local
`IP
`450
`
`
`
`
`
`
`Create local MAC
`from Packet TLV
`
`460
`
`
`
`Update ARp
`
`
`
`table and Create
`ARPreply
`
`
`465
`
`
`
`
`
`Unicast ARP
`reply
`470
`
`
`Fig. 4B
`
`

`

`Patent Application Publication Oct. 24,2002 Sheet 9 of 14
`
`US 2002/0156612 Al
`
`
`
`
`Control node
`receives ARP
`Reply from internal
`node
`
`
`
`Control node
`
`updates source node
`
`
`info in TLV of ARP
`
`reply
`473
`
`
`
`
`
`
`Control node
`
`
`
`unicasts packet
`to approrpriate
`
`
`destination node
`
`475
`
`
`Select RVI for
`
`Unicast
`493
`
`
`
`
`Prepend header
`
`TLV and Unicast
`
`datagram directly
`on RVI
`
`_ 495
`
`
`
`
`
`
`
`
`Fig. 4C
`
`
`
`
`
`
`
`
`ARP replier (or
`load balancer)
`receives ARP
`Reply
`480
`
`
`
`
`
`
`
`
`
`
`
`Update ARP
`table
`485
`
`
`
`Dequeue
`
`
`datagram from |
`
`
`ARP queue
`
`
`487
`
`
`
`
`
`

`

`Patent Application Publication Oct. 24,2002 Sheet 10 of 14
`
`US 2002/0156612 Al
`
`ServiceA
`
`ServiceB
`505
`
`Cluster
`
`0
`Bo
`on Ww
`30
`33
`z
`>
`
`LO

`=

`LL
`
`

`

`Patent Application Publication
`
`Oct. 24,2002 Sheet 11 of 14
`
`US 2002/0156612 Al
`
`SPiS-1OSS8001q
`
`
`
`a1Bo)ebesoys
`
`WOoZ9
`
`$e9
`
`jadsoupysulauo
`
`lossa0oid
`
`apls-10sse0dld
`
`
`
`o15o)aHeiojs~
`
`dogo
`
`@DIS-BPOUjOJJUOD
`
`
`
`oifo;aBesoys
`
`Sb9.
`
`
`
`eyeqaBeloig ‘Bis
`
`juowabeueW
`
`QoeLsyu|
`
`O19
`
`eBelojS
`
`uolyenbyuod
`
`o16o7y
`
`S09
`
`aInpOnNAS
`
`SES
`
`
`

`

`Patent Application Publication Oct. 24,2002 Sheet 12 of 14
`
`US 2002/0156612 Al
`
`N 0
`
`)2
`LL
`
`
`
`
`oO
`
`gs
`S a °
`ee Oto
`oo?
`Oocrt
`apa oO”eagee
`aS
`= 2
`zo
`=
`
`g
`<=
`3
`@
`Qo
`~
`~
`o
`2
`3
`Oo
`
`
`
`
`=
`oe
`Ou
`Ho®2o
`ae
`o = me
`>
`o@
`<7
`oO
`f&
`
`

`

`Patent Application Publication
`
`Oct. 24,2002 Sheet 13 of 14
`
`US 2002/0156612 Al
`
`Sl—.ee
`
`Lu'o'H]
`
`
`
`XU][BUOISUSWIG-b
`
`
`
`yeuueYdISOH
`
`[yw
`
`g‘big
`
`
`

`

`Patent Application Publication
`
`Oct. 24,2002 Sheet 14 0f14
`
`US 2002/0156612 Al
`
`ssolnosey9844
`
`
`
`ginjonyseyep
`
` O16
`
`
`.026
`
`eyepHumomyou
`
`gInjponi}s
`
`
`
`juawebeuelJEISNO
`
`9160|
`
`$06
`
`eyepebesojs
`
`eINyonys
`
`S16
`
`
`
`
`
`
`

`

`US 2002/0156612 Al
`
`Oct. 24, 2002
`
`ADDRESS RESOLUTION PROTOCOL SYSTEM
`AND METHODIN A VIRTUAL NETWORK
`
`BACKGROUND
`
`[0001]
`
`1. Field of the Invention
`
`[0002] The present inventionrelates to computing systems
`for enterprises and application service providers and, more
`specifically, to processing systems having virtualized com-
`munication networks.
`
`[0003]
`
`2. Discussion of Related Art
`
`In current enterprise computing and application
`[0004]
`service provider environments, personnel from multiple
`information technology(I'l) functions(electrical, network-
`ing, etc.) must participate to deploy processing and network-
`ing resources. Consequently, because of scheduling and
`other difficulties in coordinating activities from multiple
`departments, it can take weeks or months to deploy a new
`computer server. This lengthy, manual process increases
`both human and equipmentcosts, and delays the launch of
`applications.
`
`[0005] Moreover, because it is difficult to anticipate how
`muchprocessing powerapplications will require, managers
`typically over-provision the amount of computational
`power. As a result, data-center computing resources often go
`unutilized or under-utilized.
`
`If more processing poweris eventually needed than
`[0006]
`originally provisioned, the various IT functions will again
`need to coordinate activities to deploy more or improved
`servers, connect them to the communication and storage
`networksandso forth. This task gets increasingly difficult as
`the systems become larger.
`
`is also problematic. For example,
`[0007] Deployment
`when deploying 24 conventional servers, more than 100
`discrete connections may be required to configure the over-
`all system. Managing these cables is an ongoing challenge,
`and each represents a failure point. Attempting to mitigate
`the risk of failure by adding redundancy can double the
`cabling, exacerbating the problem while increasing com-
`plexity and costs.
`
`[0008] Provisioning for high availability with today’s
`technologyis a difficult and costly proposition. Generally, a
`failover server must be deployed for every primary server. In
`addition, complex management software and professional
`services are usually required.
`
`[0009] Generally,it is not possible to adjust the processing
`power or upgrade the CPUs on a legacy server. Instead,
`scaling processor capacity and/or migrating to a vendor’s
`next-generation architecture often requires
`a “forklift
`upgrade,” meaning more hardware/software systems are
`added, needing new connections and the like.
`
`[0010] Consequently, there is a need for a system and
`method of providing a platform for enterprise and ASP
`computing that addresses the above shortcomings.
`
`SUMMARY
`
`[0012] According to one aspect of the invention, a method
`and system of implementing an address resolution protocol
`(ARP)are provided. Acomputing platform hasa plurality of
`processors connected by an underlying physical network.
`Logic, executable on one of the processors, defines a topol-
`ogy of an Ethernet network to be emulated on the computing
`platform. The topology includes processor nodes and a
`switch node. Logic, executable on one of the processors,
`assigns a set of processors from the plurality to be processors
`to act as the processor nodes. Logic, executable on oneof the
`processors, assigns virtual MAC addressesto each processor
`node of the emulated Ethernet network. Logic, executable
`on one ofthe processors,allocates virtual interfaces over the
`underlying physical network to provide direct software
`communication from each processor node to each other
`processor node. Each virtual interface has a corresponding
`identification. Each processor node has ARPrequestlogic to
`communicate an ARP request to the switch node, in which
`the ARP request includes an IP address. The switch node
`includes ARP request broadcast logic to communicate the
`ARPrequest to all other processor nodes in the emulated
`Ethernet network. Each processor node has ARPreply logic
`to determine whetherit is the processor node associated with
`the IP address in an ARP request and, if so, to issue to the
`switch node an ARP reply, wherein the ARP reply contains
`the virtual MAC address of the processor node associated
`with the IP address. The switch node includes ARP reply
`logic to receive the ARP reply and to modify the ARP reply
`to include to include a virtual interface identification for the
`ARPrequesting node.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`[0013]
`
`In the Drawing,
`
`a system diagram illustrating one
`is
`1
`[0014] FIG.
`embodiment of the invention;
`
`FIGS. 2A-C are diagramsillustrating the commu-
`(0015]
`nication links established according to one embodimentof
`the invention;
`
`[0016] TIGS. 3A-B are diagramsillustrating the network-
`ing software architecture of certain embodiments of the
`invention;
`
`FIGS. 4A-C are flowcharts illustrating driver logic
`[0017]
`according to certain embodiments of the invention;
`
`[0018] FIG. 5 illustrates service clusters according to
`certain embodiments of the invention;
`
`FIG.6 illustrates the storage software architecture
`[0019]
`of certain embodiments of the invention;
`
`FIG.7 illustrates the processor-side storage logic
`[0020]
`of certain embodiments of the invention;
`
`{0021] FIG. 8 illustrates the storage address mapping
`logic of certain embodiments of the invention; and
`
`FIG.9 illustrates the cluster managementlogic of
`[0022]
`certain embodiments of the invention.
`
`DETAILED DESCRIPTION
`
`invention features a platform and
`[0011] The present
`method for computer processing in which virtual processing
`arca networks may be configured and deployed.
`
`[0023] Prcfcrred embodiments of the invention provide a
`processing platform from which virtual systems may be
`deployed through configuration commands. The platform
`provides a large pool of processors from which a subsct may
`
`

`

`US 2002/0156612 Al
`
`Oct. 24, 2002
`
`is formed as a serial connection between a NIC 107 and a
`port in the switch fabric 115. Each link operates at 112
`megabytes/second.
`[0029]
`In some embodiments, multiple cabinets or chas-
`sises may be connected together to form larger platforms.
`Andin other embodiments the configuration may ditter; for
`example, redundant connections, switches and control nodes
`may be eliminated.
`[0030] Under software control, the platform supports mul-
`tiple, simultaneous and independent processing areas net-
`works (PANs). Each PAN, through software commands, is
`configured to have a corresponding subset of processors 106
`that may communicate via a virtual local area network that
`is emulated over the PtP mesh. Each PAN isalso configured
`to have a corresponding virtual I/O subsystem. No physical
`deploymentor cabling is needed to establish a PAN. Under
`certain preferred embodiments, software logic executing on
`the processor nodes and/or the control nodes emulates
`swilched Ethernet semantics; other software logic executing
`on the processor nodes and/or the control nodes provides
`virtual storage subsystem functionality that follows SCSI
`semantics and that provides independent I/O address spaces
`for each PAN. Network Architecture Certain preferred
`embodiments allow an administrator to build virtual, emu-
`lated LANs using, virtual components, interfaces, and con-
`nections. Each of the virtual LANs can be internal and
`
`private to the platform 100, or multiple processors maybe
`formed into a processor cluster externally visible as a single
`IP address.
`
`be selected and configured through software commands to
`form a virtualized network of computers (“processing area
`network” or “processor clusters”) that may be deployed to
`serve a given setof applications or customer. The virtualized
`processing area network (PAN) maythen be used to execute
`customer specific applications, such as web-based server
`applications. The virtualization may includevirtualization of
`local area networks (LANs) or the virtualization of I/O
`storage. By providing such a platform, processing resources
`may be deployed rapidly and easily through software via
`configuration commands,e.g., from an administrator, rather
`than through physically providing servers, cabling network
`and storage connections, providing powerto each server and
`so forth.
`
`[0024] Overview of the Platform and its Behavior
`
`[0025] As shownin FIG.1, a preferred hardware platform
`100 includes a set of processing nodes 105a-n connected to
`a switch fabrics 115a,b via high-speed, interconnect 110a,b.
`The switch fabric 115a,b is also connected to at least one
`control node 120a,b that
`is in communication with an
`external IP network 125 (or other data communication
`network), and with a storage area network (SAN) 130. A
`management application 135,
`for example,
`executing
`remotely, may access one or more of the control nodes via
`the IP network 125 to assist in configuring the platform 100
`and deploying virtualized PANs.
`
`[0026] Under certain embodiments, about 24 processing
`nodes 105a-n, two control nodes 120, and two switch fabrics
`115a,b are contained in a single chassis and interconnected
`with a fixed, pre-wired mesh of point-to-point (PtP) links.
`Each processing node 105 is a board that includes one or
`more (e.g., 4) processors 106j-/, one or more network
`interface cards (NICs) 107, and local memory(e.g., greater
`than 4 Gbytes) that, among other things,
`includes some
`BIOS firmware for booting and initialization. There is no
`local disk for the processors 106; insteadall storage, includ-
`ing storage needed for paging, is handled by SAN storage
`devices 130.
`
`[0031] Under certain embodiments, the virtual networks
`so created emulate a switched Ethernet network, though the
`physical, underlying network is a PtP mesh. The virtual
`network utilizes IEEE MACaddresses, and the processing
`nodes support IETF ARPprocessing to identify and associ-
`ate IP addresses with MACaddresses. Consequently, a given
`processor node replies to an ARP request consistently
`whether the ARP request came from a node internal or
`external to the platform.
`[0032] FIG. 2A shows an exemplary network arrange-
`[0027] Each control node 120 is a single board that
`ment that may be modeled or emulated.Afirst subnet 202 is
`includes one or more(e.g., 4) processors, local memory, and
`formed by processing nodes PN,, PN, and PN, that may
`local disk storage for holding independent copies of the boot
`communicate with one another via switch 206. A second
`image andinitial file system that is used to boot operating
`subnet 204 is formed by processing nodes PN,, and PN,, that
`system software for the processing nodes 105 and for the
`may communicate with one another via switch 208. Under
`control nodes 106. Each control node communicates with
`switched Ethernet semantics, one node on a subnet may
`SAN 130 via 100 megabyte/second fibre channel adapter
`communicate directly with another node on the subnet; for
`cards 128 connected to fibre channel links 122, 124 and
`example, PN, may send a message to PN,. The semantics
`communicates with the Internet (or any other external net-
`also allow one node to communicate with a set of the other
`work) 125 via an external network interface 129 having one
`or more Gigabit Ethernet NICs connected to Gigabit Ether-
`net links 121,123. (Many other techniques and hardware
`may be used for SAN and external network connectivity.)
`Each control node includes a low speed Ethernet port (not
`shown)as a dedicated managementport, which may be used
`instead of remote, web-based management via management
`application 135.
`
`[0028] The switch fabrics is composed of one or more
`30-port Giganet switches 115, such as the NIC-CLAN 1000
`and clan 5300 switch, and the various processing and control
`nodes usc corresponding NICs for communication with such
`a fabric module. Giganet switch fabrics have the semantics
`of a Non-Broadcast Multiple Access (NBMA)network. All
`intcr-node communication is via a switch fabric. Each link
`
`nodes; for example PN, may send a broadcast message to
`other nodes. The processing nodes PN, and PN. cannot
`directly communicate with PN,, because PN,, is on a dif-
`ferent subnet. For PN, and PN, to communicate with PN,
`higher layer networking software would need to be utilized,
`which software would have a fuller understanding of both
`subnets. Though not shownin the figure, a given switch may
`communicate via an “uplink” to another switch or the like.
`As will be appreciated given the description below, the need
`for such uplinks is different
`than their need when the
`switches are physical. Specifically, since the switches are
`virtual and modeled in software they mayscale horizontally
`as wide as needed. (In contrast, physical switches have a
`fixed number of physical ports sometimes the uplinks are
`needed to provide horizontal scalability.)
`
`

`

`US 2002/0156612 Al
`
`Oct. 24, 2002
`
`[0033] FIG. 2B shows exemplary software communica-
`tion paths and logic used under certain embodiments to
`model the subnets 202 and 204 of FIG. 2A. The commu-
`
`nication paths 212 connect processing nodes PN,, PN., PN,,,
`and PN,,, specifically their corresponding processor-side
`network communication logic 210, and they also connect
`processing nodes to control nodes. (Though drawn as a
`single instance of logic for the purpose of clarity, PN, may
`have multiple instances of the corresponding processor
`logic, one per subnet,
`for example.) Under preferred
`embodiments, managementlogic and the control node logic
`are responsible for establishing, managing and destroying
`the communication paths. The individual processing nodes
`are not permitted to establish such paths.
`
`[0034] As will be explained in detail below, the processor
`logic and the control node logic together emulate switched
`Ethernet semantics over such communication paths. For
`example, the control nodes have control node-side virtual
`switch logic 214 to emulate some(but not necessarily all) of
`the scmantics of an Ethernet switch, and the processor logic
`includes logic to emulate some (but not necessarily all) of
`the semantics of an Ethernet driver.
`
`[0035] Within a subnet, one processor node may commu-
`nicate directly with another via a corresponding virtual
`interface 212. Likewise, a processor node may communicate
`with the control node logic via a separate virtual interface.
`Under certain embodiments,
`the underlying switch fabric
`and associated logic (e.g., switch fabric manager logic, not
`shown) provides the ability to establish and manage such
`virtual interfaces (VIs) over the point to point mesh. More-
`over, these virtual interfaces may be establishedin a reliable,
`redundant fashion and are referred to herein in as RVIs. At
`points in this description, the terms virtual interface (VI) and
`reliable virtual interface (RVI) are used interchangeably, as
`the choice between a VI versus an RVI largely depends on
`the amountofreliability desired by the system at the expense
`of system resources.
`
`[0036] Referring conjointly to FIGS. 2A-B, if node PN,is
`to communicate with node PN, it does so ordinarily by
`virtual interface 212,_,. However, preferred embodiments
`allow communication between PN, and PN. to occur via
`switch emulation logic,
`if for example VI 212,., is not
`operating satisfactorily. In this casc a message may be sent
`via VI 212,.witehz0g aNd Via VI 212,05,206-2. If PN, is to
`broadcast or multicast a message to other nodes in the subnet
`202 it does so by sending the message to control node-side
`logic 214 via virtual interface 212, _,.achoog- Control node-
`side logic 214 then emulates the broadcast or multicast
`functionality by cloning and sending the message to the
`other relevant nodes using the relevant VIs. The same or
`analogous VIs may be used to convey other messages
`requiring control node-side logic. For example, as will be
`described below, control node-side logic includes logic to
`support the address resolution protocol (ARP), and VIs are
`used to communicate ARPreplies and requests to the control
`node. Though the above description suggests just onc VI
`between processor logic and control logic, many embodi-
`ments employ several such connections. Moreover, though
`the figures suggest symmetry in the software communication
`paths, the architecture actually allows asymmetric commu-
`nication. For example, as will be discussed below, for
`communication clustercd scrviccs the packets would be
`
`routed via the control node. However, return communication
`may be direct between nodes.
`[0037] Notice that like the network of FIG. 2A,there is no
`mechanism for communication between node PN,, and
`PN,,,- Moreover, by having communication paths managed
`and created centrally (instead of via the processing nodes)
`sucha path is not creatable by the processing nodes, and the
`defined subnet connectivity cannot be violated by a proces-
`sor.
`
`[0038] FIG. 2C shows the exemplary physical connec-
`tions of certain embodimentsto realize the subnets of FIGS.
`2A and B. Specifically, each instance of processing network
`logic 210 communicates with the switch fabric 115 via a PtP
`links 216 of interconnect 110. Likewise, the control node has
`multiple instances of switch logic 214 and each communi-
`cates over a PtP conneciton 216 to the switch fabric. The
`virtual interfaces of FIG. 2B include the logic to convey
`information over these physical links, as will be described
`further below.
`
`To create and configure such networks, an admin-
`[0039]
`istrator defines the network topology of a PAN andspecifies
`(e.g., via a utility within the management software 135)
`MACaddress assignments of the various nodes. The MAC
`addressis virtual, identifying a virtual interface, and nottied
`to any specific physical node. Under certain embodiments,
`MACaddresses follow the IEEE 48 bit address format, but
`in which the contents include a “locally administered” bit
`(set to 1), the serial numberofthe control node 120 on which
`the virtual interface was originally defined (more below),
`and a count value from a persistent sequence counter on the
`control node that is kept in NVRAM inthe control node.
`These MACs will be used to identify the nodes (as is
`conventional) at a layer 2 level. For example,in replying to
`ARPrequests (whether from a node internal to the PAN or
`on an external network) these MACswill be included in the
`ARPreply.
`[0040] The control node-side networking logic maintains
`data structures that contain information reflecting the con-
`nectivity of the LAN (e.g., which nodes may communicate
`to which other nodes). ‘he control node logic also allocates
`and assigns VI (or RVI) mappings to the defined MAC
`addresses and allocates and assigns VIs or (RVIs) between
`the control nodes and between the control nodes and the
`processing nodes. In the example of FIG. 2A, the logic
`would allocate and assign VIs 212 of FIG. 2B. (The naming
`of the VIs and RVIs in some embodiments is a consequence
`of the switching fabric and the switch fabric managerlogic
`employed.)
`[0041] As each processor boots, BIOS-based boot logic
`initializes each processor 106 of the node 105 and, among
`other things, establishes a (or discovers the) VI 212 to the
`control node logic. The processor node then obtains from the
`control node relevant data link information, such as the
`processor node’s MAC address, and the MAC identities of
`other devices within the same data link configuration. Each
`processorthen registers its IP address with the control node,
`whichthen bindsthe IP address to the node and an RVI (c.g.,
`the RVI on which the registration arrived). In this fashion,
`the contral node will be able to bind IP addresses for each
`virtual MAC for cach node on a subnet. In addition to the
`
`above, the processor node also obtains the RVI or VI-related
`information for its connections to other nodes or to control
`
`node networking logic.
`
`

`

`US 2002/0156612 Al
`
`Oct. 24, 2002
`
`the various
`[0042] Thus, after boot and initialization,
`processor nodes should understand their layer 2, data link
`connectivity. As will be explained below,
`layer 3 (IP)
`connectivity and specifically layer 3 to layer 2 associations
`are determined during normal processing of the processors
`as a consequence of the address resolution protocol.
`
`[0043] FIG. 3A details the processor-side networking
`logic 210 and FIG. 3B details the control node-side net-
`working 310 logic of certain embodiments. The processor
`side logic 210 includes IP stack 305, virtual network driver
`310, ARP logic 350, RCLAN layer 315, and redundant
`Giganet drivers 320a,b. The control node-side logic 310
`includes redundant Giganet drivers 325a,b, RCLAN layer
`330, virtual Cluster proxy logic 360, virtual LAN server 335,
`ARPserverlogic 355, virtual LAN proxy 340, and physical
`LAN drivers 345.
`
`IP Stack
`
`[0044] The IP stack 305 is the communication protocol
`stack provided with the operating system (e.g., Linux) used
`by the processing nodes 106. The IP stack provides a layer
`3 interface for the applications and operating system execut-
`ing on a processor 106 to communicate with the simulated
`Etheret network. The IP stack provides packets of infor-
`mation to the virtual Ethernet layer 310 in conjunction with
`providing a layer 3, IP address as a destination for that
`packet. The IP stack logic is conventional except that certain
`embodiment avoid check sum calculations and logic.
`
`Virtual Ethernet Driver
`
`(0045] The virtual Ethernet driver 310 will appearto the IP
`stack 305 like a “real” Ethernet driver. In this regard, the
`virtual Ethernet driver 310 receives IP packets or datagrams
`from the IP stack for subsequent transmission on the net-
`work, and it receives packet information from the network
`to be delivered to the stack as an IP packet.
`
`[0046] The stack builds the MAC header. The “normal”
`Ethernet code in the stack may be used. The virtual Ethernet
`driver receives the packet with the MAC header already
`built and the correct MAC address alreadyin the header.
`
`In material part and with reference to FIGS. 4A-C,
`[0047]
`the virtual Ethernet driver 310 dequeues 405 outgoing IP
`datagrams so that the packet may be sent on the network.
`The standard IP stack ARPlogic is used. The driver, as will
`be explained below,intercepts all ARP packets entering and
`leaving the system to modify them so that
`the proper
`information ends up in each node’s ARPtables. The normal
`ARPlogic places the correct MAC address in the link layer
`header of the outgoing packet before the packet is queued to
`the Ethernet driver. The driver then just examines the link
`layer header and destination MACto determine howto send
`the packet. The driver does not directly manipulate the ARP
`table (except for the occasional invalidation of ARPentries).
`
`[0048] The driver 310 determines 415 whether ARPlogic
`350 has MAC address information (more below) associated
`with the IP address in the dequcued packet. If the ARP logic
`350 has the information, the information is used to send 420
`the packet accordingly. If the ARP logic 350 does not have
`the information, the driver necds to determine such infor-
`mation, and in certain preferred embodiments, this informa-
`tion is obtained as a result of an implementation of the ARP
`protocol as discussed in connection with FIGS. 4B-C.
`
`If the ARP logic 350 has the MAC address infor-
`[0049]
`mation, the driver analyzes the information returned from
`the ARP logic 350 to determine where and how to send the
`packet. Specifically,
`the driver looks at
`the address to
`determine whether the MAC addressis in a valid format or
`in a particular invalid format. For example, in one embodi-
`ment, internal nodes (i.e., PAN nodes internal to the plat-
`form) are signaled through a combination of setting the
`locally administered bit,
`the multicast bit, and another
`predefined bit pattern in the first byte of the MAC address.
`The overarching pattern is one which is highly improbable
`of being a valid pattern.
`
`If the MAC address returned from the ARPlogic is
`[0050]
`in a valid format, the IP address associated with that MAC
`address is for a node external at least to the relevant subnet
`and in preferred embodiments is external to the platform. To
`deliver such a packet, the driver prepends the packet wilh a
`TLV (type-length-value) header. The logic then sends the
`packet to the control node over a pre-established VI. The
`control node then handles the rest of the transmission as
`appropriate.
`
`If the MAC address information returned from the
`[0051]
`ARPlogic 350 is in an a particular invalid format, the invalid
`format signals that the IP-addressed node is to an internal
`node, and the information in the MAC address information
`is used to help identify the VI (or RVDdirectly connecting
`the two processing nodes. For example, the ARP table cntry
`may hold information identifying the RVI 212 to use to send
`the packet, e.g., 212,.,, to another processing node. The
`driver prepends the packet with a TLV header. It then places
`address information into the header as well as information
`
`identifying the Ethernet protocol type. The logic then selects
`the appropriate VI (or RVI) on which to send the encapsu-
`lated packet. If that VI (or RVI) is operating satisfactorily it
`is used to carry the packet; if it is operating unsatisfactorily
`the packet is sent to the control node switch logic (more
`below) so that the switch logic can send it to the appropriate
`node. ‘Though the ARP table may contain information to
`actually specify the RVI to use, manyother techniques may
`be employed. [‘or example, the information in the table may
`indirectly provide such information, e.g., by pointing to the
`information of interest or otherwise identifying the infor-
`mation of interest though not containit.
`
`[0052] For any multicast or broadcast type messages, the
`driver sends the messageto the control node on a defined VI.
`The control node then clones the packet and sendsitto all
`nodes (excluding the sending node) and the uplink accord-
`ingly.
`
`If there is no ARP mapping then the upper layers
`[0053]
`would never have sent the packet to the driver. If there is no
`datalink layer mapping available, the packet is put aside
`until ARP resolution is completed. Once the ARP layer has
`finished ARPing, the packets held back pending ARPget
`their datalink headers build and the packets are then sent to
`the driver.
`
`If thc ARP logic has no mapping for an IP address
`[0054]
`of an IP packet from the IP stack and, consequently, the
`driver 310 is unable to determine the associated addressing
`information (i.c., MAC address or RVI-related information),
`the driver obtains such information by following the ARP
`protocol. Referring to FIGS. 4B-C,the driver builds 425 an
`ARPrequest packet containing the relevant IP address for
`
`

`

`US 2002/0156612 Al
`
`Oct. 24, 2002
`
`which there is no MAC mappingin the local ARP table. The
`node then prepends 430 the ARP packet with a TLV-type
`header. The ARP requestis then sent via a dedicated RVI to
`the control node-side networking logic—specifically,
`the
`virtual LAN server 335.
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket