`US 6,931,449 B2
`(10) Patent No.:
`Aug.16, 2005
`(45) Date of Patent:
`Schmidt et al.
`
`US006931449B2
`
`(54)
`
`METHOD MIGRATING OPEN NETWORK
`CONNECTIONS
`
`2002/0085549 Al *
`2003/0172080 Al *
`
`7/2002 Reza et al. uo... 370/389
`9/2003 Talanis et al.
`.......0..... 707/100
`
`FOREIGN PATENT DOCUMENTS
`
`
`
`Inventors: Brian K. Schmidt, Mountain View, CA
`(75)
`WO WO 200122743 A2* 3/2001—..... H04L/12/56
`(US); James G. Hanko, RedwoodCity,
`CA (US)
`
`OTHER PUBLICATIONS
`
`(73)
`
`Assignee: Sun Microsystems, Inc., Santa Clara,
`CA (US)
`
`(*)
`
`Notice:
`
`Subject to any disclaimer, the term ofthis
`patent is extended or adjusted under 35
`US.C. 154(b) by 641 days.
`
`(21)
`
`(22)
`
`(65)
`
`61)
`(52)
`
`(58)
`
`(56)
`
`Appl. No.: 09/816,996
`
`Filed:
`
`Mar. 22, 2001
`Prior Publication Data
`
`US 2002/0138629 Al Sep. 26, 2002
`
`Int. Cl? GO06F 15/177; GO6F 15/16
`
`U.S. Cl.
`....
`.. 709/228; 709/220; 709/221;
`709/248
`Field of Search ..............0ccceecceeeee 709/220, 221,
`709/228, 248
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5,825,649 A * 10/1998 Yoshimura ............0..00. 700/82
`5,938,722 A
`8/1999 Johnson
`6,151,590 A * 11/2000 Cordery et al... 705/60
`6,587,874 Bl *
`7/2003 Golla et al. oe 709/220
`
`“Technical White Paper” VMware, Inc., Feb. 1999, pp. 1-7
`(XP-002241055).
`“Nomadic Computing Environment Employing Wired and
`Wireless Networks” By
`‘Toshiaki Tanaka, Masahiro
`Morikura and Hitoshi Takanashi, IEICE Trans. Commun.,
`vol. E81-B No.
`8, Aug.
`8,
`1998, pp.
`1565-1573
`(XP-000788462).
`
`* cited by examiner
`
`Primary Examiner—Jeffrey Gaffin
`Assistant Examiner—Angel L. Casiano
`(74) Attorney, Agent,
`or Firm—Martine, Penilla &
`Gencarella, LLP
`
`(57)
`
`ABSTRACT
`
`Amechanism for the migration of open network connections
`is described herein. According to one or more embodiments
`of the present invention, an active computing environment
`called compute capsule is provided. Each capsule has a
`unique locator. Packets are used to send information
`between capsules using the locators. When a capsule
`migrates, any open network connectionsthat existed before
`the migration may continue when the capsule finishes the
`migration.
`
`16 Claims, 11 Drawing Sheets
`
`3. RECORD ASSIGNED IP ADDRESS
`
`CAPSULE
`400
`
`1. REQUEST FOR LOCATOR
`
`
`
`aanernena“
`
`
` CAPSULE
`
`DIRECTORY
`SERVICE
`402
`
`
`
`
`MACHINEIP
`
`
`ADDRESS
`450
`
`2. CAPSULE LOCATOR
`
`Google Exhibit 1007
`Google v. VirtaMove
`
`Google Exhibit 1007
`Google v. VirtaMove
`
`
`
`U.S. Patent
`
`Aug. 16, 2005
`
`Sheet 1 of 11
`
`US 6,931,449 B2
`
`BCANNCOROOLAClie:Oreloeeelmipeeit
`
`
`
`OZb
`
`Y
`
`ph.RX
`
`00}
`
`OctS0:bt0Ad
`
`
`
`
`bt
`
`|SUNDIS
`
`i
`
`vel
`
`
`
`
`
`
`
`U.S. Patent
`
`Aug. 16, 2005
`
`Sheet 2 of 11
`
`US 6,931,449 B2
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`APPLICATION LAYER
`206
`
`PRESENTATION LAYER
`205
`
`SESSION LAYER
`210
`
`TRANSPORTLAYER
`215
`
`NETWORKLAYER
`220
`
`DATALINK LAYER
`22.
`
`,
`
`PHYSICAL LAYER
`230
`
`,
`
`-
`
`IP ADDRESS FOR EACH MACHINE
`.
`240
`
`245
`
`FIGURE 2A
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`IP ADDRESS FOR EACH CAPSULE
`255
`
`IP ADDRESS FOR EACH MACHINE
`260
`IP ADDRESS FOR EACH NETWORK
`270
`
`FIGURE 2B
`
`APPLICATION LAYER
`806
`PRESENTATION LAYER
`205
`
`SESSION LAYER
`210
`
`TRANSPORT LAYER
`215
`
`NETWORK LAYER
`220
`DATALINK LAYER
`225
`
`|
`
`|
`
`PHYSICAL LAYER
`230
`
`
`
`U.S. Patent
`
`Aug. 16, 2005
`
`Sheet 3 of 11
`
`US 6,931,449 B2
`
`OBTAIN ALL PROCESSES OF
`USER
`301
`
`303
`
`ASSIGN UNIQUE LOCATOR
`TO GAPSULE
`300
`
`CAPTURE THE STATE OF
`EACH PROCESS
`302
`
`ENCAPSULATE ALL
`PROCESSESIN A CAPSULE
`
`FIGURE 3
`
`
`
`U.S. Patent
`
`Aug. 16, 2005
`
`Sheet 4 of 11
`
`US 6,931,449 B2
`
`CAPSULE
`
`a3.RECORDASSIGNEDIPADDRESS
`400
`\
`\
`\‘
`|
`
`
` CAPSULE
`
`
`DIRECTORY
`1. REQUESTFOR LOCATOR
`SERVICE
`
`402
`
`
`
`
`
`
`MACHINEIP
`
`ADDRESS
`
`2, CAPSULE LOCATOR
`450
`
`FIGURE4
`
`
`
`U.S. Patent
`
`Aug. 16, 2005
`
`Sheet 5 of 11
`
`US 6,931,449 B2
`
`FIRST CAPSULE RECEIVES
`PACKETS FROM SECOND CAPSULE
`500
`
`503
`
`FIRST CAPSULE DECIDES
`TO MIGRATE
`501
`
`FIRST AND SECOND CAPSULES
`SYNCHRONIZE
`902
`
`FIRST CAPSULE EXITS
`THE SYSTEM
`
`FIGURE 5A
`
`
`
`U.S. Patent
`
`Aug. 16, 2005
`
`Sheet 6 of 11
`
`US 6,931,449 B2
`
`
`
`TWO HOST NETWORKS
`ESTABLISH COMMUNICATION
`
`AT THE KERNEL LEVEL |
`
`550
`
`DETERMINE NUMBER ANDKIND OF
`PACKETS ALREADY TRANSFERRED
`BETWEENA FIRST AND SECOND CAPSULE
`|
`555
`
`565
`
`FIRST AND SECOND CAPSULES AGREE
`WHEN TO STOP SENDING PACKETS
`560
`
`..DETERMINE PORT NUMBERS FOR
`FIRST AND SECOND CAPSULES
`
`
`
`
`
`FIRST AND SECOND CAPSULES AGREE
`TO STATE OF CAPSULE
`JUST BEFORE MIGRATION
`5/0
`
`FIGURE 5B
`
`
`
`U.S. Patent
`
`Aug. 16, 2005
`
`Sheet 7 of 11
`
`US 6,931,449 B2
`
`CAPSULE MIGRATES |
`600
`
`
`
`APPLICATIONS COMMUNICATING WITH
`
`
`CAPSULE CONTINUE TO WRITE DATA.TO
`THE SOCKET ANDIT 1S PLACED IN A BUFFER
`610
`
`
`
`
`
`
`NO
`
`BUFFER FULL?
`620
`
`APPLICATION SPECIFIC
`
`630
`
`ERROR HANDLING
`ROUTINESINITIATED
`
` IS
`MIGRATION COMPLETE?
`640
`
`
`
`SEND BUFFERED PACKETS
`650
`
`
`FIGURE 6
`
`
`
`U.S. Patent
`
`Aug. 16, 2005
`
`Sheet 8 of 11
`
`US 6,931,449 B2
`
`|
`
`CAPSULE
`700
`
`
`
`
`
`1, LOOKUP LOCATOR FOR ANOTHER CAPSULE(2
`
`
` CAPSULE
`
`
`
`DIRECTORY
`SERVICE
`702
`
`
`
`2. RETURN CAPSULE(2)'S LOCATOR
`
`FIGURE 7
`
`
`
`U.S. Patent
`
`Aug. 16, 2005
`
`Sheet 9 of 11
`
`US 6,931,449 B2
`
`
`
`
`
`
`
`CAP IP| CAP2 IP
`CAP1 IP] CAP2 IP
`
`(SRC)|(ST)
`
`
`(SRC)|(DST)
`
`PAYLOAD
`PAYLOAD
`
`
`
`
` MACHINE2 801
`
`SYSTEM
`
`a
`
` TRANSLATION TABLE 804
`
`NETWORK DRIVER
`803
`
`
`
`
`MACHINE | IP] MACHINE 2 IP
`
`
`(SRC)
`(DST)
`PAYLOAD
`
`FIGURE 8
`
`
`
`U.S. Patent
`
`Aug. 16, 2005
`
`Sheet 10 of 11
`
`US 6,931,449 B2
`
`
`CAPSULE
`
`(CAPSULEIP)
`PKT
`
`
`MACHINEZIP
`CAPIP
`(SRC)
`(DST)
`PAYLOAD
`
`
`
`
`
`
`
`
`
`EDGE OF LEGACY AND NON-LEGACY
`NETWORKS
`
`
`Annenendameeosnsoesssnconannees
`
`aaneeeeeeneces
`
`
`NON-LEGACY SYST
`
`
`
`
` LEGACY SYSTEM
`(MACHINE2 IP)
`
`PKT
`
`STANDARD NETWORK
`
`
`
`
`
`MACHINE1 IP|MACHINE2 IP
`CAPSULE TRANSLATION
`
`TABLE
`(SRC)
`(DST)
`901
`PAYLOAD
`
` SPECIAL ROUTER 900
`
`FIGURE 9
`
`
`
`U.S. Patent
`
`Aug. 16, 2005
`
`Sheet 11 of 11
`
`US 6,931,449 B2
`
`
`
`Jaecoosoneceaseoh
`
`”
`
`PROCESSOR
`
`1013
`
`SERVER
`
`
`poten
`
`—ee ee eeefaewoohceeteeseeomeaanintomeeeemeterseemmepeecereSreeeeesereeaoremne
`
`
`
`
`1012
`
`1015
`
`MAIN
`MEMORY
`
`ee eee eee ess ese
`
`se ee
`
`NETWORK LINK
`
`1021
`
`LOCAL
`NETWORK
`
`.
`
`.
`
`1022
`
`ws
`
`1024
`
`FIGURE10
`
`
`
`US 6,931,449 B2
`
`1
`METHOD MIGRATING OPEN NETWORK
`CONNECTIONS
`
`BACKGROUND OF THE INVENTION
`
`1. Field of the Invention
`
`The present invention relates primarily to the field of
`computer networks, and in particular to migrating open
`network connections.
`
`2. Background Art
`Sometimes a person’s interaction with a computer
`involves using one or more computer programs
`(applications) that initiate connections to other computers
`over a computer network (open network connections).
`Sometimes it is desirable for this person to stop the inter-
`action with the computer, to move to a new computer, and
`to begin interacting with the new computer at precisely the
`point where the user stopped interacting with the first
`computer. Using current schemes, however,
`this is not
`possible because the user’s computing environment cannot
`be represented in a form that can be understood by both
`computers and moved between these computers.
`However, in co-pending U.S. patent application entitled
`“Method and Apparatus for Representing and Encapsulating
`Active Computing Environments” Application No. 09/764,
`771 filed on Jan. 16, 2001, assigned to the assignee of the
`present application, and hereby fully incorporated into the
`present application by reference,
`it was described how a
`group of active processes and their associated state could be
`represented in a form that madeit possible to halt the active
`processes, to move them to a different binary compatible
`machine, or to suspend them ondisk for later revival on the
`same or a different machine.
`
`Still, however,it is not possible to move active computing
`environments andstill maintain the open network connec-
`tions. Before further discussing the drawbacks of current
`schemes,
`it
`is instructive to discuss how the nature of
`computing is changing.
`The Nature of Computing
`The nature of computing is changing. Until recently,
`modern computing was mostly “machine-centric”, where a
`user accessed a dedicated computer at a single location. The
`dedicated computer had all the data and computer programs
`necessary for the user to operate the computer, and ideally,
`it had large amounts of hardware, such as disk drives,
`memory, processors, and the like. With the advent of com-
`puter networks, however, different computers have become
`more desirable and the focus of computing has become
`“service-oriented”. In particular, computer networks allow a
`user to access data and computer programsthat exist else-
`where in the network. When the user accesses such data or
`
`the remote computer is said to be
`computer programs,
`providing a service to the user. With the improvement in
`services available to users, the need to have a dedicated
`computer following the machine-centric paradigm is greatly
`reduced. The machine-centric paradigm also becomes much
`less practical in this environment because distributing ser-
`vices is much morecost-effective.
`
`In particular, computers in a service-oriented environment
`have little need for powerful hardware. For instance, the
`remote computer processesthe instructions before providing
`the service, so a powerful processor is not needed on the
`local access hardware. Similarly, since the service is pro-
`viding the data, thereis little need to have large capacity disk
`drives on the local access hardware. In such an environment,
`one advantage is that computer systems have been imple-
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`mented that allow a user to access any computer in the
`system andstill use the computer in the same manner(i.e.,
`have access to the same data and computer programs).
`For instance, a user may be in location A and running a
`word processor, a web browser, and an interactive multime-
`dia simulation. In a service-oriented environment, the user
`might stop using the server computer in location A and move
`to location B where the user could resume these computer
`programs on a different machine at the exact point where the
`user stopped using the machineat location A, as long as both
`computers had access via the computer network to the
`servers where the programs were being executed. The run-
`ning programs themselves in this example, however, cannot
`be moved between computers because of the design of
`current operating systems.
`Migration
`Moving (or migrating) the programs between servers is
`desirable, for instance, when the remote computer perform-
`ing the data processing and running the computer programs
`becomesbusyor isoff-line for repair or upgrades. In other
`instances it is desirable for a user to suspend the programs
`as they are in progress, for instance, using a disk, and to
`resume the programs later on a different machine. Often
`some of the processes to be suspended or moved may have
`open network connections. If the user is using a database
`program, it may be connected to a remote server where the
`database resides, for example. Currently when the user
`migrates to another machine, this connectionis lost and has
`to be re-established.
`One scheme leaves behind routers to act as forwarding
`agents for
`the new machine. These agents add to the
`overhead cost of the network, and slows down the commu-
`nications process if there are many of them in the network.
`Furthermore, leaving behind forwarding agents also means
`that the user’s session may not operate properly if any of the
`computers holding a forwarding agent for the session fails.
`Leaving behind forwarding agents increases the dependency
`of the computing environment on remote machines, whichis
`an unbounded problem.
`One problem with current packet based schemesto route
`information across a network is that the packets of infor-
`mation sent back and forth between machines have some
`
`information that is unique to each machine. In particular,
`each packet contains two parts, the header and the payload.
`The header contains routing information and the payload
`contains the actual data. Part of the routing information is
`the Internet Protocol address (IP address) of the machine on
`which the process is running. When the user migrates to
`another machine the IP address changes. Not only are all
`packets received prior to the migration lost, but any packets
`remaining in the transfer will not reach the user who has
`migrated to another machine because the IP address of that
`machineis different. All packets sent to and from this new
`machine will now have a different IP address as part of the
`headersection.
`
`SUMMARYOF THE INVENTION
`
`The present invention is directed to a method and appa-
`ratus for transparent migration of open network connections.
`According to one or more embodiments of the present
`invention a compute capsule is provided. The capsule has a
`unique locator, such as an IP address, assigned to it. Using
`the unique locator, the capsule may be movedto a different
`machine having potentially a different operating system or
`on different network and maintain the open network con-
`nections it had prior to the migration.
`All users are assigned their own capsuleat the time of log
`in, ie. session creation. The capsule communicates with
`
`
`
`US 6,931,449 B2
`
`3
`other capsules in the form of packets. In one embodiment,
`the outgoing packet uses the uniquelocator(i.e., IP address)
`of the target capsule as the final destination. This target
`capsule may lie on the same host network or on a different
`one. The underlying system knowsthe location of the target
`capsule by looking at the locator (e.g., IP address) of the
`target capsule, which may be encapsulated as part of the
`header information of the outgoing packet.
`In another
`embodiment,
`the underlying system wraps the original
`packet in another packet that uses the IP address of the host
`network on which the target capsule currently lies as the
`final destination instead of the locator of the target capsule.
`This new packet
`is routed using the standard network
`infrastructure to the host network on which the target
`capsule lies.
`In another embodiment, the present invention allows the
`host network on which the target capsule currently lies to
`remove the wrapper around the outgoing packet, and deliver
`it
`to the rightful owner (target capsule). In this way all
`applications address incoming packets by their network
`address regardless of the location of the target capsule. The
`target capsules are then mapped to the respective machine
`hosts using, for example, naming services like Lightweight
`Directory Access Protocol (LDAP)or others.
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`These and other features, aspects and advantages of the
`present invention will becomebetter understood with regard
`to the following description, appended claims and accom-
`panying drawings where:
`FIG. 1 shows how the traditional operating system is
`re-partitioned according to one embodiment of the present
`invention.
`
`FIG. 2A shows the protocol stack layer for a typical
`system that uses the International Standards Organization
`(ISO) model.
`FIG. 2B shows the protocol stack layer for a system
`according to an embodiment of the present invention.
`FIG. 3 showsthe creation of a compute capsule which is
`capable of migrating open network connections transpar-
`ently according to one embodimentof the present invention.
`FIG. 4 shows another embodiment of the present inven-
`tion where capsules receive a unique locator.
`FIG. 5A shows the steps a capsule takes just prior to
`migration according to an embodimentof the present inven-
`tion.
`
`FIG. 5B shows the synchronization between capsules
`according to an embodiment of the present invention.
`FIG. 6 shows a buffering scheme during migration
`according to an embodiment of the present invention.
`FIG. 7 shows how capsules communicate according to an
`embodiment of the present invention.
`FIG. 8 shows how capsules communicate according to
`another embodiment of the present invention.
`FIG. 9 is an illustration of the use of special routers
`according to one embodimentof the present invention.
`FIG. 10 is an illustration of an embodimentof a computer
`execution environment.
`
`DETAILED DESCRIPTION OF THE
`INVENTION
`
`Embodiments of the present invention are directed to the
`migration of open network connections. In the following
`description, numerous specific details are set forth to pro-
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`vide a more thorough description of embodiments of the
`invention. It will be apparent, however, to one skilled in the
`art,
`that
`the invention may be practiced without
`these
`specific details. In other instances, well known features have
`not been described in detail so as not
`to obscure the
`invention.
`
`To migrate open network connections a compute capsule
`structure is implemented. All of the state and data related to
`the open network connections of the member processes in
`the compute capsule formsa portion of the compute capsule.
`Whenthe capsule is moved, the state and data relating to
`these open network connections allows the connection to
`resume.
`
`Compute Capsules
`A compute capsule comprises one or more processes and
`their associated system environment. A compute capsule is
`configured to provide an encapsulated form that is capable
`of being moved between computers or stored off-line, for
`instance on a disk drive or other non-volatile storage
`medium. The system environment in a capsule comprises
`state information relating to exactly what the processes are
`doing at any given time in a form that is understandable by
`any binary compatible machine. System environmentinfor-
`mation may include, for instance, privileges, configuration
`settings, working directories and files, assigned resources,
`open devices, installed software, and internal programstate.
`Processes in the same capsule may communicate with
`each other and share data via standard Interprocess Com-
`munication (IPC) mechanisms, for instance using pipes,
`shared memory, or signals. Communication with processes
`outside the capsule, on the other hand,
`is restricted to
`Internet sockets and globally shared files. This ensures that
`capsules can move without restriction. For example, a
`conventional IPC pipe between processes in different cap-
`sules would force both capsules to reside on the same
`machine, but a socket can be redirected. The use of compute
`capsules is completely transparent, and applications need
`not
`take any special measures, such as source code
`modification, re-compilation, or linking with special librar-
`ies.
`In addition, a system using compute capsules can
`seamlessly inter-operate with systems that do not.
`Re-Partitioning the Operation System
`To provide such functionality, the traditional operating
`system is re-partitioned as shown in FIG. 1 so that all
`host-dependant and personalized elements of the computing
`environment are moved into the capsule 100, while lever-
`aging policies and management of the shared underlying
`system 105. The computing environment comprises CPU
`110,file system 115, devices 120, vita memory 125, and IPC
`130. Each of these components of the computing environ-
`ment have been partitioned as indicated by the curved line
`135.
`
`The state of the CPU scheduler 140 is left in the operating
`system 105. This state comprises information that the oper-
`ating system maintains so that it knows which processes
`may run, where they are, what priority they have, how much
`time they will be granted processor attention, etc. Process
`state 145, which is moved to the compute capsule 100, has
`process-specific information, such as the values in the
`registers,
`the signal handlers registered, parent/child
`relationships, access rights, and file tables. The file system
`115 leaves local files 150 that are identically available on all
`machines, (e.g., /usr/bin or/man on a UNIX system) in the
`operating system 105. The file system 115 further leaves
`disk blocks 152 outside the capsule, which are cachesof disk
`blocks that are read into the system and can be later used
`when needed to be read again. The disk structure 154 is also
`
`
`
`US 6,931,449 B2
`
`5
`left outside the capsule. The disk structure is specific to an
`operating system and serves as a cache of wherefiles are
`located on the disk, (i.e., a mapping of pathnamesto file
`locations). Network file system (NFS) is a protocol for
`accessing files on remote systems. The operating system
`maintains information 156 with respect to the NFS and a
`cache 158, whichis a cacheoffiles the operating system has
`retrieved from remote servers and stored locally. Similar
`state is maintained for other network based file systems.
`What has been partitioned away from the operating sys-
`tem is thefile state 160. The file state 160 is moved to the
`capsule 100. The file state 160 is the state of a file that some
`process in the capsule has opened. File state 160 includes,
`for instance, the name of the file and where the process is
`currently accessing the file. If the file is not globally acces-
`sible via the network (e.g., stored on a local disk), then its
`contents are placed in the capsule.
`Devices 120 are components that are attached to the
`computer. For each device thereis a driver that maintains the
`state of the device. The disk state 165 remains in the
`operating system 105. The other device components are
`specific to a log-in session and are movedto the capsule 100.
`The other devices include a graphics controller state 170,
`whichis the contentthat is being displayed onthe screen, for
`instance the contents of a frame buffer that holds color
`values for each pixel on a display device, such as a monitor.
`Keyboard state 172 and mousestate 175 includesthe state
`associated with the user’s current
`interaction with the
`
`keyboard, for instance whether caps lock is on or off and
`with the screen, for instance where the pointer is currently
`located. Tty state 174 includes information associated with
`the terminals the user is accessing, for instance if a user
`opens an Xwindow on a UNIX systemorif a useruses telnet
`or performs an rlogin (remote login). Tty state 174 also
`includes information about what the cursor looks like, what
`types of fonts are displayed in the terminals, and whatfilters
`should be applied to make the text appear a certain way, for
`instance.
`
`Virtual memory 125 has state associated with it. The
`capsule tracks the state associated with changes made from
`within the capsule which are termed read/write pages 176.
`Read-only pages 178 remain outside the capsule. However,
`in one embodiment read-only pages 178 are movedto the
`capsule as well, which is useful in some scenarios. For
`instance, certain commands one would expect to find on a
`newmachine when their capsule migrates there may not be
`available. Take, for instance, a command suchas ls or more
`on a UNIX system. Those read-only pages may not be
`necessary to bring into the capsule when it is migrating
`between UNIX machines, because those pages exist on
`every UNIX machine. If, however, a user is moving to a
`machine that does not use those commands,it is useful to
`move those read only pages into the capsule as well. The
`swap table 180, which records what virtual memory pages
`have been replaced and moved to disk, remains outside the
`capsule as do the free list 182, (which is a list of empty
`virtual memory pages), and the page table 184.
`All IPC 130 is moved into the capsule. This includes
`shared memory 186, which comprises a portion of memory
`that multiple processes maybe using, pipes 188, fifos 190,
`signals 192, including handler lists and the state needed to
`know what handler the process was using and to find the
`handler. Virtual interface and access control 194is useful for
`
`10
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`separating the capsule from host-dependent information that
`is specific to a machine, such as the structure of internal
`program state or the IDsfor its resources. The interface 194
`refers generally to the virtualized naming of resources and
`
`65
`
`6
`translations between virtual resource names and physical
`resources, as well as lists that control access to processes
`trying to access capsules. Virtualization facilitates the
`remapping of resource names to a new computer when a
`process is migrated. Network portion 199 comprises the
`information necessary for data to be transferred across a
`network. For instance, it includes the location of the source
`of a packet and the location of the destination for a packet.
`Thus, capsule state includes data that are host-specific,
`cached on the local machine to which the capsule is bound,
`or not otherwise globally accessible. This includes the
`following information:
`Capsule State: Nametranslation tables, access controllist,
`ownerID, capsule name, etc.;
`Processes: Tree structure, process control block, machine
`context, thread contexts, scheduling parameters,etc.;
`Address Space Contents: Read/write pages of virtual
`memory; because they are available in the file system,
`contents of read-only files mapped into the address
`space (e.g., the application binary andlibraries) are not
`included unless explicitly requested;
`Open File State: Only file names, permissions,offsets, etc.
`are required for objects available in the global file
`system. However, the contents of personalfiles in local
`storage (e.g.,
`/tmp) must be included. Because the
`pathname ofa file is discarded after it is opened, for
`each process one embodimentof the invention main-
`tains a hash table that maps file descriptors to their
`corresponding pathnames.In addition, some openfiles
`have no pathname,(i.e., if an unlink operation has been
`performed). The contents of such files are included in
`the capsule as well;
`IPC Channels: IPC state has been problematic in most
`prior systems. The present invention adds a new inter-
`face to the kernel modules for each form of IPC. This
`interface includes two complementary elements: export
`current state, and import state to re-create channel. For
`example, the pipe/fifo module is modified to export the
`list of processes attachedto a pipe, its current mode,the
`list of filter modules it employs, file system mount
`points, and in-flight data. When given this state data,
`the system can re-establish an identical pipe;
`Open Devices: By adding a state import/export interface
`similar to that used for IPC, the invention supports the
`most commonly used devices: keyboard, mouse, graph-
`ics controller, and pseudo-terminals. The mouse and
`keyboard have verylittle state, mostly the location of
`the cursor and the state of the LEDs(e.g., caps lock).
`The graphics controller is more complex. The video
`mode(e.g., resolution and refresh rate) and the contents
`of the frame buffer must be recorded, along with any
`color tables or other specialized hardware settings.
`Supporting migration between machines with different
`graphics controllers is troublesome, but a standard
`remote display interface can addressthat issue. Pseudo-
`terminal state includes the controlling process, control
`settings, a list of streams modules that have been
`pushed onto it, and any unprocessed data.
`Capsules do not include shared resources or the state
`necessary to manage them (e.g., the processor scheduler,
`page tables), state for kernel optimizations (e.g., disk
`caches),
`local
`file system, physical resources (e.g.,
`the
`network), etc.
`Network Layer
`Networklayer 199 of FIG. 1 is further described in FIGS.
`2A and 2B. FIG. 2A showsthe protocol stack layer for a
`
`
`
`US 6,931,449 B2
`
`7
`typical system that uses the International Standards Orga-
`nization ISO) model (i.e., the Internet). This networking
`model includes an application layer 206, a presentation layer
`205, a session layer 210, a transport layer 215, a network
`layer 220, a datalink layer 225, and a physical layer 230.
`Network layer 220 in this environment assigns every
`machine a unique locator address 240, commonly known as
`IP (Internet Protocol) address. It also assigns the network to
`which these machines are connected a unique network
`address 245, commonly knownas network IP address. These
`unique address are used to send and receive messages, as
`well as find the location of any machine on any network.
`Messages can include email messages, transfer of different
`kinds of data, etc.
`FIG. 2B when contrasted with FIG. 2A illustrates the
`
`difference between the current art in system design and a
`modification to the Protocol stack layer of the ISO model
`that the present invention uses. FIG. 2B includes the same
`seven layer approach, but modifies network layer 250 where
`not only are unique locator addresses for machines 255 and
`networks 260 assigned, but also every compute capsule in
`the network is assigned a unique locator address 270.
`Capsule Creation
`FIG. 3 showsthe creation of a compute capsule which is
`capable of migrating open network connections transpar-
`ently according to one embodimentof the present invention.
`At step 300 a unique locator, such as an IP address and a
`unique network locator address, is given to the capsule. At
`step 301, all the processes of the user are obtained. Next, at
`step 302, the state of each process is captured. For example,
`if a user is transferring data from an open network
`connection,the state of this process will include the number
`of packets of data transferred until the point where the user
`decides to migrate. Step 303 encapsulates all of the pro-
`cesses of the user along with identification information
`about the user (e.g. log-in session) into a capsule. The user
`can now migrate to another machine on the sameordifferent
`network, and the processes that were open before the migra-
`tion will now be open displaying the same information on
`the new machine. This smooth transition between machines
`
`is possible due to the process states captured and stored in
`the capsule before the migration.
`FIG. 4 shows another embodiment of the present inven-
`tion where capsules receive a unique locator.
`In this
`embodiment, unique locator, such as an IP address and
`unique network locator address, is assigned to a capsule.
`Capsule 400 is created in system 401, which has its own
`unique network locator address (machine IP address) 450.
`System 401 requests a unique locator for capsule 400 from
`a capsule directory service 402. Directory service may be a
`lightweight directory access protocol (LDAP) service or
`another well-knownservice, and it may use Dynamic Host
`Configuration Protocol (DHCP) although it is not required.
`Capsule directory service 402 may send back the locator for
`capsule 400 and also keep track of its location.
`Preparation for Migration
`One embodimentof the present invention seen in FIG. 5A
`showsthe steps taken just before the migration of an open
`network connection is initiated, for instance when the sys-
`tem identifies that the user is suspending the session in
`preparation for migration. This example is shown with
`respect to a single open network connection, but it should be
`realized that in mostsituations any given capsule might have
`multiple member processes with one or more open network
`connections in each process.
`In the case of multiple
`connections, each is handled in a way described below. In
`the simplified scenario of FIG. 5, a first capsule is receiving
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`8
`packets from a second capsule, at step 500. At step 501 the
`first capsule decides to migrate. At step 502, the first and
`second capsules synchronize with each other.
`Synchronizing step 502 may be performed at the kernel
`level where the two host networks exchange messages.
`These messages might include the number and kind of
`packets already exchanged between the two capsules, the
`port numberofthe first and second capsules, among other
`things. The two host networksalso agree on the state of the
`capsules just before the first capsule migrates. The capsules
`as well as processes are not only unaware of this synchro-
`nization step, but are unaware of the migration of the first
`capsule. Since a capsule often has multiple open network
`connections with more than one capsule, synchronization
`step 502 has to be performed individually with each capsule.
`At step 503, the first capsule exits the system, for instance
`to migrate and join back later when the user logs back in at
`a later time on a different machine.
`One embodiment of the synchronization steps between
`capsules that have member processes with open network
`connections is shown in FIG. 5B. The synchronization steps
`between the capsules tells them where and whento stop
`sending packets. At step 550, the two host networksestab-
`lish communication at the kernel level. At step 555 the
`numberand kind of packets already exchanged between the
`two capsules is determined and at step 560, the capsules
`agree when to stop sending packets. Then, at step 565 the
`port numberofthe first and second capsules is determined.
`The two host networksalso agree on the state of the capsules
`just before the first capsule migrates at step 570.
`Buffering Scheme During Migration
`When a capsule