`
`In
`
`
`
`PATENT
`
`5760-00400
`
`VRTS 0064
`
`"EXPRESS MAIL" MAILING LABEL NUMBER
`EL849600979US
`
`DATE OF DEPOSIT MARCH 28 2002
`I HEREBY CERTIFY THAT THIS PAPER OR
`FEE IS BEING DEPOSITED WITH THE
`UNITED STATES POSTAL SERVICE
`"EXPRESS MAIL POST OFFICE TO
`ADDRESSEE" SERVICE UNDER 37 C.F.R.
`§1.10 ON THE DATE INDICATED ABOVE
`AND IS ADDRESSED TO THE ASSISTANT
`COMMISSIONER FOR PATENTS,
`WASHING N, D.C 2023
`'
`
`Paul Kennedy
`
`Disaster Recovery and Backup Using Virtual Machines
`
`By:
`
`Hans F. van Rietschote
`
`Symantec 2018
`
`Veaam v. Symantec
`|PR2013—00150
`
`Symantec 2018
`Veaam v. Symantec
`IPR2013-00150
`
`
`
`BACKGROUND OF THE INVENTION
`
`1.
`
`Field of the Invention
`
`This invention is related to the field of computer systems and, more particularly,
`
`to backup and disaster recovery mechanisms in computer systems.
`
`2.
`
`Description of the Related Art
`
`Computer systems, and their components, are subject to various failures which
`
`may result in the loss of data. For example, a storage device used in or by the computer
`
`system may experience a failure (e. g. mechanical, electrical, magnetic, etc.) which may
`
`make any data stored on that storage device unreadable. Erroneous software or hardware
`
`operation may corrupt the data stored on a storage device, destroying the data stored on
`
`an otherwise properly functioning storage device. Any component in the storage chain
`
`between (and including) the storage device and the computer system may experience
`
`failure (e.g. the storage device, connectors (e. g. cables) between the storage device and
`
`other circuitry, the network between the storage device and the accessing computer
`
`system (in some cases), etc.).
`
`To mitigate the risk of losing data, computer system users typically make backup
`
`copies of data stored on various storage devices. Typically, backup software is installed
`
`on a computer system and the backup may be scheduled to occur periodically and
`
`automatically. In many cases, an application or applications may be in use when the
`
`backup is to occur. The application may have one or more files open, preventing access
`
`by the backup software to such files.
`
`Some backup software may include custom code for each application (referred to
`
`as a "backup agent"). The backup agent may attempt to communicate with the
`
`application or otherwise cause the application to commit its data to files so that the files
`
`can be backed up. Often, such backup agents make use of various undocumented features
`
`
`
`10
`
`15
`
`20
`
`25
`
`
`
`of the applications to successfully backup files. As the corresponding applications
`
`change (e. g. new versions are released), the backup agents may also require change.
`
`Additionally, some files (such as the Windows registry) are always open and thus difficult
`
`to backup.
`
`Disaster recovery configurations are used in some cases to provide additional
`
`protection against loss of data due to failures, not only in the computer systems
`
`themselves but in the surrounding environment (e.g. loss of electrical power, acts of
`
`nature, fire, etc.). In disaster recovery configurations, the state of data may periodically
`
`be checkpointed from a first computer system to a second computer system. In some
`
`cases, the second computer system may be physically located distant from the first
`
`computer system. If a problem occurs that causes the first computer system to go down,
`
`the data is safely stored on the second computer system. In some cases, applications
`
`previously running on the first computer system may be restarted on the second computer
`
`system to allow continued access to the preserved data. The disaster recovery software
`
`may experience similar issues as the backup software with regard to applications which
`
`are running when a checkpoint is attempted and the files that the applications may have
`
`open at the time of the checkpoint. Additionally, replicating all the state needed to restart
`
`the application on the second computer system (e.g. the operating system and its
`
`configuration settings, the application and its configuration settings, etc.) is complicated.
`
`SUMMARY OF THE INVENTION
`
`One or more computer systems, a carrier medium, and a method are provided for
`
`backing up virtual machines. The backup may occur, e. g., to a backup medium or to a
`
`disaster recovery site, in various embodiments. In one embodiment, an apparatus
`
`includes a computer system configured to execute at least a first virtual machine, wherein
`
`the computer system is configured to: (i) capture a state of the first virtual machine, the
`
`state corresponding to a point in time in the execution of the first virtual machine; and (ii)
`
`
`
`IO
`
`15
`
`20
`
`25
`
`
`
`copy at least a portion of the state to a destination separate from a storage device to which
`
`the first Virtual machine is suspendable. A carrier medium may include instructions
`
`which, when executed, cause the above operation on the computer system. The method
`
`may comprise the above highlighted operations.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`
`
`10
`
`15
`
`20
`
`25
`
`The following detailed description makes reference to the accompanying
`
`drawings, which are now briefly described.
`
`Fig. l is a block diagram of one embodiment of a computer system.
`
`Fig. 2 is a flowchart illustrating operation of one embodiment of a backup
`
`program shown in Fig. 1.
`
`Fig. 3 is a block diagram of one embodiment of a pair of computer systems,
`
`wherein one of the computer systems is a disaster recovery site for the other computer
`
`system.
`
`Fig. 4 is a flowchart illustrating operation of one embodiment of a checkpoint
`
`program shown in Fig. 3
`
`Fig. 5 is a flowchart illustrating operation of one embodiment of a recovery
`
`program shown in Fig. 3.
`
`Fig. 6 is a block diagram of a second embodiment of a computer system.
`
`Fig. 7 is a flowchart illustrating operation of a second embodiment of a backup
`
`program shown in Fig. 6.
`
`
`
`Fig. 8 is a flowchart illustrating operation of a portion of one embodiment of a
`
`VM kernel.
`
`
`
`While the invention is susceptible to various modifications and alternative forms,
`
`specific embodiments thereof are shown by way of example in the drawings and will
`
`herein be described in detail. It should be understood, however, that the drawings and
`
`detailed description thereto are not intended to limit the invention to the particular form
`
`disclosed, but on the contrary, the intention is to cover all modifications, equivalents and
`
`10
`
`alternatives falling within the spirit and scope of the present invention as defined by the
`
`appended claims.
`
`DETAILED DESCRIPTION OF EMBODIMENTS
`
`A computer system executes one or more virtual machines, each of which may
`
`include one or more applications. To create a backup, the computer system may capture a
`
`state of each virtual machine and backup the state. In one embodiment, the computer
`
`system may capture the state in cooperation with a virtual machine kernel which controls
`
`execution of the virtual machines, while the virtual machines continue to execute. The
`
`state may include the information in a virtual machine image created in response to a
`
`suspension of the virtual machine. In another embodiment, the computer system may
`
`capture the state by suspending each virtual machine to an image and backing up the
`
`image of the virtual machine. In this manner, the files used by the application are backed
`
`up, even if the application has the files open while the virtual machine is active in the
`
`computer system. Furthermore, updates to the files which are not yet committed (e.g.
`
`they are still in memory in the virtual machine) may be backed up as well. In some cases,
`
`only a portion of the state or image need be backed-up at a given time (e.g. non—persistent
`
`virtual disks may be backed-up by copying the COW files corresponding to those disks, if
`
`an initial copy of the disk file has been made).
`
`20
`
`25
`
`
`
`Similarly, for disaster recovery configurations, the computer system may
`
`periodically capture the state of the virtual machines as a checkpoint. The checkpoints
`
`may be copied to a second computer system, which may retain one or more checkpoints
`
`for each virtual machine. In the event of a "disaster" at the original computer system, the
`
`virtual machines may be resumed from one of the checkpoints on the second computer
`
`system. The loss of data may be limited to the data created between the selected
`
`checkpoint and the point at which the disaster occurred. The checkpoints may be created
`
`by capturing state while the virtual machines continue to execute, or by suspending the
`
`Virtual machines and copying the suspended image. As mentioned above, in some cases,
`
`only a portion of the state or image may be copied. Since the virtual machine state
`
`includes all of the state used by the application (operating system and its configuration
`
`settings, the application and its configuration settings, etc.), restarting the application on
`
`the second computer system may occur correctly.
`
`Turning now to Fig. 1, a block diagram is shown illustrating one embodiment of a
`
`computer system 10 for performing a backup. Other embodiments are possible and
`
`contemplated. The computer system 10 includes one or more virtual machines (e.g.
`
`virtual machines 16A-16C as illustrated in Fig. 1). The virtual machines are controlled
`
`by a virtual machine (VM) kernel 18. The virtual machines 16A—16C and the VM kernel
`
`18 may comprise software and/or data structures. The software may be executed on the
`
`underlying hardware in the computer system 10 (e. g. the hardware 20). The hardware
`
`may include any desired circuitry. For example, the hardware may include one or more
`
`processors, or central processing units (CPUs), storage, and input/output (I/O) circuitry.
`
`In the embodiment of Fig. 1, the computer system 10 includes a storage device 22 and a
`
`backup medium 24.
`
`As shown in Fig. 1, each application executing on the computer system 10
`
`executes within a virtual machine 16A—16C. Generally, a virtual machine comprises any
`
`
`
`10
`
`15
`
`20
`
`25
`
`
`
`combination of software, one or more data structures in memory, and/or one or more files
`
`stored on a storage device (such as the storage device 22). The virtual machine mimics
`
`the hardware used during execution of a given application. For example, in the virtual
`
`machine 16A, an application 28 is shown. The application 28 is designed to execute
`
`within the operating system (0/5) 30. Both the application 28 and the O/S 30 are coded
`
`with instructions executed by the virtual CPU 32. Additionally, the application 28 and/or
`
`the O/S 30 may make use of various virtual storage devices 34 and virtual I/O devices 36.
`
`The virtual storage may include any type of storage, such as memory, disk storage, tape
`
`storage, etc. The disk storage may be any type of disk (e.g. fixed disk, removable disk,
`
`compact disc read-only memory (CD-ROM), rewriteable or read/write CD, digital
`
`Versatile disk (DVD) ROM, etc.). Each disk storage in the virtual machine may be
`
`mapped to a file on a storage device such as the storage device 22A. Alternatively, each
`
`disk storage may be mapped directly to a storage device, or a combination of direct
`
`mappings and file mappings may be used. The virtual I/O devices may include any type
`
`of I/O devices, including modems, audio devices, video devices, network interface cards
`
`(NICS), universal serial bus (USB) ports, firewire (IEEE 1394) ports, serial ports, parallel
`
`ports, etc. Generally, each virtual I/O device may be mapped to a corresponding I/O
`
`device in the underlying hardware or may be emulated in software if no corresponding
`
`l/O device is included in the underlying hardware.
`
`The virtual machine in which an application is executing encompasses the entire
`
`system state associated with an application. Generally, when a Virtual machine is active
`
`(i.e. the application within the virtual machine is executing), the virtual machine may be
`
`stored in the memory of the computer system on which the virtual machine is executing
`
`(although the VM kernel may support a paging system in which various pages of the
`
`memory storing the virtual machine may be paged out to local storage in the computer
`
`system) and in the files which are mapped to the virtual storage devices in the virtual
`
`machine. The VM kernel may support a command to suspend the virtual machine. In
`
`response to the command, the VM kernel may write an image of the virtual machine to
`
`
`
`10
`
`15
`
`20
`
`25
`
`
`
`the storage device 22 (e.g. the image 40 shown in Fig. 1), thus capturing the current state
`
`of the virtual machine and thus implicitly capturing the current state of the executing
`
`application. The image may include one or more files written in response to the suspend
`
`command, capturing the state of the virtual machine that was in memory in the computer
`
`system, as well as the files representing the virtual storage in the virtual machine. The
`
`state may include not only files written by the application, but uncommitted changes to
`
`files which may still be in the memory within the virtual machine, the state of the
`
`hardware (including the processor 32, the memory in the virtual machine, etc.) within the
`
`Virtual machine, etc. Thus, the image may be a snapshot of the state of the executing
`
`10
`
`application.
`
`
`
`20
`
`25
`
`A suspended virtual machine may be resumed using a resume command supported
`
`by the VM kernel. In response to the resume command, the VM kernel may read the
`
`image of the suspended virtual machine from the storage device and may activate the
`
`Virtual machine in the computer system.
`
`The computer system 10 may be configured to backup the virtual machines
`
`executing thereon. For example, in the illustrated embodiment, a backup program 42
`
`may execute in the virtual machine 16C (and may also be stored on the storage device
`
`22). The virtual machine 16C may be a console Virtual machine as illustrated in Fig. 1 (a
`
`virtual machine which also has direct access to the hardware 20 in the computer system
`
`10). Alternatively, the backup program 42 may execute on a non-console virtual machine
`
`or outside of a virtual machine.
`
`The backup program 42 may suspend the virtual machines executing on the
`
`computer system 10 (e.g. the virtual machines 16A-16B as shown in Fig. 1) and backup
`
`the image of each virtual machine (e. g. the image 40 of the virtual machine 16A) onto the
`
`backup medium 24 (or send the image files to a backup server, if the backup server is
`
`serving as the backup medium 24). Once the backup has been made, the backup program
`
`
`
`42 may resume the virtual machines to allow their execution to continue.
`
`Since a given virtual machine is suspended during the backup operation for that
`
`virtual machine, the files used by the app1ication(s) within the virtual machine may be
`
`backed up even if the files are in use by the application(s) at the time the virtual machine
`
`is suspended. Each virtual machine may be suspended and backed up in the same
`
`fashion. Thus, the backup program 42 may not include any specialized backup agents for
`
`different applications that may be included in the various virtual machines.
`
`
`
`10
`
`20
`
`25
`
`In the embodiment of Fig. 1, the backup medium 24 may be used to store the
`
`images of the virtual machine. Generally, the backup medium 24 may be any medium
`
`capable of storing data. For example, the backup medium 24 may be storage device
`
`similar to the storage device 22. The backup medium 24 may be a removable storage
`
`device, to allow the backup medium to be separated from the computer system 10 after
`
`the backup is complete. Storing the backup medium physically separated from the
`
`computer system that is backed up thereon may increase the reliability of the backup,
`
`since an event which causes problems on the computer system may not affect the backup
`
`medium. For example, the backup medium 24 may comprise a removable disk or disk
`
`drive, a tape backup, writeable compact disk storage, etc. Alternatively, the backup
`
`medium 24 may comprise another computer system (e.g. a backup server) coupled to
`
`receive the backup data from the computer system 10 (e.g. via a network coupling the two
`
`computer systems), a storage device attached to a network to which the computer system
`
`is attached (e.g. NAS or SAN technologies), etc.
`
`The virtual hardware in the virtual machine 16A (and other virtual machines such
`
`as the virtual machines 16B-16C) may be similar to the hardware 20 included in the
`
`computer system 10. For example, the virtual CPU 32 may implement the same
`
`instruction set architecture as the processor(s) in the hardware 20. In such cases, the
`
`virtual CPU 32 may be one or more data structures storing the processor state for the
`
`
`
`virtual machine 16A. The application and O/S software instructions may execute on the
`
`CPU(s) in the hardware 20 when the virtual machine 16A is scheduled for execution by
`
`the VM kernel 18. When the VM kernel 18 schedules another virtual machine for
`
`execution (e.g. the virtual machine 16B), the VM kernel 18 may write the state of the
`
`processor into the virtual CPU 32 data structure. Alternatively, the virtual CPU 32 may
`
`be different from the CPU(s) in the hardware 20. For example, the Virtual CPU 32 may
`
`comprise software coded using instructions from the instruction set supported by the
`
`underlying CPU to emulate instruction execution according to the instruction set
`
`architecture of the virtual CPU 32. Alternatively, the VM kernel 18 may emulate the
`
`operation of the hardware in the virtual machine. Similarly, any virtual hardware in a
`
`virtual machine may be emulated in software if there is no matching hardware in the
`
`hardware 20.
`
`Different virtual machines which execute on the same computer system 10 may
`
`differ. For example, the 0/8 30 included in each virtual machine may differ. Different
`
`virtual machines may employ different versions of the same O/S (e.g. Microsoft Windows
`
`NT with different service packs installed), different versions of the same O/S family (e.g.
`
`Microsoft Windows NT and Microsoft Windows2000), or different O/Ss (e.g. Microsoft
`
`Windows NT, Linux, Sun Solaris, etc.).
`
`Generally, the VM kernel may be responsible for managing the virtual machines
`
`on a given computer system. The VM kernel may schedule virtual machines for
`
`execution on the underlying hardware, using any scheduling scheme. For example, a time
`
`division multiplexed scheme may be used to assign time slots to each virtual machine.
`
`Additionally, the VM kernel may handle the suspending and resuming of virtual
`
`machines responsive to suspend and resume commands. The commands may be received
`
`from a virtual machine, or may be communicated from another computer system. In one
`
`embodiment, the VM kernel may be the ESX product available from VMWare, Inc. (Palo
`
`Alto, CA).
`
`
`
`10
`
`15
`
`20
`
`25
`
`
`
`In the illustrated embodiment, the VM kernel may execute directly on the
`
`underlying hardware (i.e. without an underlying operating system). In other
`
`embodiments, the VM kernel may be designed to execute within an operating system.
`
`For example, the GSX product available from VMWare, Inc. may execute under various
`
`versions of lVIicrosoft’s Windows operating system and/or the Linux operating system.
`
`The storage device 22 may be any type of storage device to which the computer
`
`systems 10 may be coupled. For example, the storage device 22 may comprise one or
`
`more fixed disk drives such as integrated drive electronics (IDE) drives, small computer
`
`system interface (SCSI) drives, etc. The fixed disk drives may be incorporated as
`
`peripherals of the computer systems 10 through a peripheral bus in the computer systems
`
`10 such as the peripheral component interconnect (PCI) bus, USB, firewire, etc.
`
`Alternatively, the storage device 22 may couple to a network (e.g. network attached
`
`storage (NAS) or storage area network (SAN) technologies may be used). The storage
`
`device 22 may be included in file servers to which the computer systems 10 have access.
`
`The storage device 22 may also be removable disk drives, memory, etc. Generally, a
`
`storage device is any device which is capable of storing data.
`
`It is noted that, while each virtual machine illustrated in Fig. 1 includes one
`
`application, generally a virtual machine may include one or more applications. For
`
`example, in one embodiment a user may execute all applications which execute on the
`
`same underlying O/S 30 in the same virtual machine.
`
`10
`
`15
`
`20
`
`25
`
`It is noted that the term "program", as used herein, refers to a set of instructions
`
`which, when executed, perform the function described for that program. The instructions
`
`may be machine level instructions executed by a CPU, or may be higher level instructions
`
`defined in a given higher level language (e.g. shell scripts, interpretive languages, etc.).
`
`The term "software" may be synonymous with "program".
`
`10
`
`
`
`
`
`3X3!“£33”
`
`£35:
`L==-‘re
`
`
`
`Turning next to Fig. 2, a flowchart is shown illustrating operation of one
`
`embodiment of the backup program 42. Other embodiments are possible and
`
`contemplated. The blocks shown in Fig. 2 may represent the operation of instructions
`
`forming the backup program 42, when executed.
`
`The backup program 42 suspends a virtual machine (block 50). As mentioned
`
`above, the VM kernel supports a suspend command. The backup program 42 transmits
`
`the send command to the VM kernel to suspend the Virtual machine. The command may
`
`include a virtual machine "name" assigned by the VM kernel or the user which uniquely
`
`identifies the virtual machine to the VM kernel.
`
`The virtual machines may have one or more virtual disks which are defined to be
`
`"non-persistent". Generally, a non-persistent virtual disk is one in which writes to the
`
`virtual disk are not committed until a separate "commit" command is executed for the
`
`virtual disk. By way of contrast, writes to a "persistent" virtual disk are committed at the
`
`time of the individual writes. In one embodiment, the non-persistent disks may be
`
`implemented as two files: a virtual disk file and a copy-on-write (COW) file for each
`
`disk. In embodiments using the ESX/GSX products from VMWare, Inc., the COW file
`
`may be the file with the extension ".REDO". The virtual disk file may be a file the size of
`
`the virtual disk. The virtual disk file may be organized as a set of disk blocks, in a
`
`fashion similar to physical disk drives. The COW file stores updated copies of disk
`
`blocks in a log form. Thus, the virtual disk file may contain the disk blocks prior to any
`
`uncommitted updates being made. If more than one write has been made to a given
`
`block, the COW file may store multiple copies of the block, one for each write that has
`
`occurred. To commit the writes, the blocks from the COW file may be written to the
`
`corresponding block locations in the virtual disk file, beginning at the start of the COW
`
`file and proceeding, in order, to the end. Both the virtual disk files and the corresponding
`
`COW files may be included in the virtual machine image 40.
`
`ll
`
`10
`
`15
`
`20
`
`25
`
`
`
`
`
`If the suspended virtual machine includes one or more non—persistent virtual disks
`
`(decision block 52), the backup program 42 may commit the changes in the COW files to
`
`the corresponding virtual disks prior to making the backup (block 54). Alternatively, the
`
`backup program 42 may backup the virtual disk and COW files. In such an embodiment,
`
`the backup program 42 may optionally commit the changes after copying the virtual
`
`machine image to the backup medium, if desired. In yet another alternative, only the
`
`COW files may be copied for non—persistent virtual disks after an initial copy of the
`
`virtual disk file is made.
`
`10
`
`The backup program 42 copies the Virtual machine image 40 to the backup
`
`medium 24 (block 56) and resumes the virtual machine on the computer system 10 (block
`
`58). If additional virtual machines remain to be backed-up (decision block 60), the
`
`backup program 42 selects the next Virtual machine (block 62) and repeats blocks 50-60
`
`15
`
`for that virtual machine.
`
`20
`
`25
`
`While the flowchart shown in Fig. 2 illustrates backing up one virtual machine at
`
`a time, other embodiments may suspend all the virtual machines, copy the images to the
`
`backup medium 24, and resume all the virtual machines.
`
`It is noted that, while the present embodiment may include non—persistent virtual
`
`disks with COW files, other embodiments may have only persistent virtual disks and the
`
`disks files may be backed up as a whole each time a backup occurs.
`
`Turning next to Fig. 3, a block diagram illustrating a pair of computer systems
`
`10A and 10B arranged in a disaster recover configuration is shown. Other embodiments
`
`are possible and contemplated. In the embodiment of Fig. 3, the computer system 10A
`
`may be the primary system (e. g. the one located at the user’s site) and the computer
`
`system 10B may be the disaster recovery system (e.g. the one located physically remote
`
`l2
`
`
`
`from the user’s site). The computer systems 10A and 10B may be coupled via a network
`
`12. Similar to the computer system 10 shown in Fig. 1, the computer system 10A may
`
`include a VM kernel 18A, hardware 20A, and a storage device 22A. Similarly, the
`
`computer system 10B may include a VM kernel 18B, hardware 20B, and a storage device
`
`22B. The computer system 10A is shown executing the virtual machines 16A-16C. Each
`
`virtual machine may include one or more applications, O/S, virtual storage, virtual CPU,
`
`virtual I/O, etc. (not shown in Fig. 3), similar to the illustration of the Virtual machine
`
`16A in Fig. 1. The computer system 10B is shown executing the virtual machine 16D
`
`(and may execute the Virtual machine 16A, if a "disaster" event occurs with the computer
`
`10
`
`system 10A).
`
`
`
`15
`
`20
`
`25
`
`The image 40 of the virtual machine 16A is illustrated in greater detail in Fig. 3
`
`for one embodiment. In the illustrated embodiment, the image 40 includes a memory file
`
`70, a disk file 72, and a COW file 74. The memory file 70 may include the state of the
`
`memory in the virtual machine 16A as well as any virtual hardware state that may be
`
`saved (e. g. the state of the virtual CPU 32, etc.). The disk file 72 may be the virtual disk
`
`file. A disk file 72 may be provided for each virtual disk in the virtual machine. The
`
`COW file 74 may be the COW file for a non—persistent virtual disk. A COW file 74 may
`
`be provided for each non—persistent virtual disk in the virtual machine.
`
`A checkpoint program 76 may be executing in the virtual machine 16C (and may
`
`be stored on the storage device 22A as shown in Fig. 3). Similar to Fig. 1, the virtual
`
`machine 16C may be a console virtual machine. Alternatively, the checkpoint program
`
`76 may execute on a non-console virtual machine or outside of a virtual machine.
`
`Generally, the checkpoint program 76 may periodically suspend the virtual machines
`
`which are to be replicated on the disaster recovery system, thus creating virtual machine
`
`images that may serve as checkpoints of the virtual machines on the disaster recovery
`
`system. The checkpoint program 76 copies the images to the computer system 10B over
`
`the network 12, and then resumes the virtual machines on the computer system 10A.
`
`13
`
`
`
`If a disaster event occurs (e.g. the computer system 10A crashes, is corrupted, or
`
`the environment the computer system 10A is executing in experiences a problem such as
`
`a loss of power, an act of God, etc.), the computer system 10B may recover the virtual
`
`machine or machines from any of the checkpoints that have been provided by the
`
`computer system 10A. For example, in Fig. 3, the recovery program 78 (executing in the
`
`virtual machine 16D and also stored on the storage device 22B) may be used to recover
`
`from one of the checkpoints. While the virtual machine 16D in which the recovery
`
`program 78 is executing is the console virtual machine, other embodiments may execute
`
`the recovery program 78 in a non-console virtual machine or outside of a virtual machine.
`
`In Fig. 3, two checkpoints are shown stored on the storage device 22B (although
`
`in general any number of checkpoints may be stored). The first checkpoint includes the
`
`memory file 70A, the disk file 72A, and the COW file 74A. The second checkpoint
`
`(made later in time than the first checkpoint) includes the memory file 70B, the disk file
`
`72B, and the COW file 74B. Either checkpoint may be used to recover the virtual
`
`machine 16A on the computer system 10B. To recover, the recovery program 78 resumes
`
`the Virtual machine using one of the checkpoints. Thus, the Virtual machine 16A is
`
`shown (in dashed lines) executing on the computer system 10B. The resumed virtual
`
`machine 16A would have the same state as the original virtual machine 16A on the
`
`computer system 10A at the time the checkpoint was made.
`
`The virtual disk file 72A and the COW file 74A are shown in dotted lines in Fig.
`
`3. These files may actually be deleted, in some embodiments, when the second
`
`checkpoint (comprising the memory file 70B, the virtual disk file 72B, and the COW file
`
`74B) is written to the storage device 22B. In such embodiments, the virtual disk file 72B
`
`may be the same as the combination of the virtual disk file 72A and the COW file 74A.
`
`That is, in such embodiments, the checkpoint program 76 may commit the changes in the
`
`COW file 74 to the virtual disk file 72 after copying the image 40 to the computer system
`
`14
`
`
`
`10
`
`15
`
`20
`
`25
`
`
`
`
`
`10B, and may create a new COW file to collect updates which occur after the checkpoint
`
`is made. Thus, the virtual disk file 72B at the next checkpoint may be the combination of
`
`the preceding virtual disk file 72A and the preceding COW file 74A. In other
`
`embodiments, the virtual disk file 72A and the COW file 74A may be retained even after
`
`subsequent checkpoints are made.
`
`The network 12 may comprise any network technology in various embodiments.
`
`The network 12 may be a local area network, wide area network, intranet network,
`
`Internet network, or any other type of network. The network 12 may be designed to be
`
`10
`
`continuously available (although network outages may occur), or may be intermittent (e.g.
`
`a modem connection made between a computer system in a user’s home and a computer
`
`system in a user’s workplace). Any network protocol may be used. For example, the
`
`network 12 may be an Ethernet network. Alternatively, the network may be a token ring
`
`network, etc. The network 12 may also represent shared storage between the computer
`
`15
`
`systems 10A-10B.
`
`While Fig. 3 illustrates checkpointing the virtual machine 16A, other
`
`embodiments may also checkpoint the Virtual machine 16B. In some embodiments, all
`
`virtual machines on a given computer system may be checkpointed. In other
`
`embodiments, a subset of the virtual machines (e.g. those executing so-called "mission
`
`critical" applications) may be checkpointed and other Virtual machines executing less
`
`critical applications may not be checkpointed.
`
`Turning now to Fig. 4, a flowchart is shown illustrating operation of one
`
`embodiment of the checkpoint program 76. Other embodiments are possible and
`
`contemplated. The blocks shown in Fig. 4 may represent the operation of instructions
`
`forming the checkpoint program 76, when executed.
`
`20
`
`25
`
`The checkpoint program 76 may wait for the checkpoint interval to expire
`
`15
`
`
`
`(decision block 80). The operation of decision block 80 may represent the checkpoint
`
`program 76 being scheduled for execution (e.g. by the VM kernel 18A) at the expiration
`
`of the checkpoint interval, or may represent the checkpoint program 76 itself maintaining
`
`the interval. The checkpoint interval may be selected to be any desired interval between
`
`consecutive checkpoints. For example, a checkpoint interval of about 10 minutes may be
`
`used. A checkpoint interval in the range of, for example, 5 to 15 minutes may be used.
`
`Alternatively, the interval may be 30 minutes, one hour, or any other desired interval.
`
`The shorter the checkpoint interval, the more network bandwidth may be used to transfer
`
`the image 40 to the computer system 10B and the more frequently the applications
`
`executing in the virtual machines may be interrupted. The longer the checkpoint interval,
`
`the higher the risk of data loss may be (e.g. up to 10 minutes worth of data may be lost if
`
`the checkpoint interval is 10 minutes, while up to one hour worth of data may be lost if
`
`the checkpoint interval is one hour). The checkpoint interval may also be defined based
`
`on a quantity other than time. For example, the checkpoint interval may be based on the
`
`amount of activity occurring in a virtual machine. The number of file accesses, file
`
`writes, etc. could be counted to gauge the activity level, for instance. As another
`
`example, the return of an application to a certain p