`
`
`
`2013 IEEE 33rd International Conference on Distributed Computing Systems2013 IEEE 33rd International Conference on Distributed Computing Systems2013 IEEE 33rd International Conference on Distributed Computing Systems
`
`ImageElves: Rapid and Reliable System Updates in
`the Cloud
`
`Deepak Jeswani, Akshat Verma, Praveen Jayachandran, Kamal Bhattacharya
`IBM Research - India.
`
`Abstract—Virtualization has significantly reduced the cost of
`creating a new virtual machine and cheap storage allows VMs
`to be turned down when unused. This has led to a rapid
`proliferation of virtual machine images, both active and dormant,
`in the data center. System management technologies have not
`been able to keep pace with this growth and the management cost
`of keeping all virtual machines images, active as well as dormant,
`updated is significant. In this work, we present ImageElves, a
`system to rapidly, reliably and automatically propagate updates
`(e.g., patches, software installs, compliance checks) in a data
`center. ImageElves analyses all target images and creates reliable
`image patches using a very small number of online updates.
`Traditionally, updates are applied by taking the application
`offline, applying updates, and then restoring the application, a
`process that is unreliable and has an unpredictable downtime.
`With ImageElves, we propose a two phase process. In the first
`phase, images are analyzed to create an update signature and
`update manifest. In the second phase, downtime is taken and
`the manifest is applied offline on virtual images in a parallel,
`reliable and automated manner. This has two main advantages,
`(i) spontaneously apply updates to already dormant VMs, and (ii)
`all updates following this process are guaranteed to work reliably
`leading to reduced and predictable downtimes. ImageElves uses
`three key ideas: (i) a novel per-update profiling mechanism to
`divide VMs into equivalence classes, (ii) a background logging
`mechanism to convert updates on live instances into patches
`for dormant images, and (iii) a cross-difference mechanism to
`filter system-specific or random information (e.g., host name,
`IP address), while creating equivalence classes. We evaluated
`the ability of ImageElves to speed up mix of popular system
`management activities and observed upto 80% smaller update
`times for active instances and upto 90% reduction in update
`times for dormant instances.
`
`I. INTRODUCTION
`Virtualization has made it increasingly easy and less expen-
`sive to create new virtual machine instances. There is evidence
`that instead of simply consolidating existing workloads into
`fewer servers, virtualization has led to an increase in the
`number of systems, now running as VMs [10]. There is
`a proliferation of barely used VMs, as developers forget
`to return (or intentionally postpone, anticipating reuse) the
`VMs they do not use to the resource pool at the end of a
`project. Unfortunately, system management technologies have
`been unable to keep pace with this rapid proliferation, and
`the management cost of keeping all virtual machine images
`updated, both dormant and active, is significant. In a recent
`study, it was noted that more than 58% of virtual machine
`images in a university data center had not been used for over
`1.5 months. Nearly a quarter of the images in the university’s
`data center as well as in EC2 were not updated for 2-3
`months [21]. This can pose a significant security threat to
`the data center, as well as increasing the time for updating
`
`these dormant images and getting them ‘ready’ when they are
`needed again.
`Due to resource sharing and I/O indirection, virtualization
`has also exacerbated the problem of system management,
`making it more expensive and time consuming to update
`systems reliably. Anti-virus updates and operating system
`patches can result in increases in CPU utilization of the order
`of 40% on a virtual machine – changes that would have a
`negligible impact on a physical system [10]. As any system
`update is deemed unreliable, in order to deal with unforeseen
`problems, traditionally, any changes or updates to a running
`system are applied by taking the application offline for a
`large enough change window, applying the updates while
`system administrators are on stand-by, and then restoring the
`application. Often, the actual change window is significantly
`larger than the time required for the update [16]. In this work,
`we propose ImageElves to rapidly, reliably, and automatically
`propagate system updates (e.g., patches, software installs,
`compliance checks) to dormant as well as live virtual machine
`images in a data center.
`There are several tools available today that automate ap-
`plication of patches and compliance checks for online virtual
`machines [6], [15], [8], [7], [16]. While these tools address
`the scalability challenge, the patches themselves remain unre-
`liable, and administrators need to fix any failed updates manu-
`ally. Once a failed update is fixed manually, the administrator
`has to then ensure that the patch itself is fixed or create yet
`another patch that can then be fed to the tools for automatic
`application on the VMs. Hence, these tools do not help reduce
`the total application downtime by much, as a large chunk
`of the downtime is needed to manually fix any unforeseen
`problems during the update process. Furthermore, these tools
`do not patch dormant VM images. Recently, a novel tool
`called N¨uwa was presented in [21] to patch dormant VMs.
`The tool automatically rewrites a patch that is intended to be
`applied on online VMs, to exclude or replace statements that
`require a running VM. This modified patch is then applied on
`the dormant VMs. Apart from being unable to handle online
`VMs and the additional cost of rewriting patches, this tool
`has additional limitations in that the rewritten patches are not
`guaranteed to succeed.
`Contribution: In this paper, we design a system called
`ImageElves that can work for (i) both active and dormant
`instances, (ii) guarantees the reliability and time taken for
`an update, (iii) can handle all kinds of updates (without
`source code), and (iv) scales with new types of updates.
`ImageElves works by first identifying equivalent images and
`then applying offline update manifests on equivalent images in
`
`
`
`1063-6927/13 $26.00 © 2013 IEEE1063-6927/13 $26.00 © 2013 IEEE1063-6927/13 $26.00 © 2013 IEEE
`
`
`DOI 10.1109/ICDCS.2013.33DOI 10.1109/ICDCS.2013.33DOI 10.1109/ICDCS.2013.33
`
`
`
`
`
`269390390
`
`WIZ, Inc. EXHIBIT - 1067
`WIZ, Inc. v. Orca Security LTD.
`
`
`
`a reliable fashion. ImageElves uses three key ideas: (i) a novel
`per-update profiling mechanism that partitions the VMs into
`equivalence classes, (ii) a background light-weight logging
`mechanism to convert updates on live instances into update
`manifests for dormant instances, and (iii) a cross-difference
`mechanism to identify random or system-specific information
`across VMs (such as hostname or IP address) while creat-
`ing equivalence classes. Instances that belong to a common
`equivalence class are guaranteed to perform identically for the
`specified update. This reliability allows us to perform updates
`fully automatically and in parallel on all equivalent instances,
`leading to significant reduction in update time and labor cost.
`Further, it avoids expensive operating system and application
`testing after update application,
`leading to shorter change
`windows. We implemented ImageElves on a target system
`with 37 virtual machines and observed upto 80% reduction
`in update times for active instances and upto 90% reduction
`in update times for dormant instances.
`
`II. BACKGROUND AND MOTIVATION
`We first present a background on how servers are updated
`in production data centers.
`
`Fig. 1. Update Flow in Production Data Centers
`
`A. Update Process on Production Servers
`Any update in a production environment is a change with the
`potential to break the application in the environment. Hence,
`updates are dealt with very carefully, using a change manage-
`ment process. Huang et al. present the change management
`process for patching in [6]. Even though the exact change
`management process differs across data centers as well as with
`the nature of updates, the essential steps remain the same.
`We abstract a generic change management process for
`various kinds of updates in Fig. 1. One may note the structural
`similarity of this process with various documented change
`management processes (e.g., [6]). The process starts with a
`request for a change window to perform the change. Once
`the change window is granted, the application is shutdown at
`the start of the window. The update is then applied on the
`target server. Once the update completes, testing is performed
`to ensure that the update did not break the application. Testing
`consists of operating system health checks followed by health
`checks for the application. If testing succeeds, the change
`window is closed. Otherwise, manual remediation is performed
`followed by re-testing.
`
`270391391
`
`Fig. 2. Timeline of an update
`
`We capture the steps on a timeline in Fig. 2. The most
`important thing to observe here is that, even if the actual time
`taken for the update to be applied is small, the overall time
`taken from the request for a change window to a change close
`is fairly large. Approval for a change window is manual and
`takes time. More importantly, the actual change window is
`often orders of magnitude larger than the update time [16].
`This is because of the inherent unreliability of this change
`process requiring expensive operating system and application
`tests. Coupled with a buffer time kept for manual remediation,
`the actual planned downtime is typically a few hours, even
`if the actual update may take less than 10 mins. Further,
`every update has huge associated labour cost due to the testing
`performed after the update is completed.
`The above process captures how an update is applied on
`one system. Frequently, updates need to be applied on almost
`all instances in a data center (e.g., an OS security patch)
`or on very large number of instances (e.g., database or web
`server updates). Due to inherent risk in the update process,
`administrators typically update only a few systems to start
`with. Once the first few updates succeed, they try to update
`more systems. Updating all relevant systems in the data center
`often happen months after the update was available [4].
`In this work, we focus on system updates, which are
`common across a data center. Examples of such updates are
`security patches, compliance updates, installation and configu-
`ration of system management software, upgrades of operating
`systems and common middleware. The common denominator
`for all these updates are that they are applied on a large
`number of systems. The core idea behind our work is to profile
`these updates on very few instances and use that information
`to update majority of the instances in a reliable and low-
`cost manner. Application-specific updates like code upgrades,
`which touch few instances only, are outside the scope of this
`work and should be handled using their regular update process.
`B. VM Sprawl
`Server virtualization leads to a reduction in the hardware
`footprint of the data center by consolidating multiple work-
`loads as virtual machines on a shared physical server. This
`leads to a reduction in data center infrastructure cost. However,
`system administration cost
`is proportional
`to the number
`of distinct managed servers (physical or virtual) in a data
`center. Every virtual server runs an operating system and an
`application stack, which need to be administered. In fact, it
`has been observed that virtualization leads to an increase in
`the number of running systems in a data center [10].
`An increase in the number of managed VMs in a virtualized
`data center is fairly easy to explain. Virtualization allows
`virtual machines to be provisioned automatically in the order
`
`
`
`of a few minutes. In comparison, provisioning a physical
`server was a time-consuming activity, requiring multiple levels
`of approval in a traditional data center. Further, since vir-
`tual machines can be created with fairly small sizes, users
`request virtual machines indiscriminately. Finally, one can
`shutdown virtual machines and free up any CPU or memory
`resources used by the virtual machine. Since storage is fairly
`inexpensive, this has led to users creating virtual machines
`indiscriminately and shutting them down, when not required.
`In a recent survey, it was observed that more than 58% of the
`VMs in a cloud were not used for the last 1.5 months [21].
`The explosion of virtual machines (both active and dormant)
`in a virtualized data center has led to the problem of VM
`Sprawl. VM sprawl poses significant challenge for system
`administrators in a data center. Firstly, it increases the labour
`cost of managing and updating the software components due
`to an increase in the number of managed instances. Secondly,
`since a large number of instances are dormant at a given point
`in time, scheduled updates miss the dormant instances. Since
`the resources consumed by dormant instances are released
`to other active instances,
`it
`is not even possible to bring
`up all dormant instances and update them periodically. This
`combined with the inherent unreliability of system updates
`exacerbates the situation for system administrators.
`In this work, we pursue the idea that VM sprawl can also be
`leveraged to increase the inherent reliability of system updates.
`The ease of cloning instances and the use of golden masters to
`provision instances in a virtualized data center often leads to
`a large number of virtual machines with the same system and
`middleware footprint (with possibly different applications and
`data). Given an update, our key idea is to automatically profile
`the update and identify all instances that are semantically
`equivalent for the update. Equivalent instances are guaranteed
`to respond identically to an update, allowing updates to be
`applied reliably, automatically, and rapidly.
`
`III. ImageElves DESIGN
`
`We first present the goals of ImageElves and then describe
`the techniques that help us achieve these goals.
`
`A. Design Goals
`∙ Reliable Update Process: One of the primary issues in
`the current system update process is the lack of reliability.
`Since it is not clear if an update will succeed, many
`system management activities are still performed under
`manual supervision. In case an update fails, the system
`administrator decides to roll back the update cleanly
`or perform remediation actions and retry the update. A
`reliable update process is a pre-requisite for automation.
`∙ Deterministic Update Duration: The lack of reliability
`of the system update process also reflects in long change
`windows. Even if an update completes in 5 mins in most
`cases, the change window requested by a system admin-
`istrator is proportional to the worst case update duration.
`Change windows in production systems often involve
`application downtime or reduced application availability
`(e.g., reduced cluster size). Hence, an important goal
`
`for ImageElves is to ensure deterministic update times,
`leading to short change windows.
`∙ Reduced Update Duration: ImageElves also attempts to
`reduce the update duration, leading to even shorter change
`windows.
`∙ Update both Dormant and Active Instances: Virtual-
`ized data centers have a large number of dormant VMs
`along with the active instances. ImageElves attempts to
`ensure that both active and dormant instances are updated.
`∙ Reduced Labour Cost: One of the most
`important
`business goals of ImageElves is to reduce the overall
`labour cost for managing systems in large data centers.
`∙ Content-Oblivious Updates: Automation of the update
`process should be oblivious to the specific commands
`executed by the updates so as to be generic and applicable
`to all updates.
`
`B. Design Overview
`ImageElves introduces the following novel ideas to ensure
`a reliable, automated and low cost system update process in
`data centers.
`∙ Per-Update Equivalence Class Identification: ImageElves
`creates a signature for each update and partitions all
`relevant instances (online and offline) in a data center
`into equivalence classes based on the signature. Signature
`consists of all files which may have a dependency on
`the update. A successful update on one member of an
`equivalence class guarantees successful update on other
`members, ensuring that the update process is reliable.
`∙ Cross-Difference Signature Filter: Signature files may
`typically contain environmental parameters (e.g., IP ad-
`dresses,
`timestamps) leading to semantically identical
`images being classified as not equivalent. We employ
`a cross-difference signature filter to eliminate environ-
`mental or random noise from signature files allowing for
`creation of larger equivalence classes.
`∙ Offline Manifest Creation: We log all file system changes
`during an update to create an offline manifest that can
`be applied on other equivalent images with determin-
`istic update durations. Further, the offline manifest is
`created in a content-oblivious manner, i.e., oblivious to
`the actual actions performed by the update or any human
`remediation actions performed for the update to work.
`Offline manifest also allows us to meet the design goal
`of handling dormant instances.
`∙ Parallel Automated Offline Updates: Once the offline
`manifests are created, the reliability and content-oblivious
`nature of manifests allow us to automatically update mul-
`tiple systems in parallel, leading to significant reduction
`in update time.
`
`C. Identifying Equivalent Images
`One of the central ideas behind ImageElves is to identify a
`set of VM images, which are equivalent. A pair of images can
`be considered as equivalent if the result of any update applied
`on the two images would be identical. It is obvious that there
`would be very few images, which are universally equivalent
`(i.e., they would behave identically for any set of updates).
`
`271392392
`
`
`
`Clearly, only clones are universally equivalent. We conjecture
`that for a particular update, there may be many images which
`are equivalent to each other with respect to that update. We
`first define a few notations that help us formalize the notion
`of equivalence.
`Definition 1: Signature: A signature 𝑆𝑖,𝑗 for an update 𝑈𝑖
`and an image 𝐼𝑗 is defined as the set of all image files that
`impact the application of 𝑈𝑖 on image 𝐼𝑗. It includes the list
`of all files which are read by the update 𝑈𝑖, along with their
`attributes and hash value of the content of files after cross
`difference filter.
`Definition 2: Manifest: A manifest 𝑀𝑖,𝑗 for an Update 𝑈𝑖
`and an instance 𝐼𝑗 is defined as all files and properties that are
`modified or created by the update 𝑈𝑖 on image 𝐼𝑗. A manifest
`captures all filesystem changes and applying the manifest 𝑀𝑖,𝑗
`offline is equivalent to applying the update 𝑈𝑖 on instance 𝐼𝑗.
`Definition 3: Equivalent Images: Two instances 𝐼𝑗 and 𝐼𝑘
`are said to be equivalent for a given update 𝑈𝑖, i.e., 𝐼𝑗 ≡𝑈𝑖
`𝐼𝑘,
`iff their manifests are identical 𝑀𝑖,𝑗 = 𝑀𝑖,𝑘.
`It is easy to understand that if an update is applied success-
`fully on an instance 𝐼𝑗, the update will also be successful
`to 𝐼𝑗
`for the given
`on any instances that are equivalent
`update. Consider also the case where an update 𝑈𝑖 was not
`automatically successful on an instance. However, a system
`administrator performed additional operations on the instance
`to make the update successful. Consider a new update 𝑈′
`𝑖 ,
`which includes the update 𝑈𝑖 as well as the additional manual
`remedation actions performed to make the update successful.
`Any instances that are equivalent to 𝐼𝑗 for this modified update
`𝑈′
`𝑖 can also be successfully updated by applying the update
`𝑈𝑖 along with the manual actions performed in 𝑈′
`𝑖 .
`The notion of per-update equivalence is very powerful and
`we leverage it to ensure reliability. Given a set 𝑆 of instances
`on which an update 𝑈𝑖 needs to be applied, we only need to
`identify the distinct equivalence classes for 𝑈𝑖 in 𝑆. Then, if
`the update is successful for any one image in an equivalence
`class, it will necessarily be successful on all other images in
`that class. Further, if an update failed on one image in an
`equivalence class, we can reliably identify that the update will
`fail on all other images in that class, without having to apply
`the update on any of them. In order to identify equivalence
`classes for an update 𝑈𝑖, we use the files that constitute the
`signature of the update. Given a successful update of 𝑈𝑖 on
`an instance 𝐼𝑗, we identify the files in the update’s signature
`𝑆𝑖,𝑗. All images that have an identical content in the signature
`files are marked as equivalent. We use SHA1 technique to
`test whether the content is identical. The details of this test is
`discussed in subsequent section.
`
`It is important to note that images with identical content of
`the files in signature set are guaranteed to be equivalent. How-
`ever, images with different signatures can also have the same
`manifest and may be equivalent. Hence, our approximation
`may create more equivalent classes than may actually exist in
`a set of instances. The key property of this approximation is
`that it is safe, i.e., if we identify an update as reliable, it is
`indeed reliable.
`
`D. Cross-Difference Mechanism
`We investigated files that constitute the signature for many
`updates. We observed that often files in the signature are
`configuration files for operating system, OS components, or
`application packages. Updates often look for the presence or
`configuration of various components or applications they de-
`pend on and access the configuration files associated with these
`applications. Configuration files often contain environmental
`parameters or Points of Variability (PoV), e.g., IP addresses,
`hostnames. These PoVs differ across instances and as a result,
`even if a configuration file in two different
`instances are
`semantically equivalent, a simple diff between the two files
`will show differences. Hence, ImageElves would not be able
`to classify two such images as equivalent (even if they indeed
`are equivalent for an update).
`We also observed another common issue with such files.
`Many applications often use XML files for configuration and
`tag XML nodes with randomly generated numbers. These
`random numbers show up as differences between two files,
`which are identical in all other respects. Since ImageElves
`is designed to be content-oblivious (i.e., does not use expert
`knowledge about the actual actions performed by updates),
`discarding the PoVs or random attributes while classifying
`files is challenging. ImageElves instead uses a slightly different
`approach to identify noise or environmental parameters.
`ImageElves uses clones of one instance to create the sig-
`nature and manifest. Since the two instances are clones, their
`signature and manifest should be identical modulo any noise
`or environment parameters. Any difference in content between
`the signature of the two instances is annotated as noise.
`This cross-difference information is passed to a file difference
`method, which ignores any differences at locations annotated
`as noise between two files it compares. We observed that this
`cross-difference mechanism was able to dramatically increase
`the number of images in an equivalent class and thereby
`decrease the number of equivalence classes identified.
`
`E. Offline Update Manifest Creation
`ImageElves allows updates to be applied offline on a
`dormant VM instead of a running instance. This allows us
`to update dormant VMs without any need to bring them
`online. Dormant VMs often do not have compute or memory
`resources assigned and bringing them online for updation is
`not feasible. Further, dormant
`images often lie in backup
`application instances or in libraries, which are brought up only
`when the primary instance has failed. In such cases, updating
`the dormant instances may delay recovery, leading to extensive
`downtimes. Offline updates can also lead to faster updation in
`many cases, as we will show experimentally. Further, a large
`number of images can be updated in parallel in an offline
`fashion.
`An offline update to an image is nothing else but applying
`the update manifest on equivalent
`images. It
`is important
`to note that a manifest created using one instance can not
`be applied on another instance, unless the two instances are
`guaranteed to be equivalent. The manifest is created on one
`online instance for each equivalence class in a straightforward
`manner. Before the update is applied, a snapshot is taken of the
`
`272393393
`
`
`
`instance. Once the update completes successfully, an image diff
`is performed between the snapshot and the updated instance.
`For every file that is different, a patch is created to transform
`the file in the snapshot to the file in the updated instance.
`The collection of all these patches constitute the manifest for
`the update, which can subsequently be used to update the
`remaining instances in the equivalence class offline.
`
`F. ImageElves Automated System Update Process
`We now describe the overall update process enabled by
`ImageElves. Given a new update 𝑈𝑖 that needs to be applied on
`a target set of instances 𝑆, ImageElves follows the two-phase
`iterative process described in Fig. 3.
`
`the choice between offline and online may depend on the
`duration of the online update and the number of equivalent
`images in the class. If the online update took a long time or
`required reboots, the offline manifest may speed up the process
`deterministically. If the online update was short and applicable
`to only a few images, it may make sense to apply the update
`online. The entire process can be optimized further by pipe
`lining manifest creation and update application across multiple
`equivalent classes. Our current implementation of ImageElves
`is captured in Fig. 3 and optimizations to further speed up the
`process are left as future work.
`
`IV. IMPLEMENTATION DETAILS
`
`Input: Set 𝑆, Update 𝑈𝑖
`While 𝑆 is not empty
`Phase 1: Select a target instance 𝐼𝑗 at random
`Apply the update 𝑈𝑖 on 𝐼𝑗
`Create the Signature 𝑆𝑖,𝑗 and manifest 𝑀𝑖,𝑗
`Update 𝑆𝑖,𝑗 and 𝑀𝑖,𝑗 using cross-difference
`Phase 2 (Parallel): Compute 𝑆𝑖,𝑘 for all images 𝐼𝑘
`and create equivalent classes
`For all 𝐼𝑘 ≡𝑈𝑖
`𝐼𝑗, (Parallel Step)
`If 𝑈𝑖 was successful on 𝐼𝑗
`Apply 𝑀𝑖,𝑗 or 𝑈𝑖 on 𝐼𝑘
`Else
`Mark 𝐼𝑘 as failed
`End-If
`Remove 𝐼𝑘 from 𝑆
`End-For
`End-While
`
`Fig. 3.
`
`ImageElves Update Flow
`
`The update process takes an update along with possible
`target instances 𝑆 as input. It first finds a leader in the set
`𝑆. On the leader, a regular update process is applied, which
`involves taking a conservative downtime. The leader can be
`elected randomly or based on some criterion; possibly an
`instance, which is not running any critical application. The
`update process may succeed quickly, may succeed with human
`intervention or fail. In any case, a signature of the final update
`is created, which is then used to identify equivalence classes.
`All instances that are equivalent to the leader are either marked
`as failed (in case the original update had failed) or updated
`reliably (the update may either be applied offline using the
`manifest or online using the original update).
`The actual update can be applied either online or offline.
`If the update is applied using the offline manifest, all rel-
`evant
`images are updated in parallel. We recommend the
`following rules of thumb for selecting between online and
`offline mechanism for applying an update to a equivalence
`class. For dormant instances, the offline manifest is likely to
`be the preferred mechanism. If the update required human
`intervention on the leader, the offline manifest is again likely
`the preferred mechanism, since the offline manifest captures
`both automated and manual actions and can be replicated
`on the remaining VMs in the equivalence class without any
`human intervention. For error-free updates on active instances,
`
`Fig. 4.
`
`ImageElves Component Diagram
`
`We have implemented ImageElves in python to update
`virtual machine images. Our current implementation works
`only for Linux and Unix systems. It does not handle changes
`to the Windows registry, but we are planning to extend
`our implementation for Windows systems as well. Figure 4
`captures the key modules implemented in ImageElves.
`An Orchestrator component drives the overall update pro-
`cess. It takes from the administrator an Update and a Leader
`VM image to apply the update. It invokes a Virtualization
`Engine to take a snapshot and then asks the administrator to
`perform the update using the Signature Creator module. Once
`the update is completed, the Signature Creator module returns
`the signature 𝑆𝑖,𝑗 associated with the update 𝑈𝑖 on instance
`𝐼𝑗. The Orchestrator then invokes the Manifest Creator, which
`leverages the snapshot to create a Manifest for the update.
`The Orchestrator passes the signature of the update to the
`Equivalence Engine. The Equivalence Engine leverages a
`File Difference Engine, which implements the cross-difference
`mechanism, and returns the equivalence classes formed. Both
`the Equivalence Engine and the Manifest Creator use a Mount
`Utility to mount images during equivalence class and manifest
`creation. We next describe details of each individual module.
`
`A. Mount Utility
`The proposed system currently uses VMware VDDK to
`mount
`images. Since multiple images can have the same
`volume attributes, the Logical Volume ID, Physical Volume ID
`and Volume Group ID of the images are changed temporarily.
`The renaming is necessary to avoid namespace conflicts on
`the system. The IDs are restored to their original values when
`images are unmounted.
`
`273394394
`
`
`
`B. Signature Creator
`
`As mentioned earlier, the current implementation of Im-
`ageElves works for UNIX based operating systems. To create
`the signature for an update, we make use of UNIX’s strace
`tool [11]. Strace is a diagnostic, instructional, and debugging
`tool, which intercepts and records all system calls invoked by
`a process. The administrator applies the update on the leader
`within the operational context of strace. Thus, when the update
`runs, all files touched by it or its child threads are captured. A
`conservative signature is created by ImageElves that includes
`all files read by the update on the leader. We filter out few
`files containing low-level system specific information from
`this conservative list. The files which are filtered out in our
`current implementation are /etc/mtab, /etc/ftab (both contain
`text information of all mounted filesystems), /etc/resolv.conf
`(contains DNS info) and /etc/passwd.
`The Signature Creator module keeps a record of all file
`names and the operations performed on them as part of the sig-
`nature of the update. The operations that are recorded by strace
`consist of open (read/write/modify), create, delete (rename,
`unlink) and permission modification operation (chmod, lstat,
`statfs). Once this signature is ready, we leverage the Difference
`Engine to apply the cross-difference operation between the
`leader and its clone to prune out random and environmental
`parameters from these files. The environmental parameters
`can be identified using a Points-of-Variability (PoV) mining
`system (e.g., [3], [9]). The PoVs and random values identified
`are removed from the signature files and a SHA1 digest of
`each file is created. This digest is used as the file signature to
`create equivalence classes.
`
`C. Manifest Creator
`
`The Manifest Creator module mounts the leader containing
`the update and its snapshot of state before the update to find
`out the changes created in the file system by the update. It
`looks for changes only within files captured as part of the
`signature as identified by the Signature Creator module. The
`manifest for the update contains all files that were modified
`or created by the update, their meta information (such as
`path, permissions, and ownership), as well as instructions
`regarding the nature of operations performed on each file.
`This information describing the contents of the manifest is
`stored in a meta manifest file. The instructions are different
`for different types of operations performed by the update. For
`a file create operation, the Manifest Creator makes a copy of
`the file (data) after removing PoVs. It adds the location of the
`copied file in the meta manifest file, along with instructions
`to copy and replace PoV values when applying the manifest
`on another image. For files which are modified, the Manifest
`Creator module uses the Linux diff utility to create the patch.
`It adds the location of the patch along with instructions to
`apply the patch in the meta manifest file. Similarly, for files
`for which permissions are modified, the manifest contains the
`new permission as data, and instructions to change permission
`of the file to the new value. Files that were only read are
`excluded from the manifest, and for files that are marked for
`deletion, a delete instruction is added in the manifest.
`
`D. Equivalence Engine
`G