`Images in a Compute Cloud
`
`Wu Zhou Peng Ning
`North Carolina State University
`{wzhou2, pning}@ncsu.edu
`Ruowen Wang
`North Carolina State University
`rwang9@ncsu.edu
`
`Xiaolan Zhang Glenn Ammons
`IBM T.J. Watson Research Center
`{cxzhang, ammons}@us.ibm.com
`Vasanth Bala
`IBM T.J. Watson Research Center
`vbala@us.ibm.com
`
`failure to provide regular system services. Unfortunately, apply-
`ing security patches is a notoriously tedious task, due to the large
`number of patches and the high rate at which they are released —
`it is estimated that, in an average week, vendors and security orga-
`nizations release about 150 vulnerabilities and associated patching
`information [15]. As a result, most software runs with outdated
`patches [11, 12].
`The problem is exacerbated by the IT industry’s recent shift to
`virtualization and cloud computing. Virtualization allows a com-
`plete system state to be conveniently encapsulated in a virtual ma-
`chine (VM) image, which can run on any compatible hypervisor.
`Based on virtualization, cloud computing services (e.g., Amazon
`Elastic Compute Cloud (EC2) [2], NCSU Virtual Computing Lab
`(VCL) [20]) provide on-demand computing resources to customers’
`workloads, usually encapsulated in VM images. Because VM im-
`ages are normal files, they can be easily copied to create new VM
`images. This has led to a new “VM image sprawl” problem, where
`a large number of VM images are created by the users and left
`unattended. A direct result of the VM image sprawl problem is the
`significantly increased management cost of maintaining these VM
`images, including regularly applying security patches to both active
`VMs and dormant VM images.
`
`ABSTRACT
`Patching is a critical security service that keeps computer systems
`up to date and defends against security threats. Existing patching
`systems all require running systems. With the increasing adoption
`of virtualization and cloud computing services, there is a growing
`number of dormant virtual machine (VM) images. Such VM im-
`ages cannot benefit from existing patching systems, and thus are
`often left vulnerable to emerging security threats.
`It is possible
`to bring VM images online, apply patches, and capture the VMs
`back to dormant images. However, such approaches suffer from un-
`predictability, performance challenges, and high operational costs,
`particularly in large-scale compute clouds where there could be
`thousands of dormant VM images.
`This paper presents a novel tool named Nüwa that enables effi-
`cient and scalable offline patching of dormant VM images. Nüwa
`analyzes patches and, when possible, converts them into patches
`that can be applied offline by rewriting the patching scripts. Nüwa
`also leverages the VM image manipulation technologies offered
`by the Mirage image library to provide an efficient and scalable
`way to patch VM images in batch. Nüwa has been evaluated on
`freshly built images and on real-world images from the IBM Re-
`search Compute Cloud (RC2), a compute cloud used by IBM re-
`searchers worldwide. When applying security patches to a fresh
`installation of Ubuntu-8.04, Nüwa successfully applies 402 of 406
`patches. It speeds up the patching process by more than 4 times
`compared to the online approach and by another 2–10 times when
`integrated with Mirage. Nüwa also successfully applies the 10 lat-
`est security updates to all VM images in RC2.
`
`1.
`
`INTRODUCTION
`
`Patching is a basic and effective mechanism for computer sys-
`tems to defend against most, although not all, security threats, such
`as viruses, rootkits, and worms [13, 19, 21]. Failing to promptly
`patch physical machines can subject the systems to huge risks, such
`as loss of confidential data, compromise of system integrity, and
`
`Permission to make digital or hard copies of all or part of this work for
`personal or classroom use is granted without fee provided that copies are
`not made or distributed for profit or commercial advantage and that copies
`bear this notice and the full citation on the first page. To copy otherwise, to
`republish, to post on servers or to redistribute to lists, requires prior specific
`permission and/or a fee.
`ACSAC ’10 Dec. 6-10, 2010, Austin, Texas USA
`Copyright 2010 ACM 978-1-4503-0133-6/10/12 ...$10.00.
`
`
`
`VM images in VCL and 575 public VM images posted at EC2’s
`AMI page. However, more than 91% of the VCL images and more
`than 96% of the EC2 images have not been updated for at least
`1.5 months. Moreover, more than 58% of the VCL images have
`not been used in the last 1.5 months. Note that it is not the case
`that these inactive images will not be needed in the future. Indeed,
`based on the VCL log, VCL purged 776 VM images marked by the
`users as “deleted” in the past; all of the remaining 831 images were
`explicitly marked as needed by their owners.
`Our investigation of EC2 and VCL leads to two observations:
`• Most VM images in compute clouds are not properly patched.
`The longer a VM image remains unpatched, particularly af-
`ter a major vulnerability is discovered, the more likely it is
`to threaten other machines in the compute cloud or in the In-
`ternet. Also, unpatched images owned by organizations or
`companies may not be compliant with the organizations’ se-
`curity policies.
`• A significant portion of the VM images are mostly offline and
`infrequently booted. Thus, any attempt to start these VMs
`and install patches will be an extra cost to the image owners.
`The cloud service providers may certainly offer patching as a
`free service; however, they will have to sacrifice CPU cycles
`that could potentially bring in revenues.
`
`Inadequacy of Existing Patching Utilities: Traditional patch-
`ing utilities, originally designed for running systems, require the
`VM images to be online before the patch can be applied. There are
`a few recent attempts to patch offline VM images using traditional
`patching utilities. For example, the Microsoft Offline Virtual Ma-
`chine Servicing Tool [10] brings the VM image online, applies the
`patches, and captures the VM back to a dormant image. Realizing
`that not all VM images are needed immediately by their users, a
`lazy patching approach was developed in [27], which injects the
`patch installer and patch data into the VM image in such a way
`that the patch process is triggered at the next booting time. This
`optimization can yield significant savings in the total time spent
`in patching in the case where only a small percentage of dormant
`images will ever be used. However, the tradeoff is that users will
`now see delays in image startup, which can be significant for im-
`ages that have accumulated a long list of yet-to-be-applied patches.
`Our own experiences show that update time can be fairly long (in
`the order of 10s of minutes) for stale systems (e.g., dormant for 1
`month). In modern clouds where VM instances are dynamically
`provisioned to meet varying demands, this delay is unacceptable.
`Additionally, for enterprises systems, it is often required that all IT
`assets (physical or virtual, dormant or online) be up to date with
`regard to patches for security or compliance reasons. This will ap-
`ply to cloud providers as enterprises embrace the cloud comput-
`ing model. Finally, in a cloud environment where customers are
`charged for resources used during patching, this approach imposes
`costs that customers might not accept.
`In general, patching approaches that require VMs to be online
`are a poor fit for VM images in compute clouds. Note that it takes
`on the order of minutes just to power up and shut down a VM im-
`age. With the large number of dormant VM images that are infre-
`quently used, these approaches add significant extra costs either for
`customers or for cloud service providers. In addition to these costs,
`bringing a VM image online necessarily runs code that has nothing
`to do with patching, which makes patching less predictable.
`
`There are indeed more
`kbcategory.jspa?categoryID=171).
`public AMIs in EC2 (more than 7,000 in US East, US West, and EU West
`EC2 sites in mid April 2010) than those in this list. Amazon does not
`publish usage data.
`
`Our Solution–Nüwa Offline Patching Tool: We propose an ap-
`proach that is fundamentally different from the traditional online
`model. We argue that the only way to make the patching process
`scalable in a cloud environment, where the number of images can
`potentially reach millions 2, is to do it offline. A closer look into the
`patching process reveals that it can be decomposed into a sequence
`of actions, not all of which require a running system. In fact, most
`of the patching actions only depend on and have an impact on file
`system objects, which are already encapsulated in the VM image
`itself. Among the actions that do depend on or have impacts on a
`running system, we find that many are unnecessary when patching
`offline, and some can be safely replaced by other actions that do not
`need the running system. Based on these findings, we design and
`implement Nüwa 3, a scalable offline patching tool for VM images.
`By patching offline, Nüwa avoids the expensive VM start and stop
`time, and, for the majority of cases, ensures that, when a VM image
`is ready to be started, it has the latest patches installed.
`Because Nüwa is an offline patching tool, it can leverage novel
`VM image manipulation technologies to further improve scalabil-
`ity.
`In particular, Nüwa is integrated with the Mirage image li-
`brary [24], which stores identical files once and treats images as
`logical views on this collection of files. By exploiting Mirage,
`Nüwa can patch all images that contain a file by patching that sin-
`gle file and updating each image’s view, thus providing efficient
`and scalable offline patching in batch.
`Our implementation of Nüwa supports the Debian package man-
`ager [5] and the RPM package manager [8]. We evaluated Nüwa
`with 406 patches to a freshly installed Ubuntu-8.04. Our evaluation
`shows that Nüwa applies 402 of the 406 patches offline and speeds
`up the patching process by more than 4 times compared to the on-
`line approach. This can be further improved by another 2–10 times
`when the tool is integrated with Mirage, making Nüwa an order of
`magnitude more efficient than the online approach. We also eval-
`uated Nüwa on real-world images from the IBM Research Com-
`pute Cloud (RC2) [25], a compute cloud used by IBM researchers
`worldwide. Nüwa successfully applies the 10 latest security up-
`dates to all VM images in RC2.
`This paper is organized as follows. Section 2 gives background
`information on patching and describes our design choices and tech-
`nical challenges. Section 3 presents an overview of our approach.
`Section 4 describes the mechanisms we use to convert an online
`patch into one that can be safely applied offline. Section 5 de-
`scribes how we leverage efficient image manipulation mechanisms
`to further improve scalability. Section 6 presents our experimen-
`tal evaluation results. Section 7 discusses related work. Section 8
`concludes this paper with an outlook to the future.
`
`2. PROBLEM STATEMENT
`
`2.1 Background
`Software patches, or simply patches, are often distributed in the
`form of software update packages (e.g., .deb or .rpm files), which
`are installed using a package installer, such as dpkg and rpm. In
`this section, we give background information on the format of soft-
`ware packages and the package installation process. We use the
`Debian package management tool dpkg as an example. Most soft-
`ware package management tools follow the same general style with
`only slight differences.
`
`2Amazon EC2 already contains over 7,000 public VM images as
`of April 2010, without including private images that users choose
`not to share with others [18].
`3Named after the Chinese Goddess who patches the sky.
`
`Pg. 2
`
`
`
`Packages are distribution units of specific software. A package
`usually includes files for different purposes and associated meta-
`data, such as the name, version, dependences, description and con-
`crete instructions on how to install and uninstall this specific soft-
`ware. Different platforms may use different package formats to dis-
`tribute software to their users. But the contents are mostly the same.
`A Debian package, for example, is a standard Unix ar archive,
`composed of two compressed tar archives, one for the filesystem
`tree data and the other for associated metadata for controlling pur-
`poses.
`Inside the metadata, a Debian package includes a list of
`configuration files, md5 sums for each file in the first archive, name
`and version information, and shell scripts that the package installer
`runs at specific points in the package lifecycle.
`The main action in patching is to replace the old buggy filesys-
`tem data with the updated counterparts. Moreover, the package
`installer also needs to perform additional operations to ensure the
`updated software will work well in the target environment. For ex-
`ample, dependences and conflicts must be resolved, a new user or
`group might have to be added, configuration modifications by the
`user should be kept, other software packages dependent on this one
`may need to be notified, and running instances of this software may
`need to be restarted. Most of these actions are specified in scripts
`provided by the package developers. Because these scripts are in-
`tended to be invoked at certain points during the patching process,
`they are called hook scripts. The hook scripts that are invoked be-
`fore (or after) file replacement operations are called pre-installation
`(or post-installation) scripts. There are also scripts intended to be
`invoked when relevant packages (e.g., dependent software) are in-
`stalled or removed.
`More details about Debian package management tools can be
`found in the Debian Policy Manual [6].
`
`2.2 Design Choices and Technical Challenges
`Our goal is to build a patching tool that can take existing patches
`intended for online systems and apply them offline to a large collec-
`tion of dormant VM images in a manner that is safe and scalable.
`By safety we mean that applying the patch offline achieves the same
`effect on the persistent file systems in the images as applying it on-
`line. By scalability we mean that the tool has to scale to thousands,
`if not millions of VM images. In this paper we only consider dor-
`mant VM images that are completely shutdown; VM images that
`contain suspended VMs are out of the scope of this paper.
`We made a conscious design decision to be backward compati-
`ble with an existing patch format. It is tempting to go with a “clean
`slate” approach, where we define a new VM-friendly patch format
`and associated tools that do not make the assumption of a running
`system at the time of patch application. While this is indeed our
`long-term research goal, we think its adoption will likely take a
`long time, given the long history of the traditional online patching
`model and the fact that it is an entrenched part of today’s IT prac-
`tices, ranging from software development and distribution to sys-
`tem administration. Thus, we believe that an interim solution that is
`backward compatible with existing patch format, and yet works in
`an offline manner and provides much improved scalability, would
`be desirable.
`Several technical challenges arise in developing such a scalable
`offline patching tool, as discussed below:
`Identifying Runtime Dependences: The current software in-
`dustry is centered around running systems and so are the available
`patching solutions. A running system provides a convenient en-
`vironment to execute the installation scripts in the patch. The in-
`stallation scripts query the configuration of the running system to
`customize the patch appropriately for the system. Some scripts also
`
`restart the patched software at the end of the patching process to en-
`sure its effect takes place. Some patches require running daemons.
`For example, some software stores configuration data in a database.
`A patch that changes the configuration requires the database server
`to be running in order to perform schema updates.
`The challenge is to separate runtime dependences that can be
`safely emulated (such as information discovery that only depends
`on the file system state) or removed (such as restarting the patched
`software) from the ones that cannot (such as starting a database
`server to do schema updates). We address this challenge by a
`combination of manual inspection of commands commonly used
`in scripts (performed only once before any offline patching) and
`static analysis of the scripts.
`Removing Runtime Dependencies: Once we identify runtime
`dependences that can be safely emulated or removed, the next chal-
`lenge is to safely remove these dependences so that the patch can be
`applied to a VM image offline and in a manner that does not break
`backward compatibility. Our solution uses a script rewriting ap-
`proach that preserves the patch format and allows a patch intended
`for an online system to be applied safely offline in an emulated
`environment.
`Patching at a Massive Scale: As the adoption of virtualization
`and cloud computing accelerates, it is a matter of time before a
`cloud administrator is confronted with a collection of thousands,
`if not millions of VM images. Just moving from online to offline
`patching is not sufficient to scale to image libraries of that magni-
`tude. We address this challenge by leveraging Mirage’s capabilities
`in efficient storage and manipulation of VM images [24].
`
`3. APPROACH
`It seems plausible that patching VM images offline would work,
`given the fact that the goal of patching is mainly to replace old
`software components, represented as files in the file system, with
`new ones.
`Indeed, to patch an offline VM image, we only care
`about the changes made to the file system in the VM image; many
`changes intended for a running system do not contribute to the VM
`image directly.
`Simple Emulation-based Patching: One straightforward ap-
`proach is to perform the file replacement actions from another host,
`referred to as the patching host. The patching host can mount and
`access an offline VM image as a part of its own file system. Using
`the chroot system call to change the root file system to the mount
`point, the patching host can emulate an environment required by the
`patching process on a running VM and perform the file system ac-
`tions originally developed for patching a running VM. We call this
`approach simple emulation-based patching and the environment set
`up by the above procedure the emulated environment.
`Failures and Observations: Unfortunately, our investigation
`shows that the installation scripts used by the patching process pose
`a great challenge to simple emulation-based patching. For example,
`Figure 2 shows two segments of code from dbus.postinst,
`the post-installation script in the dbus package. The first segment
`(lines 1 to 7) detects possibly running dbus processes and sends a
`reboot notification to the system if there exists one. The second seg-
`ment (lines 9 to 16) restarts the patched dbus daemon so that the
`system begins to use the updated software. Both segments depend
`on a running VM to work correctly. The simple emulation-based
`patching will fail when coming across this script.
`We looked into the internals of patching scripts. After analyz-
`ing patching scripts in more than one thousand patching instances,
`we made some important observations. First, most commands used
`in the patching scripts are safe to execute in the emulated envi-
`ronment, in the sense that they do not generate undesirable side
`
`Pg. 3
`
`
`
`1 if [ "$1" = "configure" ]; then
`2
`if [ -e /var/run/dbus/pid ] &&
`3
`ps -p $(cat /var/run/dbus/pid); then
`4
`/usr/share/update-notifier/notify-reboot-required
`5
`...
`fi
`6
`7 fi
`8 ...
`9 if [ -x "/etc/init.d/dbus" ]; then
`10
`update-rc.d dbus start 12 2 3 4 5 . stop 88 1 .
`11
`if [ -x "‘which invoke-rc.d‘" ]; then
`12
`invoke-rc.d dbus start
`13
`else
`14
`/etc/init.d/dbus start
`fi
`15
`16 fi
`
`Figure 2: Excerpts of the dbus.postinst script
`
`effects on the persistent file system that would make the patched
`VM image different from one patched online except for log files
`and timestamps. Examples of such commands include the test
`commands in lines 2, 9 and 11, cat in line 3, /usr/share/
`update-notifier/notify-reboot-required in line 4,
`update-rc.d in line 10, and which in line 11. Second, some
`command executions have no impact on the offline patching and
`thus can be skipped. For example, invoke-rc.d in line 12 of
`Figure 2 is supposed to start up a running daemon, and its execu-
`tion has no impact on the persistent file system. Thus, we can just
`skip it. We call such code unnecessary code. Third, there are usu-
`ally more than one way to achieve the same purpose. Thus, it is
`possible to replace an unsafe command with a safe one to achieve
`the same effect. For example, many scripts use uname -m to get
`the machine architecture; unfortunately, uname -m returns the ar-
`chitecture of the patching host, which is not necessarily the archi-
`tecture for which the VM image is intended. We can achieve the
`same purpose by looking at the file system data, for example, the
`architecture information in the ELF header of a binary file.
`Safety Analysis and Script Rewriting: Motivated by the above
`observations, in this paper, we propose a systematic approach that
`combines safety analysis and script rewriting techniques to address
`the challenge posed by scripts. The safety analysis examines whether
`it is safe to execute a script in the emulated environment, while the
`rewriting techniques modify unsafe scripts to either eliminate un-
`safe and unnecessary code, or replace unsafe code with safe one
`that achieves the same purpose. Our experience in this research in-
`dicates that the majority of unsafe scripts can be rewritten into safe
`ones, and thus enable patches to be applied to offline VM images
`in the emulated environment.
`However, not all scripts can be handled successfully in this way.
`We find some patching instances, after safety analysis and rewrit-
`ing, still unsafe in the emulation-based environment. Some patches
`have requirements that can only be handled in a running environ-
`ment. For example, the post-installation script in a patch for MySQL
`may need to start a transaction to update the administrative tables of
`the patched server. As another example, mono, the open source im-
`plementation of C# and the Common Language Runtime, depends
`on a running environment to apply the update to itself.
`The Nüwa Approach: To address this problem, we adopt a
`hybrid approach in the development of Nüwa. When presented
`with a patch, Nüwa first performs safety analysis on the patching
`scripts included in the original patch. If all scripts are safe, Nüwa
`uses simple emulation-based patching directly to perform offline
`patching. If some scripts are unsafe, Nüwa applies various rewrit-
`ing techniques, which will be discussed in detail in Section 4, to
`
`these scripts, and performs safety analysis on the rewritten scripts.
`If these rewriting techniques can successfully convert the unsafe
`scripts to safe ones, Nüwa will use simple emulation-based patch-
`ing with the rewritten patch to finish offline patching. However,
`in the worst case, Nüwa may fail to derive safe scripts through
`rewriting, and will resort to online patching. In reality, we have
`found such cases to be rare – our results show that less than 1% of
`the packages tested in our experiments fall into this category (Sec-
`tion 6.1).
`In addition to patching individual VM images, Nüwa also lever-
`ages VM image manipulation technologies to further improve scal-
`ability. In particular, Nüwa uses features of the Mirage image li-
`brary [24] to enable scalable patching of a large number of VM
`images in batch.
`To distinguish between the two variations of Nüwa, we refer to
`the former as standalone Nüwa, and the latter, which leverages Mi-
`rage, as Mirage-based Nüwa.
`In the following, we describe the
`novel techniques developed for offline patching in the context of
`both standalone and Mirage-based Nüwa.
`
`4. SCRIPT ANALYSIS AND REWRITING
`This section explains how safe patch scripts are identified and,
`when possible, unsafe scripts are transformed into safe scripts. The
`analysis is based on three concepts — impact, dependence, and
`command classification, which are defined in Section 4.1. Sec-
`tion 4.2 presents rewriting techniques that, using information from
`safety analyses, convert many unsafe scripts into safe scripts.
`In our implementation, safety analysis and script-rewriting run
`immediately before the package manager (i.e., dpkg and rpm) ex-
`ecutes a patch script. As a result, analyses and transformations have
`access to the script’s actual environment and arguments and to the
`image’s filesystem state.
`Patch scripts are in general shell scripts. For example, patch
`scripts in Debian are SUSv3 Shell Command Language scripts [17]
`with three additional features mandated by the Debian Policy Man-
`ual [6]. Patch scripts are executed by an interpreter that repeatedly
`reads a command line, expands it according to a number of expan-
`sion and quoting rules into a command and arguments, executes the
`command on the arguments, and collects the execution’s output and
`exit status. The language is very dynamic (for example, command-
`lines are constructed and parsed dynamically), which forces our
`analyses and transformations to be conservative. Nonetheless, sim-
`ple, syntax-directed analyses and rewritings suffice to convert un-
`safe scripts to safe versions for 99% of the packages we considered.
`4.1
`Impact, Dependence, and Command Clas-
`sification
`The goal of command classification is to divide a script’s com-
`mand lines into three categories: (1) safe to execute offline, (2) un-
`safe to execute offline, and (3) unnecessary to execute offline. To
`classify command lines, we divide a running system into a “mem-
`ory” part and a “filesystem” part, and determine which parts may
`influence or be influenced by a given command line. The intuition
`is that the “filesystem” part is available offline but the “memory”
`part requires a running instance of the image that is being patched.
`Table 1: Commands w/ FS-only impacts
`Command Type
`Example Commands
`File attribute mod.
`chown, chmod, chgrp, touch
`Explicit file content mod.
`cp, mv, mknode, mktemp
`Implicit file content mod.
`adduser, addgrp, remove-shell
`
`We say that a command-line execution depends on the filesys-
`tem if it reads data from the filesystem or if any of its arguments
`or inputs flow from executions that depend on the filesystem. An
`
`Pg. 4
`
`
`
`execution impacts the filesystem if it writes data to the filesystem or
`if its output or exit status flow to executions that impact the filesys-
`tem. Table 1 lists some commands whose executions impact the
`filesystem:
`We say that a command-line execution depends on memory if it
`inspects any of a number of volatile components of the system’s
`state (perhaps by listing running processes, opening a device, con-
`necting to a daemon or network service, or reading a file under
`/proc that exposes kernel state) or any of its arguments or inputs
`flow from executions that depend on memory. An execution im-
`pacts memory if it makes a change to a volatile component of the
`system’s state that outlives the execution itself, or if its output or
`exit status flow to executions that impact the memory.
`Note that all executions have transient effects on volatile state:
`they allocate memory, create processes, cause the operating system
`to buffer filesystem data, and so forth. For the purposes of classifi-
`cation, we do not consider these effects to be impacts on memory;
`we assume that other command-line executions do not depend on
`these sorts of effects. Table 2 lists some commands that impact or
`depend on memory.
`
`Table 2: Commands w/ memory impact/dependence
`Command Type
`Example Commands
`Daemon start/stop
`invoke-rc.d, /etc/init.d/
`Process status
`ps, pidof, pgrep, lsof, kill
`System info. inquiry
`uname, lspci, laptop-detect
`Kernel module
`lsmod, modprobe
`Others
`Database update, mono gac-install
`
`The definitions for command-line executions are extended to def-
`initions for static command lines. A command line depends on
`memory (or the filesystem) if any of its executions depend on mem-
`ory (or the filesystem). A command line impacts memory (or the
`filesystem) if any of its executions impact memory (or the filesys-
`tem).
`To seed impact and dependence analysis, we manually inspected
`all commands used in patch scripts to determine their intrinsic mem-
`ory and filesystem impacts and dependences. This might seem to
`be an overwhelming task but, in practice, scripts use very few dis-
`tinct commands; we found only about 200 distinct commands used
`by more than 1,000 packages. It may be possible to derive this in-
`formation by instrumenting command executions. In practice, we
`expect that it would be provided by package maintainers.
`
`Depend
`on FS
`Yes/No
`Yes/No
`Yes/No
`Yes/No
`Yes/No
`Yes/No
`Yes/No
`
`Table 3: Command classification
`Depend
`Impact
`Impact
`on Memory
`on Memory
`on FS
`No
`No
`Yes/No
`No
`Yes
`Yes
`Yes
`No
`Yes
`Yes
`Yes
`Yes
`No
`Yes
`No
`Yes
`No
`No
`Yes
`Yes
`No
`
`Safety
`
`Safe
`Unsafe
`Unsafe
`Unsafe
`Unnecessary
`Unnecessary
`Unnecessary
`
`Our analysis concludes that a static command-line depends on
`memory if one of the following holds: (1) The command is un-
`known; (2) the command has an intrinsic memory dependence; (3)
`one or more of the arguments is a variable substitution; (4) the input
`is piped from a command that depends on memory; or (5) the input
`is redirected from a device, a file under /proc, or from a variable
`substitution.
`The rules for filesystem dependences and for impacts are simi-
`lar. Note that the analysis errs on the side of finding spurious de-
`pendences and impacts. That is, these analyses are simple “may-
`depend/may-impact” analyses, which are both flow and context in-
`sensitive.
`
`Table 3 shows how each command line’s classification as safe,
`unsafe, or unnecessary is determined from its filesystem and mem-
`ory impacts and dependences. Safe command lines do not de-
`pend on or impact memory. These are the commands that can and
`should be executed offline. Script rewriting preserves these com-
`mands. Unnecessary command lines have no impact on the filesys-
`tem. There is no reason to execute them offline because they do
`not change the image. In fact, if they depend on or impact memory,
`then they must be removed because they might fail without a run-
`ning instance. Script rewriting removes these commands. Unsafe
`command lines may execute incorrectly offline because they de-
`pend on or impact memory and also impact the filesystem. In some
`cases, script rewriting cannot remove these command lines because
`their filesystem impacts are required. If any unsafe command line
`cannot be removed, then the patch cannot be executed offline.
`
`Figure 3: Flow of script analysis and rewriting
`
`/etc/init.d/acpid
`/etc/init.d/cupsys
`killall
`
`Figure 4: Examples of command
`lines that are removed by unnec-
`essary command elimination
`
`4.2 Rewriting Techniques
`Figure 3 shows the rewriting techniques that Nüwa applies be-
`fore executing each patch script. Rewriting a script can change the
`results of safety analysis, so Nüwa reruns safety analysis after ap-
`plying these techniques. If safety analysis proves that all command
`lines in the script are safe, then the rewritten script is executed of-
`fline. Otherwise, Nüwa resorts to online patching.
`Nüwa currently applies five rewriting techniques, which are de-
`scribed below. For clarity, the presentation does not follow the
`order in which the techniques are applied (that order is shown in
`Figure 3). The first two techniques consider command-lines, anno-
`tated by safety analysis, in isolation; the last three analyze larger
`scopes.
`Unnecessary Com-
`mand Elimination: This
`technique removes un-
`necessary commands, which,
`by definition, have nei-
`ther direct nor indirect
`impact on the filesys-
`tem. Figure 4 shows an
`example.
`Command Re-
`placement: Some
`command lines that
`depend on mem-
`ory can be re-
`placed with com-
`mand lines that
`depend only on
`the filesystem. This
`often happens with
`commands that need
`information about the system, in particular when the information is
`available both in the filesystem and, if there is a running instance,
`in memory.
`
`uname -m
`-> dpkg --print-architecture
`
`uname -s
`-> echo "Linux"
`
`Figure 5: Memory-dependent com-
`mand lines and their replacements
`
`Pg. 5
`
`
`
`For example, the uname command prints system information;
`depending on its arguments, it will print the hostname, the machine
`hardware name, the operating system name, or other fields. uname
`gets its information from the kernel through the uname system call.
`Without a running instance, information from the kernel cannot be
`trusted. However, certain fields are statically known constants or
`available through commands that depend only on the filesystem;
`Figure 5 shows two examples.
`Note that the command replacement tech