`
`Application Performance Management in Virtualized Server Environments
`
`Conference Paper · May 2006
`
`DOI: 10.1109/NOMS.2006.1687567 · Source: IEEE Xplore
`
`CITATIONS
`275
`
`4 authors, including:
`
`Kirk Beaty
`IBM Watson Health
`
`29 PUBLICATIONS 1,273 CITATIONS
`
`SEE PROFILE
`
`Andrzej Kochut
`IBM
`
`65 PUBLICATIONS 1,844 CITATIONS
`
`SEE PROFILE
`
`READS
`1,285
`
`Gautam Kar
`IBM
`
`37 PUBLICATIONS 1,067 CITATIONS
`
`SEE PROFILE
`
`Some of the authors of this publication are also working on these related projects:
`
`Manager of Architecture, Cloud Development, Machine Learning team for Watson For Genomics View project
`
`All content following this page was uploaded by Kirk Beaty on 29 May 2014.
`
`The user has requested enhancement of the downloaded file.
`
`VMware, Inc. Exhibit 1018 Page 1
`
`
`
`Application Performance Management in
`Virtualized Server Environments
`
`
`
`
`
`Gunjan Khanna*
`Dept. of Electrical and Computer Engineering,
`Purdue University, West Lafayette, IN, USA
`gkhanna@purdue.edu
`
`
`Abstract — As businesses have grown, so has the need to
`deploy I/T applications rapidly to support the expanding
`business processes. Often, this growth was achieved in an
`unplanned way: each time a new application was needed a new
`server along with the application software was deployed and
`new storage elements were purchased. In many cases this has
`led to what is often referred to as “server sprawl”, resulting in
`low server utilization and high system management costs. An
`architectural approach that is becoming increasingly popular
`to address this problem is known as server virtualization. In
`this paper we introduce the concept of server consolidation
`using virtualization and point out associated issues that arise
`in the area of application performance. We show how some of
`these problems can be solved by monitoring key performance
`metrics and using the data to trigger migration of Virtual
`Machines within physical servers. The algorithms we present
`attempt to minimize the cost of migration and maintain
`acceptable application performance levels.
`
`Index Terms: Application performance management, virtual
`machine migration, virtual server.
`
`
`I. INTRODUCTION
`
`The guiding principles of distributed computing and
`client-server architectures have shaped the typical I/T
`environment for the last three decades. Many vendors have
`thrived by selling elements of a distributed infrastructure,
`such as servers, desktops, storage and networking elements
`and, of course, distributed applications, such as email,
`CRM, etc.
`As businesses have grown, so has the need to deploy I/T
`applications rapidly to support the expanding business
`processes. Often, this growth was achieved in an unplanned
`way: each time a new application was needed a new server
`along with the application software was deployed and new
`storage elements were purchased. In many cases this has
`led to what is often referred to as “server and storage
`sprawl”,
`i.e., many
`underutilized
`servers, with
`heterogeneous storage elements. A critical problem
`associated with “server sprawl”
`is
`the difficulty of
`managing such an environment. Examples are: average use
`of server capacity is only 10-35 %, thus wasting resources
`and the bigger staff required to manage large number of
`heterogeneous servers, thereby increasing the total cost of
`ownership (TCO).
`An architectural approach that is becoming increasingly
`
`*The work was done while the author was a summer intern at the IBM T. J.
`Watson Research Center.
`
`Kirk Beaty, Gautam Kar, Andrzej Kochut
`IBM T.J. Watson Research Center,
`Hawthorne, NY, USA
`(kirkbeaty, gkar, akochut)@us.ibm.com
`
`popular to address this problem is known as virtualization.
`Virtualization occurs both at the server and the storage
`levels and several papers have recently been published on
`this topic [5], [7], [8], [12], [14]. Most of these publications
`deal with the design of operating system hypervisors that
`enable multiple virtual machines, often heterogeneous guest
`operating systems, to exist and operate on one physical
`server. In this paper we start with the concept of server
`level virtualization and explore how it can be used to
`address some of the typical management issues in a small to
`medium size data center.
`The first system management problem we will look at in
`this paper is in the category of configuration management
`and is called server consolidation where the goal is to
`reduce the number of servers in a data center by grouping
`together multiple applications in one server. A very
`effective way to do this is by using the concept of server
`virtualization as shown in Figure 1. Each application is
`packaged to run on its own virtual machine, these are in
`turn mapped to physical machines – with storage provided
`by a storage area network (SAN). Details of this approach
`are given in Section II.
`Each application in an I/T environment is usually
`associated with a service level agreement (SLA), which in
`the simplest case, consists of response time and throughput
`requirements. During run time, if the SLA of an application
`is violated, it is often because of factors such as high CPU
`utilization and high memory usage of the server where it is
`hosted. This leads to the main issue that we address in this
`paper: how to detect and resolve application performance
`problems in a virtual server based data center. Our
`approach is based on a novel algorithm for migrating virtual
`machines (VMs) within a pool of physical machines (PMs)
`when performance problems are detected.
`In Section II, we outline a simple procedure to perform
`server consolidation using the concept of virtualization.
`Related work is presented in Section III. Section IV
`describes our algorithm (DMA) for dynamic migration of
`virtual machines and Section V presents some experimental
`results. We conclude the paper and outline our future
`research plans in Section VI.
`
`II. BACKGROUND
`
`The following process, very commonly used by I/T
`service organizations [22], provides a simple algorithm for
`server consolidation, as represented pictorially in Figure 1:
`
`
`
`
`
`VMware, Inc. Exhibit 1018 Page 2
`
`
`
`H e te r o g e n e o u s u n d e r u tiliz e d s e rv e r e n v ir o n m e n t (o n e a p p lic a tio n p e r s e rv e r )
`
`S e rv e r 1
`
`S e rv e r 2
`
`S e rv e r 3
`
`S e rv e r 4
`
`S e rv e r 5
`
`S e rv e r m
`
`S e rv e r n
`
`A p p 1
`
`A p p 2
`
`A p p 3
`
`A p p 4
`
`A p p 5
`
`A p p m
`
`A p p n
`
`O S 1
`
`3 0 %
`
`O S 2
`
`4 0 %
`
`O S 3
`
`2 5 %
`
`O S 4
`
`3 0 %
`
`O S 5
`
`3 5 %
`
`O S m
`
`2 8 %
`
`O S n
`
`5 0 %
`
`C o n s o lid a tio n P r o c e s s
`
`S e rv e r1
`
`S e r v e r 2
`
`S e rv e r m
`
`V M 1
`
`V M 2
`
`V M 3
`
`V M 4
`
`V M 5
`
`V M m
`
`V M n
`
`A p p 1
`o n
`G u e s t
`O S
`
`A p p 2
`o n
`G u e s t
`O S
`
`A p p 3
`o n
`G u e s t
`O S
`
`A p p 4
`o n
`G u e s t
`O S
`
`A p p 5
`o n
`G u e s t
`O S
`
`A p p m
`o n
`G u e s t
`O S
`
`A p p n
`o n
`G u e s t
`O S
`
`H y p e rv is o r
`
`H y p e rv is o r
`
`H y p e rv is o r
`
`H o m o g e n e o u s s e r v e r e n v ir o n m e n t w ith v ir tu a l m a c h in e s a n d h ig h u tiliz a tio n
`
`
`
`Figure 1: A Typical Virtualized I/T Environment
`
`
`1) For each server to be consolidated, collect measurements
`that can be used to compute the average CPU usage, memory
`requirements, disk I/O and network bandwidth usage over a
`period of time (e.g., several weeks). Let us assume there are
`X servers.
`2) Choose a target server type with compatible architecture,
`associated memory, access to shared disk storage and
`network communications.
`3) Take each of the X servers, one at a time, and construct a
`virtual machine image of it. For instance, if server 1 is an
`email application on Windows, create a Windows virtual
`machine (e.g., using VMWare [5], [8]). The resource
`requirements of the virtual machine will be approximately
`the same as the original server which is being virtualized. At
`this step we will have X virtual machines.
`4) Map the first virtual machine to the first server selected
`in step 2. Map the second virtual machine to the same server
`if it can accommodate the resource requirements. If not,
`introduce a new physical machine (PM) and map the VM to
`this new machine. Continue this step until each of the VMs
`has been mapped to a PM, introducing a new PM when
`required.
`5) The set of PMs, at the end of step 4, each with possibly
`multiple associated VMs, comprise the new, consolidated
`server farm. Readers will surely associate this process with
`static bin packing techniques, which yields a sub-optimal
`mapping of VMs to physical servers.
`
`
`In this paper we start with such a mapping of VMs to
`physical servers and propose techniques that will provide
`two system management functions:
`a) Observe the performance of the key metrics of the
`operational virtualized systems (VMs) and as necessary
`(because of increased workload) or periodically (to
`
`optimize the consolidation when demand is less) move
`the virtual machines to maximize the performance;
`doing so while using the minimum number of physical
`servers.
`If any of the VMs report an SLA violation (e.g., high
`response time) perform dynamic re-allocation of VM(s)
`from the hosting PM to another physical machine such
`that the SLA is restored.
`
`b)
`
`III. RELATED WORK
`
`Related work is considered in two parts: the first looks at
`schemes for virtualization and server management, the
`second at the relevant algorithmic approaches of packing
`objects into bins, i.e., bin packing.
`Virtualization has provided a new dimension for today’s
`systems. Classic virtual machine managers (VMM) like
`VM/370 [20] have existed for quite some time but VMWare
`[8] and dynamic logical partitioning (DLPARs) [19] have
`provided an impetus for using virtual machines in new ways.
`There have been efforts to build open source virtual machine
`managers like Xen [6] which provides an open source
`In addition,
`framework
`to build custom
`solutions.
`virtualization has provided new avenues for research like
`Trusted Virtual Domains [23] and Grid Computing using
`VMs [12]. Virtualization at the application level is well
`addressed by products from Meiosys [18].
`VMWare ESX server (hypervisor) runs on the bare
`hardware and provides ability to create VMs and move them
`from one PM to another using VMotion [8]. It requires the
`PM to have shared storage such as SAN or NAS. The cited
`paper [5] provides an overview of the memory management
`scheme employed in the ESX server. VMWare has the
`Virtual Center which provides a management interface to the
`virtual farm. Although some metrics are provided by the
`
`
`
`
`
`VMware, Inc. Exhibit 1018 Page 3
`
`
`
`Virtual Center, they are not fine-grained and require
`extensive human
`interaction for use
`in management.
`Resource management of these virtual machines still needs to
`be addressed in a cost effective manner whether in a virtual
`server farm or in a grid computing environment as pointed
`out in [12]. We provide a solution to this problem through a
`dynamic management infrastructure to manage the virtual
`machine while adhering to the SLA. A wide variety of
`approaches exist
`in
`the
`literature
`to perform
`load
`management, but most of these approaches concentrate on
`how to re-direct incoming customer requests to provide a
`balanced workload distribution [1] (identical to a front-end
`sprayer which selectively directs incoming requests to
`servers). An example is Websphere XD [4], which uses a
`to manage workload and allocate
`dynamic algorithm
`customer requests. The ODR component of XD
`is
`responsible for load management and for efficiently directing
`customer requests to back-end replicas, similar to a sprayer.
`Use of replication to tackle high workload and provide fair
`allocation with fault-tolerance has been a common practice,
`but for a virtual farm where each server is running a different
`application, it is not applicable.
`The problem of allocating virtual machines to physical
`machines falls in the category of vector-packing problems in
`theoretical computer science (see survey [16]). It is known
`that finding optimal solutions to vector-packing (or its super-
`set Bin-packing and class-constrained packing problems) is
`NP-hard. Several authors including [9], [11] have proposed
`polynomial time approximate solutions (PTAS) to these
`problems with a low approximation ratio. Cited paper [10]
`gives an algorithm for a restricted job allocation problem
`with minimum migration constraint, but the problem does
`not allow for multiple jobs being assigned to a single
`machine. It is similar to the sprayer approach, developing a
`system which sits at the front end and makes decisions as to
`where to forward incoming requests. These approaches also
`assume that the size of the vectors and bins are fixed, i.e.,
`deterministic values are considered. In a virtual server
`environment, a VM’s utilization may change, thus making
`static allocation techniques unfit, and, instead, requiring
`accurate modeling and dynamic re-allocation. In order to
`precisely model the changing workload, authors in [13]
`propose stochastic load balancing in which probabilistic
`bounds on
`the resources are provided. Stochastic or
`deterministic packing solutions have largely looked at a static
`initial allocation which is close to the optimal.
`It is, however, significantly more challenging to design a
`dynamic re-allocation mechanism which performs allocation
`of VMs (vectors) at discrete time steps making the system
`self-adjusting to workload, without violating the SLA. Also,
`it is important to note that the problem of minimizing
`migrations among bins (VMs to PMs in our case) during re-
`allocation is still an open research problem. Specifically, in
`this paper we address the issue of dynamic re-allocation of
`VMs, minimizing the migration cost, where cost is defined in
`terms of metrics, such as CPU and memory usage.
`
`IV. PROBLEM FORMULATION
`
`We start with the environment described in the previous
`section, namely, an I/T environment consisting of a set of
`
`physical servers, each of which hosts one or more virtual
`machines (VM). In
`the
`interest of simplification of
`presentation, we assume that the physical server environment
`is homogeneous. Heterogeneous environments where PMs
`may have different resource capacities, e.g., CPU, memory,
`etc. can be handled by appropriate scaling of the migration
`cost matrix. Scaling is a widely used approach (see [11],
`[16]) to simplify the problem formulation bringing no change
`to the proposed solution (or algorithm). Typically each
`virtual machine implements one customer application. Due to
`workload changes, resources used by the VMs, (CPU,
`memory, disk and Network I/O) will vary, possibly leading
`to SLA violations. The objective of our research is to design
`algorithms that will be able to resolve SLA violations by
`reallocating VMs to PMs, as needed. Metrics representing
`CPU and memory utilization, disk usage, etc., are collected
`from both the VMs and the PMs hosting them using standard
`resource monitoring modules. Thus, from a resource usage
`viewpoint, each VM can be represented as a d-dimensional
`vector where each dimension represents one of the monitored
`resources. We model resource utilization of a virtual machine
`VMi as a random process represented by a d-dimensional
`utilization vector (Ui(t)) at time t. For a physical machine
`PMk the combined system utilization is represented by Lk(t).
`Each physical machine, say PMj, has a fixed capacity Cj in d-
`dimensional space. Assume that there are a total of n VMs
`which reside on m physical machines. The number of PMs
`(m) may change dynamically, as the algorithm proceeds, to
`meet the increasing workload requirements. At the initial
`state (t=0) the system starts with some predefined allocation
`(through server consolidation as outlined in Section II). As
`the state of the VMs change (due to changes in utilization), it
`causes utilization to exceed thresholds in the pre-defined
`allocation, leading to possible SLA violations. We propose a
`dynamic re-allocation of these stochastic vectors (Ui(t)) on
`the PMs to meet the required SLA. This dynamic algorithm
`runs at discrete time instances t0, t1,…,tk… to perform re-
`allocation when triggered via a resource threshold violation
`alert. In our model we assume a mapping of SLA to system
`resource utilization and hence thresholds are placed on
`utilization, exceeding which,
`triggers
`the re-allocation
`procedure. Below, we explain the nature of the inputs to the
`algorithm and the objective function that we attempt to
`optimize.
`The input includes a function which maps the individual
`resource utilization to the combined utilization of the
`physical machine, i.e., Lk(t) = f(U1(t), U2(t)..) for all VMs
`located on machine PMk. The combined utilization is usually
`considered as a vector sum in traditional vector-packing
`literature but it is not generally true for several shared system
`resources, like SAN and CPU, because of the overhead
`associated with resource sharing among VMs. The latency of
`SAN access grows non-linearly w. r. t. the applied load. If
`we look at the average response time Ravg(t) for all the VMs
`on the same PM, then it grows non-linearly as a function of
`the load on the physical machine (Figure 2). Let VMj’s
`resource utilization at time t be denoted by the vector:
`=
`)(
`[
`(
`),
`(
`)....
`(
`)]
`tU
`u
`ut
`t
`u
`t
`
`1
`2
`j
`j
`j
`jd
`We assume that a VM’s resource utilization for a specific
`resource is equivalent to the fraction of that resource used by
`
`
`
`
`
`VMware, Inc. Exhibit 1018 Page 4
`
`
`
`10
`
`20
`
`30
`
`40
`
`50
`
`60
`
`70
`
`80
`
`90
`
`Loa d
`
`10 0
`
`
`
`14
`
`12
`
`10
`
`8
`
`46
`
`2
`
`0
`
`0
`
`Response Time (sec)
`
`Figure 2: Response time VS Applied Load on a system dimension
`
`=
`)(
`[
`(
`).........
`...
`(
`)]
`),
`(
`),
`(
`tR
`trtrtr
`r
`t
`
`1
`2
`3
`m
`where ri(t) is the residual capacity vector of the ith physical
`machine at time t.
`Residual capacity for a resource, such as CPU or memory,
`in a given machine denotes the unused portion of that
`resource that could be allocated to an incoming VM. In order
`to keep the response time within acceptable bounds it is
`desirable that the physical machine’s utilization be below the
`threshold (knee). In some cases, such as batch applications,
`throughput rather than response time is the more critical SLA
`parameter.
` In such situations, thresholds can be set
`appropriately (from Figure 2) by specifying a higher value
`for acceptable response time. We would like to achieve
`maximum possible utilization for a given set of machines and
`avoid adding new physical machines unless necessary.
`The aim is to achieve a new allocation of the VMs, given a
`previous allocation, which minimizes the cost of migration
`and provides the same throughput. System performance is
`monitored consistently for a violation of an SLA, and re-
`allocation is triggered when a violation occurs and is
`performed at discrete time instances t1, t2… tk. System
`monitoring for system metrics can be performed using
`standard monitoring tools like IBM Director [24]. Because of
`the costs associated with migration and the use of new
`physical machines, it is implied that the residual capacity of a
`machine should be as low as possible and migrations should
`be minimized, thus bringing the new solution close to the
`previous one. It is important to note that low values of ri(t)
`might not be sufficient to accommodate an incoming VM
`during migration. Thus the goal of our algorithm is to keep
`the variance of the vector R(t) as high as possible. We will
`illustrate the concept behind this principle, using an example.
`Let each of the ci(t) be simply CPU utilizations (0 ! ci(t) !
`100) of the ith machine. Consider the following two vectors
`C(t) : [40, 50, 30] and [90, 0, 30]. The latter vector has a
`higher variance of the residual vector ([10, 100, 70]) with
`less number of machines having high utilization. Thus, this
`state is more likely to be able to accommodate a migrating
`VM, or, in some cases a new VM, when a new customer
`application is introduced. Alternatively, since PM2’s resource
`usage, represented by the second number, is 0, it can
`effectively be removed. This provides us with one of the
`
`this VM on the associated PM. If A denotes the set of VMs
`allocated to a physical machine PMk then the load on PMk in
`the ith dimension (i.e. the ith resource) is given by:
`=
`
`)(
`)(
`tL
`u
`t
`i
`
`ji
`
`Aj
`In general, it is hard to relate an SLA parameter, e.g.
`response time of a customer application, quantitatively to the
`utilization of the ith resource. The equation (1) approximates
`the non-linear behavior of the response time Ravg(t) as it
`relates to the load in the ith dimension on the physical
`machine. The ni is the knee of the system beyond which the
`response time rises exponentially and approaches infinity
`asymptotically. The variable k is a tuning parameter to adjust
`the initial slope of the graph. Authors in [1] use a similar
`function for customer utility associated with a given
`allocation of resources to a specific customer. Their function
`yields a linear increase below the knee. In real systems, in
`order to model the graph depicted in Figure 2, we need an
`asymptotic increase as utilization moves close to 1, which is
`yielded by equation (1).
`
`!∈
`
`)
`
`2
`
`+
`
`k
`
`
`
`]
`
`−
`
`n
`
`i
`)(
`tL
`i
`
`(
`
`)(
`tL
`i
`−
`
`1
`
`[(
`
`)(
`tL
`i
`
`−
`
`n
`
`i
`
`)
`
`+
`
`21
`
`R
`
`avg
`
`)(
`t
`
`=
`
`
`
`
`
`( 1)
`
`Equation (1) is a hyperbola which closely approximates
`the graph in Figure 2. One can obtain similar equations for
`multiple dimensions. To meet the SLA requirements in terms
`of response time, a system should (preferably) operate below
`the knee. In each resource dimension the knee could occur at
`different points, hence the set of knees can be represented as
`a vector. This serves as a threshold vector for triggering
`incremental re-allocation to lower the utilizations. The
`utilizations are constantly monitored and the algorithm
`ensures, through dynamic re-allocation of VMs to physical
`servers, that they stay below the threshold (knee). Since the
`Li(t) are modeled as random processes, checking for a
`threshold violation would be done as a probabilistic
`guarantee {P(Li(t)<ni)>!} which means that with probability
`! the utilization would remain below ni; this forms a
`constraint.
`The re-allocation procedure must consider the costs
`associated with performing the migration of VMs. These
`VMs are logical servers and may be serving real time
`requests. Therefore, any delay resulting from the migration
`needs to be considered as a cost. Use of a cost function also
`helps in designing an algorithm which is stable and does not
`cause frequent migration of machines. Let the cost of
`migration of one unit vector in d-dimension be denoted by
`the row vector Mc. It consists of migration cost coefficients
`for each dimension. These cost coefficients depend on the
`implementation of the virtual server migration. In this model
`we assume that the coefficient of Mc remains the same for all
`migrations. Thus the cost of migration of VMj is given by
`Mc.Uj(t). The cost of bringing in a new PM during migration
`is denoted by NBc which is assumed to be orders of
`magnitude larger than migration cost Mc. In short, this is
`because, introduction of a new machine incurs hardware,
`software, and provisioning costs. Let matrix R(t) denote the
`residual capacity of the system:
`
`
`
`
`
`VMware, Inc. Exhibit 1018 Page 5
`
`
`
`these weights are used to normalize each component to an
`equal scale. These can be specified by
`the system
`administrator and be fine tuned.
`Weights also reflect the importance of each cost/gain
`function. For example, in a system where it is relatively
`cheaper to perform migration of VMs across the physical
`servers and more expensive to add a new PM, w2 would be
`much lower as compared to w3. If an administrator would
`like a fair utilization across physical machines and would not
`like to reclaim a physical machine when utilization wanes,
`then s/he can reduce the weight w1. The constraint in
`Equation (2) represents the amount of load which each PM
`can hold without going over the threshold njk) and hence not
`violating the SLA.
`
`A. Design Challenges
`The general scenario of allocating VMs to physical
`machines is conceptually close to the classical vector-
`packing problem [11] making it NP-hard. As evident from
`the problem formulation, the number of migrations needs to
`be minimized and since the number of physical machines is
`not fixed, the use of techniques of relaxation to Linear
`Programming is not suitable. A solution involving LP would
`require re-solving the LP at each discrete time instance and a
`new solution might require a whole new re-allocation leading
`to a high migration cost. Solutions aiming to minimize the
`number of moves across physical machines (analogous to
`bin(s)) is still an open research problem. The solution
`approach must handle dynamic inclusion and exclusion of
`physical machines to satisfy the constraints. Cost to the
`customers can be calculated on the basis of usage of these
`PMs, providing greater flexibility at the user level.
`The general problem, as represented by Equation 2, is NP-
`hard. In a practical setting, such as a consolidated server
`environment which was introduced in Section II, we would
`like to implement a heuristic (PTAS) algorithm that can be
`executed online to address SLA problems arising from over
`utilization of resources. In the section below we present such
`an algorithm which outlines actions that can be performed to
`optimize each of the components separately.
`
`B. Algorithm
`Assume that PM1, PM2….PMm are the physical machines
`and VMij is the jth virtual machine on PMi. An initial
`allocation is already provided and the proposed dynamic
`management algorithm (DMA) focuses on the dynamic part.
`For each VM and its host physical machine the utilizations
`are monitored. Here utilization consists of the observed
`metrics like CPU, memory, etc. For each physical machine
`PMi, we maintain a list of all the virtual machines allocated
`to it in non-decreasing utilization order, i.e. the first virtual
`machine, VMi1, has the lowest utilization. Since migration
`cost is directly calculated based on the utilization, another
`way to look at the order is “Virtual Machines are ordered
`according to their migration costs within each Physical
`Machine”. For each physical machine we calculate and store
`its residual capacity ri(t). Additionally, we maintain the list
`of residual capacities in non-decreasing order of l2 norm (i.e.,
`magnitude of the vector). Without loss of generality we can
`represent the VMs as shown in Figure 3.The constraints, as
`indicated in equation (2) are constantly monitored and any
`violation of these constraints triggers the algorithm to
`
`members of the objective function that we have formulated
`i.e. maximizing the variance of the residual vector.
`When a resource threshold is violated and the migration
`algorithm is set in motion, there are three decisions which it
`needs to make, namely:
`a. Which physical machine (PM) to remove a VM (i.e.,
`migrate) from?
`b. Which VM to migrate from the chosen PM (from step
`1)?
`c. Which new PM to migrate the chosen VM (from step 2)
`to?
`Since thresholds are set on the utilization at each physical
`machine, violation of a threshold triggers the algorithm to
`determine which (one or more) of the VMs from the physical
`machine (at which the violation took place) needs to be
`migrated to another physical machine.
`More formally, let Xt be an n x m allocation matrix
`containing allocation variables xij equal to 1 if virtual
`machine i is allocated to physical machine j. Given an
`allocation at time t denoted by Xt we want to compute
`another allocation of machines at time t + " i.e. Xt+". The
`migrations performed by re-allocation is given by the
`migration matrix ZM (n x 1), obtained from the difference
`Xt+" - Xt and setting the rows with positive difference to be 1.
`The expected migration cost incurred by the new allocation
`is given by the scalar value:
`⋅
`⋅
`[
`)(
`tUME
`c
`M
`The problem in its most general form can be represented as
`follows:
`Max { w1Var(R(t))– w2
`
`T
`
`Z
`
`]
`
`
`
`⋅
`)(
`[
`tUME
`c
`
`T
`
`⋅
`
`Z
`
`M
`
`]
`
`- w
`3
`
`n.NBc}
`
`u
`
`ki
`
`.
`
`x
`
`ij
`
`<
`
`n
`
`jk
`
`)
`
`>
`
`ξ
`
`1;
`
`≤≤
`mj
`
` ( 2)
`
`n
`
`!=
`
`i
`
`1
`
`P
`
`(
`
`where n is the number of new physical machines brought in
`to accommodate the migrating VM if necessary. For a matrix
`M, Var(M) is defined as an L2 norm of the variance vector
`obtained by computing sample variances of each row. For
`example if M1 is a sample row with values [10 10 20], then
`variance of this row is 33.33. For a n x m matrix we first
`obtain a variance vector (say A) of size n x 1 such that
`element i of the vector A is a sample variance of the values in
`the ith row of M. Finally a L2 norm of the vector A gives the
`required scalar. Equation (2) expresses the SLA constraint
`which forces a physical machine’s total utilization in each
`n
`
`u
`
`ki
`
`⋅
`
`x
`
`ij
`
`i
`
`=
`1
`
`, to stay below the knee (njk).
`
`dimension i.e. !
`Here ! is the probability confidence with which the
`utilization is below the input capacity knee njk. This equation
`is true for all physical machines. Thus the optimization
`function in this formulation consists of costs/gains associated
`with each of
`the previously discussed metrics. The
`maximization function can be divided into three components,
`each with a configurable coefficient wi. The first sub-
`component reflects the gain because of Var(R(t)) which
`reflects how close the allocation is. The second term is the
`migration cost compared to the previous allocation. The last
`term is the cost incurred because of adding n new servers.
`Each of wi represent the amount of weight the sub-
`component has on the objective function, in other words
`
`
`
`
`
`VMware, Inc. Exhibit 1018 Page 6
`
`
`
`
`
`
`
`
`
`VM14VM14VM14
`
`
`
`
`
`VM13VM13VM13
`
`
`
`
`
`VM12VM12VM12
`
`
`
`
`
`VM11VM11VM11
`
`
`
`
`
`PM1PM1PM1
`
`
`Mc(V11) <Mc(V12)..< Mc(V11) <Mc(V12)..<
`
`Mc(V1j)Mc(V1j)
`
`
`
`VM23VM23
`
`
`VM22VM22
`
`VM21VM21
`
`
`
`PM1PM1
`
`
`
`VMk3VMk3
`
`
`
`VMk2VMk2
`
`
`
`VMk1VMk1
`
`
`
`…………
`
`PMkPMk
`
`
`Height represents Height represents
`
`the utilization of a the utilization of a
`
`VMVM
`
`
`
`Figure 3: The VMs on each PM are ordered with respect to their
`Migration Costs.
`
`perform re-allocation. Because the resource variations are not
`predictable, the times at which the algorithm runs is not pre-
`determined. Since our problem settings dictate minimum
`residual space, we keep lower bounds on utilization as well.
`An instance of the utilization falling below a low mark for
`one or more physical machines, would trigger the same
`algorithm, but for garbage collection, i.e., reclaiming the
`physical machine (emptying it of VMs). Assume for physical
`machine PMk the constraints are violated. Hence a VM from
`the physical machine PMk must be migrated. We use the
`residual capacity list to decide the destination physical
`machine.
`We select the VM from PMk with the lowest utilization
`(i.e., the least migration cost) and move it to a physical
`machine which has the least residual capacity big enough to
`hold this VM.The process of choosing the destination
`physical machine is done by searching through the ordered
`residual capacity list (requires O(log(m) time). After moving
`VMk1 the residual capacities are re-calculated and the list is
`re-ordered. Moving VMk1 might not yet satisfy the SLA
`constraints on PMk, so we repeat this process for the next
`lowest utilization VM i.e. VMk2 until we satisfy the
`constraints. Since the destination PM (say PMj) is chosen
`only if it has enough residual capacity to hold the VM
`(VMk1), allocating VMk1 to PMj does not violate the SLA
`constraints on that machine. In case the algorithm is unable
`to find a physical machine with big enough residual capacity
`to hold this VM, then it instantiates a new physical machine
`and allocates the VM to that machine. As a pre-condition to
`performing the migration, we compare the sum of residual
`capacities with the (extra needed) utilization of physical
`machine PMk. If residual capacities are not enough to meet
`the required extra need of machine PMk then we introduce a
`new physical machine and re-check. This process a