`of Dynamic, Distributed Real-time Systems
`
`Lonnie R. Welch, Binoy Ravindran, Behrooz A. Shirazi and Carl Bruggeman
`Computer Science and Engineering Dept.
`The University of Texas at Arlington
`Box 19015, Arlington, TX 76019-0015
`{welch|binoy|shirazi|bruggema}@cse.uta.edu
`
`Abstract
`Time-constrained systems which operate in dynamic envi-
`ronments may have unknown worst-case scenarios, may
`have large variances in the sizes of the data and event sets
`that they process (and thus, have large variances in exe-
`cution latencies and resource requirements), and may not
`be statically characterizable, even by time-invariant sta-
`tistical distributions. This paper presents a specification
`language for describing environment-dependent features.
`Also presented is an abstract model that is constructed
`(statically) from the specifications, and is augmented (dy-
`namically) with the state of environment-dependent fea-
`tures. The model is used to define techniques for QoS
`(quality-of-service) monitoring, QoS diagnosis, and re-
`source allocation analysis. Experimental results show the
`effectiveness of the approach for specification of real-time
`QoS, detection and diagnosis of QoS failures, and restora-
`tion of acceptable QoS via reallocation of distributed
`computer and network resources.
`
`1. Introduction
`
`Many difficulties confront those who must engineer the
`emerging generation of real-time control systems. Such
`systems have rigorous Quality of Service (QoS) objectives.
`They must behave in a dependable manner, must respond
`to threats in a timely fashion and must provide continuous
`availability, even within hazardous and unknown envi-
`ronments. Furthermore, resources should be utilized in an
`efficient manner, and scalability must be provided to ad-
`dress the ever-increasing complexity of scenarios that con-
`front such systems. The difficulties in engineering such
`systems arise from several phenomena, one of the most
`perplexing being the dynamic environments in which they
`must function. Systems which operate in dynamic envi-
`ronments may have unknown worst-case scenarios, may
`have large variances in the sizes of the data and event sets
`that they process (and thus, have large variances in execu-
`tion latencies and resource requirements), and cannot be
`
`characterized (accurately) by constants, by intervals or
`even by time-invariant statistical distributions.
`This paper presents techniques for specification and
`modeling of real-time Quality-of-Service for distributed
`systems that operate in dynamic environments. The tech-
`niques are based on a programming-language-independent
`meta-language for describing real-time QoS in terms of
`end-to-end paths through application programs. Constructs
`are provided for the specification and modeling of deter-
`ministic, stochastic, and dynamic characteristics of envi-
`ronment-dependent paths. The provision for description of
`periodic, transient and hybrid (transient-periodic) paths is
`also made. Another novel feature of the language is that it
`allows the description of multi-level timing constraints
`through (1) simple, cycle deadlines (which apply to a sin-
`gle cycle of a periodic, transient or transient-periodic path)
`and (2) super-period deadlines (which apply to multiple
`cycles of a transient-periodic path). The language and
`model also consider the scalability and fault tolerance
`features of the end-to-end paths and their application pro-
`gram constituents.
`This is significantly different from other real-time ap-
`plication development languages and other real-time meta-
`languages. Real-time application development languages
`(e.g., Tomal [12], Pearl [17], Real-Time Euclid [13],
`RTC++ [9], Real-Time Concurrent C [7], Dicon [14],
`Chaos [3], Flex [16], TCEL [5], Ada95 [2], and MPL [19])
`include a wide variety of features that allow assertion
`checking and code transformation to ensure adherence to
`timing constraints. Specification languages or “meta-
`languages” (such as ACSR [4], GCSR [1], [23], CaRT-
`Spec [27] and RTL [10]) formalize the expression of tim-
`ing constraints and in some cases allow proofs of proper-
`ties of programs based on these constraints. In [6], the
`features of [10] have been folded into an application de-
`velopment language. Our work is a specification meta-
`language that is application-language-independent. Rather
`than providing real-time support within a particular appli-
`cation language, it provides support for expressing timing
`
`Authorized licensed use limited to: Riva Laughlin. Downloaded on July 01,2024 at 18:48:34 UTC from IEEE Xplore. Restrictions apply.
`
`Ex.1006 / Page 1 of 10
`TESLA, INC.
`
`
`
`constraints for systems of application programs, written in
`a melange of programming languages. Unlike previous
`work, in which timing constraints are associated at a rela-
`tively small granularity (e.g., task-level), our language
`allows timing constraints to be expressed the granularity of
`end-to-end paths which span many communicating pro-
`grams.
`Another way of categorizing real-time languages and
`models is in the way they permit characterization of a sys-
`tem’s behavior when interacting with the physical envi-
`ronment. Prior work has typically assumed that the effects
`of the environment on the system can be modeled determi-
`nistically. Our work expands this to include systems that
`interact with environments that are either deterministic,
`stochastic, or dynamic. This is done by modeling interac-
`tions with the environment as data and event streams, that
`may have deterministic, stochastic or dynamic properties.
`In order to handle dynamic environments, it is useful if
`the language includes features that can be related to dy-
`namic mechanisms (such as those described in [20] [8]
`[21] [22] [24]) for monitoring, diagnosis and recovery.
`Language support for run-time monitoring of real-time
`programs has been addressed in [20], [6], [11], [13], and
`[2]. However, limited support was provided for diagnosis
`and recovery actions. Our work extends the language fea-
`tures pertaining to diagnosis of timing problems, and the
`migration or replication of software components to handle
`dynamic data stream or event stream loads (scalability).
`The specification language also provides support for fault-
`tolerance, an issue that has not been addressed in most
`real-time languages.
`Previous real-time languages allow the description of
`behaviors that are purely periodic or aperiodic. Our work
`extends language support to describe hybrid behaviors
`such as the transient-periodic described in [26]. We also
`allow for multi-dimensional timing constraints, which, to
`our knowledge, cannot be described in any existing real-
`time language.
`The remainder of the paper presents the specification
`language and shows how it is used for adaptive QoS man-
`agement. Section 2 explains the software paradigm used in
`the specification language. In Section 3, we present the
`grammar of the specification language and illustrate its use
`with examples. Section 4 presents a static system model
`that is constructed from specifications. Section 5 defines a
`dynamic model, which is built at run-time by augmenting
`the static model with environment-dependent state. Use of
`the dynamic and static system models for adaptive re-
`source and QoS management is illustrated in Section 6.
`Section 7 presents experimental results of applying the
`language, the models and the resource and QoS manage-
`ment techniques to a distributed real-time application.
`
`2. The dynamic real-time path paradigm
`
`This section presents the dynamic real-time path para-
`digm. A path-based real-time subsystem (see Figure 1)
`typically consists of a situation assessment path, an action
`initiation path and an action guidance path. The paths in-
`teract with the environment via evaluating streams of data
`from sensors, and by causing actuators to respond (in a
`timely manner) to events detected during evaluation of
`sensor data streams. The system operates in an environ-
`ment that is either deterministic, stochastic or dynamic. A
`deterministic environment exhibits behavior that can be
`characterized by a constant value (see [19]). A stochastic
`environment behaves in a manner that can be characterized
`by a statistical distribution (see [15]). A dynamic environ-
`ment (e.g., a war-fighting environment) depends on condi-
`tions which cannot be known in advance (see [22], [29]).
`A (partial) air defense subsystem can be modeled using
`three dynamic paths: threat detection, engagement, and
`missile guidance. The threat detection path examines radar
`sensor data (radar tracks) and detects potential threats. The
`path consists of a radar sensor, a sensor data stream, a fil-
`tering program and an evaluation program. When a threat
`is detected and confirmed, the engagement path is acti-
`vated, resulting in the firing of a missile to engage the
`threat. After a missile is in flight, the missile guidance path
`uses sensor data to track the threat, and issues guidance
`commands to the missile. The missile guidance path in-
`volves sensor hardware, software for filtering, software for
`evaluating & deciding, software for acting, and actuator
`hardware.
`The remainder of this section describes the features of
`dynamic real-time paths. Recall that a path may be one of
`three types: situation assessment, action initiation, or ac-
`
`Data streams
`
`real-time
`subsystem
`
` sensors
`
`assessment
`
`environment
`
`actuators
`
`initiation
`
`guidance
`
`event
`
`event
`
`Event streams
`
`Figure 1. A real-time subsystem.
`
`tion guidance. The first path type, situation assessment,
`continuously evaluates the elements of a sensor data
`stream to determine if environmental conditions are such
`that an action should be taken (if so, the action initiation
`path is notified). Thus, this type of path is called continu-
`
`Authorized licensed use limited to: Riva Laughlin. Downloaded on July 01,2024 at 18:48:34 UTC from IEEE Xplore. Restrictions apply.
`
`Ex.1006 / Page 2 of 10
`TESLA, INC.
`
`
`
`ous. Typically, there is a timeliness objective associated
`with completion of one review cycle of a continuous path,
`i.e., on the time to review all of the elements of one in-
`stance of a data stream. (Note: A data stream is produced
`by sampling the environment. One set of samples is called
`a data stream instance.)
`The threat detection path of the air defense system is an
`example of a continuous path. It is a sensor-data-stream-
`driven path, with desired end-to-end cycle latencies for
`evaluation of radar track data. If it fails to meet the desired
`timeliness Quality of Service in a particular cycle, the path
`must continue to process track data, even though desired
`end-to-end latencies cannot be achieved. Peak loads cannot
`be known in advance for the threat detection path, since
`the maximum number of radar tracks can never be known.
`Furthermore, average loading of the path is a meaningless
`notion, since the variability in the sensor data stream size
`is very large (it may consist of zero tracks, or it may con-
`sist of thousands of tracks).
`The second path type, action initiation, is driven by a
`stream of events sent by a continuous (situation assess-
`ment) path. It uses inputs from sensors to determine which
`actions should be taken and how the actions should be
`performed, notifies actuators to start performing the ac-
`tions, and informs the action guidance path that an action
`has been initiated. We call this type of path transient, since
`it performs work in response to events. Typically, a timing
`objective is associated with the completion of the initiation
`sequence. The importance of the timing objective for a
`transient path is often very high, since performance of an
`action may be mission-critical or safety-critical.
`For example, the engagement path of the air defense
`example is a transient path. It is activated by an event from
`the threat detection path, and has a QoS objective of end-
`to-end timeliness. The real-time QoS of this path has a
`higher priority than the real-time QoS of the continuous
`threat detection path.
`The third path type, action guidance, is activated by an
`action initiation event, and is deactivated upon completion
`of the action. Action guidance repeatedly uses sensor in-
`puts to monitor the progress of an actuator, to plan correc-
`tive actions needed to guide the actuator to its goal, and to
`issue commands to the actuator. This type of path is called
`quasi-continuous, since it behaves like a continuous path
`when it is active. A quasi-continuous path has two timeli-
`ness objectives: (1) cycle completion time: the duration of
`one iteration of the “monitor, plan, command” loop, and
`(2) action completion time (or deactivation time): the time
`by which the action must complete in order for success.
`Note that it is more critical to perform the required proc-
`essing before the action completion deadline than it is to
`meet the completion time requirement for every cycle (al-
`though the two deadlines are certainly related). Thus, it is
`acceptable for the completion times of some cycles to
`violate the cycle deadline requirement, as long as the de-
`
`sired actions are successfully completed by the deactiva-
`tion deadline.
`The missile guidance path of the air defense example is
`a quasi-continuous path. It is activated by the missile
`launch event. Once activated, the path continuously issues
`guidance commands to the missile until it detonates (the
`deactivation event). The required completion time for one
`iteration is dynamically determined, based on characteris-
`tics of the threat. If multiple threat engagements are active
`simultaneously, the threat engagement path is responsible
`for issuing guidance commands to all missiles that have
`been launched.
`
`3. Grammar of the specification language
`
`This section presents a specification language for de-
`scribing the characteristics and requirements of dynamic,
`path-based real-time systems. The language provides ab-
`stractions to describe the properties of the software, such
`as hierarchical structure, inter-connectivity relationships,
`and run-time execution constraints. It also allows descrip-
`tion of the physical structure or composition of the hard-
`ware such as LANs, hosts, interconnecting devices or ICs
`(such as bridges, hubs, and routers), and their statically
`known properties (e.g., peak capacities). Further, the
`Quality-of-Service requirements on various system com-
`ponents can be described. In this paper, we illustrate the
`concrete syntax to describe only the software and the
`Quality-of-Service objectives.
`As shown in Figure 2, a subsystem is specified by de-
`scribing its priority, sets of constituent applications and
`devices, a set of end-to-end real-time path definitions, and
`a graph representing the communication connectivity of
`the applications and devices. The non-terminals of the sub-
`system grammar are explained in a depth-first manner in
`the remainder of this section.
`As seen in the grammar of Figure 2 and the example of
`Figure 3, the attributes of an application include boolean
`variables that indicate whether the application (1) can be
`replicated for survivability and (2) can support scalability
`via load sharing among replicas. Indications of whether an
`application combines its input stream (which may be re-
`ceived from different predecessor applications and/or de-
`vices), and splits its output stream (which may be distrib-
`uted to different successor applications and/or devices) are
`also specified. Application attributes also include all in-
`formation necessary to startup and shutdown applications
`(not elaborated in this paper). The connectivity of the sub-
`system describes the flow of data and events between the
`applications and devices of the subsystem, and is described
`as a sequence of ordered application and/or device pair
`names within parentheses. Alternately, one may use the
`square bracket set notation, which is a short hand to repre-
`sent a complete graph.
`
`Authorized licensed use limited to: Riva Laughlin. Downloaded on July 01,2024 at 18:48:34 UTC from IEEE Xplore. Restrictions apply.
`
`Ex.1006 / Page 3 of 10
`TESLA, INC.
`
`
`
`The definition of a path (see Figure 4) includes a set of
`constituent applications, various attributes, QoS require-
`ments and data/event stream definitions. The attributes of a
`path include priority, type, and importance. Path type,
`which defines the execution behavior of the path, is either
`continuous, transient, or quasi-continuous. The importance
`attribute (a string) is interpreted as the name of a dynami-
`cally linked library procedure that may be passed argu-
`ments such as priority and the current time, and returns an
`integer value that represents the importance of the path.
`A real-time QoS specification includes timing con-
`
`<sw-sub-system>::
`
`<appln-defn>::
`
`<device-defn>
`<connectivity-descr>::
`<graph-defn>::
`
`<pair-wise-descr>::
`<complete-graph-descr>::
`
`Subsystem ID "{" Priority INT_LITERAL
`";" <appln-defn>+ <device-defn> { <path-
`defn> }* <connectivity-descr> "}"
`Application ID "{" Scalable STRING ";"
`Survivable STRING ";" Combining
`STRING " ;"Splitting [NONE | EQUAL |
`STRING]";" <startup-block>+ <shutdown-
`block>+ "}"
`Device { ID ","}* ID ";" | l
`Connectivity "{" <graph-defn>+ "}"
`<pair-wise-descr> | <complete-graph-descr>
`| ID
`"(" ID "," ID ")"
`"[" { ID }+ "]"
`
`Figure 2. Grammar for subsystems.
`
`Application FilterManager { Scalable NO; Survivable YES; Combining NO;
` Splitting EQUAL; Startup{ . . . }; Shutdown { . . . } }
`Device Sensor, Actuator;
`Connectivity { (Sensor, FilterManager) (FilterManager, Filter)
` (Filter, EvalDecideManager)
` (EvalDecideManager, EvalDecide) (EvalDecide, ActionManager)
` (EvalDecide, MonitorGuideManager) (ActionManager, Action)
` (Action, Actuator) (MonitorGuideManager, MonitorGuide) }
`
`Figure 3. Application, device and
`connectivity specifications.
`
`straints such as simple deadlines, inter-processing times,
`throughputs, and super-period deadlines. A simple dead-
`line is defined as the maximum end-to-end path latency
`during a cycle of a continuous or quasi-continuous path, or
`during an activation of a transient. Inter-processing time is
`defined as a maximum allowable time between processing
`of a particular element of a continuous or quasi-continuous
`path’s data stream in successive cycles. The throughput
`requirement is defined as the minimum number of data
`items that the path must process during a unit period of
`time. A super-period deadline is defined as the maximum
`allowed latency for all cycles of a quasi-continuous path.
`A super-period deadline is specified as the name of a dy-
`namically linked library procedure that is called dynami-
`cally to determine the estimated super-period deadline.
`Each timing constraint specification may also include
`items that relate to the dynamic monitoring of the con-
`straint. These include minimum and maximum slack val-
`ues (that must be maintained at run-time), the size of a
`moving window of measured samples that should be ob-
`
`<path-type>::
`<Real-TimeQoS>::
`<RTQoS-metric>::
`
`<threshold>::
`
`
`Path ID "{" <appn-set> <path-attribs>
`<path-defn>::
` <Real-TimeQoS><SurvivabilityQoS><stream-defns> "}"
`<appn-set>::
`Contains "{" { <app-name> ","}* <app-name> "}"
`<app-name>::
`ID
`<path-attribs>::
`Priority INT_LITERAL";" Type <path-type> ";"
` Importance STRING";"
`Continuous | Transient | Quasi-Continuous
`Real-TimeQoS "{" <RTQoS-metric>+ "}"
`SimpleDeadline FLOAT_LITERAL ";" <threshold>
`| Inter-ProcessingTime FLOAT_LITERAL ";"
` <threshold>
`| Throughput INT_LITERAL ";"<threshold>
`| Super-PeriodDeadline STRING ";" <threshold>
`MaxSlack INT_LITERAL";"
`MinSlack INT_LITERAL ";"
`MonitorWindowSize INT_LITERAL ";"
`Violations INT_LITERAL ";" | l
`SurvivabilityQoS "{" Survivable STRING;
`MinCopies INT_LITERAL "}"
`<DataStream-defn> | <EventStream-defn> |
`<DataStream-defn> <EventStream-defn>
`
`<SurvivabilityQoS>::
`
`<stream-defns>::
`
`Figure 4. Grammar for paths.
`
`Path Sensing {
` Contains {Sensor, FilterManager, Filter, EvalDecideManager, EvalDecide}
` Type Continuous; Priority 2; Importance "CalcImport";
` Real-TimeQoS {SimpleDeadline 4; //secs
` MaxSlack 80; MinSlack 20; //PERCENT
`
` MonitorWindowSize 20; Violations 15;
`
` Inter-ProcessingTime 7; // secs
`
` Throughput 200; // data elements per second }
` SurvivabilityQoS { Survivable YES; MinCopies 2; }
`
`Figure 5. A path specification.
`
`served, and the maximum tolerable number of violations
`(within the window).
`Figure 4 also contains the grammar for describing sur-
`vivability QoS. A survivability QoS specification includes
`a boolean variable that indicates (1) whether the path
`should be managed to ensure survivability and (2) the
`minimum required level of redundancy. Note that repli-
`cating a path entails replicating all of the applications that
`make up the path.
`An example path specification is given in Figure 5. The
`“Sensing” path is continuous and has a priority of 2. It has
`a Simple Deadline of 4 seconds, which means that each
`review cycle must not have a latency that exceeds this
`amount. The MaxSlack and MinSlack definitions further
`constrain this requirement to the interval [0.8seconds,
`3.2seconds]. If 15 out of the most recent 20 cycles violate
`this requirement, then corrective action by a resource man-
`ager is required. If the upper bound is exceeded, the cor-
`rective action would be to allocate more resources for the
`path. In the case where the lower bound is exceeded, the
`action would be to take away some of the resources of the
`path. The path inter-processing time is 7 seconds, meaning
`that no more than this amount of time should elapse be-
`tween reviews of a particular data stream element in suc-
`cessive cycles. The path throughput must be at least 200
`data stream elements per second. The path also requires
`
`Authorized licensed use limited to: Riva Laughlin. Downloaded on July 01,2024 at 18:48:34 UTC from IEEE Xplore. Restrictions apply.
`
`Ex.1006 / Page 4 of 10
`TESLA, INC.
`
`
`
`DataStream {
` Type Deterministic;
` Size 40;}
`
`EventStream {
` Type Dynamic; }
`
`
`DataStream {
`
`EventStream {
`
`Type Stochastic;
`Size "/home/ uta/MGDataStream.data";}
`
`Type Stochastic;
`Rate "Exponential 0.5" ; }
`
`Figure 7. Specifications of streams.
`
`<DataStream-defn>:: DataStream "{" Type [Deterministic | Stochastic | Dynamic ] ";"
` Size <environ-descr> ";" "}"
`<EventStream-defn>:: EventStream "{" Type [Deterministic | Stochastic | Dynamic ] ";"
` Rate <environ-descr> ";" "}"
`<environ-descr>:: INT_LITERAL
` | "(" INT_LITERAL "," INT_LITERAL ")" | <pdf-descr> | l
`<pdf-descr>:: STRING | FILENAME
`Figure 6. Grammar for streams.
`
`the existence of at least 2 copies of each of its member
`applications at all times.
`Figure 6 shows the grammar for describing stream
`properties. A corresponding specification is illustrated in
`Figure 7. The stream type can be deterministic, stochastic,
`or dynamic. The data stream size or event arrival rate of a
`deterministic stream is a constant (scalar or interval). A
`stochastic stream has a data stream size or an event arrival
`rate that is characterized by a probability distribution
`function. The distribution is described as a string contain-
`ing the name of a distribution and its parameters, or is de-
`scribed as the name of a data file containing a data set that
`characterizes the stream’s behavior. The data stream size
`or event arrival rate of a dynamic stream is not described
`in the specification, since it must be observed at run time.
`
`4. Static system model
`
`This section presents an overview of a static system
`model (which is constructed by a specification compiler)
`for a single subsystem.
`A software subsystem, SS, consists of a set of applica-
`tions SS.A = { a1, a2,…..}, a set of devices (sensors and
`actuators) SS.D = { d1, d2,…..}, a communication graph for
` P
`((SS.D ¨
` SS.A) ·
`applications and devices G (SS) ˛
`(SS.D ¨
` SS.A)), and a set of paths SS.P = {P1, P2, P3, …}.
`(Note: P denotes power set).
`Each path Pi is represented as a type t (Pi) ˛
` {c, qc, t}
`(note that ‘c’, ‘qc’ and ‘t’ denote ‘continuous’, ‘quasi-
`continuous’ and ‘transient’, respectively), a set of applica-
`tions Pi.A = {ai,1, ai,2,…..} (where Pi.A ˝
` SS(Pi).A), a set of
`devices Pi.D = {di,1, di,2,…..} (where Pi.D ˝
` SS(Pi).D), a
`((Pi.D ¨
` Pi.A) ·
` (Pi.D ¨
`communication graph g (Pi) ˛
` P
`Pi.A)) (note that g (Pi) ˝
` G (SS(Pi))), a data stream Pi.DS
`(defined if t (Pi) ˛
` {c, qc}), and an event stream Pi.ES
`(defined if t (Pi) ˛
` {t, qc}). SS(Pi) denotes the subsystem
`in which path Pi is contained, ROOT(G(Pi)) is used to
`denote the root application node of g (Pi) (the node which
`
`receives a data stream from a sensor or an event stream
`from an application) and SINK(G(Pi)) is used to represent
`the sink application of g (Pi) (the application which com-
`municates with an actuator or sends an event to the root
`application of another path). The type of Pi’s datastream is
`defined as t (Pi.DS)˛ {dynamic, stochastic, deterministic}.
`s (Pi.DS) denotes the size characteristics of Pi’s data-
`stream.
`The real-time QoS requirements of a path are: required
`l REQ(Pi) seconds, required throughput of
`latency of
`q REQ(Pi) data stream elements/time, and required data in-
`ter-processing time of d REQ(Pi) (a maximum allowable
`time between processing of a particular element of Pi.DS
`in successive cycles). To mask momentary QoS-spikes
`during QoS monitoring, a specification may define a sam-
`pling window and a maximum number of QoS violations
`to be tolerated within the window. w (Pi) models the sam-
`pling window size and u (Pi) represents the maximum al-
`lowable number of violations (within sampling window
`w (Pi)). y
`(Pi) = [y min(Pi), y max(Pi)] is the required slack
`interval for each QoS requirement; i.e., it is required that
`the ratio (required QoS - actual QoS) : (required QoS) be
`no less than y min(Pi) and no greater than y max(Pi).
`REPLICABLE(ai,j) --- has a true or false value to indi-
`cate if application ai,j can be replicated for the purpose of
`load sharing. COMBINING(ai,j) --- has the value of true
`or false, indicating if application ai,j combines the inputs
`from its predecessors before passing the values to succes-
`sor(s). SPLITTING(ai,j) ˛
` (none, equal, non-equal) indi-
`cates whether an application splits it outputs among its
`successors, and if so, whether the outputs equally.
`
`5. Dynamic system model
`
`This section presents a model of such dynamic features
`as the current environment, its effect on QoS, a mapping of
`software to computation and communication hardware,
`and the number of replicas of each application.
`The set of replicas of ai,j during cycle ‘c’ of Pi is defined
`as REPLICAS (ai,j, c) = {ai,j,1, ai,j,2, …}. The host to which
`ai,j,k is assigned during cycle ‘c’ of Pi is defined as HOST
`(ai,j,k, c, Pi), and the communication path (set of LANs and
`interconnection devices (ICs)) used during cycle ‘c’ of Pi
`for messages between applications ai,j,x and ai,j,y is repre-
`sented as COMMPATH (ai,j,x, ai,j,y, c, Pi).
`The set of elements that constitutes a data stream can
`vary dynamically. Pi.DS(c)={Pi.DS(c)1, Pi.DS(c)2,…} rep-
`resents the set of elements in Pi.DS during cycle ‘c’ of Pi.
`The tactical load (in number of data stream elements
`processed) of a (quasi)-continuous path Pi during it’s cth
`cycle is |Pi.DS(c)|. The processing of elements of a data
`stream may be divided among replicas of an application to
`exploit concurrency as a means of decreasing execution
`latency of a path. In successive stages of a path that has
`
`Authorized licensed use limited to: Riva Laughlin. Downloaded on July 01,2024 at 18:48:34 UTC from IEEE Xplore. Restrictions apply.
`
`Ex.1006 / Page 5 of 10
`TESLA, INC.
`
`
`
`non-combining applications (applications which, after
`processing data received from a single predecessor, simply
`divide the data among their successors), data will arrive in
`batches to applications; hence, each application may proc-
`ess several batches of data during a single cycle. Thus, the
`model represents the batches of data processed by applica-
`tion/replica ‘a’ during cycle ‘c’.as Pi.DS(c, a)={Pi.DS(c,
`a)1, Pi.DS(c, a)2,…} The cardinality |Pi.DS(c).b(a)| is
`called the tactical load of ‘a’, and can be calculated by
`considering the REPLICAS, SPLITTING, and COMBIN-
`ING attributes of the system model. The data stream ele-
`ments contained in the jth batch of ‘a’ are denoted by
`Pi.DS(c, a, j)={Pi.DS(c, a, j)1, Pi.DS(c, a, j)2,…}.
`
`6. Dynamic QoS analysis techniques
`
`This section illustrates the use of the dynamic and static
`system models for adaptive QoS management.
`Monitoring of real-time QoS involves the collection of
`time stamped events sent from applications. The times
`when application/replica ‘a’ starts and ends processing
`batch ‘j’ of data during cycle ‘c’ of Pi are denoted by
`s(Pi.DS(c, a, j)) and e(Pi.DS(c, a, j)), respectively. The
`times when ‘a’ starts and ends processing of data stream
`element Pi.DS(c, a)k are represented as s(Pi.DS(c, a)k) and
`e(Pi.DS(c, a)k), respectively.
`Observed real-time QoS metrics are defined in terms of
`these basic events as follows: (1) latency of path Pi during
`cycle ‘c’ = l OBS(Pi, c) = {e(Pi.DS(c, ai,m,n, j)) – s(Pi.DS(c,
`ai,x,1, 1)) | ai,m = SINK(g (Pi)), ai,x = ROOT(g (Pi)}; (2)
`throughput of Pi during cycle ‘c’ = q OBS(Pi, c) = |Pi.DS(c)| /
`max(l OBS(Pi, c)); (3) data-inter-processing time of applica-
`tion ‘a’ for datum Pi.DS(c, a)k during cycle ‘c’ =
`d OBS(Pi.DS(c, a)k) = s(Pi.DS(c, a)k) - s(Pi.DS(c-1, a)k), for c
`> 1; (4) data-inter-processing time of application ‘a’ during
`cycle ‘c’ = d OBS(Pi.DS(c, a)) = f(d OBS(Pi.DS(c, a)k)), for all
`‘k’ in the range [1, |Pi.DS(c, a)|]. (E.g., f can be defined as
`‘average’ or as ‘max’.); (5) data-inter-processing time of Pi
`during cycle ‘c’ = d OBS(Pi.DS(c)) = f(d OBS(Pi.DS(c, a))), for
`all ‘a’ in Pi.A. (E.g., f can be defined as ‘average’ or as
`‘max’.)
`Analysis of a time series of the real-time QoS metrics
`enables detection of QoS violations. An overload of a path
`occurs in any cycle ‘c’ wherein u (Pi) £
` |{d: c-d+1 < w (Pi)
` [ [l REQ(Pi) - max(l OBS(Pi,c))] / l REQ(Pi) < y min(Pi) ] }|.
`Similarly, an underload of a path occurs in any cycle ‘c’
`wherein u (Pi) £
` |{d: c-d+1 < w (Pi) (cid:217)
` [ [l REQ(Pi) -
`max(l OBS(Pi,c))] / l REQ(Pi) > y max(Pi) ] }|.
`When a path-level (end-to-end) real-time QoS violation
`occurs, diagnosis determine the cause(s) of the violation
`(i.e., identifies subpaths (application programs) that are
`experiencing significant slowdown). One diagnosis tech-
`nique declares an application/replica ‘a’ to be unhealthy
`during cycle ‘c’ of path Pi if there exists another cycle ‘d’
`
`such that the following conditions hold. Condition 1: d <
`c. Condition 2: HOST (a, c, Pi) = HOST (a, d, Pi). Condi-
`tion 3: |Pi.DS(c, a)| = |Pi.DS(d, a)|. Condition 4: "
`f: (f <
` [HOST (a, c, Pi) = HOST (a, f, Pi)] (cid:217)
`c) (cid:217)
` [|Pi.DS(c, a)| =
`|Pi.DS(f, a)|] (cid:217)
` max(l OBS(Pi,f)) > max(l OBS(Pi,d)). Condi-
`tion 5: max(l OBS(Pi,d)) < max(l OBS(Pi,c)) - e . Note: e is the
`minimal difference between cycle latencies that is consid-
`ered significant.
`Following diagnosis, allocation analysis identifies and
`ranks recovery actions for the applications/replicas diag-
`nosed as unhealthy. The first step of allocation analysis is
`to determine if the tactical load of an unhealthy application
`has significantly increased in the recent past. The tactical
`load of an application/replica ‘a’ has increased in the a
`cycles immediately prior to cycle ‘c’ by at least a signifi-
`cant amount ‘D ’ when: $ d: c>d (cid:217)
` c-d<a +1 (cid:217)
` (|Pi.DS(c, a)|
`> | Pi.DS(d, a)| + D ). The tactical load change analysis is
`used to determine the appropriate allocation action for an
`unhealthy application as follows. Replicate an unhealthy
`application/replica ‘a’ when its tactical load has increased
`recently by a significant amount. We denote such an action
`as: act(a) = (a, replicate). Move an ‘a’ when its tactical has
`not increased recently by a significant amount. We denote
`such an action as: act(a) = (a, move). We define
`ACT={act1, act2,…} as the set of actions selected, where
`acti is a pair (a,action) and denotes that ‘action’ is recom-
`mended for ‘a’.
`The second step of allocation analysis is to group ac-
`tions which address the same cause of poor path QoS. All
`actions that move applications/replicas from a particular
`host are grouped, since any one of those actions will re-
`duce the contention experienced by applications on the
`host. All actions that involve replication of a particular
`application ‘a’ are also grouped, since the addition of an-
`other replica of ‘a’ will cause a



