throbber
Distributed, Scalable, Dependable Real-Time Systems:
`Middleware Services and Applications
`
`Lonnie R. Welch1, Binoy Ravindran2, Paul V. Werme3, Michael W. Masters3,
`Behrooz A. Shirazi1, Prashant A. Shirolkar1, Robert D. Harrison3, Wayne Mills3, Tuy Do3,
`Judy Lafratta3, Shafqat M. Anwar1, Steve Sharp4, Terry Sergeant5, George Bilowus4,
`Mark Swick4, Jim Hoppel4, Joe Caruso4
`
`Abstract
`Some classes of real-time systems function in environments
`which cannot be modeled with static approaches. In such
`environments, the arrival rates of events which drive tran-
`sient computations may be unknown. Also, the periodic
`computations may be required to process varying numbers
`of data elements per period, but the number of data ele-
`ments to be processed in an arbitrary period cannot be
`known at the time of system engineering, nor can an upper
`bound be determined for the number of data items; thus, a
`worst case execution time cannot be obtained for such
`periodics. This paper presents middleware services that
`support such dynamic real-time systems through adaptive
`resource management. The middleware services have been
`implemented and employed for components of the experi-
`mental Navy system described in [10]. Experimental char-
`acterizations show that the services provide timely re-
`sponses, that they have a low degree of intrusiveness on
`hardware resources, and that they are scalable.
`
`1. Introduction
`
`rigorous, multi-
`systems have
`real-time
`Many
`dimensional Quality-of-Service (QoS) objectives. They
`must behave in a dependable manner, respond to threats in
`a timely fashion and provide continuous availability within
`hostile environments. Furthermore, resources should be
`utilized in an efficient manner, and scalability must be
`provided to address the ever-increasing complexity of sce-
`narios that confront such systems, even though the worst
`case scenarios of the environment may be unknown (e.g.,
`
`see [6]). This paper describes innovative QoS and resource
`management technology for such systems.
`Our approach is based on the dynamic path paradigm. A
`path-based real-time subsystem (see [11]) typically con-
`sists of a detection & assessment path, an action initiation
`path and an action guidance path. The paths interact with
`the environment via evaluating streams of data from sen-
`sors, and by causing actuators to respond (in a timely
`manner) to events detected during evaluation of sensor
`data streams.
`Most previous work in distributed real-time systems has
`focused on a lower level of abstraction than the path and
`has assumed that all system behavior follows a statically
`known pattern [8, 9]. When applying the previous work to
`some
`applications
`(such
`as
`those described
`in
`[WRHM97]), problems arise with respect to scalability of
`the analysis and modeling techniques; furthermore, it is
`sometimes impossible to obtain some of the parameters
`required by the models. The work described in this paper
`addresses these problems.
`A major difference between the traditional load balanc-
`ing techniques [7] and dynamic QoS-based resource man-
`agement services lies in the overall goals. While load bal-
`ancing systems (see Load-Leveler [5], LSF [12], NQE [2],
`PBS [4], Globus [3], and Condor [1]) attempt to achieve
`system performance goals such as minimized response
`time or maximized throughput, dynamic QoS-based re-
`source managers strive to meet the QoS requirements of
`each application they manage. Another major difference
`between these systems is their workload models. Tradi-
`tional load balancing systems assume independent jobs
`with known resource requirements. In a dynamic resource
`
`
`1 Computer Science & Engineering Dept.; University of Texas at Arlington; Box 19015, Arlington, TX 76019;
`{welch|shirazi}@cse.uta.edu
`2The Bradley Dept. of Electrical and Computer Engineering;Virginia Polytechnic Institute and State University; Blacksburg, VA
`24061;binoy@vt.edu
`3Code B35; Naval Surface Warfare Center,Dahlgren, VA 22448; {WermePV|MastersMW}@nswc.navy.mil
`4Computer Sciences Corporation, Dahlgren, VA 22448
`5Ouachita Baptist University, 410 Ouachita Street, Arkadelphia, AR 71998-0001; sergeantt@alpha.obu.edu
`
`
`Ex.1005 / Page 1 of 5Ex.1005 / Page 1 of 5
`
`TESLA, INC.TESLA, INC.
`
`

`

`management system, the workload requirements of appli-
`cations can vary, based on environmental conditions; ad-
`ditionally, applications are dependent (communicate with
`each other).
`The rest of the paper is organized as follows. Section 2
`provides an overview of a middleware architecture for
`dynamic QoS management of path-based systems, and
`describes the adaptive resource allocation approach em-
`ployed by the middleware. In section 3 we present our
`experiences with the QoS management middleware serv-
`ices. This includes a description of the Navy testbed in
`which the techniques were employed, and experimental
`results characterizing response times of the middleware
`services.
`
`\
`
`Sen -
`sor
`
`filter
`
`eval
`
`act
`
`Actua -
`tor
`
`RT paths
`
`1
`
`metrics
`calculate
`
`8
`
`allocation
`enactment
`
`spec.
`file
`
`2
`
`QoS
`monitor
`
`7
`
`allocation
`analysis
`
`6
`
`resource
`discovery
`
`5
`
`H/W
`metrics
`
`3
`
`QoS
`diagnosis
`
`4
`
`action
`selection
`
`distributed
`hardware
`
`2. Dynamic resource and QoS management
`
`:70  4., ,7.90.9:70 41 90 7084:7.0
`,3/ "4$ 2,3,02039 8419,70
`
`The logical architecture of the QoS management soft-
`ware is shown in Figure 1. It behaves as follows. The ap-
`plication programs of real-time control paths send time-
`stamped events to the metrics calculation component,
`which calculates path-level QoS metrics and sends them to
`the QoS monitor. The monitor checks for conformance of
`observed QoS to required QoS, and notifies the QoS diag-
`nosis component when a QoS violation occurs. QoS diag-
`nosis notifies the action selection component of the
`cause(s) of poor QoS and recommends actions (e.g., move
`a program to a different host or LAN, shed a program, or
`replicate a program) to improve QoS. Action selection
`ranks the recommended actions, identifies redundant ac-
`tions, and forwards the results to the allocation analysis
`component; this component consults resource discovery
`for host and LAN load index metrics, determines a good
`way to allocate the hardware resources in order to perform
`the actions, and requests the actions be performed by the
`allocation enactment component.
`
`Program name, source host name
`Destination host name
`Pathname
`
`Program status
`Program name
`
`Resource
`Manager
`
`Human
`Computer
`Interface
`
`User Commands
`
`Display
`
`User
`
`User commands
`Data Request
`
`Data
`
`Program Info
`Path Info
`Data Requests
`
`Current system status
`Data
`
`PC
`
`Start
`Kill
`
`Program status
`
`Unhealthy sub-paths
`Trend of load
`Pathid
`
`RTCS
`
`Sub-path latency
`Path latency
`Profile information
`
`Path latency
`Sub-Path latency
`Profiling Information
`
`System
`Data
`Repository
`
`Parser
`
`System
`Structure
`
`Host Data
`Network Data
`(CPU, memory,
` throughput, latency)
`
`User
`Specifications
`
`Specification
`File
`
` RTCS Data
`
`Change
`Load
`
`RTCS
`CONSOLE
`
`Display
`
`Software
`Monitors
`
`Program, Load
`Path latency
`Sub-Path latency
`
`Program, Load
`Path latency
`Sub-Path latency
`
`Profile
`Information
`
`Request Hardware
`Info
`
`User
`
`User Commands
`(change load)
`
`Display Hardware
`Info
`
`Hardware
`Monitors
`
`Hardware
`CONSOLE
`
`:70  !8., ,7.90.9:70 41 90 7084:7.0
`,3/ "4$ 2,3,02039 8419,70
`
`The physical QoS management architecture is shown
`in Figure 2. The core component of the middleware is the
`resource manager. It is activated when programs die and
`when time-constrained control paths miss their deadlines.
`In response to these events, it takes appropriate measures
`to improve the quality of service delivered to the applica-
`tions. The reallocations made by the resource manager
`make use of information provided by the hardware and
`software monitors, as well as from a specification file that
`describes QoS requirements and the structures of the soft-
`ware system and the hardware system. The system data
`repository component is responsible for collecting and
`
`
`Ex.1005 / Page 2 of 5Ex.1005 / Page 2 of 5
`
`TESLA, INC.TESLA, INC.
`
`

`

`maintaining all system information. The program control
`(PC) component consists of a central control program and
`a set of startup daemons. When resource manager needs to
`start a program on a particular host, it informs the control
`program, which notifies the startup daemon on that host.
`The HCI provides information to the user regarding the
`system configuration, application status, and reallocation
`decisions. It also allows the operator to dynamically mod-
`ify the behavioral characteristics of the resource discovery
`and the resource manager components.
`
`Program Control
`
`RT
`CS
`
`7
`
`0
`
`Adaptive Resource Management
`Adaptive Resource Management
`5
`
`Resource Manager
`
`8
`QoS Monitor
`QoS Monitor
`QoS Monitors
`10
`
`9
`Hardware Analyzer
`
`6
`
`4
`
`1
`
`RM HCI
`
`System Data
` Repository
`
`3
`Hardware Data Repository
`
`Spec File
`
`Spec Libraries
`
`2
`
`HW Monitors
`
`:70  /,59;0 7084:7.0 2,3,02039 8.0
`
`3,748
`
`Figure 3 depicts the overall architecture of the adaptive
`resource management system. In its current implementa-
`tion, resource management is activated in 3 modes: during
`the initial system start-up process (to start application pro-
`grams), when a path becomes unhealthy (i.e., a path la-
`tency exceeds the required deadline), and when an appli-
`cation program is terminated (due to hardware/software
`faults). A description of each of these resource manage-
`ment modes follows.
`
`The actions performed in start-up mode are as follows:
`1. The System Data Repository loads the user spec file
`via the Spec Libraries, which consist of a compiler
`and data structures to store the compiled real-time
`system and application QoS specifications (step 0 of
`Figure 3).
`2. The System Data Repository sends the system infor-
`mation to Resource Management Human Computer
`Interface (RM HCI) for display purposes (step 1 of
`Figure 3).
`3. Hardware (HW) Monitors continuously observe a
`resource’s load index and pass this information to the
`Hardware Data Repository (step 2 of Figure 3), which
`in turn passes such information to the System Data
`Repository (step 3 of Figure 3).
`
`4. The Resource Manager receives the initial startup
`information from the System Data Repository, as
`specified by the spec file (step 4 of Figure 3).
`5. The Resource Manager informs the Program Control
`to start the Real-Time Control System (RTCS) appli-
`cation programs on the specified hosts (step 5 of Fig-
`ure 3).
`6. The Program Control starts the programs and informs
`the Resource Manger accordingly (step 5 of Figure 3).
`7. The Resource Manger sends startup information to
`RM HCI for display purposes (step 6 of Figure 3).
`8. The RTCS continuously sends the application profile
`(time stamps or program/path latencies) information
`to QoS Monitors (step 7 of Figure 3). Global time is
`made available to RTCS via the Network Time Proto-
`col (NTP) package.
`
`The sequence performed during QoS monitoring and en-
`forcement mode is:
`6.
` If a path becomes unhealthy (misses its deadline), the
`QoS Monitors detect such a condition, diagnose the
`cause of poor health, and suggest an action (such as
`moving or replicating an application program) to the
`Resource Manager (step 8 of Figure 3).
`7. The Resource Manger needs to decide on which
`host(s) and LAN(s) the unhealthy sub-path needs to be
`replicated or moved. This decision is made by
`choosing the host(s) and LAN(s) with the smallest
`load indices. The Hardware Analyzer ranks the hosts
`and LANs in ascending order of their load index and
`passes this information to the Resource Manager (step
`9 of Figure 3).
`8. Once a host is selected, the Resource Manger notifies
`the Program Control to make the change (step 5 of
`Figure 3) and updates the RM HCI accordingly (step 6
`of Figure 3).
`
`
` The actions taken in program recovery mode are described
`below:
`1. If a sub-path (RTCS program) is terminated due to
`some hardware/software failure, Program Control de-
`tects such a condition and informs the Resource Man-
`ager accordingly (step 5 of Figure 3).
`2. The Resource Manger finds the host(s) and LAN(s)
`with the smallest load indices by querying the Host
`analyzer (step 9 of Figure 3).
`3. It then re-starts the terminated program on that host by
`informing the Program Control (step 5 of Figure 3)
`and RM HCI (step 6 of Figure 3) accordingly. In or-
`der to avoid thrashing (restarting a faulty piece of
`software), this step is only repeated an operator-
`determined fixed number of times.
`
`
`Ex.1005 / Page 3 of 5Ex.1005 / Page 3 of 5
`
`TESLA, INC.TESLA, INC.
`
`

`

`Manager. Program Start Detection time (t1’) is the time
`required by the startup daemon to detect the start of the
`program.
`We repeatedly measured the response times of the sur-
`vivability services, and observed that the total average
`response time, Tu, is 3.45 seconds, with a standard devia-
`tion of 0.021435. The data also indicate that the maximum
`time is spent during host resource discovery, followed by
`the time taken by the startup daemons to actually start the
`program.
`Response time measurements were also made for path
`overload detection and overload recovery via automatic
`scalability. The total Scalability response time (Tc) calcu-
`lations are divided into 4 phases: (1) Path Overload data
`transfer time (t2) is the time taken for the overloaded path
`information to reach the resource manager. This interval
`occurs immediately after the overload detection heuristic
`in the subsystem manager detects the overloaded condition
`of the path (t1), (2) Resource Manager processing (t3) is
`the same as in Tu, (3) Scale time (Ts) is the time taken by
`the Program Manager and Startup Daemons to actually
`start a new copy of the program. The scale time (Ts) con-
`sists of three phases, identical to the restart component of
`Tu. In repeated experiments, the average total response
`time, Tc, for scalability services was 4.51 seconds, with a
`standard deviation of 0.020731.
`
`4. Conclusions and future work
`
`This paper describes adaptive resource management
`middleware that provides integrated services for fault tol-
`erance, distributed computing, and real-time computing.
`The underlying system model differs significantly from
`that used in related work. Furthermore, the services have
`been applied to an experimental Navy system prototype.
`Experiments show that the services provide bounded re-
`sponse times, scalable services, and low intrusiveness.
`
`5. Acknowledgements
`
`This work was sponsored in part by DARPA/NCCOSC
`contract N66001-97-C-8250, and by NSWC/NCEE con-
`tracts NCEE/A303/41E-96 and NCEE/A303/50A-98.
`
`3. Experimental results
`
`The technology described in this paper was evaluated
`within the Naval Surface Warfare Center High Perform-
`ance Computing (NSWC HiPer-D) Testbed, which con-
`tains the experimental Navy system described in [10]. The
`implementation includes the following capabilities: (1) a
`simulated track source, (2) track correlation and filtering
`algorithms, (3) track data distribution services, (4) a doc-
`trine server and three types of doctrine processing, (5) an
`engagement server, (6) a display subsystem including X-
`windows based tactical displays, submode mediation, and
`alert routing surface operations, (7) a simulated weapons
`control system, and (8) identification upgrade capabilities.
`The software runs on a heterogeneous network configu-
`ration that includes Myrinet, ATM, FDDI, and ethernet, on
`multiple heterogeneous host platforms, including Dec Al-
`pha's with OSF-1, a Dec Sable with OSF-1, TAC-4's with
`HP-UX, Sun Sparc 10's with Solaris, and Pentiums with
`OSF-1RT.
`We performed experiments to determine the responsive-
`ness of our QoS and resource management middleware for
`survivability and scalability services.
`The total Survivability response time(Tu) calculation is
`divided into four major phases : (1) Program Death De-
`tection time (t1) is the time taken by the Startup Daemon to
`inform the Program Manager of a dead program, (2) Re-
`source Manager Notification time (t2) is the time taken by
`the Program Manager to inform the Resource Manager of
`the dead program, (3) Resource Manager Processing time
`(t3) is the time taken by the Resource Manager to select a
`good host, and (4) Restart time (Tr) is the time taken by the
`Program Manager and Startup Daemons to actually restart
`the program.
`The processing time at the resource manager, t3 , is fur-
`ther decomposed. Preprocessing time (t31) is the time in-
`terval after receipt of a dead program message from the
`program manager and before network discovery begins.
`This time interval is internal to the resource manager.
`Network Resource Discovery (t32) is the time interval re-
`quired to obtain network-level metrics from the network
`controller. Host Resource Discovery (t33) is the time inter-
`val required to obtain host-level metrics from all eligible
`host monitors. Allocation Decision time (t34) is the time
`interval required to choose a good host. This interval is
`internal to the resource manager. Post-processing time (t35)
`is the time interval after finding the best host and before
`sending a program restart instruction to the program man-
`ager. This interval is internal to the resource manager.
`The Restart time, Tr, consists of three phases. Program
`Notification time (t2’) is the time required to inform the
`Program Manager to restart a particular program on a par-
`ticular host. Program Manager to Startup daemon data
`transfer time (t4) is the time required to transfer the restart
`data to the appropriate Startup daemon from the Program
`
`
`Ex.1005 / Page 4 of 5Ex.1005 / Page 4 of 5
`
`TESLA, INC.TESLA, INC.
`
`

`

`6. References
`
`[1]
`
`[2]
`
`[3]
`
`[4]
`
`[5]
`
`[6]
`
`[7]
`
`“Condor Project,”
`http://www.cs.wisc.edu/condor/, 1999.
`
`Cray Research, Document in-2153 2/97, Tech-
`nical report, Cray Research, 1997.
`
`I. Foster and C. Kesselman. “Globus Project,”
`http://www.globus.org/, 1999.
`
`R. Henderson and D. Tweten. “Portable Batch
`Systems: External Reference Specification,”
`Technical report, NASA, Ames Research Cen-
`ter, 1996.
`
`IBM. Corporation. “IBM Load Leveler: User's
`Guide,” Sept. 1993.
`
`Gary Koob, “Quorum,” Proceedings of the
`DARPA ITO General PI Meeting, pages A-59
`to A-87, October 1996.
`
`B. Shirazi, A.R. Hurson, and K. Kavi, “Sched-
`uling and Load Balancing in Parallel and Dis-
`tributed Systems,” IEEE Press, 1995.
`
`[8]
`
`[9]
`
`[10]
`
`[11]
`
`[12]
`
`S. Son, "Advances in Real-Time Systems,"
`Prentice Hall, 1995.
`
`J. Stankovic, and K. Ramamritham, "Advances
`in Real-Time Systems," IEEE Computer Soci-
`ety Press, April 1992.
`
`L. R. Welch, B. Ravindran, R. Harrison, L.
`Madden, M. Masters and W. Mills, “Challenges
`in Engineering Distributed Shipboard Control
`Systems,” The IEEE Real-Time Systems Sym-
`posium, December 1996.
`
`L. R. Welch, B. Ravindran, B. Shirazi and C.
`Bruggeman, “Specification and analysis of dy-
`namic, distributed real-time systems,” in Pro-
`ceedings of the 19th IEEE Real-Time Systems
`Symposium, 72-81, IEEE Computer Society
`Press, 1998.
`
`S. Zhou, “LSF: Load Sharing in Large-scale
`Heterogeneous Distributed Systems,” Proc.
`Workshop on Cluster Computing, 1992.
`
`
`Ex.1005 / Page 5 of 5Ex.1005 / Page 5 of 5
`
`TESLA, INC.TESLA, INC.
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket