throbber
874
`
`Short Notes
`
`IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 5, NO. 8, AUGUST 1994
`
`Low-Latency, Concurrent Checkpointing
`for Parallel Programs
`
`Kai Li, Jeffrey F. Naughton, and James S. Plank
`
`full checkpoint; therefore, if the algorithms exhibit good efficiency,
`high concurrency, and low latency in taking a full checkpoint, taking
`an incremental checkpoint can only exhibit better performance in
`such measures.
`
`short note presents the results of an implementation of
`Abstract-This
`several algorithms for checkpointing and restarting parallel programs on
`shared-memory multiprocessors. The algorithms are compared according
`to the metrics of overall checkpointing time, overhead imposed by the
`checkpointer on the target program, and amount of time during which the
`checkpointer interrupts the target program. The best algorithm measured
`achieves its efficiency through a variation of copy-on-write, which allows
`the most time-consuming operations of the checkpoint to be overlapped
`with the running of the program being checkpointed.
`Zndex Terms-Checkpointing, fault tolerance, copy-on-write, multipro-
`cessing, backward error recovery
`
`I. INTRODUCTION
`This short note presents four algorithms for checkpointing and
`restarting parallel programs running on shared-memory multiproces-
`sors. To test the efficiency of these algorithms, we implemented
`them on the DEC Firefly multiprocessor [29], and profiled their
`performance on five benchmark programs. The algorithms range from
`very simple to more complex, using techniques such as copy-on-
`write [9], [30] and buffering 1261 to realize good performance on the
`multiprocessor. In the best of these algorithms, most of the checkpoint
`is taken in parallel with the target program’s execution, and when it
`does interrupt the target, the interrupts are for small, fixed periods of
`time (under 0.1 s in our implementation).
`The main use of checkpointing is to provide the mechanism
`for performing backward error recovery, a general means of fault
`tolerance defined in [l]. The strength of backward error recovery is
`its ability to provide fault tolerance in the presence of unanticipated
`faults-faults
`that were not envisioned in the design of the system.
`No other means of fault tolerance has this property.
`Checkpointing can also be used as a means of process migration or
`coarse-grained job swapping. This is the intended use of the Condor
`checkpointer [ 191. In fact, one can view backward error recovery as
`merely process migration to the same machine at a different point
`in time. .
`All of the checkpointing algorithms presented take a “full check-
`point”; they checkpoint the entire state of the target program. The
`alternative would be to take “incremental” checkpoints, which save
`only that portion of the state that has changed since the last check-
`point. We have concentrated on full checkpoints to test the worst-
`case behavior of these algorithms. The work involved in taking an
`incremental checkpoint is a subset of the work involved in taking a
`
`Manuscript received July 7, 1992; revised July 9, 1993. This work was
`supported in part by the National Science Foundation under Grants CCR-
`8814265 and IRI-8909795, and in part by the Digital Equipment External
`Research Program and Systems Research Center.
`K. Li is with the Department of Computer Science, Princeton University,
`Princeton, NJ 08544 USA; e-mail: li@princeton.edu.
`J. Naughton is with the Department of Computer Science, University of
`Wisconsin, Madison, WI 53706 USA.
`J. Plank is with the Department of Computer Science, University of
`Tennessee, Knoxville, TN 37966 USA.
`IEEE Log Number 9401208.
`
`11. THE CHECKPOINTING ALGORITHMS
`The goal of a checkpointing algorithm is to establish a recovery
`point in the execution of the program, and save enough information to
`reconstruct the state of the program at this recovery point in the event
`of a failure. In the case of a uniprocessor, a checkpoint can be taken
`by freezing the processor and saving the state of the central processing
`unit (CPU) (i.e., the registers) and the state of memory to disk. Note
`that in saving the state of memory, one need not save the executable
`code itself, as this can be reconstructed from the executable file.
`To restore this recovery point, one merely reads the state of memory
`from disk and then restores the state of the CPU. Thus, checkpointing
`is exactly like generating a core file (which is, of course, a type of
`checkpoint). This is the approach taken by Condor [19].
`To generate a checkpoint for a shared-memory multiprocessor, one
`can do the analogous things: Freeze all the processors, then save
`the states of all of the CPU’s and the state of memory to disk. We
`implemented this simple algorithm and called it sequential, because
`it performs none of its work in parallel with the program that it is
`checkpointing.
`We set three goals for a good checkpointing algorithm on a
`multiprocessor.
`1) It must be reasonably efficient.
`2) It must impose little overhead on the target program. In other
`words, it should attempt to be maximally concurrent.
`3) What overhead it does impose must be of low latency; that is,
`it should interrupt the target program for only small periods
`of time.
`We believe that reasonable values for these goals are as follows.
`The overall checkpoint time should be no more than twice the
`optimal checkpoint time. The checkpointing should add no more
`than 20% to the running time of the program during the time that
`it is checkpointing. The latency of interrupts should be kept below
`0.1 s. We chose this value because any more might be perceived as
`noticeable to the user watching the program’s execution.
`The sequential checkpointing algorithm is clearly optimal in terms
`of overall checkpoint time, because it is limited solely by the duration
`of the disk writes. However, it has zero concurrency and is 100%
`latency. The second algorithm that we implemented attempts to
`improve the concurrency of checkpointing. We call it main memory
`checkpointing, because it freezes the processors, saves the checkpoint
`into a separate address space in main memory, and then restarts
`the processors. After starting the processors, the algorithm forks a
`new thread in the new address space that writes the checkpoint to
`disk. In this algorithm, the concurrency of checkpointing should be
`improved over the sequential algorithm, because the execution of the
`target program is overlapped with the writing of the checkpoint to
`disk.
`To improve the latency of the main memory algorithm, we im-
`plemented a third algorithm, which uses copy-on-write [9], [30] to
`make the main memory checkpoint. Copy-on-write is a technique
`that uses a processor’s virtual memory page protection hardware to
`
`1045-9219/94$04.00 0 1994 IEEE
`
`

`
`IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 5 , NO. 8. AUGUST 1994
`
`875
`
`Globlr. El
`
`m%
`
`...
`Pc
`
`Address
`
`Space
`
`. . . . . . . . .
`K
`P
`C
`K
`CPU
`states
`Sequential Algorithm
`
`
`
`NewAddress
`Address
`CPU
`Space
`States
`Space
`Main Memory Algorithm
`
`.............. C
`.............. C
`.............. c
`.............. C
`.............. C
`.............. C
`...............
`.............. c
`.............. C
`
`.............. C p-
`
`. . .
`
`......... --
`._..-. . ......- F*
`
`.... __..
`_ _ _ _ . - -.--
`
`.........-
`....
`.... .........-
`
`pc ...
`
`...
`
`P a p
`Pool
`
`.... I
`..... .!
`i
`
`Adbrss
`Space
`
`NewAddress
`CPU
`Space
`States
`Copy-on-write Algorithm
`
`Address
`Space
`
`CPU
`States
`CLL Algorithm
`
`c Bytes coped I written when all processors are frozen
`................ D Bytes copied I written after the proccssors are restarted
`Bytes written after the main memory checkpoint is complete
`-------+
`Fig. 1. The four checkpointing algorithms.
`
`make a memory-to-memory copy with low latency. The copy-on-
`write checkpointing algorithm works as follows: First, it freezes the
`processors and copies their CPU states to the separate address space.
`Also, it sets the page protection bits of all pages in main memory to
`be “read-only.” Next it unfreezes the processors and starts a separate
`copier thread that copies pages to the new address space and resets
`the pages’ protection to “read-write.’’ If a thread of the target program
`generates a page access violation, then it must write that page to the
`new address space before setting its protection to “read-write” and
`restarting. When the copier thread is done copying, the main memory
`checkpoint is complete, and the copier thread writes the checkpoint
`to disk.
`Finally, we implemented a fourth algorithm which we call con-
`current, low-latency (CLL) checkpointing. It improves upon the
`copy-on-write algorithm by adding buffering, a standard operating
`systems technique usually used to hide disk latency during file system
`writes. What the buffering does in this case is allow the checkpoint
`to be written to disk at the same time as it is being copied from
`memory. Specifically, the CLL algorithm allocates a fixed pool of
`pages (the buffer) in the second address space that the copier thread
`and page fault handlers fill, and that a new thread called the writer
`thread empties by writing the pages to disk. The writer and copier
`threads are both started immediately after the processors are unfrozen.
`All four algorithms are shown graphically in Fig. 1 .
`The improvements of the CLL algorithm over the copy-on-write
`algorithm should be twofold. First, the extra memory requirements of
`this scheme are fixed; they are the size of the page pool. The copy-
`on-write scheme needs extra space that is the same size as the target
`program’s address space, and thus is more likely to cause thrashing.
`Second, the overall checkpoint time of the CLL algorithm should be
`less than that of the copy-on-write algorithm. This is because the CLL
`algorithm starts writing to disk as soon as the processors are restarted,
`
`instead of waiting until a complete main-memory checkpoint has been
`made.
`A possible concern of the CLL algorithm is what happens when
`the page pool fills up. Then pages in the pool are freed only as fast as
`they can be written to disk. If the size of the page pool is chosen to
`be the amount of available physical memory, then the CLL algorithm
`should still outperform the copy-on-write algorithm for the following
`reason. In the copy-on-write algorithm, pages might be swapped to
`the swap area on disk so that the checkpoint can fit into main memory.
`If these pages are part of the checkpoint, then they must be swapped
`back into memory to be written to the checkpoint file. If they are part
`of the target program, then they will eventually have to be swapped
`back into memory when the program needs them. In either case, the
`copy-on-write algorithm performs an extra disk write and read for
`each swapped page, whereas the CLL algorithm needs no swapping
`and therefore performs no extra disk writes. This extreme case for
`both algorithms is tested in our implementation.
`111. RECOVERY
`Recovering from a checkpoint is straightforward, and is the same
`for all four algorithms. The processors are frozen, and the contents of
`main memory are replaced with the contents saved in the checkpoint.
`The states of the CPU’s are restored to their states at the recovery
`point, which are also saved in the checkpoint. When the processors
`are restarted, execution of the target program continues from the
`recovery point.
`
`IV. IMPLEMENTATION
`We have implemented all four checkpointing algorithms as well
`as recovery on a DEC Firefly multiprocessor. The Firefly is an
`experimental shared-memory multiprocessor developed at the DEC
`System Research Center [29]. A Firefly consists of four CVAX
`
`

`
`876
`
`IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 5 , NO. 8, AUGUST 1994
`
`processors, each with a floating point unit and a direct-mapped 64
`kilobyte cache. The caches are coherent, so that all processors within
`a single Firefly see a consistent view of shared memory. The operating
`system for the Firefly is Taos [20], an Ultrix with threads and cheap
`thread synchronizations.
`The Firefly upon which we tested our algorithms has 16 megabytes
`of physical memory, and a separate input-output (1-0) processor that
`shares memory with the other four processors, but also has a separate
`bus connecting the Firefly to 1-0 devices and the outside world. In
`our system, the 1-0 processor is connected to an RD54 disk drive.
`The implementation is written in Modula-2+ [24], an extension of
`Modula-2. The user interface to the checkpointing routines is a single
`call to setup-checkpoint() at the beginning of the user’s program.
`This call is used either to restore the program’s execution to a saved
`recovery point or to specify how long the checkpointer should wait
`before interrupting the program to take a checkpoint. Freezing the
`processors is provided by Taos through a system call. In the absence
`of such a system call, one could freeze the processors either through
`interprocessor interrupts, or by protecting all the pages in memory
`to be “no access,” and gaining control of the processors in the page
`fault handlers.
`In taking a checkpoint only the user’s address space is saved.
`The states of the kernel and the file system are not saved. The
`ramifications of this decision are that constructs that rely on kernel
`and external state, such as remote procedure calls and open file
`pointers, are not guaranteed to be recoverable. Thus, the programs that
`we tested were bereft of such constructs. We view these restrictions
`as tolerable for two reasons. First, the goal of our implementation
`is to examine the performance of checkpointing algorithms in regard
`to the metrics of speed, concurrency, and latency. The goal is not to
`write a production-level checkpointing system for the Firefly. Second,
`recoverable kernels have been studied and implemented [6], [21], as
`have checkpointers for uniprocessors that either provide recovery for
`read-only and sequential read-write files [19], or rewrite the UNIXTM
`file system to be completely recoverable [28]. We cannot justify
`taking the time to duplicate this work on Taos when the result is
`so tangential to our experiments.
`One of the variables in our implementation is the page size.
`Although the actual page size on the Firefly’s memory management
`unit is 512 bytes, we emulate different page sizes by varying the
`number of bytes that are copied to the second address space by
`the copier thread and by the page fault handlers. Larger pages will
`increase the time required to handle a fault, but they will also decrease
`the number of faults, and because of locality of reference, they may
`also decrease the rapidity at which faults occur.
`
`V. EXPERIMENTS
`For our initial experiments, we tested all four checkpointing
`algorithms on a parallel implementation of merge sort. We also
`checkpointed four other parallel programs: traveling salesman, matrix
`multiplication, pattern matching, and bubble sort. Since the results
`from experiments with these programs were so similar to the results
`with merge sort, we do not present them here. The merge sort program
`sort 250000 indexed records, where the record size can be changed
`to modify the size of the program’s address space.
`In all experiments, the four processors of the Firefly are partitioned
`so that the target program uses three and the checkpointer uses one.
`This is to measure the maximal concurrency of our checkpointing
`methods. The checkpointer waits for the target to run for 10 s, and
`then it takes one complete snapshot. For the results presented here,
`the page size is 8 kilobytes, and the page pool size is 1 megabyte.
`All times represent wall-clock time when the target program and
`
`4 Squential + Copy-on-Hate
`t M a i n M m o r y +CLL
`’0 1
`
`I
`0
`
`I
`
`I
`
`1
`8
`6
`4
`2
`Address Space Size (MBptes)
`Graph 1. Checkpoint time.
`
`I
`
`I
`
`I
`0
`
`
`
`-0- Sequential t Copy-on-Hate
`t M A Memory + CLL
`’O0 1
`t
`I
`
`7 5w :L
`
`0
`
`d
`
`0
`
`8
`2
`6
`1 0 1 2
`4
`A d d m Spce Size (MByta)
`Graph 2. Checkpoint time for large address space.
`
`4 Squential t Copy-on-write
`t Main Memory + CLL
`
`o
`
`i
`i
`i
`i
`A d d m Space S h (MEyta)
`Graph 3. Checkpoint overhead.
`
`i
`
`o
`
`
`
`checkpointer have exclusive use of the system. All times are averages
`of five or more runs.
`Graphs 1 thorough 4 display the overhead imposed by the four
`checkpointing algorithms, as a function of the size of merge sort’s
`address space. The total checkpoint time displayed in Graph 1
`measures the elapsed time from the start of the checkpoint to its
`conclusion. Graph 2 extends Graph I to include the checkpoint time
`when the address space approaches the size of physical memory. The
`checkpoint overhead in Graph 3 is the amount of time by which
`the checkpoint increases the target’s running time. Graph 4 displays
`the overhead as a percentage of the checkpoint’s running time. This
`is equal to checkpoint overhead divided by the checkpoint time.
`Concurrency can be calculated as follows:
`concurrency = 100% - overhead.
`
`

`
`-
`
`+ Sequential + Copy-on-write
`-e Main Memory -D- CLL
`=
`" 7
`
`0
`
`' 0 1
`
`Avenge Trap Time I 0.015 seconds
`
`IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 5, NO. 8, AUGUST 1994
`
`877
`
`70
`
`1
`8
`6
`4
`2
`Address Space Size W y t e s )
`Graph 4. Checkpoint overhead percentage.
`
`0
`
`0
`
`
`
`Initial Stop Time
`Maximum Fault Time
`
`-o-
`
`0.0
`
`1.5
`2 5
`0 5
`1.0
`2.0
`Starting Time of 0.1 second interval (see)
`Graph 6. Frequency of page faults.
`
`s a l -
`
` -.---
`
`600
`-:1
`
`J
`
`/
`
`0 .o
`0.01';
`
`0 0
`
`Graph 5. Latency data.
`
` . , . I
`
`.
`
`,
`I
`6 6
`
`
`2 2
`
`4 4
`11
`00
`11
`88
`
`
`
`
`Address S p e c Size W y t e s )
`
`r
`
`,
`
`.
`
` I
`
`22
`
`.
`
`
`
`
`
`
`
`32
`
`64
`Page Slze (1LByte5)
`Number of page faults vs. page size.
`
`Graph 7.
`
`%
`
`lzs
`
`It comes as no surprise that in all four algorithms, checkpoint time
`is proportional to the size of the address space. Also as expected, the
`sequential algorithm records the fastest checkpointing time, followed
`by the CLL algorithm. The other two algorithms take longer to
`record a checkpoint, because they wait for a complete copy of the
`checkpoint to exist in main memory before writing the checkpoint to
`disk. Of these two algorithms, the copy-on-write algorithm takes the
`longest, because of the extra work it spends processing page faults,
`and because it copies a page at a time.
`As shown in Graph 2, when the address space approaches the size
`of physical memory, the main memory and copy-on-write algorithms
`exhibit severe thrashing, because their memory needs far exceed
`the size for physical memory. The other two algorithms keep their
`memory needs below the size of physical memory, and therefore do
`not suffer such rash penalties. It is worth noting that for all but the
`smallest address space tested, the pool of pages in the CLL algorithm
`became completely filled. Therefore, some worst-case data is included
`in the graphs.
`Graphs 3 and 4 show that the two algorithms based on copy-
`on-write display the smallest overhead and therefore the greatest
`concurrency. This is because these algorithms freeze the processors
`for the smallest amount of time. Taken as a whole, Graphs 1 through
`4 show that the CLL algorithm is clearly the best of the four with
`regard to the combination of checkpoint time and concurrency. The
`results that follows pertain only to this algorithm.
`Graphs 5 and 6 show latency data for the CLL algorithm. The
`overhead of checkpointing is divided into two parts: the time that all
`the threads are stopped initially to protect the address space and save
`the threads' states, and the time that the target threads are trapped,
`waiting to process page faults. The first curve in Graph 5 represents
`
`the initial stop time as a function of address space size, and the second
`represents the maximum time that any thread waits as a result of a
`page fault.
`For address spaces up to 3 megabytes, the initial stop time is kept
`below 0.1 s. Moreover, for all address space sizes, the maximum
`page fault time is well below our low-latency goal of 0.1 s. Graph
`6 displays the frequency of page faults over time for a run with a
`4-megabyte address space. In this graph, the checkpoint is broken
`into 0.1 s intervals, and the number of page faults in each interval
`is plotted. The purpose of the graph is to show that work is indeed
`being accomplished by the target threads during the initial phases of
`the checkpoint.
`Note that after an initial burst of nine page faults, the trapping
`frequency steadies at six faults per 0.1 s for the first second. Then
`it slows to about four traps per 0.1 s, until there are no more page
`faults. The average time to process a page fault is 0.015 s. Thus,
`during the first second of the checkpoint, the threads spend about
`0.09 d0.1 s interval processing page faults. Since there are three
`target threads, this means that the threads spend only one-third of
`their time processing page faults in the first second; the rest of the
`time is devoted to completing the merge. Thus, even at the beginning
`of the checkpoint, when one expects the highest frequency of page
`faults, the target still performs an adequate amount of work.
`Graphs 7 and 8 display the results of altering the page size, again
`for a merge sort example with a 4-megabyte address space. As would
`be expected, the total number of page faults is proportional to the
`inverse of the page size, whereas the maximum time to process a
`trap increases almost linearly with the page size. Therefore, the ideal
`page size is one that significantly decreases the number of page faults
`while not significantly increasing the maximum page fault time.
`
`

`
`878
`
`IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 5, NO. 8, AUGUST 1994
`
`Of special note is a recent implementation by Elnozahy, Johnson,
`and Zwaenepoel [7]. They implement distributed checkpointing and
`recovery on a network of sixteen Sun 3/60’s with a centralized file
`server. They implement the sequential and CLL algorithms, as well
`as both incremental and nonincremental checkpointing. They show
`that incremental checkpointing can reduce the amount of data being
`checkpointed by up to 97%. More relevantly to this short note, they
`corroborate our results by showing that the CLL significantly lowers
`overhead for incremental checkpointing. They do not show any results
`conceming the CLL algorithm for nonincremental checkpointing. The
`idea of using virtual-memory access protection hardware to achieve
`synchronization for the concurrent checkpointing was motivated by
`both shared virtual memory [ 161 and real-time, concurrent garbage
`collection [2].
`
`VII. CONCLUSION
`We have presented and implemented a low-latency concurrent
`algorithm for checkpointing parallel programs on stock shared-
`memory multiprocessors. The algorithm requires a constant amount
`of extra space, no change to the target parallel programs, and no
`special hardware assistance. Our experiment shows that this algorithm
`meets our performance goals on all five benchmarks: 80% to 97%
`of its checkpointing executes concurrently with the target programs,
`checkpoint time is always within 50% of optimal, and the latency is
`kept under 0.1 s. These goals were met by applying the techniques
`of copy-on-write [9], [30] and buffering [26] to the checkpointer.
`Our algorithms are concerned solely with taking one snapshot
`with no prior history of the target’s execution. For programs with
`large virtual address spaces, recording the changes between snapshots
`should be much more efficient than taking each snapshot separately.
`In the future, our scheme can be combined with [8] to use dirty
`page information and calculate snapshots incrementally. Such a
`method would not impose a large initial stop time. Moreover, the
`checkpointing time will be reduced because pages that have not been
`changed since the last snapshot will not be brought into physical
`memory and written out to disk. Results of distributed checkpointing
`from [7] support these assertions.
`
`ACKNOWLEDGMENT
`
`We would like to thank G. Swart for his help with modifying Taos,
`and M. Theimer for his helpful comments.
`
`REFERENCES
`
`T. Anderson and P. A. Lee, Fault Tolerence: Principles and Practice.
`Englewood Cliffs, NJ: Rentice-Hall International, 198 1.
`A. W. Appel, J. R. Ellis, and K. Li, “Real-time concurrent collection
`on stock multiprocessors,” In ACM SIGPLA”88 Con5 Programming
`Language Design Implementation, 1988, pp. 1 1-20.
`K. Mani Chandy and L. Lamport, “Distributed snapshots: Determining
`global states of distributed systems,” ACM Trans. Compuf. Syst.. vol. 3,
`no. 1, pp. 3-75, Feb. 1985.
`F. Cristian and F. Jahanain, “A timestamp-based checkpointing protocol
`for long-lived distributed computations,” in Proc. 10th Symp. Reliable
`Distrib. Syst., 1991, pp. 12-20.
`D. J. DeWitt, R. H. Katz, F. Olken, L. D. Shapiro, M. R. Stonebraker,
`and D. Wood, “Implementation techniques for main memory database
`systems,” in Proc. ACM SIGMOD Inf. Con$ Management Data. 1984,
`pp. 1-8.
`F. Douglis and J. Ousterhout, “Process migration in the sprite operating
`system,” in Pmc. 7th In?. Con$ Distrib. Computing Syst., 1987, pp.
`18-25.
`E. N. Elnozahy, D. B. Johnson, and W. Zwaenepoel, “The performance
`of consistent checkpointing,” in Pmc. I Ith Symp. Reliable Distrib. Sysr.,
`1992.
`
`Page Size (KBytes)
`Graph 8. Maximum fault time vs. page size.
`
`These data show that a page size of 16 kilobytes is ideal. The
`number of page faults is kept relatively small (around 120 faults),
`and the maximum page fault time is still below 0.1 s. Note that this
`large page size has one other advantage: If the hardware were built
`with an actual page size of 16 kilobytes, then upon protecting the
`address space, it would have to change only $ the number of page
`table entries that it currently has to change. This should reduce the
`initial stop time (the first curve in Graph 5) by the same factor, which
`would bring it to well under 0.1 s for all address space sizes.
`We omit the data for the results of checkpointing for the other four
`benchmark programs (traveling salesman, matrix multiplication, pat-
`tern matching, and bubble sort), except to say that their performance
`in all cases was the same or better than merge sort examples with
`similar address space sizes.
`
`VI. RELATED WORK
`The bulk of work on and implementations of backward error
`recovery and fault tolerance in parallel and distributed systems
`has been in database and transaction-processing systems [5], [ 101,
`[17], [22], [25]. These schemes benefit from the fact that database
`computations can be viewed as consisting of atomic transactions.
`Since we concentrate on general-purpose parallel programs, no such
`computational model can be assumed.
`General-purpose checkpointing has been studied and implemented
`on both uniprocessors and distributed systems. Proposals and
`overviews for uniprocessor checkpointing have been provided by
`[l], [15], [23]. In [28], a backward recovery implementation is
`described that focuses on a file system for UNIX that is fully
`recoverable. Reference [ 191 describes a portable checkpointing
`system called “Condor,” which runs on any commercial uniprocessor
`and successfully checkpoints a majority of UNIX applications for
`the purpose of process migration. Neither implementation attempts
`to provide concurrency or low latency.
`There has been much work on designing checkpointers for dis-
`tributed systems [l], [3], [Ill-[14] and for multicomputers [4],
`[18]. Here the focus of the work is on establishing a consistent
`recovery point, that is, either synchronizing the processors to define a
`global recovery point or postprocessing the processors’ checkpoints to
`rebuild a plausible recovery point of the system. This is not a problem
`in shared-memory multiporcessor checkpointing, because the memory
`bus provides a simple place to enforce processor synchronization.
`Staknis proposed a new memory design called sheaved memory
`[27] for supporting checkpointing in paged systems. In a sheaved
`memory, physical page frames can be bundled together, so that
`data written to one frame in the bundle is simultaneously written
`to all frames in the bundle. Removing a frame from its bundle
`would provide a snapshot of that memory page. Building such a
`memory would be quite costly, and it probably would be used only
`in special-purpose machines.
`
`

`
`IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 5 , NO. 8, AUGUST 1994
`
`879
`
`[8] S. Feldman and C. Brown, “A system for program debugging via
`reversible execution,” ACM SIGPLAN Notices, Workshop on Parallel
`Distrib. Debugging, vol. 24, pp. 112-123, Jan. 1989.
`[9] R. Fitzgerald and R. F. Rashid, “The integration of virtual memory
`management and interprocess communication in accent,” ACM Trans.
`Comput. Syst., vol. 4, no. 2, pp. 147-177, May 1986.
`[IO] R. B. Hagmann, “A crash recovery scheme for a memory-resident
`database system,” IEEE Trans. Comput., vol. C-35. no. 9, pp. 839-843,
`Sept. 1986.
`[ I l l D. B. Johnson and W. Zwaenepoel, “Recovery in distributed systems
`using optimistic message logging and checkpointing,” J. Algorithms,
`vol. 11, pp. 462-491, Sept. 1990.
`[I21 M. F. Kaashoek, R. Michiels, H. E. Bal, and A. S. Tanenbaum,
`“Transparent fault-tolerance in parallel Orca programs,” Tech. Rep.
`IR-258, Vrije Univ., Amsterdam, the Netherlands, Oct. 1991.
`[I31 R. Koo and S. Toueg, “Checkpointing and rollback-recovery for dis-
`tributed systems,” IEEE Trans. Sofrware Eng., vol. 13, no. 1, pp. 23-31,
`Jan. 1987.
`[ 141 T. H. Lai and T. H. Yang, “On distributed snapshots,” Inform. Processing
`Lett., vol. 25, pp. 153-158, May 1987.
`[I51 B. M. Lampson, “Atomic transactions,” in B. M. Lampson, M. Paul, and
`H. J. Siegert, Eds., Distributed Systems: Architecture and Implementa-
`New York: Springer-Verlag, 1981, pp. 246264.
`fion.
`[I61 K. Li and P. Hudak, “Memory coherence in shared virtual memory
`systems,” ACM Trans. Comput. Syst. vol. 7, pp. 321-359, Nov. 1989.
`[I71 K. Li and J. F. Naughton, “Multiprocessor main memory transaction
`processing,” in Proc. Int. Symp. Database in Parallel Distrib. Syst., 1988,
`pp. 177-187.
`[I81 K. Li, J. F. Naughton, and J. S. Plank, “An efficient checkpointing
`method for multicomputers with wormhole routing.” Inr. J. Parallel
`Processing, vol. 20, no. 3, June 1992.
`[I91 M. Litzkow and M. Solomon, “Supporting checkpointing and process
`migration outside the UNIX kernel,” in ConJ Proc., Usenix Winter 1992
`Tech. Con$, 1992, pp. 283-290.
`[20] P. R. McJones and G. F. Swart, “Evolving the UNM system interface
`to support multithreaded programs,” Tech. Rep. 21, DEC Syst. Res.
`Center, Sept. 1987.
`1211 J. K. Ousterhout, A. Cherenson, F. Douglis, M. Nelson, and B. Welch,
`“The sprite network operating system,” IEEE Comput., vol. 21, pp.
`23-36, Feb. 1988.
`[22] C. Pu, “On-the-fly, incremental, consistent reading of entire databases,”
`in Proc. 11th Int. Con5 Very Large Databases, 1985, pp. 369-375.
`[23] B. Randell, “System structure for software fault tolerance,” IEEE Trans.
`Sofmare Eng., vol. SE-I, no. 2, pp. 22G232, 1975.
`[24] P. Rovner, R. Levin, and J. Wick, “On extending modula-2 for building
`large, integrated systems,” Res. Rep. 3, DEC Syst. Res. Center, 1985.
`[25] K. Salem and H. Garcia-Molina, “Checkpointing memory-resident
`databases,” Tech. Rep. CS-TR-126-87, Dept. of Comput. Sci., Princeton
`Univ., 1987.
`[26] A. Silberschatz and J. L. Peterson, Operating Systems Concepts. Read-
`ing, MA: Addison-Wesley, 1988.
`[27] M. E. Staknis, “Sheaved memory: Architectural support for state saving
`and restoration in paged systems,” in 3rd Int. Con5 Architectural Support
`Programming Languages and Operating Syst., 1989, pp. 96-103.
`[28] D.J. Taylor and M. L. Wright, “Backward error recovery in a UNIX
`environment,” in 16th Ann. Int. Symp. Fault-Tolerant Computing Syst.,
`1986, pp. 118-123.
`[29] C. P. Thacker and L. C. Stewart, “Firefly: A multiprocessor worksta-
`tion,” in Proc. 2nd Int. ConJ Architectural Support for Programming
`Languages and Operating Syst., 1987, pp. 164172.
`[30] M.M. Theimer, K. A. Lantz, and D.R. Cheriton, “Preemptable remote
`execution facilities for the v-system,” in Proc. loth ACM Symp. Oper-
`ating Syst. Principles, 1985, pp. 2-1 1,
`
`Lower and Upper Bounds on Time for
`Multiprocessor Optimal Schedules
`
`Kamal Kumar Jain and V. Rajaraman
`
`lower and upper bounds on the minimum time needed
`Abstract-The
`to p

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket