throbber
A Survey of Some Theoretical Aspects of Multiprocessing J. L. BA£R Umvers~ty of Washington, Seattle, Washington 98195 In this paper, several theoretical aspects of multiprocessing are surveyed. First, we look at the language features that help in exploiting parallelism. The additional instructions needed for a multiprocessor architecture; problems, such as mutual exclusion, raised by the concurrent processing of parts of a program; and the extensions to existing high-level languages are examined. The methods for automatic detection of parallelism in current high-level languages are then reviewed both at the inter and intra statement levels. The following part of the paper deals with more theoretical aspects of multiprocessing. Different models for parallel computation such as graph models, Petri nets, parallel flowcharts, and flow graph schemata are introduced. Finally, prediction of performance of multiprocessors either through analysis of models or by simulation is examined In an appendix, an attempt ~s made toward the classification of existing multlprocessors Key Words and Phrases" multlprocessing, mutual exclusion, semaphores, automatic detection of parallehsm, graph models, Petri nets, flow graph schemata, scheduling, array processors, pipe-line computers. CR Categories: 6.0, 8 1, 4.32, 5.24 INTRODUCTION In this paper we plan to survey some theo- retical aspects of multiprocessing. By definition, multiprocessing is the simulta- neous processing of two (or more) portions of the same program by two (or more) processing units. Among the latter, the I/O processors are excluded; that is, the over- lapping of I/O operations with arithmetic or logical instructions is not considered as multiprocessing in this paper. Multiproc- essing must not be confused with multi- programming, which is the time and re- source sharing of a computer system by two (or more) programs resident simultaneously in primary memory. Sometimes we shall use the term parallel processing for either of the above or a combination of both. If one considers a yon Neumann machine as the model for computer systems, there is no chance of parallel processing; then one module in the machine takes over control, all others stand idle. However, the concept of the von Neumann machine has evolved, and the first breaches were an overlap of I/O operations with CPU computations (which led to multiprogramming) and look- ahead features. Then, keen interest in systems ~ith multiple CPU's developed. An increasing number of applications calls for more computing power. A classical example is the weather prediction problem with its stringent deadline requirement. In fact, some computations that should be thought of as parallel (e.g., addition of two vectors) have been transformed into se- quential algorithms only because of the restrictions imposed by the architecture of the machines. Thus, multiprocessors seem to be a n~tural expansion of uniprocessors. Some multiprocessors can be viewed simply as extensions of uniprocessors; that is, dif- ferent jobs are assigned to different CPU's, but no job may use more than oneCPU. Obvi- ous advantages of this design are: speed-up of global throughput; improved reliability with the possibility of graceful degradation; and Computing Surveys, Vol. 5, No. 1, March 1973
`
`AA/SWA Ex. 1010, p.1 of 50
`American Airlines, et. al. v. Intellectual Ventures, et.al.
`IPR2025-00785
`
`

`

`32 • J. L. Baer CONTENTS I Introduction 31-33 IL Language Features to Exploit Multaprocessmg 33-38 A Explicit Indication of Parallehsm B The Lock-Out Problem C New Programnnng Language Concepts Ill Automatic Detectlon of Parallclmm 38-46 A Detection of Parallehsm Within an Arithmetic Expression B Detection of Parallehsm Between Statements Conchtions for Statements Independency Automatic Program Analysm Dmcovering the Hidden Para}lehsm Loop Rephcation Experimental Results IV Models of Parallel Computations 46-57 A Generahties Relatlve to the Modehng of Parallel Computatlons B. A First Look at Directed Graph Models Directed Acyclic Bdogle Graphs Petri Nets and Extensmns C Graph Models With Extensive Control D Models Introducing Data Queueing on the Ares E Parallel Program and Flow Graph Schemata F Other Models and Current Research V Prechction of Performance 57-44 A Analysis Queuemg Theory Graph Model Analysis B Gross Level Simulation C Free Level Simulation VI Conclusaon 64-6/; Acknowledgment Blbhography 65-72 Appendix' Hardware Systems 72-80 A Homogeneous Multlproccesors Crossbar Switch System Multiple-Bus Connected Systems Time-Shared B~a System B Nonhomogeneous Multiprocesmng Systems C Array Processors D Pipe-Line Computers Copyright © 1973, Association for Computing Machinery, Inc. General permiss]on to republish, but not for profit, all or part of this material is granted, provided that ACM's copymght notice is given and that reference is made to this pubhca- tlon, to its date of issue, and to the fact that re- printing privileges were granted by permission of the Association for Computing Machinery. decreased overhead in switching among jobs in a multiprogramming environment. Also, one has to note that, while in the early days of computing a large percentage of the dollar value of a system resided in the circuits of the arithmetic and logical units, this is no longer true with the advent of integrated circuits and LSI; furthermore, replication of CPU's does not increase the price of a system as significantly as before. These technological advances allow more easily and economically the building of special-purpose computers--or at least units --which lend themselves to (particular) cases of multiprocessing. We shall start our survey (Section II) by looking first at the necessary changes in the instruction set of multiprocessor machines, and then at the modifications or extensions induced in the high-level languages for programming mtrltiprocessors. Problems of concurrent control and new language de- velopments will be examined. Section IlI will deal with automatic recognition of parallelism in a program written for a c(~n- ventional machine. Both the instruction level and the statement level of languages such as FORTRAN and ALGOL-60 will be examined. The next section will be devoted to modeling of computations for parallel processing. Graph models and program schemata have been used widely, and they will be classified according to their structural properties, such as logic, interpretation, queueing possibilities, etc. Section V will survey the uses of models for prediction of performance; we shall distinguish between analytical prediction through queueing or graph theory, a priori scheduling, and full- fledged simulations. Finally, we shall con- clude with some remarks on the future of multiprocessing. In the appendix we have classified multi- processors into four categories: 1) homogeneous multiprocessors; i.e., sys- tems composed of a set of ]dentical CPU's. 2) non-homogeneous multlprocessors; i.e., systems for which the arithmetic-logical unit is a set of special-purpose functional units. 3) array processors; i.e., systems composed Computing Surveys, Vol 5, No I, March 1973
`
`AA/SWA Ex. 1010, p.2 of 50
`American Airlines, et. al. v. Intellectual Ventures, et.al.
`IPR2025-00785
`
`

`

`Theoretical Aspects of Multiprocessing • 33 of a set of identical CPU's (called proc- essing elements), acting synchronously under the control of a common broad- casting unit. 4) pipe-line processors; i.e., systems in which the CPU has the capability of performing operations on streams (vec- tors) of operands. These terms are used throughout the paper. Our survey covers the literature up to the first half of 1971, with a few references to later papers. An attempt has been made to keep the bibliography up-to-date. II. LANGUAGE FEATURES TO EXPLOIT MULTIPROCESSING Since multiprocessing adds a new dimension in the power of a computer system, it must be reflected in the instruction set of the machine. Furthermore, assembly language macros and extensions to high-level lan- guages must exist to allow programmers to ask explicitly for parallel paths if they choose to do so. It is also reasonable to investigate whether totally new program- ming language concepts ought not be de- fined to take advantage of the multiproc- essing design. In this section we first review proposals and existing features of assembly and high-level languages that permit ex- plicit indication of parallelism. This leads necessarily to the "lock-out" problem; that is, how parallel paths avoid destroying their shared data. We end this section by looking at proposals for new programming languages geared to parallel processing. A. Explicit indication of parallelism In some array and pipe-line systems, such as ILLIAC IV, VAMPS, STAR-100, ASC*, the instruction repertoire possesses vector in- structions which allow the treatment, in parallel, of ordered sets of data. Here, we are mainly interested in path pazallelism, that is, the concurrent execution of two different parts, or of the same part with different data, of the program; this has been called "the problem of flowcharting and programming the actual links" [GILL 58] * See Appendix. for a multiprocessor with asynchronous events. The basic concept at the assembly level is that of FORK-JOIN [CONW 63, DENN 66], which is also called SIMU [DREI 58A], SPLIT-ASSEMBLE [LEHM 66], and START-HALT [WIRT 69]. Conway [CONW 63l defines five possible uses of the FORK-JOIN combination. They are: FORK A : 1. Initiate process at address A. 2. Continue current process at next in- struction. FORK A,J : 1 and 2 above. 3. Increment counter at address J. FORK A,J,N : l and 2 as above. 3. Set counter at ad- dress J to N. JOIN J : 1. Decrement counter at address J. If zero, initiate proc- essing at addre~ J+l; else 2. Release the proc- essor executing the JOIN. JOIN J,B : 1. As above. 2. The original value of J is kept and a branch to B (i.e., a wait) is initiated. Application of these instructions for the control of three concurrent processes is shown in Figure 1. These instructions do not allow a path to terminate per se, that is, without coming to a junction point. QUIT in [DENN 66] or TERMINATE in [LEHM 66] play this role. Equivalent extensions to ALGoL-like languages have been proposed: DOTOGETHER-HOLD [OPLE 65]; FORK-JOIN-TERMINATE [ANDE 65]; PARBEGIN-PAREND [DIJK 68] AND [WIRT 66] (for optional forking). All of these work on a local block basis. Gosden [GOSD 66] states that most paral- lelism is to be found in loops and, thus, defines five primitives, some of which will allow him to implement easily PARALLEL FOR statements. They are: Computing SurveyJ, Vol. 5, No. l, March 1973
`
`AA/SWA Ex. 1010, p.3 of 50
`American Airlines, et. al. v. Intellectual Ventures, et.al.
`IPR2025-00785
`
`

`

`34 • J. L. Baer PREP :A Parallel Path Counter, PPC, is initialized (PPC ~- 1). (A stack of PPC's is kept in case of nested loops.) AND(L) : Two-way fork. PPC ~-- PPC + 1. Process at address L is initiated and the current process is continued at the next instruction. ALSO(L) : As above but without incre- menting the PPC. JOIN : PPC ¢-- PPC - 1. If PPC = 0, the PPC is "popped" and processing continues at the next instruction, else the processor executing the JOIN is released. IDLE :Terminates a path and re- leases the processor that was executing it. Figure 2 shows the realization of a PARAL- LEL FOR using these instructions. It is to be noted that this scheme is inde- pendent of the number of processors avail- able, and that there is no need to state ex- plicitly the relationship between AND and JOIN. Similar ideas can be found in PFORTRAN [DRAU 67], an extension of FORTRAN IV with the additional possibility of limiting the number of processors executing a loop. t I t B ........ I ? J Proce~q i t@ (n,l,?) ? I I ....... FIG 1 Conway's FORK-JOIN concept I Pro¢ i JOT' ,q~ 2 I A1 I = 0 i I PREP ~ A3 IF I >N GO TO A6 AND (A2) I °° [ RECIN END for I 1' I F1G 2. PARALLEL FOR I = 1,N,1 DO BEGIN .. END Two general-purpose high-level languages, PL/I [IBMC 69] and ALGOL-68 [VANW 69], have incorporated these concepts in their definitions. In PL/I, forking, called asynchronous processing, is done at the procedure level. A procedure called with a TASK option can be performed in parallel with its called path, and is said to be a subtask attached to the latter. Associated with the parallel subtask is an EVENT option. The joining operation can be programmed in different ways: first, by a RETURN or END statement which gives back control to the calling task; second, by an EXIT (or STOP in the case of the main procedure) which stops the current Computing Surveys, Vol 5, No 1, March 1973
`
`AA/SWA Ex. 1010, p.4 of 50
`American Airlines, et. al. v. Intellectual Ventures, et.al.
`IPR2025-00785
`
`

`

`Theoretical Aspects of M ultiprocessing • 35 task and all its attached subtasks. Finally, a WAIT statement allows suspension of opera- tions in a calling task until all--or only a limited number of--events of attached sub- tasks (defined through the EVENT option) have terminated. In ALGOL-68, parallelism can be present both at the procedure level by using a par symbol and at the statement/declaration level through the concept of collateral clauses. A simplifying explanation of the latter is to say that whenever a ";" is re- placed by a "," when separating syntactical units, the actions implied by these syntac- tical units can be done in parallel instead of sequentially. For example in: realx: = 1.1, inty: = 2, z; (x: = x + 4.3, z: = y); the declarations and initializations cf x, y, and z can be performed in parallel; and, after their terminations, the two assigna- tions can in turn be done in parallel. This collateral clause concept was already present in APL's original definition lIVER 62]. Several assignment statements could be written on the same line, the respective left and right parts being separated by commas. Because of APL's orientation toward on-line computing, this idea has been abandoned. Parallelism is still present in APL in vector and matrix operations, which makes the language particularly attractive for pipe-line machines. B. The Lock-Out Problem When splitting a task in parallel paths, it may happen that two (or more) of these paths want to access the same data. To be consistent, there must be some means of preventing unorderly changes in the shared data base, if it is to be updated by at least one of the concurrent processes. This has been referred to as the "lock-out" or "mutual exclusion" problem. The portion of code--in a path--that accesses the shared data is called the "critical section" of that path [DIJK 68]. One way to provide the necessary protec- tion is by having instructions of the form TEST AND WAIT AND SET [LEHM 66] or the equivalent pair LOCK w/UNLOCK w [DENN 66], where w is a one-bit switch. The interpretation is as follows: If one at- tempts to execute LOCK w when w is already set, the LOCK w instruction is re- peated until w is reset; when w is (or be- comes) reset, LOCK w sets w and the pro- gram path containing the instruction may continue. When UNLOCK w is reached, the switch w is again set to off. For high-level languages, similar schemes have been proposed. OBTAIN and RE- LEASE statements [ANDE 65] provide the same functions as the LOCK~UNLOCK pair; a new procedure type, called SHARED, would surround all blocks representing the portions of parallel programs which should not be run concurrently [WIRT 66]. In PFORTRAN [DRAU 67] variables can be either PUBLIC or PRIVATE, therefore allowing the programmer to protect the integrity of his program. The synchronization of concurrent proc- esses cycling through their critical sections has been thoroughly studied by Dijkstra [DIJK 65, DIJK 68]. The requirements he sets forth are: 1) At any given time only one process is in its critical section. 2) Stopping one process outside its critical section has no effect on other processes. 3) The decision as to which process enters its critical section cannot be postponed indefinitely. The general solution to the problem can be found in [DIJK 68] where a number of faulty examples are worked out untd a satisfactory result is obtained. Only logical and integer variables are used; there is no extension to the usual statements of high- level languages. However, the solution for n processors has a flaw; namely, while require- ment 3) above assumes that some processor will eventually enter its critical section, it may happen that a particular processor is locked out forever. Knuth [KNUT 66] re- states that requirement as: 3) Every process wanting to enter its critical section will be eventually allowed to do SO. Furthermore, he gives an algorithm to solve this modified problem. Computing Surveys, Vol. 5, No. 1, March 1973
`
`AA/SWA Ex. 1010, p.5 of 50
`American Airlines, et. al. v. Intellectual Ventures, et.al.
`IPR2025-00785
`
`

`

`36 • J. L. Baer Aside from the difficulty of the pro- grammed solution (a more formal solution is now available [GILB 72]), the main draw- back to the above approaches is that proc- esses waiting to enter their critical sections are busy setting flags and accessing common variables, thereby slowing the whole system. Primitive operations were, therefore, defined to circumvent this waste of processor time. Elementary primitive operations similar in nature to the LOCK/UNLOCK pair, but without requirement of processor looping were defined by Saltzer [SALT 66] and Lampson [LAMP 68], namely: wakeup (p): If processor p is logi- cally blocked (i.e., dormant), change its state to active; Else set up a wake-up wait- ing switch (wws) to remem- ber the wake-up call. block (p): If p's wws is set, reset it; Else change p's state to dor- mant. It is to be noted that many processes may be blocked waiting for the same condition; therefore, a wakeup list may have to be kept and updated, forcing processes to be cogni- zant of each other. To allow programs to be more independent, Dijkstra defines a new type of variables, called semaphores (which can take only non-negative integer values), and two operations on them, V and P, de- fined as follows: V(S) :S : = S + 1; P(S) :L :IfS = 0 then go toLelseS : = S- 1; with the understanding that these ALGOL- like statements are indivisible and may not be executed simultaneously. In fact, it can be shown [DIJK 68] that binary semaphores are sufficient at the expense of more complexity in the code. Figure 3 shows an example of the use of semaphores for a simple buffering routine. This concept has been now widely ac- cepted. For example, semaphores have been incorporated as a feature of ALGOL-68, and there have been proposals [WIRT 69] for hardware P and V. However, it is to be "begin integer number of queuemg portions, number of queuemg portions = 0, parbegin producer begin agam I produce the next portion, add portion to buffer, V(number of queuemg portions), goto again 1 end; consumer begin again 2 P(number of queuemg portmns), take portion from buffer, process portion taken, goto again 2 end parend end" FIG. 3. Example of semaphores usage. noted that the indivisibility of these opera- tions has been criticized. Using a model similar to that used in sequential circuits theory, Bredt [BRED 70A] gives procedures to design a hardware lock-out mechanism free of any assumption about this indivisi- bility. C. New Programming Language Concepts As early as 1962, foreseeing the advent and generalization of multiprocessors, Brown [BROW 62] proposed a method "free, at least above a certain minimal level, from the kind of overspecification of sequence con- straints that characterize present program- ming." This idea has recently been recon- sidered. We here report two different ap- proaches to it. In the first, [TI~SL 68], a class of "single assignment" languages is pre- sented and depicted with a particular ex- ample, COMPEL (for Compute Parallel). The main characteristics of this class of languages are that no variables are assigned values by more than one statement, and that there is no possibility of side effects. This allows the compiler to detect easily which variables are dependent on previously assigned ones and, thus, to find the Farallelism inherent in the program. (We shall return to this Frob- lem in the next section.) COMPEL processes only three types of quantities: numbers, lists, and functions. COMPEL also has only one type of statement: the assignment state- ment. Because of this latter requirement, there must exist a sizable number of built-in functions. For example, an ALGOL for-loop such as: Computing Surveys, Vol. 5, No. 1, March 1973
`
`AA/SWA Ex. 1010, p.6 of 50
`American Airlines, et. al. v. Intellectual Ventures, et.al.
`IPR2025-00785
`
`

`

`Theoretical Aspects of Multiprocessing • 37 s:=0; r:=r0; for i: = 1 step 1 until n do begin s:= s + a[i] • r; r:= r -{- 2 end; rl:=r; will be replaced in COMPEL by r: each[r0 by 2 for hi; s: -t- list of (each a) • r; rl: last(list of r); The replacement may look rather artificial and clumsy to a programmer; but, as the authors of the paper point out, it may look as natural as FORTRAN to a novice. Another advantage of such an approach is that incre- mental compilation is readily feasible since the ordering of the statements is immaterial. Unhappily, this aspect of the language makes it unreadable and undocumentable. Further development and extensions of the single assignment concept can be found in [CHAM 71]. A less revolutionary approach has been taken by Volansky [VOLA 70]. His proposal for a new language is an outgrowth of a graph model of computations used at UCLA, that will be discussed in detail in Section IV. In ordinary high-level languages, data is defined separately from computation and sequencing. Volansky's claim is that the latter should also be separated since lan- guages for parallel processing involve much more control than those for sequential machines. This would simplify programming for parallel processing, as well as enhance modular compiler design. Postulating a data sublanguage and a computation sublan- guage, as in FORTanN or ALGOL-60, Volansky [VOLA 70] defines a control sublanguage with the following capabilities: 1) Upon completion of a computation, one or more computations can be initiated (FORK). 2) More than one computation may precede another's initiation (JOIN). 3) A set of computations can be looped sequentially (DO-loop). 4) A set of computations can be looped with each set executing in parallel (replica- tion). 5) Back and forth communication must exist between control and computation sublanguages (e.g., for tests and looping). To satisfy these requirements, each state- ment has lists of immediate predecessors and successors statements, as well as a set of computations to be activated either in parallel, sequentially looped, or replicated, depending on the type of statement. Branch- ing statements (IF-type) are also present. As with COMPEL, this language is hard to read and, further, hard to write. The amount of work involved in writing a program in such a language makes its practicality dubious. Nevertheless, the idea of separating control and computations can be pursued. By defining a few default options, the read and write abilities will be greatly enhanced. Figure 4a gives an example of a FORTRAN program; Figure 4b, a control sublanguage for its parallel execution [VOLA 70]; and Figure 4c, a middle of the road approach with the description of computational blocks and their parallel control. Before closing this section on new lan- guages, we should note a now defunct language, TRANQUIL, which was specifically designed for array processors. TRANQUIL [ABEL 69] was an ALGOL-like language with capabilities reflecting the specific hardware. For example, the arrays could be skewed or straight (see Appendix); there were two types of for statements, one for sequentiM execution and one for replication; and new tests were devised for conditional branch- ing, such as: If any x ~ y then ... else ..., or If all x < y then ... else ... Loops could be performed either sequentially or in parallel, with a wide range of possibili- ties for the indexing. TRANQUIL was very much hardware dependent, and could not be considered a general-purpose language. How easily existing languages, with ade- quate extensions, can be used for some classes of multiprocessors is an open ques- tion. In fact, the approach considered in the next section cannot be applied to sl:ecial- Computing Surveys, Vol. 5, No. I, March Ig73
`
`AA/SWA Ex. 1010, p.7 of 50
`American Airlines, et. al. v. Intellectual Ventures, et.al.
`IPR2025-00785
`
`

`

`38 • J. L. Baer DIMENSION A(IO), B(IO), Y(IO), X (10) 1 READ (N,A,B) 2 SA=0 3 SB=0 DO5I = 1,N 4 SA = SA+ A(I) 5 SB = SB + B(I) DO 7 I = 1,N 6 X(I) = h(I) -t- B(I) 7 Y(I) = X(I)**2 DO 10 I = 1,N 8 IF(X(I) LT.SA) GO TO 11 9 A(I) = X(I) - SA 10 B(I) = Y(I) -- SB 11 WRITE (N,A,B) END Fig 4 (a) Sample FORTRAN program ~. VI(N) : 'A1,A5'; A1. V2,V3 . A2; A2. <~I=I,N : A3; A3: V4,V5 : A4, A4. END > : A9; AS. <I(1/N) . A6; A6. V6 " A7, A7. V7 : AS; A8: END > :A10, "A9,A10': <I=I,N "All, All. VS(LV) :?LV (A14,A12); A12" V9,V10 A13; A13. END > . A14, A14: VII :~ FtG 4 (b) Control program for sample FORTRAN program. DEC I I{EAL A(IO),B(IO),X(IO),Y(IO) B 1 REAl) (N,A,B), SA = 0, SB = 0, B 2 SA = SA + A(I), SB = SB + B(I), B 3 X(I) = A(I) + B(I), Y(I) = X(I)**2, B 4 IF (X(I) LT SA) END, A(1) = X(I) - SA, B(I) = Y(1) -- SB, B 5 WRITE (N,A,B), END DEC 1, (Q Computational blocks SEQ BI, LJ, L2, BS, END, L1 FORK L3, IA, L3 DO I ~ I,N B2, JOIN LS, L4 DO PARI = I,N B3, JOINL5, L5 END LI (2), L2 DO I = 1,N B4, (~l) Flow of control FXG 4 (c) Two-dimensional programs. purpose computers or to array and pipe-line processors. III. AUTOMATIC DETECTION OF PARALLELISM In the previous section we have seen how parallelism could be explicitly stated by the programmer. This section is devoted to the problem of detecting parallelism automati- cally at the compiler level. We shall restrict the discussion to procedure-oriented lan- guages such as FORTRAN and ALGOL-60. Studies in this area fall into two groups: detection of parallelism within an expression; and detection of parallelism between con- secutive statements. The former may find its best use in generating code for non- homogeneous multiprocessors with distinct functional units (e.g., CDC 6600), while the latter is geared toward homogeneous multi- processors. At this point it is worth noting that hard- ware mechanisms exist to detect possible concurrent execution/decoding of instruc- tions at run time, given a sequential stream of instructions. Tjaden and Flynn [TJAD 70] delineate the three types of dependency that can prevent two instructions from being executed/decoded in parallel; that is: 1) procedural: depending on the flow of control; 2) operational: depending on the allocation of resources; and 3) data: depending on the utilization of the result of some operation for oper- ands of others. They propose a pre-decoding stack, an extension of a look-ahead unit, which can be used to decode n instructions (with ex- tended opcodes) in such a way that all inde- pendent instructions can be initiated in parallel. In a more conventional look-ahead machine, such as the IBM 360/91, a hard ware mechanism is included for dispatching floating-point instructions to their respective functional units. This scheme allows maxi- mal concurrency by a tagging process which does not significantly increase the execution time of the instructions [TOMA 67]. We now turn our attention to the software aspects of the problem. Computing Sur~eys, Vol. 5, No I, March 1973
`
`AA/SWA Ex. 1010, p.8 of 50
`American Airlines, et. al. v. Intellectual Ventures, et.al.
`IPR2025-00785
`
`

`

`Theoretical Aspects of Multiprocessing • 39 A, Detection of Parallelism Within an Arithmetic Expression Transforming an arithmetic expression into a binary tree representation is a well- known process. Here we are interested in a transformation that not only yields a binary tree but also displays the operations which can be performed concurrently, that is, which make use of the associative law of some operators. These operations will be said to be at the same level, where the level of a variable is 0 and the level of the result of an operation involving operands of levels and j is max(i, j) + 1. The published algo- rithms (in chronological order [SQUI 63A, HELL 66, STON 67, BAER 68, and RAMA 69]) that perform this transformation can be classified according to their characteristics, namely: • The type of machine for which they are intended to generate code. All but one of the algorithms cited above are directed toward multi-register, multi-functional units CPU's. The exception is Stone's [STON 67], which is designed for a pipe- line push-down stack computer (see Ap- pendix). In [RAMA 69] one can also deduce the maximum number of registers required to evaluate the expression ~qth- out extraneous STORE and FETCH operations. • The possibility of reordering of the oper- ands, making use of the commutative law of some operators. Two algorithms [BAER 68, SQU163A] take this into account when transforming expressions such as: (a -t- b X c q- d) into (a -{- d) -~ (b X c). • The reduction of the number of levels. This depends on the possible reordering of operands and the treatment of unary operators. It seems that the Baer and Bovet's algorithm [BAER 68] performs best when applied to a single arithmetic expression. In [SQUI 63A] "blocks" of assignment statements, without procedural dependency, are treated, possibly leading to a better overall tree structure. • The number of passes over the input string. Stone's algorithm requires only one pass for the parsing; however, comparing it with the other algorithms is difficult since the code generated does not have the same structure. Squire's algorithm is the most complex, and requires scanning the input string back and forth many times. The other methods require either as many passes as there are levels [BAER 68], or this number plus an initial pass to transform the input string into Polish notation. • The simplicity of the code generated. Hellerman's, and Baer and Bovet's algo- rithms produce a 3-address code, or quadruple, of the form: temporary result operand 1 operator operand 2 This seems to be an appropriate form of intermediate language. Summaries of these algorithms have al- ready been presented [BAER 68, REIG 69, RAMA 69]. Therefore, we limit ourselves to the illustration of one of them [BAER 68] for the simplified case where the only opera- tors allowed are "-{-" and "X". As stated earlier, the algorithm uses multi- ple passes, one per level of the resulting binary tree. During each pass, the input string is scanned once from left to right, the result of a call to the scanning procedure being the triplet LSCOP (operator to the left of the operand), ITEM (the operand), and SCOP (operator to the right of the operand). A double-word stack is used (STACK(1,IS) storing operands, STACK- (2, IS) storing operators) for entities which cannot yet be part of a temporary result either because of precedence constraints or because a complete quadruple cannot yet be generated. Table 1 shows the parsing of the expression A -b B • C • D -{- E -{- F, and Figure 5 the resulting binary tree. It is to be noted that in the above algo- rithms there is no mention of operational dependency; i.e., it is implied that as many functional units and as many registers as necessary are available. (Procedural depend- ency is taken care of by forbidding side effects and allowing reordering of the oper- ands. Extra parentheses could be added to the expression if one wanted to prevent some of this reordering). Resource allocation has been studied in connection with com- pilation of FORTRAN expressions for the CDC 6600. In a first paper [ALLA 64], Computing Surveys, Vol. 5, No. I, March 1973
`
`AA/SWA Ex. 1010, p.9 of 50
`American Airlines, et. al. v. Intellectual Ventures, et.al.
`IPR2025-00785
`
`

`

`40 • J.L. Baer TABLE 1. COMPILATION OF A ÷ B * C * D + E + F ~OR PARALLEL COMPUTATIONS (Pass 1 LSCOP ITEM SCOP ÷ A ÷ 1 + B • 2 Input String A+B*C*DTETF;) IS STACK(1,IS) STACK(2,IS) TEMP. RESULTS OUTPUT STRING • C * 1 • D ÷ 1 + E + 0 + F ; 0 (Pass 2 Input String ~ Output String = Tz*D÷T2+F;) ÷ T1 * 1 Tl * 0 • D + 0 0 0 T3 = TI*D T3÷ + T~ ÷ 1 T~ + T3+ + F ; 0 0 0 T~=T2+F Ta+T4; (Pass 3 Input string (--- Output String = T3÷T4;) ÷ T3 + 1 T3 + 0 + T~ ; 0 0 0 T~=T~+T. Ts; A + 0 A + 0 B * 0 A ÷ T1 = B*C TI* A ÷ T**D+ 0 0 T~=A+E TI*DWT2TF; 0 0 Ti*D÷ T2F; 81 r; C D A F v FIG. 5. Binary tree for the parallel computation of A ÷ B*C*D÷ E÷ F. levelq All

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket