`
`INTELLECTUAL VENTURES EX. 2013
`EMC v. Intellectual Ventures
`IPR2016-01106
`
`
`
`104
`
`-
`
`Philip H. Enslow Jr.
`
`CONTENTS
`
`INTRODUCTION
`Motivations
`Multiple-Computer Systems
`Definition of a Multiprocessor
`MULTIPROCESSOR HARDWARE SYSTEM
`ORGAN IZATIONS
`Time-Shared/Ccmmon-Bus Systems
`Crossbar SWItch Systems
`Multiport Memory Systems
`Comparison of the Three Basic System Organisation
`MULTIPROCESSOR OPERATING SYSTEMS
`Operating System Fatalities Prowded
`Organization of Multiprocessor Operating Systems
`PAST, PRESENT. AND FUTURE OF
`MULTIPROCESSING
`Development of Multiprocessing
`Current Multiprocessors
`System Performance
`Coat Efiecuveness
`Future Trends
`FURTHER READINGS
`REFERENCES
`
`+
`
`This paper is concerned with improvements
`at the level of system organization; it deals
`specifically with a special class of system
`organizations—systems known as multi-
`processors.
`
`Multiple-Computer Systems
`
`Of course, not all multiple-computer systems
`are multiprocessors. An obvious example of a
`multiple-computer system that
`is not a
`multiprocessor is a system with a stand-
`alone peripheral or satellite processor. Per-
`haps less obvious examples are the various
`forms of coupled systems (both loosely and
`closely coupled) such as the IBM ASP (At-
`tached Support Processor) System and
`others having direct electrical connections.
`Specific examples and a complete discussion
`of the evolution of multiplecomputer sys-
`tems, as well as an introduction to paral-
`
`Computina Survey, Vol. 9, No. 1, March 1977
`
`lelisnl in the basic unit-processor, are given
`in Enslow [9].
`there are many similarities
`Naturally,
`between multiple-computer
`systems and
`multiprocessors since both are motivated by
`the same basic goal—the support of simul-
`taneous operations in the system; in fact
`the distinctions are often not clear-cut, as is
`exemplified by the frequent use of the term
`“multiprocessing” in instances where it is
`not appropriate. However, there is an im-
`portant dilierence between multiple-com-
`puter systems and multiprocessors, and it is
`based on the extent and degree of sharing
`[8]: A multiple-computer system consists of
`several separate and discrete computers (even
`though there may be direct communica-
`tion between them), whereas a multiproc-
`essor is a single computer with multiple
`processing units.
`
`Definition of a Multiprocessor
`
`A multiprocessor is defined in the American
`National Standard Vocabulary for Informa-
`tion Processing as “a computer employing
`two or more processing units under inte-
`grated control.” That definition is good as
`far as it goes, but it does not go far enough.
`Certainly the requirement that a multi-
`processor have “integrated control” is ex-
`tremely important, for a multiprocessor must
`have a single integrated operating system;
`however, the concepts of sharing and inter-
`action, which are at the core of the tech-
`niques of multiprocessing, are not included
`in the ANSI definition.
`
`With respect to the hardware, a multi-
`processor must have the capability for the
`direct sharing of main memory by all proc‘
`essors
`(arithmetic/logic unit and control
`unit only) and the sharing of input/output
`devices by all memory and processor com-
`binations (Figure 1). Although there may
`be some qualifications on the sharing of all
`resources of a particular type (one example—-
`private memory—is discussed below), the
`basic concept of total sharing is still valid.
`The important aspect of “interaction” is
`the level at which it occurs. In multiple-
`computer systems the physical unit of inter-
`action is usually the complete file or data
`set. In a true multiprocessor the level of
`
`
`
`
`
`
`
`Page 2
`
`
`
`
`
`Multiprocessor Organization—A Survey
`
`o
`
`105
`
`Interconnection
`
`system
`
`
`
`Processor
`
`11 nits
`
`
`FIGURE 1.
`
`Basic multiprocessor organization.
`
`interaction aDOWed must be more flexible
`and in fact must be allowed to descend to
`
`even the smallest physical unit. Interaction
`must be possible with files, data sets, and
`even data elements. From the control point
`of View,
`interaction must be possible be-
`tween complete jobs and tasks as well as
`between individual job steps.
`It is the combination of these expanded
`concepts of sharing and of interaction at all
`levels
`that completely characterizes
`the
`hardware and software required to provide
`a "true" multiprocessor, which can now be
`defined by the following system character-
`istics:
`
`I A multiprocessor contains two or more
`processors
`of
`approxiinately
`com-
`parable capabilities.
`. All processors share access to common
`memory.
`I All processors share access to input/
`output channels, control units, and
`devices.
`
`0 The entire system is controlled by one
`operating system providing interac-
`tion between processors and their
`programs at the job, task, step, data
`set, and data element levels.
`
`MULTIPROCESSOR HARDWARE SYSTEM
`ORGANIZATIONS
`
`The key to the classification of multiproc-
`essor systems is the interconnection sub-
`
`system, and the factors that are most im-
`portant are the topology and the operation
`of this central portion of the system. There
`have been a number of very good taxono-
`mies prepared which focus on interconnec-
`tion networks [1, 5, 18]; however, these re-
`views have been concerned primarily with
`the details of interconnection itself rather
`than with the characterization of the sys-
`tems in which particular interconnection
`subsystems are embedded. Examining the
`nature of the processor-to-memory switch
`in a manner similar to Conway [6], this
`author has identified three fundamentally
`different
`system organizations used in
`multiprocessors :
`I ‘Timesshared or common bus
`0 Crossbar switch matrix
`
`I Multiport memories
`Although the entire scope of interconnection
`schomas is much larger and certainly mueh
`more complex than the coverage presented
`here,
`these categories nonetheless form a
`useful base for a discussion of the organiza-
`tion of multiprocessor systems, and each of
`these interconnection techniques is pre-
`sented below in the context of providing
`the central portion of such systems.
`It should also be noted that there are
`
`several other system organizations that have
`been utilized to achieve parallelism. Some
`of these are: asymmetrical or nonhomoge-
`neous systems (e.g. CDC 6000 series); array
`
`Computint Surveys, Vol. 9, No. 1, March [977
`
`
`
`
`
`Page 3
`
`
`
`
`
`106
`
`-
`
`Philip H. Enslmu Jr.
`
`IIO
`processor
`
`
`
`
`
` Processor
`
` Processor
`
`I/O
`processor
`
`
`
`
`
`FIGURE 2. Time-shared/common bus system organisation—single bus.
`
`and vector processors; pipeline processors;
`and associative processors.
`The first of these other systems may be
`quite close to a "true" multiprocessor if the
`operating system supports the proper levels
`of interaction. The latter three examples of
`system organization are discussed in other
`papers in this special
`issue. In addition,
`there is one other class of system organiza-
`tion that exhibits many of the characteristics
`of a multiprocessor, and that is fault-tolerant
`systems; however, the motivation for the
`development of such systems and the goals
`of their design are quite different from those
`of a true multiprocessor system.
`
`Time-Shored/Common-Bus Systems
`
`interconnection system for
`The simplest
`either single or multiple processors is a
`common communication path connecting
`all of the fonctional units. This technique
`has been used to assemble some simple
`multiprocessors
`(Figure 2)—“simple”
`in
`that the interconnection subsystem can be
`merely a multiconductor cable. Such an
`interconnection system is often a totally
`passive unit having no active components
`such as switches or amplifiers. Transfer
`operations are controlled completely by the
`bus interfaces of the sending and receiving
`units. The unit wishing to initiate a trans-
`fer, a processor or an 1/0 unit, must first
`determine the availability status of the bus,
`then address the destination unit, deter-
`mine its availability and capability to re-
`ceive the transfer, tell the destination what
`to do with the data being transferred. and
`finally initiate the transfer. A receiving unit
`only has to recognize its address and re-
`
`spend to the control signals from the sender.
`These are the basic concepts, although the
`entire operation is not actually that simple.
`(The single bus in the PDP-ll, the Uranus,
`has 56 lines to provide the control lines and
`data paths necessary to transfer words of
`only 16 bits.) It is possible to simplify this
`process somewhat by the use of a centralized
`bus controller/arbiter, but such an ap-
`proach does have negative efiects on system
`reliability and flexibility.
`The hardware changes required to add or
`remove functional units are usually quite
`minimal. Often all
`that is required is to
`physically attach or detach the unit. The
`units in the system are required to know
`what other units are present and to know
`their unit and internal location addresses,
`but that requirement is basically a software
`problem. The quantity and types of func-
`tional units are transparent to the intercon-
`nection subsystem. This type of intercon-
`nection subsystem is, by its very nature,
`quite reliable, and its cost is relatively low;
`however, it does introduce a single critical
`component in the system that can cause a
`system failure as a result of a malfunction
`in any of the bus interface circuits.
`Of course these benefits of simplicity and
`low cost do not accrue without entailing
`other limitations—in particular the serious
`limitation on overall system performance
`that results from having only one path for
`all transfers—since the total overall transfer
`
`rate within the system is limited by the
`bandwidth and speed of this single path.
`Interconnection techniques that overcome
`this weakness add to the complexity of
`the system.
`The first step in solving this problem
`
`Computing Surveys, Vol. 9. No. 1. March 1977
`
`
`
`Page 4
`
`
`
`
`
`Multiprocessor Organizatimt—A Survey
`
`-
`
`107
`
`
`
`l/O
`devices
`
`Memory
`Units
`
`
`
`Bus
`modifier
`
`Control
`109w
`
`Pr
`
`USP-55°F
`
`P
`
`racessor
`
`FIGURE 3. Time-shared/common bus system organization—unidirectional buses.
`
`
`
`FIGURE 4. Time—shared/common bus system organizationumultipla two-way husaa.
`
`might be to provide two one-way paths
`(Figure 3), since this addition does not
`appreciably increase
`system complexity
`or diminish reliability. On the other hand, a
`single transfer operation in such a system
`usually requires the use of both buses, so
`not much is actually gained.
`The next step would be to provide mul-
`tiple two-way buses (Figure 4)
`to allow
`multiple simultaneous transfers; however, the
`complexity of the system would be greatly in-
`creased. No longer would the interconnec-
`tion subsystem be a totally passive unit;
`logic, switching, and other control functions
`would now have to be associated with each
`
`point at which functional units Were at-
`tached to the transfer buses.
`
`An example of a system utilizing separate
`time—shared buses for memory access and
`for input/output transfers is the MIT/IL
`ACGN computer, a fault-tolerant system
`designed for space applications (Figure 5).
`Another system utilizing a set of multiple
`transfer buses is the Plessy System 250,
`which was developed for communications
`applications‘ (Figure 6). The 250 has one
`bus per processor
`in the system. Other
`
`1 The System 250 is also well known for the hard-
`ware included in the system to support the direct
`implementation of protection through the use of
`capabllities.
`
`examples of systems employing the time-
`shared bus technique for interconnection
`are the IBM STRETCH, the Univac LARC,
`the CDC 6600 (for transfers betWeen main
`memory and the peripheral processors), and
`seVeral minicomputsr systems such as the
`Lockheed Sun.
`
`A recently developed system utilizing a
`series of multiple and separate buses is the
`PLURIBUS minicomputer multiprocessor [16,
`22]. The basic processor is the Lockheed
`SUE minicornputer. There are three types of
`buses—processor, memory, and input/out-
`put. There are seven processor buses with
`two processors and two 4K memories at-
`tached to each (Figure 7(a)); there are also
`two memory buses, each with two 8K
`memory units (Figure 703)); and, finally,
`there is one input/output bus plus an input/
`output bus extension (Figure 7(c)). These
`buses are all
`interconnected to form a
`single system of 14 processors (Figure 8).
`
`Crossbar Switch Systems
`
`If the number of buses in a time-shared bus
`system is increased, a point is reached at
`which there is a separate path available for
`each memory unit (Figure 9). The inter-
`connection subsystem is then a “nonb10ck-
`ing” crossbar. The adjective “nonblocking”
`
`Computing Surveys. Vol. 9, No. 1, March 1977
`
`
`
`
`
`Page 5
`
`
`
`Philip H. Enslm» Jr.
`
`Processor
`memOry
`
`Processor
`memory
`
`Processor
`memory
`
`Processor
`
`Data
`memory
`
`Data
`memory
`
`Data
`memory
`
`IIO
`controller
`
`Processor I Processor
`
`
`
`FIGURE 5. MIT/IL ACGN computer.
`
`Paper tape
`punch
`
`Serial
`interface
`unit
`
`Teleprmter
`
`Low
`activity
`peripheral dewces
`
`Serial
`interface
`unit
`
`Paper tape
`reader
`
`Parallel
`interface
`unit
`
`Magnetic
`tape
`
`Parallel
`interface
`
`Disk or Drum Line Printer
`
`Parallel
`interface
`
`Parallel
`interface
`
`Banal-
`parallel
`adaptor
`
`Serial-
`parallel
`adaptor
`
`Secondary
`switches
`
`Primary
`switches
`
`Peripheral
`buses
`
`multiplexer multiplexer
`
`PP250
`processor
`unrt
`
`processor
`unit
`
`processor
`unit
`
`Processor
`
`buses
`
`FIGURE 6. Plessy System 250—medium system canfiguration [25].
`
`Computing Surveys. Vol. 9, No. I, March 1917
`
`Page 6
`
`
`
`
`
`Multiprocessor Organization—A Suwey
`
`o
`
`109
`
`Processor bus
`
`Memory bus
`
`110 bus
`
`
`
`Bus coupler
`
`o 0
`
`Bus coupler
`
`Pseudo interrupt
`
`
`
`
`
`unications
`Comm
`lnterlace
`
`oea
`
`Communications
`interface
`
`Bus extender
`
`
`
`I device
`
`
`
`
`
`Communications
`Interface
`
`
`
`
`
`
`is)
`
`
`
`
`
`(a)
`
`
`
`FIGURE 7. Fantastic bus structures [16].
`
`is usually omitted since one characteristic of
`the crossbar switches used in multiprocessor
`systems is that they are “complete” with
`respect to the memory units (i.e., there is a
`separate bus associated with each memory,
`and the maximum number of transfers that
`
`memory cycle. These conflicting requests
`are usually handled on a predetermined
`priority basis, e.g., input/output has highest
`priority in all conflicts, processor number 2
`has primary access priority to memory 2,
`etc. The result of the inclusion of such a
`
`can take place simultaneously is limited by
`the number of memory boxes and the band
`width-speed product of
`the buses rather
`than by the number of paths available).
`The important characteristics of a system
`utilizing a crossbar interconnection matrix
`are the extreme simplicity of the switch
`to-functional unit interfaces and the ability
`to support simultaneous transfers for all
`memory units. To provide these features
`requires major hardware capabilities in the
`switch. Not only must each cross-point be
`capable of switching parallel transmissions,
`but it must also be capable of resolving
`multiple requests for access to the same
`memory module occurring during a single
`
`capability is that the hardware required to
`implement
`the switch can become quite
`large and complex. An example that has been
`cited is a system with twenty-four 32-bit
`processors and 32 memory units (Lehman,
`cited in [2]). The number of circuits re-
`quired in the switch matrix for this system
`would be two to three times the number
`
`required for an IBM System 360 Model '15.
`Although large scale integration (LSI) can
`reduce the size of the switch, it will have
`little efl‘ect on its cemplexity.
`A characteristic of somewhat lesser gen-
`eral importance, but one which can be sig-
`nificant
`in specific instances,
`is the ca-
`pability to expand the size of the system
`
`Computing Surveys, Vol. 9, No. 1. March 19??
`
`
`
`Page 7
`
`
`
`110
`
`'
`
`Philip H. Enslow Jr.
`
`Power
`
`Processor busses [7)
`H
`
`I:
`
`;
`
`:
`
`EEHEEEEEI
`
`Extender
`
`A
`
`'3 Bus
`E
`[Eilpcemra'
`
`Dame
`
`I
`
`Intartace
`
`Frocemor
`End
`
`Time
`Ciock
`
`Communicauon
`[5:52:31
`Elana m
`c539: Memory
`
`rocasaor H Memawem
`
`
`
`Computing Surveys. Vol. 9. No. II Hum]: 19??
`
`_ -'
`Power
`*
`
`'
`
`3K
`
`Memory
`
`'-
`' g B BK
`I
`IIIIIII
`“ A
`Busses
`'1'
`h—
`IIIIIII-II
`-
`-II
`LII-d
`.IIIIIIIII II|_---
`III
`III—
`IIIIIIIIII III I ' "h I
`
`---I
`
`-
`
`PowerCCCB
`':"'
`Power
`n fitfifififitfififi EEHHEHI
`
`“005
`
`FIGURE 8. PLUmBUs prototype system [16].
`
`
`
`FIGURE 9. Crossbar (nunblocking) switch system organization.
`
`
`
`Page 8
`
`
`
`
`
`111
`
`Multiprocessor Organization—A Survey
`
`-
`
`FIGURE 10. Crossbar switch system organization with 1/0 crossbar switch matrix.
`
`Memories
`
`Processors {Including U0)
`
`
`
`
`FIGURE 11. Burroughs multiple interpreter system organization.
`
`merely by increasing the capacity of the
`switch. Usually there are no changes re-
`quired in any of the functional units be-
`cause of the very simple interfaces utilized,
`and the switch may be designed so that its
`capacity can be increased simply by adding
`additional modules of crosspoints. One ef-
`fect of LSI on the crossbar interconnection
`
`system is the feasibility of designing crossbar
`matrices for a larger capacity than initially
`required,
`equipping them only for
`the
`present
`requirements. Expansion would
`then be facilitated, requiring only the addi-
`tion of the missing crosspoints. This dis-
`cussion of “easy” expansion has addressed
`only the hardware. The modification of the
`operating system to Support
`the larger
`system may prove to be difficult if it was not
`properly designed for expansion; however,
`this is true for all multiprocessor system
`
`the inter-cen-
`
`organizations regardless of
`nection technique employed.
`In order to provide the flexibility re-
`quired in access to the input/output de-
`vices, a natural extension of the crossbar
`switch concept is to use a similar switch on
`the device side of the I/O processor or
`channel (Figure 10). The hardware required
`for the implementatiOn is quite difi'erent
`and not nearly so complex because con-
`trollers and devices are normally designed
`to recognize their own unique addresses.
`The effect.
`is the same as if there were a
`primary bus associated with each I/O chan-
`nel and crossbuses for each controller/device.
`A system utilizing a variation of
`the
`crossbar is the Burroughs Mold-Interpreter
`System [7, 23}. For the purpose of
`this
`discussion, which is concerned primarily
`with interconnection systems, it is sufiicient
`
`
`
`
`
`Computing Surveys, Vol. 9, No. 1, March I977
`
`Page 9
`
`
`
`Philip H. Emlmv Jr.
`
`x
`
`distribution
`u nit
`
`slngfe
`executive
`
`Recovery
`nucleus
`
`Main memory
`units
`131 .072 words
`
`There are a number of examples of sys-
`
`(b) Signal distribution unit
`
`FIGURE 12. RCA 215 multiprocessor.
`
`to note that the basic building block of the
`system i a microprogrammed “interpreter”
`(Figure 11). The microprograms in these
`units can be changed dynamically so that a
`single interpreter can function as a FORTRAN
`translation machine at one time, later change
`to an ALGOL execution machine, then change
`to function as an input/output processor,
`etc. It is obvious that the interconnecting
`switch for this system must- be extremely
`flexible. As can be seen in the figure, the
`switch resembles the earlier diagram of a
`
`multiple-shared-bns system; however, it can
`also be considered as a crossbar 1)" there are
`enough independent paths for all memories
`to be accessed simultaneomly. This is an
`excellent example of the blending together of
`these two concepts and the major character-
`istic that differentiates them: the crossbar
`
`provides nonblocking simultaneous memory
`access, and the miflfiple-shared bus pro-
`vides flexibility in the routing of intercon-
`nection paths.
`
`Computing Survelfll. Vol. 9, No. 1, March 1971'
`
`Page 10
`
`
`
`Multiprocessor Organizatz'm—A Survey
`
`-
`
`113
`
`16 X 16 Crossbar Interconnect
`
`processor-lo-memory only
`
`translator Address
`
`Address
`translator
`
`Address
`
`translator
`
`Interprocessor
`interrupt
`controller
`
`lnterprocessor Interrupt bus
`
`FIGURE- 13. Cman—the Carnegie-Mellon multi-mini processor.
`
`terns utilizing crossbar interconnection sys4
`tems. The first
`true multiprocessor,
`the
`Burroughs D-825 [AN/GYK-3(V)], had the
`switch distributed among the memory
`modules with five cables entering each
`switch module (three of these were normally
`used for processors with the other two for
`input/output). The switch module inter-
`faces had the logic circuitry necessary to
`accommodate and queue simultaneous mem-
`ory access requests. The 13-825 also utilized
`a crossbar switch matrix for connection to
`selected input/output devices. The organi-
`zation of the 13—825 then is quite similar to
`that shown in Figure 10. A system employ-
`ing a classic separately identifiable crossbar
`matrix is the RCA 215 (Figure 12). In this
`particular system the switch is designated
`the signal distribution unit (SDU). Figure
`12(b) illustrates the assignment of memory
`access priorities Within the SDU.
`A major
`research project
`involving a
`crossbar
`interconnection system is
`the
`C.mmp,
`the Carnegie-Mellon multi-mini-
`processor (Figure 13). The scope of this re-
`search project cncompasses the investiga-
`
`tion of economical techniques for intercon-
`neetion as well as in-depth studies of the
`operating system and of overall system per-
`formance [27—29]. The processor units uti~
`lized in the system are various models of the
`DEC PDP-ll. The Specific features of the
`implementation of
`this processor exhibit
`some deviations from the classic design for a
`multiprocessor. The first of these is that
`each processor has associated with it a block
`of dedicated private memory. This block is
`used to support the dedicated memory loca-
`tions used in the PDP—ll interrupts and
`traps; however, it would have been possible
`to send the traps through the crossbar.
`Another feature is a separate unit function-
`ing as the address translator for all accesses
`to the shared memory, since the address
`space of the main memory greatly exceeds
`that of
`the PDP-ll itself. Finally, each
`input/output device is associated with a
`single processor and cannot be shared. This
`again is an accommodation to the UNIBUS
`structure of the basic PDP-ll. The UNIsus
`is also used for access to a special bus that
`supports interprocessor communication.
`
`Computing Surveys. Vol. 9, No. 1. March 1971
`
`
`
`Pa
`
`
`
`Page 11
`
`
`
`
`
`114
`
`-
`
`Pass; H. Enslow Jr.
`
` Computer
`
`Central exchange
`
`Computer
`
`Peripheral
`device
`
`
`
`Peripheral
`device
`
`
`
`
`
`Peripheral
`device
`
`Peripheral
`devrce
`
`
`
`FIGURE 14. Rmo-Wooldridge RW-400 system—the “polymorphic computer.”
`
`A short historical digression: The earliest
`known system employing a crossbar-type
`interconnection switch Was
`the Ramo-
`Wooldridge RW-400,
`the
`“Polymorphic
`Computer” system, developed for the U. S.
`Air Force for large command and control
`installations (Figure 14). The primary em-
`phasis in this system design was on at-
`taining very high system availability, par-
`ticularly as seen from each control position.
`Perhaps the most important. feature of the
`system was the real-time interaction of
`users through large display and control
`consoles; the purpose of the “central ex-
`change” shown in the figure was to permit
`the consoles to be connected to any com-
`puter to change functions and to provide
`backup. It can be seen that the system does
`not fit the definition of a multiprocessor
`ofiered earlier, for it is not possible to share
`memory. The RW-400 was thus only an-
`other multiple-computer system; however,
`it served an important role in the develop-
`ment of the crossbar concept.
`
`Mulliporl Memory Systems
`
`If the control, switching, and priority arbi-
`tration logic that is distributed throughout
`
`the crossbar switch matrix is concentrated
`
`a.
`at the interface to the memory units,
`multiport memory system is
`the result
`(Figure 15). This system organization is
`well suited to both uni— and multiprocessor
`system organizations and is used in both.
`The method often utilized to resolve mem-
`
`ory access conflicts is to assign permanently
`designated priorities to each memory port;
`the system can then be configured as neces-
`sary at each installation to provide the ap—
`propriate priority access to various memory
`boxes for each functional unit (Figure 16).
`Except for the priority associated with each,
`all of the ports are usually electrically and
`operationally identical. In fact, the ports are
`often merely a row of identical cable con—
`nectors, and electrically it makes no differ-
`ence whether an 1/0 or central processor is
`attached. Specifically, a system utilizing
`8-port memory units may have any mix-
`ture of processor and I/O units subject to
`the restrictions that there must be at least
`
`one of each and the total is eight or less.
`The priority for memory access associated
`with each processor or input/output channel
`is then established by the selection of the
`
`Computing Surveys. Vol. 9. No 1. March 1977
`
`Pa
`
`Page 12
`
`
`
`Multiprocessor Organization—A Survey
`
`-
`
`115
`
`
`
`FIGURE 16. Multiport—memory system organization—assignment of memory port priorities.
`
`connector used for cabling that unit to the
`memory.
`The flexibility possible in configuring the
`system also makes it possible to designate
`portions of memory as “private” to certain
`processors,
`I /0 units, or
`combinations
`thereof (Figure 17). This type of system
`organization can have definite advantages in
`increasing security against unauthorized ac-
`cess and may also permit the storage of re-
`covery routines in memory areas that are
`not susceptible to modification by other
`processors; however, there are also serious
`disadvantages in system recovery if the other
`processors are not able to access control and
`
`status information in a memory block as-
`sociated with a failed processor. One system
`that utilizes the private memory concept is
`the PRIME system, designed at the University
`of California at Berkeley [4].
`The multiport memory system organiza-
`tion also can support nonblocking access to
`the memory if a “full-connected” topology
`is utilized. Since each word access is a
`
`separate operation, it also permits "the ex-
`ploitation of interleaved memory addresses
`for access by a single processor; however,
`for multiple processors,
`interleaving may
`actually degrade memory performance by
`increasing the number of memory access
`
`Computing Surveys, Vol. 9, No. II March 1977
`
`
`
`Pa
`
`
`
`
`Page 13
`
`
`
`116
`
`-
`
`Philip H. Enslow Jr.
`
`
`
`FIGURE 17. Multiport-memory system organisation—~including priVate memories.
`
`conflicts that occur as all processors cycle
`through all memory following a sequence of
`consecutive addresses. Interleaving also re-
`sults in the efiective loss of more than one
`module of memory when there is a failure.
`With multiple processors it is often prefer-
`able to utilize the property of “locality of
`reference” and not attempt to increase the
`effective memory speed by interleaving.
`There are a number of examples of multi-
`port memory systems. Because of its flexi-
`bility and low cost for uniprocessor organi-
`zations,
`it is the most commonly found
`organization in large systems. The larger
`Honeywell
`systems utilize
`system con—
`trollers, each with eight ports for connection
`to processors,
`input/output multiplexers,
`front-end processors, and bulk storage sub-
`systems (Figure 18). The UNIVAC 1108 has a
`Multiple Modular Access (MMA) unit as-
`sociated with each memory bank of 65K
`words (Figure 19). The MMA has five
`priority-ordered ports. Another example of
`a multiported memory system utilized to
`interconnect multiple processors is the IBM
`System 360 Model 67 (Figure 2D).
`
`Compariscm of the Three Basie System
`Organizations
`
`specific applications. The most obvious are
`cost, reliability, flexibility, growth potential,
`and system throughput and transfer ca-
`pacity.
`
`Time-shared bus:
`
`I Lowest overall system cost for hard—
`ware.
`
`0 Least complex (the interconnection
`bus may be totally passive).
`0 Very easy to physically modify the
`hardware system configuration by
`adding or removing functional units.
`0 Overall system capacity limited by
`the bus transfer rate (this may be a
`severe restriction on overall system
`performance).
`0 Failure of the bus is a catastrophic
`system failure.
`0 Expanding the system by the addition
`of functional units may degrade over-
`all system performance (throughput).
`. The
`system efficiency
`attainable
`(based on the simultaneous use of all
`available units) is the lowest of all
`three basic interconnection systems.
`0 This organization is usually appropri—
`ate for smaller systems only.
`
`Crossbar:
`
`A number of factors can be considered in
`comparing the three basic organizations
`described above or evaluating their use in
`
`I This is the most complex interconnec-
`tion system.
`0 The functional units are the simplest
`
`Computing Surveys, Vol. 9, No. 1, March 19??
`
`
`
`Pa
`
`
`
`
`Page 14
`
`
`
`
`
`Multiprocessor Organization—A Survey
`
`«
`
`117
`
`
`
`
` Processor
`
`Bulk store
`subsystem
`
`
`Processor
`
`
`
`centroller
`controller
`
`
`Datanet
`MO
`355
`multiplexer
`front end
`(IOM)
`network
`
`
`processor
`
`
`
`
`
`
`
`Peripherals
`
`Terminals
`
`FIGURE 18. Honeywell multiprocessor system organization.
`
`Bank 1
`Bank 4
`
`Mam
`Main
`storage
`storage
`
`
`
`
`
`
`Avai Iabllrty
`control
`u mt
`
`[ACU]
`
`- rocessor
`
`(CPU)
`
`
`
`
`
`16
`do channels
`
`Console
`
`
`
`0
`Controller
`IOC
`
`
`
`
`Ill—15
`
`0—15I
`
`To/irorn peripheral device interfaces
`FIGURE 19. Human 1108 multiprocessor system.
`
`Computing Surveys, Vol 9, No. 1, March 1977
`
`Pa
`
`Page 15
`
`
`
`118
`
`0
`
`Philip H. Enslow Jr.
`
`
`
`Processor
`
`controller
`
`
`
`
`
`
`
`Multiplex
`channel
`
`
`
`
`
`
`
`Selector
`channel
`
`Multiplex
`channel
`
`
`
`Selector
`channel
`
`FIGURE 20.
`
`IBM System 360 Model 67.
`
`and cheapest since the control and
`switching logic is in the switch. (The
`interfaces to the switch are simple and
`usually require no bus couplers.)
`Because a basic switching matrix is
`required to assemble any functional
`units into a working configuration,
`this organization is usually cost-effec-
`tive for multiprocessors only.
`There is a potential for the highest
`total transfer rate.
`
`for
`
`System expansion (addition of func-
`tional units) usually improves overall
`performance.
`There is the highest potential
`system efiiciency.
`There is a potential for system ex-
`pansion without
`reprogramming of
`the operating system.
`Theoretically, expansion of the system
`is limited only by the size of the
`switch matrix, which can often be
`modularly expanded within initial
`design or other engineering limitations.
`The reliability of the switch, and there-
`fore the system, can be improved by
`segmentation
`and/or
`redundancy
`within the switch.
`
`It is usually quite easy to partition
`the system to remove malfunctioning
`units or to establish separate systems.
`
`Multiport memory:
`
`Requires the most expensive memory
`units since most of the control and
`
`switching circuitry is included in the
`memory unit.
`The characteristics of the functional
`units permit a relatiVely low-cost uni-
`processor to be assembled from them.
`There is a potential for a very high
`total transfer rate in the overall sys-
`tern.
`
`The size and configuration options
`possible are determined (limited) by
`the number and type of memory ports
`available; this design decision is made
`quite early in the overall design
`process and is diflicult
`to modify.
`A large number of cables and con-
`nectors are required.
`All of the above characteristics are self-
`
`explanatory except those dealing with the
`overall system transfer capacity and the
`resulting system throughput. The time-
`shared bus organization obviously places the
`
`Computing Surveys, Vol. 9. No 1, March 1977
`
`
`
`Page 16
`
`
`
`Multiprocessor Organization—A Survey
`
`-
`
`119
`
`most severe limitations on the total transfer
`
`capability of the system, since it is limited
`by the bandwidth and speed of the single
`transfer bus. Consideration should be given
`to the number of units or modules attached
`
`to the bus, for each attachment loads the
`bus and reduces its speed while increasing
`the number of control signals on the bus.
`For example, since there can be only one
`memory transfer proceeding at any one
`time,
`there are advantages to using large
`memory modules rather than small ones.
`Also, as additional active functional units
`such as processors and input/output con-
`trollers are added to the bus, the number of
`requests for bus access increases, resulting
`in an increase in the number of conflicts
`
`that must be reolved; the overall effect is a
`Smaller number of effective transfers.
`(A
`similar result
`is obtained with a heavily
`loaded telephone exchange.) Many of these
`problems are avoided in the crossbar be-
`cause of the nature of the distribution of
`
`the conflict resolution circuitry. In the cross-
`bar system there is also the extremely im-
`portant advantage of having separate paths
`to each memory unit so that all memory
`modules are aVailable for simultaneous use.
`
`The multiport memory organization cen-
`tralizcs the cenfiict resolution function at the
`interface to the memory, and, although it is
`possible to configure a fully connected to-
`pology with a multiport system, the prob—
`lem