`
`US 20030041173A1
`
`(76)
`
`(57)
`
`ABSTRACT
`
`as) United States
`a2) Patent Application Publication co Pub. No.: US 2003/0041173 Al
` Hoyle (43) Pub. Date: Feb. 27, 2003
`
`
`(54) SYNCHRONIZATION OBJECTS FOR
`(52) US. Cle seeeeeseeeeeseeeesessnsretttnesnnnnnnerneenenerenreceeet 709/248
`MULTI-COMPUTER SYSTEMS
`Inventor: Stephen L. Hoyle, Mountain View, CA
`(US)
`
`Correspondence Address:
`HEWLETT-PACKARD COMPANY
`Intellectual Property Administration
`P.O. Box 272400
`Fort Collins, CO 80527-2400 (US)
`
`(21) Appl. No.:
`(22)
`Filed:
`
`09/928,115
`Aug. 10, 2001
`Publication Classification
`
`(SL)
`
`Tint, C1 oeseeeceesseeeseeetteeeeseeeeeeesessssseesentens GO6F 15/16
`
`Several multiprocessor computer systems, each having its
`own copyofan operating system,are interconnected to form
`a multi-computer system having global memory accessible
`by any processor on any node and including provision for
`spinlock accesscontrol. In this environment, a global mutex,
`and other like synchronization objects, are realized that can
`control
`the coordination of multiple threads running on
`;
`|
`multiple processors and on multiple nodes or platforms.
`Each global mutex is supported by a local operating system
`shadow mutex on each nodeor platform where threads have
`opened access to the global mutex. Global nutex function-
`ality is thus achieved that reflects and utilizes the local
`operating system’s mutex system.
`
`
`LOC.
`MEMORY
`
`
`
`
`YO
`
`CONTROLLER
`
`
`MEMORY AND
`
`21
`
`
`
`
`
`
`cemeeeeeeeeeeee
`
`4
`
`GLOBAL
`
`GLOBAL
`
`
`MEMORY
`MEMORY
`43
`
`MUTEX
`
`51
`
`42
`
`SPINLOCK
`
`Google Exhibit 1030
`Google v. VirtaMove
`
`Google Exhibit 1030
`Google v. VirtaMove
`
`
`
`Patent Application Publication
`
`Feb. 27, 2003 Sheet 1 of 13
`
`US 2003/0041173 Al
`
`Le
`
`WOOT
`
`AYOWSAN
`ONVAYOWSWN
`
`YATIONLNOD
`
`Ol
`
`£aunbjJ
`p2oTNss)
`Launn|ev
`AYOWSAN
`WEOTD
`
`SY
`
`
`
`
`
`
`Patent Application Publication
`
`Feb. 27,2003 Sheet 2 of 13
`
`US 2003/0041173 Al
`
`SEGMENT HEADER
`
`SPIN LOCK
`
`GLOBAL MEMORY SEGMENT
`
`RECORD
`
`GLOBAL DATA
`
`Figure 2
`
`
`
`Patent Application Publication
`
`Feb. 27, 2003 Sheet 3 of 13
`
`US 2003/0041173 Al
`
`
`
`LOCAL MEMORY [THREAD|
`
`NODE 20
`
`THREAD
`
`70
`
`SYNC-SFTW
`
`NODE 10
`LOCAL MEMORY
`
`SYNCRHRONIZATION
`SOFTWARE(FIGS.7-17
`
`; LOCALDATA
`
`RECORD
`
`LOCAL OS
`
`MUTEX
`
`64
`
`68
`
`66 72
`
`Figure 3
`
`
`
`US 2003/0041173 Al
`
`
`
`
`||
`|
`|
`ve
`
`|
`
`|
`
`|
`|
`
`43
`
`|
`|
`|
`||
`||
`:
`|
`|
`|
`|
`|
`|
`|
`
`l
`
`| |
`
`
`
`410-7
`
`GLOBAL
`
`WAST QUEUE
`
`
`
`NOTIFIED
`
`
`
`
`
`
`
`
`1|NODEW NODER NODE a |
`
`
`|
`3
`_
`1
`2
`
`
`||THREADS THREAD THREADS |
`
`
`
`WAITING
`WAITING
`WAITING
`|
`
`
`
`418
`414 |
`|
`Pee 3
`Figure 4
`
`Patent Application Publication
`
`Feb. 27,2003 Sheet 4 of 13
`
`
`GLOBAL MVTEX DATA RECORD
`
`GLOBAL
`403
`OBJECTID
`
`
`
`404
`
`GLOBAL
`STATE
`
`
`
`
`
`
`OWNER
`406
`NODEID
`
`408
`REFERENCES
`
`NODE
`
`:
`|
`|
`!
`
`|
`|
`|
`|
`|
`|
`!
`|
`|
`|
`|
`|
`|
`:
`|
`|
`
`rc
`
`
`
`Patent Application Publication
`
`Feb. 27,2003 Sheet 5 of 13
`
`US 2003/0041173 Al
`
`502
`
`500
`
`GLOBAL
`MEMORY ADDRESS
`
`WAIT COUNT
`
`LOCAL DATA RECORD
`
`GLOBAL
`OBJECTID
`
`504
`
`LOCAL
`REF. COUNT
`
`LOCAL THREAD
`
`Figure 5
`
`
`
`Patent Application Publication
`
`Feb. 27, 2003 Sheet 6 of 13
`
`US 2003/0041173 Al
`
`ei ee Ce C-,
`
`(p14)
`
`WaEO0T9
`
`
`
`ANANDLIVM
`
`Wwa0T9
`
`X3.LNW
`
`ssa00v
`
`(ELOld)
`
`9anol
`
`(2'9td)
`
`XB.LNW
`
`(oL’D14)
`
`f~~"536ANOWSNTvEOTO|
`
`
`
`SSS00V||avao1o||,XS.LNW
`qwao19i!
`
` oo¢hZLOL1(eros)iIINI
`Nao||
`XSLNWWEOTD
`qYuoosYWLvd
`
`Zbbb|“LNIWW8O19JTTTTT7777SEays07)|3LVYINAO
`
`MAHLONY3Sva1au
`
`INSATSISOS
`
`
`(bt“O14)SGON
`XGLNNAYOWAN1¥9077VaLvauO|\99)OFSCON|
`VvasvaTayquooay
`
`
`
`GVaYHLGNadSNSNn
`
`XSLAWW90799)|89
`09olak
`OOLL anrenee|
`LN3A4W907
`
`3SVI13yXALNW
`
`“LNITVWEO1D
`VivdWo0)]
`
`cl
`||||||||||||||||||||||||
`
`bbSls
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Patent Application Publication
`
`Feb. 27,2003 Sheet 7 of 13
`
`US 2003/0041173 Al
`
`CREATE A MUTEX
`
`
`
`702——~
`
`CREATE A NEW GLOBAL
`OBJECT1.D.
`
`
`
`704
`
`ALLOCATE AND INITIALIZE
`A NEW GLOBAL RECORD
`
`706
`
`
`
`
`
`ON THE NODE WHERE THE CALLING
`
`
`THREAD IS RUNNING, CREATE A LOCAL
`MUTEX 66 AND ALOCAL EVENT 70 (BY
`CALLS TO THE LOCAL OPERATING
`
`SYSTEM 62)
`
`
`
`
`508
`
`ALLOCATE AND INITIALIZE A
`NEW LOCAL DATA RECORD,
`AND ASSOCIATE IT WITH
`THE NEW GLOBAL RECORD.
`
`RETURN
`
`Figure 7
`
`
`
`Patent Application Publication
`
`Feb. 27,2003 Sheet 8 of 13
`
`US 2003/0041173 Al
`
`OPEN AMUTEX
`
`IF THE CALLING THREADIS THE FIRST TO
`USE THE GLOBAL MUTEX ON IT'S NODE,
`CARRY OUT STEPS 706 AND 708IN FIG. 7
`TO CREATE ON THE NODE ALOCAL
`MUTEX AND EVENT
`
`INCREMENT THE LOCAL REFERENCE
`COUNT510 (FIG.5) IN THE LOCAL DATA
`
`800
`
`NY
`
`802
`
`804
`
`RECORD 500 OR 502 FOR THE NODE
`
`RETURN
`
`Figure 8
`
`
`
`Patent Application Publication
`
`Feb. 27, 2003 Sheet 9 of 13
`
`US 2003/0041173 Al
`
`900
`
`916
`
`902
`
`
`
`ACQUIRE A MUTEX
`
`IS THE MUTEX
`AVAILABLE ?
`
`IS THE MUTEX IN
`TRANSITION 7?
`
`
` 918
`
`
`
` TAKE
`
`
`
`OWNERSHIP BY
`
`CHANGING THE
`YES
`
`
`GLOBAL STATE
`
`404 TO "OWNED"
`AND PLACING
`
`
`DECREMENT
`THE ACQUIRING
`
`
`
`THREAD'S NODE
`THE
`
`
`
`ID LOCATION 406
`SUSPENDED
`
`
`
`IS THIS THREAD ONE THREAD COUNT!|OF THE GLOBAL
`
`
`
`
`
`512 FOR THE
`MUTEX DATA
`
`
`THAT IS COMING OUT
`NODE FROM
`RECORD400
`
`
`
`
`OF SUSPENSION ?
`THE QUEUE 410
`
`
`NO
`
`
`
`
`MAKE SURE THAT THE NODE
`1.0. OF THE CALLING
`
`
`THREAD'S NODE IS
`
`
`ENTEREDINTO THE GLOBAL
`
`
`WAIT QUEUE 41
`
`
`
`INCREMENT THE
`
`LOCAL THREAD WAIT
`COUNT512
`
`914
`
`
`
`
`HAVE THE LOCAL0.P.
`
`
`THREAD EXECUTION
`
`62 CAUSE THE THREAD
`
`SUSPENDED AWAITING
`TO WAIT ON BOTH THE
`
`
`
`
`LOCAL EVENT AND
`LOCAL MUTEX
`LOCAL MUTEX 66 AND
`
`EVENT 70
`
`
`
`CALL THE LOCALO.S.
`62 TO TAKE
`OWNERSHIP OF THE
`LOCAL MUTEX 66
`
`920
`
`908
`
`CONTINUE THREAD
`EXECUTION
`
`
`
`
`
`RESUME
`
`Figure 9
`
`
`
`Patent Application Publication
`
`Feb. 27,2003 Sheet 10 of 13.
`
`US 2003/0041173 Al
`
`1000
`
`RELEASE A MUTEX
`
`ZERO THE NODEID 406
`WITHIN THE GLOBAL MUTEX
`
`1002
`
`
`
`NO IS THE GLOBAL WAIT|YES
`
`DATA RECORD 403
`QUEUE EMPTY ?
`MUTEX IS AVAILABLE
`ACTIVATED
`
`1004
`
`SET THE GLOBAL STATE
`404 TO INDICATE THE
`MUTEX IS IN TRANSITION
`
`1005
`
`4006
`
`SET THE GLOBAL STATE
`404 TO INDICATE THE
`
`1010
`
`1008
`
`DETERMINE WHICH
`NODEIS TO BE
`
`SET THE NODE'S BIT IN
`THE BIT TABLE THAT
`IDENTIFIES THE NODES
`NOTIFIED 412.
`
`1012
`
`
`
`
`GENERATE A GLOBAL
`
`
`INTERRUPTION TO CAUSE THE
`OPERATING SYSTEM AT THE
`NOTIFIED NODE TO SET THE
`
`TRIGGER FOR THE LOCAL EVENT
`70, ETC. (FIG.11)
`
`Figure 70
`
`
`
`Patent Application Publication
`
`Feb. 27,2003 Sheet 11 of 13.
`
`US 2003/0041173 Al
`
`1100
`
`MUTEX RELEASE GLOBAL INTERRUPT
`
`
`
`
`
`THREAD HAS
`YES
`INTERRUPT
`ACQUIRED LOCAL
`RETURN
`MUTEX ?
`
`
`
`
`TRIGGER AND THEN RELEASE THE LOCAL
`EVENT 70. DETERMINE WHETHER A THREAD
`
`ON THE LOCAL NODE 10 HAS ACQUIRED THE
`LOCAL MUTEX.
`
`
`1106
`
`
`RELEASE ANOTHER
`NODE(FIG. 14)
`
`
`Figure 17
`
`
`
`Patent Application Publication
`
`Feb. 27,2003 Sheet 12 of 13.
`
`US 2003/0041173 Al
`
`1200
`
`CLOSE A MUTEX
`
`
`
`
`IF THIS IS NOT THE LAST THREAD ON THE
`NODE THAT HAS OPENED THIS MUTEX,
`THEN DECREMENT THE LOCAL
`REFERENCE COUNT510.
`
`
`
`
`OTHERWISE DEALLOCATE THE LOCAL
`DATA RECORD 500 AND HAVE THE LOCAL
`
`
`OPERATING SYSTEM 62 CLOSE THE LOCAL
`
`MUTEX 66 AND THE LOCAL EVENT 70
`
`1204
`
`Figure 12
`
`
`
`Patent Application Publication
`
`Feb. 27,2003 Sheet 13 of 13.
`
`US 2003/0041173 Al
`
`1300
`
`N\
`
`GLOBAL MUTEXACCESS
`
`1302
`
`1304
`
`1306
`
`1308
`
`FIND GLOBAL
`MEMORY SEGMENT
`
`ACQUIRE HEADER
`WITH SPIN LOCK
`
`RELEASE THE SPIN LOCK
`
`ACCESS/MODIFY THE
`GLOBAL DATA RECORD
`
`Figure 13
`
`
`
`US 2003/0041173 Al
`
`Feb. 27, 2003
`
`SYNCHRONIZATION OBJECTS FOR
`MULTI-COMPUTER SYSTEMS
`
`BACKGROUND OF THE INVENTION
`
`[0001] The present invention relates generally to the shar-
`ing of resources by multitasking computer systems, and
`more particularly to arrangements for controlling access to
`computing resources that should only be used by onetask at
`a time in a multi-computer environment.
`
`they
`[0002] When computers first came into existence,
`were operated using single instructions that were executed
`one instruction at a time. As computers became more
`powerful, they grew more efficient and eventually were able
`to do many things at once. Today’s computers have the
`ability to perform multitasking. Multitasking is the ability to
`execute more than one task at the same time. A “process”is
`a program that
`is being executed plus the bookkeeping
`information that is used by the operating system to control
`that process. A “task” is also a process, but a “task” may be
`several processes. Whenever a program is executed,
`the
`operating system creates a new task or process for the
`program. Thetask or processis analogous to an envelope for
`the program. It identifies the program with a task or process
`number, and it attaches other bookkeeping information to
`the program.
`
`[0003] Originally, and for a number of years, every com-
`puter contained only one processor or CPU,and there was
`only one way to deliver a set of different
`tasks to the
`processor of the computer—onetask at a time. First task 1
`is processed, then task 2 is processed, and so on. Work on
`task 2 can begin before task 1 is completed, but only by
`stopping the work on task 1 whenever work on task 2 is
`being done, and vice versa.
`
`[0004] Now computers have become more sophisticated,
`and multiple processors are taking the place of single
`processors. On such a multiple processor computer, called a
`“multiprocessor system”(or just “multiprocessor”), any task
`can be assigned to any one of the processors, and work can
`nowactually be done simultaneously upon multiple tasks.
`Since more tasks can be completed in less time this way, a
`multiprocessor system delivers better performance than does
`a computer having only one processor.
`
`[0005] A task or an individual computer program can
`sometimes be viewed as a collection of “subtasks.” If these
`
`subtasks can be organized so that a multiprocessor system
`can execute someof them at the same time without changing
`the results computed by the task or program, then the overall
`task or program can be completed in less time, even though
`the time required to complete each subtask may not have
`changed. Thus, multiprocessor systems enable some indi-
`vidual computer tasks and programsto run faster. Construct-
`ing a task or program asa collection of subtasks that can be
`processed simultaneously is called “parallel programming.”
`Running a task or program as separate subtasks that are
`actually processed simultaneously is called “parallel pro-
`cessing.”
`
`[0006] Originally, parallel programming and parallel pro-
`cessing required that the subtasks of a program or task
`actually be tasks that can run as entirely separate, indepen-
`dent processes. More recently, computer technology has
`been developed that allows tasks, processes, or programsto
`
`be divided into distinct subtasks or subprocesses or subpro-
`grams, processing units that may be called “threads.” Each
`“thread” is a subtask or subprocess that can be delivered
`independently to a different processor. Computer programs
`organized as multiple threads are called “multithreaded
`programs.” Although there is a significant technical differ-
`ence betweentasks or processes on the one hand andthreads
`on the other, the difference is not an important one in the
`context of the invention described below. No formal dis-
`
`tinction will be made between a task or process on the one
`hand and a subtask or thread on the other hand. All such
`entities will be referred to as “threads” in the discussion
`which follows.
`
`“Multi-computer systems” provide an extension
`[0007]
`beyond multiprocessor systems as to how multiple proces-
`sors can be organized for use by multi-threaded tasks. A
`“multi-computer system”(or just mulli-computer) is a group
`of computers, each running its own copy of the operating
`system, that work together to achieve a particular goal. That
`goal is to present their collective computing resources, so
`that they appear to belong as muchaspossible to a single
`operating system running on a single computer, both to
`programs that use the computer’s resources, and also to
`humanbeings that make use of the multi-computer system
`in some way. Typically, there are also hardware resources
`(memory, for example), which are shared and are directly
`accessible by all the computers in the multi-computer sys-
`tem. Just as multiprocessor systems can deliver better per-
`formance than single processor systems, multi-computer
`systems can often deliver better performance than multipro-
`cessor systems. However, constructing programs that run
`well on a multi-computer system can be especially difficult
`unless the multi-computer system itself does a very good job
`of presenting itself to programs as if it were a single
`computer. Mostof the time, this means the multi-computer
`system must hide the fact that there are actually multiple
`operating systems running on the separate computers which
`make up the multi-computer system.
`
`[0008] A multi-threaded task operates in a way similar to
`the way in which a small companyoperates. As an example,
`consider a small company with three departments: manu-
`facturing, sales, and accounting. For the company to run
`efficiently,
`the tasks of each department need to be per-
`formed concurrently. Typically, manufacturing operations
`are not shut down until the items in a previously manufac-
`tured batch have all been sold. Thus, manufacturing and
`sales proceed at the same time. Although invoices cannot be
`prepared for items not yet sold,
`they can and should be
`prepared and processed for previouslysold items even while
`newsales are being negotiated and while a newbatch of
`items is being manufactured. Although the three tasks have
`interdependencies requiring them to coordinate their activi-
`ties, none can be shut down completely while one of the
`other tasks is executed from beginning to end.
`
`[0009] Many software tasks operate under the same con-
`ditions as this company example. They have multiple tasks
`or subtasks that can be executed at the same time as separate
`threads or sets of threads. However, these tasks or subtasks
`also have interdependencies that require coordination: por-
`tionsof onc task that cannot proceed until portions of one or
`more other tasks have been completed. Programminga set
`of such tasks so their work can be properly coordinated
`while they all run simultancously is called “synchroniza-
`
`
`
`US 2003/0041173 Al
`
`Feb. 27, 2003
`
`tion.” Specific programming constructs are used to imple-
`ment synchronization. These are called “synchronization
`objects.”
`
`[0010] A very simple case requiring coordination occurs
`whenseveral tasks need to share a single resource, but the
`resource is such that it can only be used by onetask at a time.
`Avery small business, for example, may have only a single
`phone line that needs to be used for different purposes at
`different times by the two or three people who run the
`business.
`
`in multithreaded computer programs,
`{0011] Likewise,
`frequently need to share computing
`multiple threads
`resources such as data, files, communication channels,etc.
`that can only be used by one thread at a time. To control this
`resource sharing, “synchronization objects” are required that
`allow each thread to take a turn accessing a given resource
`and to prevent other threads from accessing the resource
`while one thread takes its turn.
`
`[0012] Mechanisms that satisfy this property in some
`manner are called “locks.” A particular type of lock often
`used is called a “mutex”, which is a nickname for the words
`“mutual exclusion.” Typically, an operating system, working
`in conjunction with certain hardware features of a processor,
`provides mutex functions that allow threads to acquire,
`release, and wait for mutexes. Once a thread has acquired a
`mutex,other threads cannot acquire the same mutex until the
`first thread releases it. A given mutex is normally associated
`with a particular computing resource, perhaps a specific
`record in a data file. By programming convention, no thread
`is allowed to access the given specific record unless it has
`first “acquired” the associated mutex. In this manner, mul-
`tiple threads can access the given specific record, and each
`thread excludes the other threads from access while it takes
`its turn.
`
`[0013] The present invention is directed towards achiev-
`ing a mutex that is operative in a multi-computer environ-
`ment where each separate computer has its own separate
`copy of the operating system.
`
`ating system code for thread synchronization. Secondly,this
`approach requires access to, and the legal right to modify,
`the operating system source code. Thirdly, because the base
`operating system’s code would have to be modified, the new
`replacement code would have to be thoroughly tested in all
`of the numerous environments that utilize the operating
`system, including multi-and single-processor system envi-
`ronments that gain no benefit from this new code. Changes
`implemented solely to support multi-computer systems thus
`must be tested extensively in non-multi-computer environ-
`ments. Typically, for modern operating systems,this testing
`effort creates a very substantial amount of work that
`is
`difficult to cost justify.
`
`BRIEF SUMMARY OF THE INVENTION
`
`[0016] The present invention provides an effective method
`for extending operating system mutex functionality across
`multiple copies of an operating system where each computer
`is running a separate copy of the operating system butall are
`working together as a multi-computer system. Mutexes
`supported by the present invention are thus usable by any
`thread running on any computer within the multi-computer
`system, but the mutexes present themselves through pro-
`gramming interfaces to the threads just as though each
`mutex was supported only by a single instance of the
`operating system running on a single computer.
`
`invention is a
`the present
`{0017] Briefly summarized,
`multi-computer system having provision for global synchro-
`nization objects which comprises a plurality of multi-pro-
`cessor nodes each having provision for local memory,
`threads, and an operating system having the ability to
`manage local
`synchronization objects, global memory
`accessible to the processors on all the nodes and having at
`least one spinlock; a data structure in memory accessible by
`all the processors wherein one or more records for global
`synchronization objects may be established, said data struc-
`ture including provision for recording in a queuethe identity
`of nodes having threads awaiting access to the synchroni-
`zation object; and a synchronization software system of
`programs established in all the nodes which, at the request
`of a thread running on a node, can create, open, request,
`release, and close a global synchronization object, using the
`above spinlock and data structure and queue of node iden-
`tities to resolve requests for the synchronization object as
`between threads residing on different nodes, and using local
`synchronization objects created by the local operating sys-
`tems on nodes having threads awaiting access to resolve
`requests for the synchronization object between threads
`residing on the same node.
`
`[0014] One wayin which one might create synchroniza-
`tion objects for multi-computer systems and cause these
`synchronization objects to have essentially the same func-
`tionality and the same programming interfaces as do syn-
`chronization objects within a multiprocessing environment
`(which employs only a single copy of an operating system)
`would be to rewrite completely the operating system code
`that manages thread synchronization. New code would be
`addedto the operating system that determines when a mutex
`function is called and whether cach call refers to a local
`[0018] The queue in which is recorded the identity of the
`mutex (accessible only by threads running onasingle local
`nodes having threads awaiting access to the global synchro-
`computer) or to a global mutex (accessible by threads
`nization object may be organized as a FIFO arrangement of
`running on any computer within a multi-computer system).
`the node identifiers ordered in the same order in which
`New code would also be inserted into the operating system
`to support function calls that refer to the global mutex. In
`addition,the different running copies of the operating system
`would need to be modified so that they communicate with
`and knowabout each other and to make sure that threads
`from all the computers receive a chance to acquire a global
`mutex, while also enforcing the required mutex rules of
`sharing for all threads on all platforms.
`
`requests for the global synchronization object are received
`from the threads. And the node identifiers may be moved
`from the front to the back of the queue each time the threads
`on the correspondingly identified node are given an oppor-
`tunity to gain ownership of the local and global synchroni-
`zation objects. Additionally, counts may be maintained for
`cach node of the number of threads awaiting a synchroni-
`zation object, and those counts may be decremented when a
`thread on the corresponding node is granted the synchroni-
`zation object, and the reference to the name of the corre-
`
`[0015] This approachhas several disadvantages. First, this
`approach docs not leverage the valuc of the cxisting oper-
`
`
`
`US 2003/0041173 Al
`
`Feb. 27, 2003
`
`sponding node in the data structure may be removed when
`the count reaches zero. The global synchronization objects
`may be semaphores or mutexes.
`
`invention may also be found in a
`[0019] The present
`method for granting threads running on various multi-
`processor nodes within a multi-computer system ownership
`of a global synchronization object comprising the steps of
`maintaining a record of the state of the global synchroniza-
`tion object as free, owned, or in transition; when a thread
`seeks ownership of the global synchronization object, grant-
`ing the thread, through a spinlock mechanism,access to the
`status of the global synchronization object, and granting the
`thread ownershipif the object is free; but if the object is not
`free (ownedorin transition), adding the thread’s node to a
`queue of nodes having threads awaiting ownership of the
`global synchronization object and permitting the thread to
`seek ownership of a local synchronization object established
`on the thread’s node by a local operating system, but
`temporarily blocking threads on the thread’s node from
`seeking ownership of the local synchronization object and
`forcing them into suspension; and when the global synchro-
`nization object ownershipis released bya thread, placing the
`global synchronization object into its transition state, and
`then arranging for each node in the queue, in turn, to stop
`blocking threads on its node from seeking ownership of the
`local synchronization object, and permitting any thread that
`then gains ownership ofits local synchronization object to
`resume execution and ta gain ownership of the global
`synchronization object if the object is not owned(free or in
`transition), this process continuing until the global synchro-
`nization object is owned or until no more threads seek its
`ownership, at which point the global synchronization object
`enters its free state. Again, the synchronization objects may
`be semaphores or mutexes.
`
`FIG.6 is a diagram illustrating the memory con-
`[0026]
`tents of the local memories and of one global memoryof a
`multi-computer system, indicating with arrows subroutine
`calls and data accesses, and illustrating the use of the
`synchronization software (shown in the followingfigures) to
`create and manage a global mutex.
`
`[0027] FIG. 7 is a flow diagram of the program that
`creates a new global mutex.
`
`[0028] FIG. 8 is a flow diagram of the program that
`permits a thread to open and utilize a global mutex.
`
`[0029] FIG. 9 is a flow diagram of the program that
`permits a thread to wait for and acquire a global mutex.
`
`[0030] FIG. 10 is a flow diagram of a program that
`permits a thread to release a global mutex.
`
`FIG.11 is a flow diagram of a program, launched
`{0031]
`by a global
`interrupt directed to a particular node
`that
`attempts to grant a suspended thread access to an available
`global mutex.
`
`[0032] FIG. 12 is a flow diagram of a programthat
`permits a thread to close and stop using a global mutex.
`
`[0033] FIG. 13 is a flow diagram of a program that
`controls access to global mutex data records using a spinlock
`to coordinate access by multiple threads.
`
`[0034] FIG. 14 is a flow diagram ofa routine, called by
`the global interrupt program (FIG. 11) whena thread is not
`immediately unsuspended,
`that triggers the release of a
`thread on another node to access an available global mutex.
`
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENTS
`
`
`
`[0035] A. Introduction
`[0020] Andfinally, the invention may be foundinaset of
`
`synchronization software computer programs designed for
`[0036] Before describing the invention, a brief explanation
`use in conjunction with a multi-computer system where
`of the way in which mutexes work on multi-processors will
`individual nodes have their own copies of an operating
`be helpful to provide a reference context for the description
`which follows.
`system with local node synchronization software included in
`the operating system,
`the synchronization software com-
`puter programs being capable of carrying outthe steps listed
`above.
`
`BRIEF DESCRIPTION OF THE SEVERAL
`VIEWS OF THE DRAWINGS
`
`{0021] FIG. 1 is a logical diagram of two or more multi-
`processor computer systems or “nodes” connected in paral-
`lel to form a multi-computer system, with each node having
`both local and global memory, the multi-computer having at
`least one global mutex.
`
`[0022] FIG. 2 illustrates in part the content of a global
`memory segment of the multi-computer system shown in
`FIG.1.
`
`[0023] FIG. 3 illustrates in part the contents of local
`memory on two nodes of the multi-computer system shown
`in FIG. 1.
`
`FIG.4 illustrates the data structure of a global data
`[0024]
`record that resides within the global memory shownin FIG.
`2 andthat is associated with a global mutex.
`
`[0025] FIG. 5 illustrates the data structures ofa local data
`record that resides within the local memory of a node and
`that is associated with a global mutex.
`
`[0037] Suppose three threads have requested ownership of
`a given mutex that is already owned by someother thread,
`and supposeeachthread is willing to stop processing further
`instructions until it acquires the mutex. Suppose, in addition,
`that
`the threads request ownership of the mutcx in the
`following chronological order.
`
`[0038] Thread Al.
`
`[0039] Thread B1.
`
`[0040] Thread A2.
`
`[0041] Normally, Al would be expected to gain ownership
`of the mutex when the current ownerreleases it. Later, Bl
`would be expected to gain ownership when A1 releases the
`mutex, and then A2 when BI releases it. This behavior
`would be the result of a First-In-First-Out or FIFO policy on
`the part of the operating system for managing the outstand-
`ing thread acquisition requests for a given mutex. An appro-
`priate experiment with a given operating system would
`typically demonstrate this behavior, but with exceptions
`under certain circumstances. For example, if B1 or A2 were
`running with a higher scheduling “priority” than A1, then the
`operating system would normally give ownership of the
`mutex to one of them rather than to Al when the current
`
`
`
`US 2003/0041173 Al
`
`Feb. 27, 2003
`
`ownerreleases the mutex. On the other hand, the operating
`system mightnot do this if Al has been waiting in the queue
`for a very long time. Otherwise, it would be possible that A1
`might never acquire ownership of the mutex, regardless of
`howlongit waits in the queue. Thus, operating systems are
`usually designed to give priority to some threads but to
`insure that every thread is eventually given ownership ofthe
`mutex.
`
`[0042] Althoughanygiven operating system will probably
`have a precise deterministic sct of rulcs defining how it
`departs from a pure FIFO policy for managing mutex wait
`queues,
`the details of such rules normally will not be
`exposed and will not be guaranteed to remain unchanged
`from one version of the operating system to the next.
`Without access to the source code of the operating system,
`it would be extremely difficult to know these rulesprecisely,
`and even more difficult to duplicate their effect in new code.
`Accordingly, applications cannot be programmed to depend
`heavily upon the precise workings of any givenset ofrules.
`Applications can only expect the following: Mutex acqui-
`sition requests will mostly be granted using FIFO queue
`ordering, with occasional variations to accountfor different
`thread scheduling priorities; but any thread willing to wait
`for a long time will eventually be guaranteed ownership of
`a mutex. The invention described in this disclosure supports
`this behavior for multi-computer extensions of the mutexes
`provided by a given operating system. It also provides an
`architecture that captures the variations in mutex behavior
`which are provided by different operating systems or by
`different versions of the same operating system, and it
`replicates such behavior as accurately as is feasible and
`logical with respect to global mutexes in a multi-computer
`environment.
`
`In the discussion which follows, each of the com-
`[0043]
`puters forming part of a multi-computer system will be
`referred to as a “node.”
`
`[0044] The following assumptions (set forth in the next
`three paragraphs)
`are made regarding synchronization
`objects (mutexes, semaphores, etc.) provided by the oper-
`ating system running upon each node of the multi-computer
`system:
`
`“Mutex objects” are supported, and they include
`[0045]
`“wait for acquisition” functionality. This means the follow-
`ing behavior can be specified to the operating system when
`a call is made by a thread to acquire a mutex: If the mutex
`is already owned,
`the calling thread is placed in a wait
`queue, and it is blocked from executing any further instruc-
`tions until the mutex is available and can be given to the
`thread as requested.
`
`“Event objects” for signaling to threads are also
`[0046]
`supported. An “event” has one of two possible states:
`signaled or non-signaled. Functions are provided to switch
`betweenthe two states. A “wait for an event” function for an
`eventis also provided with the following property: When the
`function is called referring to a given event, if the event is
`in the “non-signaled state,” the operating system will block
`the calling thread from executing anyfurther instructions
`until a call is made (by some other thread) that switches the
`event to the “signaled state.”
`
`mutexes. When this functionis called, the operating system
`blocksthe calling thread from executing any further instruc-
`tions until both the event has been signaled and the mutex
`can be givento the thread as requested. (The combination of
`functionality set forth in this and the preceding two para-
`graphs are not requirements of the invention. They simplify
`the following description, and they were available on the
`operating system where the prototype version of this inven-
`tion wasoriginally implemented.)
`
`[0048] Now, with reference to FIG. 1, a multi-computer
`system 1 is shown that includes two or more nodes 10 and
`20. Each nodeis a multiprocessor that contains two or more
`processors or CPUs 12, 14, 16, and 18. Each node also has
`a memoryand I/O controller 34 and 36, local memory 38
`and 40 accessible by a node’s CPUs only, and someprovi-
`sion for input and output or I/O 46 and 48 that provides
`access to hard disk drives (not shown),
`to internets and
`intranets (not shown), to printers (not shown), and to other
`external devices, computers, and locations. One or more
`data and command busses 50 and 52 interconnect
`the
`
`memoryand I/O controllers 34 and 36 suchthatall the nodes
`10 and 20 mayaccess the same shared global memory
`segments 42 and 44,
`
`[0049] The following assumptions are made regarding the
`resources that are shared by the nodes 10 and 20 of the
`multi-computer system 1 shown in FIG.1:
`
`[0050] There is at least some global memoryor its equiva-
`lent, such as at least one of the global memory segments 42
`and 44, that is accessible from any node 10 or 20 in the
`multi-computer system 1;
`
`[0051] There is at least one primitive global lock such as
`the global spinlocks 45 and 51 or their equivalent that are
`usable from any node 10 or 20 in the multi-computer system
`1 to lock the global memory segment 42 or 44 during, for
`example, the “read-modify-write” or “read, test, and modify
`if necessary” CPU hardware memory access commandsthat
`are used to implement a mutex, a semaphore, or someother
`synchronization object;
`
`[0052] There is a global interrupt mechanism orits equiva-
`lent, such as a task scheduler. From any node in the
`multi-computer system 1, a processorinterrupt or its equiva-
`lent can be generated and supplied to at least one processor
`on any other node or in the same node. For example, the
`nodes 10 and 20 are shownhaving the global interrupts 47
`and 49; and
`
`resources described in the above
`[0053] The global
`assumptions are supported b