`4,742,450
`Duvall et al.
`(45} Date of Patent:
`May3, 1988
`
`(11] Patent Number:
`
`[54] METHOD TO SHARE COPY ON WRITE
`SEGMENT FOR MAPPED FILES
`
`[75]
`
`Inventors:
`
`[73] Assignee:
`
`Keith E. Duvall, Georgetown;
`Anthony D. Hooten, Round Rock;
`Larry K. Loucks, Austin, all of Tex.
`International Business Machines
`Corporation, Armonk, N.Y.
`
`UNIX* file between two UNIX processes being run
`concurrently on two virtual machines in a page seg-
`mented virtual memory virtual machine type data pro-
`cessing system. A Shared Copy-On-Write (SCOW)
`commandis created for the UNIX type operating sys-
`tem which when executed in response to a system call
`from one processes causes the specified UNIX file to be
`mapped to a unique segmentof the virtual memory. A
`[21] Appl. No.: 819,455
`map node data structureis established for storing the ID
`of the unique segmentand for maintaining a count value
`of the number of user sharing the unique segment. A
`[22] Filed:
`Jan. 16, 1986
`system call to the SCOW commandbythe second pro-
`[51]
`Tint. Chat ooessseteeseererseecrteeeeess GOGE 12/08
`cess involving the same UNIX file checks the map node
`[52] USL C1. ooececcececcccsssetessecetteccesereessesseceneeee 364/200
`(58] Field of Search ... 364/200 MS File, 900 MS File—_data structuretoseeif thefile is currently mapped for
`[56]
`References Cited
`the SCOW mode. Sobeequent instructions inthe apple
`cation programs which
`are run concurrently on the
`U.S. PATENT DOCUMENTS
`virtual machines operate on the copyofthefile in the
`4,135,240
`1/1979) Ritchie uc...secseeenes 364/200
`unique segment so that any data that is changed,
`i.e.
`
`“+ 364/200
`4,435,752
`3/1984 Winkelman
`written by one process, is available to be read by the
`4,625,081 11/1986 Lotito et ab. we 379/196 X
`second process.
`Primary Examiner—Archie E. Williams, Jr.
`UNIX is a Trademark of AT&T.
`Assistant Examiner—Florin Munteanu
`Attorney, Agent, or Firm—Richard E. Cummins
`
`ABSTRACT
`[57]
`A method for facilitating the interchange of data in a
`
`APPLICATION
`x
`
`6 Claims, 9 Drawing Sheets
`
`101
`102
`<—__>
`
`PEN FILE
`PROCESS
`READ TNOOE
`
`
`READ/WRITE
`
`DIRECTORY
`INTO MAIN
`
`READ ONLY
`
`REFERENCE
`MEMORY,
`WRITE ONLY
`
`
`
`
`
`
`
`120. SCHMAT(2)[|104
`READ
`-
`
`
`READ
`108
`°
`OPERATION
`ns aE
`
`
`YES
`GET}
`109
`ARTTE
`
`CURRENTLY
`QPERAT
`SEGMENT
`
`
`JON
`APPED? 4 COUNT
`1DLOAD
`
`
`;
`REGISTER],
`Is FILE
`
`
`
`
`NO
`
`122
`[DO STANDARD] no/
`READ/WRITE
`OPERATION
`124
`
`FILE
`CURRENTLY
`MAPPED?
`CONVERT FILE
`OFFSET AND
`LENGTH TO
`SEG + OFFSET
`FOR READ/WRITE
`OR COPY ON
`WRITE SEGMENT
`oer SEG REG
`
`1
`
`\no
`
`PROCESS
`Ar
`@ 130,
`.
`
`SHMAT
`
`(LOGICAL PAGE TABLE ENTRY OFFSET}
`MEMORY MAP —
`Wage|
`PHYSICAL FILE)
`iC
`BLOCK
`(0-2056)| ADDRESS
`a
`
`No
`
`t
`VIRTUAL PAGE
`“NY
`
`12?[atcocate 132
`
`
`
`FILE
`CURRENTLY
`APPED? (1)
`
`131
`FILE
`(TERMINATE)E/mapeco cory
`ON WRITE?
`15 THIS A
`Noeln
`°
`
`134
`GET
`
`NEW BLOCK
`SEGMENT
`FOR PAGE
`
`|
`1D
`(IF_REQUIRED)
`
`DO MEMORY
`COPY INTO/OUT
`OF BUFFER
`PROCESS ‘A’
`
`133
`
`CREATE
`MEMORY
`SEGMENT
`
`YES
`
`(Che-{
`
`LOAD
`SEGMENT
`REGISTER
`15 CALE
`CURRENTLY
`APPED? (214
`MAP_PAGE
`RANGE [5]
`svc
`
`135
`136
`
`145
`
`Google Exhibit 1050
`Google v. VirtaMove
`
`Google Exhibit 1050
`Google v. VirtaMove
`
`
`
`US. Patent
`
`May 3, 1988
`
`Sheet 1 of 9
`
`4,742,450
`
`FIG.
`
`1
`
`—
`
`
`
`APPLICATIONS
`
`PpKERNELas
`
`
`
`APPLICATION DEVELOPMENT PRODUCTS
`
`13
`14
`
`19
`
`
`
`DISPLAY
`
`ia
`
`CAT
`COPROCESSOR
`
`OTHER I/0
`
`20
`
`
`
`
`
`US. Patent
`
`May3, 1988
`
`Sheet 2 of 9
`
`4,742,450
`
`APPLICATION PROGRAM
`— VMI
`
`1S
`
`TONDROGEaMIE
`
`VIRTUAL RESOURCE
`35
`!
`MANAGER
`
`
`__ SARDWARE.
`INTERFACE —————
`[PRocEssoR] [vewoRy] [170CCT
`le
`———
`
`iy
`
`10
`
`
`
`US. Patent
`
`May3, 1988
`
`Sheet 3 of 9
`
`4,742,450
`
`SALAG(heeSALA(Oh)2SLIGcl=GISSILA(ee)e
`
`AMOWAWWauAWNLYLAOooqe.AWALYIA
`
`
`oRTeocoo0|t443344°000qlIN3W94S|aa45d5450INqWogs
`
`
`OLdAlgo0000|SN9S¢j}od00000*0000GN9Se1Q0000000
`
`
`
`11g-OhWYHOI-9NO71t1d-2ewvyd0ud
`
`ONISSSYGGV.ONISSaNGaV
`INaW945|3445444°005444444555
`
`
`aWn9Se19000000"344OLcadvn“
`dW9S2_|Qo000000°20000000002
`
`INaqWwo3s|44s4543°444
`
`LNAWOAS444i4444°354
`gN9Se}o000000°434O/I
`
`0000000°£004GN9ISe100000004
`
`4444444°054SONA
`
`.Gaddvw
`~HOW444454
`
`GNISc|QO000000"1005 om
`
`{NAWDAS4455444°100
`
`€Sls
`
`
`
`INGwods|dsas3s5s541
`
`
`
`GN9S2190000001
`
`
`
`
`
`
`US. Patent—May3, 1988 Sheet4 of 9 4,742,450
`
`
`
`VIRTUAL
`ADDRESS
`
`' |
`
`|
`
`CHASH)
`
`HASH ANCHOR
`TABLE
`
`-
`_
`INVERTED PAGE TABLE
`VIRTUAL ADDRESS
`
`FIG. 4
`
`
`
`
`
`US. Patent—May 3, 1988 Sheet5 of 9 4,742,450
`
`
`
`SEGMENT
`TABLE
`
`PAGE O
`1|XPT FOR
`SEGMENT
`N
`
`XPT FOR
`SEGMENT
`M
`
`
`
`SOFTWARE|
`HASH
`Lo. —— —
`
`PAGE FRAME
`FOR SEGMENT N
`
`
`
`
`
`
`
`
`FIG. 5
`
`
`
`US. Patent
`
`May3, 1988
`
`Sheet6 of 9
`
`4,742,450
`
`FILE
`DISCRIPTOR
`
`TO 1
`
`OF 4096
`SEGMENTS
`PROCESS 'N’
`
`
`CONTEXT
`
`(ADDRESS SPACE)
`120
`
`a
`[| (OYNAMIC)
`PL Map Node
`[sec10-7"
`[wapcount}“?
`COPY ON
`WRITE SEG ID"
`SHARED
`COPY WRITE|75
`SEGMENT
`MAP COUNT
`ID TABLE
`
`
`
`
`
`
`SEGMENT
`REG|OFFSET
`
`
`
`
`SEG ID’s
`
`DATA
`
`
`ADDRESS
`
`
`
`VRM
`MAIN
`FAULT
`HANDLER KA MEMORY
`{VMM)
`MANAGER
`xet peo. 27
`
`SEG ‘X’ PAGE ‘N’
`
`MAIN
`MEMORY
`
`SEG ‘X' PAGE 3
`
`°AOEDRe
`SEG 'X’ PAGE 2
`ae[SESPASE] paceFARE)
`f PROTECT| fn ; PAGE| 83
`
`g!
`
`82
`
`ADDRESS
`
`FIG. 6
`
`DISK
`FILE
`
`|
`| DISK
`|
`PAGE
`| ADDRESS
`a|
`
`Z1
`
`
`
`US. Patent
`
`May3, 1988
`
`Sheet 7 of 9
`
`4,742,450
`
`
`
`(2)€Q3dd¥W
`
`
`
`4DVddVW
`
`
`
`(S)FONVY
`
`JAS
`
`ATLINAUNS
`
`dil4SI
`SAA
`ONA
`
`601LA9
`
`LINAW9AS
`
`ai
`
`qvo?}
`
`LNANDAS
`
`Y31S194uY
`
`LNNOD
`
`Avaya
`
`INSNDIS
`
`801
`
`INaWaYINT]«SSA\andSt
`
`
`ERENEEE:AUNZUUND
`
`
`col
`
`hOL
`
`AYOLI3YHIG
`J3ON35Y3434
`NIVWOLNI
`SLIYM/OV3Y
`AINOavau
`
`AXONAW
`
`ANNO3iL1YM
`
`aviy
`
`
`
`NOILVdsdO0
`
`ALTIUM
`
`NOILVY3dO
`
`O¢l
`
`tcl
`
`(ot
`
`F114N3dOVi
`
`$S330ud
`
`
`
`NOTLVIIIddv¥
`
`iN,
`
`
`
`
`
`
`
`
`
`
`
`US. Patent
`
`May3, 1988
`
`Sheet 8 of 9
`
`4,742,450
`
`<>iV,
`
`
`
`(L3S440AYINSS3VVL39VdWI1907)
`430144ovd®KT]WISH|ayaa
`dVWAXMOWSWSS490ud
`
`FOVdWALYIAJs43LTYM/avae
`
`ceiavonJV3dI|cey
`helZl
`octJaaistoauMENGES
`
`
`(t)20addVW\Cie|iNyONAINSNuND
`'LVWHS
`(Ss)JONveaz"IJ
`
`INSW93SAYOWSN
`
`
`e
`
`13S440+93SOlHLONST
`
`GNV1554350he|A3MW4aLYSANO3
`
`SLIYM/QV3YYOs
`
`NOAdOJYO
`
`
`
`INSND3SSSLIUM
`
`53Yu94S1359
`
`O/TOs
`
`ATLNAYYNI
`3LIMM/GV3Y
`
`Aalld
`
`ONfagavaNvLsSOG
`
`éQaddVW
`
`
`
`NOILVUad0
`
`ccl
`
`ON\VSIHLSI
`
`SLTYM
`
`
`
`NOILVdadd
`
`ALVIOTIW
`
`430168MIN
`
`J9VdYOS
`
`(Q3YINDAY41)
`
`9d|
`
`lel
`
`AYOWANOd
`
`ING/OLNIAdd]
`
`gd4ssnd40
`
`HILVdSTG-3a
`
`
`
`iV,SSIIOUd
`
`6c1
`
`139
`
`LNAWOAS
`
`ai
`
`
`LEL(2)t03ddvW
`ATUINFUUAD(3)
`yast/S3A
`
`Jai
`
`23L1YMNO
`
`AdQ]3GaddVW
`
`VvarisNIWYSL
`(SS330UdANY)
`NOTLVY3d0
`
`
`
`
`
`
`
`
`
`
`
`
`
`US. Patent—May3, 1988 Sheet 9 of 9 4,742,450
`
`
`
`1e1
`
`INODE ENTRY |.
`LBLOCK # %
`BLOCK # Y
`BLOCK # 2
`
`_MAP PAGE RANGE STRUCTURE
`SEGMENT
`MODE
`IDENTIFIER!
`1=WN, 2=R/W,
`"SID BS
`3=CW
`RESERVED
`FIRST PAGE
`BYTE
`|(SEGMENT OFFSET)
`NUMBER
`
`STARTING |
`BLOCK
`
`SC |
`
`NUMBER
`OF BLOCKS
`
`NUMBER
`
`OF BLOCKS
`
`(END OF FILE)
`
`183
`
`184
`
`MAP A FILE TO
`AN EXISTING
`SEGMENT
`CALCULATE OFFSET
`OF FIRST BLOCK
`IN SEGMENT
`FEALCULATE TOTAL
`NUMBER OF PAGES
`
`|
`
`°
`
`
`
`
`
`192
`
`XPT
`
`SID B, BLK Z
`
`SID BIBLK Y
`
`SID B! BLK x
`
`SID A
`SID A]
`SID Al
`
`FIG. 8
`FLOW CHART CF
`MAP PAGE
`RANGE SERVICE
`
`186
`139
`
`188
`
`189
`
`DETERMINE NEXT
`START BLOCK
`ADDRESS-PLACE
`VALUE IN MAP
`PAGE STRUCT
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`SUM THE CONTIGUOUS
`BLOCKS AND PLACE
`SUM IN MAP PAGE
`RANGE STRUCT
`FILE INODE
`
`STRUCTURE SCAN
`COMPLETE?
`NO
`ISSUE MAP PAGE
`RANGE SVC
`
`190
`
`PLACE BLOCK
`ADDRESS FOR EACH
`
`
`PAGE ENTRY INTO
`
`
`EXTERNAL PAGE
`TABLE (INVERTED)
`
`19]
`
`
`
`Ree”
`
`
`
`4,742,450
`
`1
`
`METHOD TO SHARE COPY ON WRITE
`SEGMENT FOR MAPPEDFILES
`
`TECHNICAL FIELD
`
`The invention relates in general to methods for con-
`trolling access to data stored in a virtual memoryof a
`multi-user information handling system which is being
`run under a UNIX* type operating system. The inven-
`tion relates, in particular, to a method which permits a
`user to gain access to a file stored in a virtual memory
`segmentin order to update it, even though another user
`has previously requested access to the same virtual
`Memory segment of the file and is in the process of
`currently updating
`that segment.
`*Unix is a trademark of A.T. & T.
`
`CROSS-REFERENCED APPLICATIONS
`
`U.S. application Ser. No. 06/819,458 filed concur-
`rently herewith in the name of Duvall et al, entitled
`“Method to Control I/O Access in a Multi-Tasking,
`Virtual Memory, Virtual Machine Type Data Process-
`ing System”is directed to a method for use in a multi-
`user paged segmented virtual memory data processing
`system in which a mapped file data structure is selec-
`tively created to permit all I/O operations to the sec-
`ondary storage devices to be executed by simple load
`andstore instructions under the controlof the page fault
`handler.
`
`BACKGROUND ART
`
`The prior art discloses various multi-user virtual
`memory information handling systems. In general, a
`virtual memory system implies a system having a main
`memorythatis relatively fast, but somewhatlimited in
`capacity, because ofits cost, and a backing store device
`whichis relatively slow, but is rather large, since the
`cost of storageperbit is relatively inexpensive. Implicit
`also in a virtual memory system is a paging system
`which functions to control the transfer of data between
`the main memory and the backing store. In practice, the
`main memory is generally a semiconductor memory
`array, while the backing store is generally one or more
`disk drives or files, some of which may even allow the
`media to be replaced by an operator.
`The main memoryhasits own arrangementfor defin-
`ing real address storage locations, as does the disk stor-
`age subsystem. The system, therefore, employs a virtual
`address when requesting data from storage. The Virtual
`Memory Manager (VMM) has the responsibility to
`check that the data at the virtual address is in main
`memory andif not, to transfer the data to main memory
`from the backing store. The specific manner in which
`the Virtual Memory Manager accomplishes the transfer
`varies significantly among the prior art systems, primar-
`ily because of the inherent characteristics of the specific
`hardware, including the conventions adopted for defin-
`ing real addresses of the storage devices and also be-
`cause ofthe differences in the operating systems under
`which the hardwareis being run.
`The motivation for creating a virtual memory type
`system is based primarily on the realization that the cost
`of providing real memory for the system ofa size that
`would support either one complex program, or a num-
`ber of smaller programs which could be run concur-
`rently by one or more users,
`is prohibitive. Further,
`since generally there is no real reason for having the
`entire program resident in main memory, it would be
`
`15
`
`20
`
`25
`
`30
`
`40
`
`45
`
`55
`
`65
`
`2
`more cost effective to store the program data on less
`expensive disk file backing stores and “page” portions
`of the data and program into main memory,as required.
`The paging process, when conducted by the Virtual
`Memory Manager, does not significantly impact the
`overall system performance, since the main processor
`can switch to another task or process which has previ-
`ously been paged into main memory.
`Theprior art virtual memory systems employ various
`operating systems since an operating system is generally
`designed to take advantage of the architecture of the
`processing unit and a particular application or environ-
`ment. Someoperating systems, such as PC DOS,for the
`family of IBM Personal Computers (PCs) and compati-
`bles, is designed primarily for a single user environment.
`On the other hand, the UNIX operating system is de-
`signed primarily for a multi-user environment. The use
`of the UNIX operation system has, for a number of
`technical and non-technical reasons, been somewhat
`restricted to particular systems. As a result, the number
`of application programs that are run under a UNIX
`operating system have, until recently, been also rather
`limited. Multi-user UNIX systems employing virtual
`memory have even been morelimited.
`The manner in which UNIX implements System
`Calls, particularly to storage, is in many respects quite
`advantageous to system performance. In UNIX,
`the
`System Call
`is the interface between UNIX and an
`application program. A System Call by the application
`program requests the “kernel” portion of the UNIX
`operating system to perform one particular task or ser-
`vice on behalf of the operating system. The “kernel”
`portion of UNIX includes approximately 60 System
`Calls which are not changed between different hard-
`ware systems, and are the standard interface to UNIX.
`Other programs in UNIX adoptthe kernel to thepartic-
`ular hardware environment.
`UNIX has a unique file system for managing data
`stored on the systems’ external storage devices, e.g.,
`disk files. While UNIX allowsa file to be accessed by
`many different concurrent users, if the file is to be up-
`dated, additional System Calls are required in order to
`insure that the updating occurs in a serial fashion. These
`additional System Calls function to lock portions of the
`file temporarily, reserving that area for the exclusive
`use of the calling program that is to do the updating.
`This does require involvement by the “kernel” in the
`locking and unlocking tasks and, hence, has an adverse
`effect on overall system performance. The prior art
`non-virtual UNIX systems do, nevertheless, permit the
`concurrent use of the samefile by different users. The
`ability to share a portion of the samefile among various
`users is advantageous for interprogram or interprocess
`communication, in that once the portion of thefile is
`updated by one program,the data is immediately avail-
`able to all the other programs or processes that are
`sharing that segment. The term “process,” in UNIX
`terminology, means simply a program that it
`is cur-
`rently executing.
`function of a typical
`The memory management
`UNIX operating system is a part of the UNIX kernel
`and generally is unique for each different Central Pro-
`cessing Unit. Some processing units require the total
`program to be in memory before any portion of the
`program can be run. Other CPUscan begin execution of
`a program while only a small portion is in active mem-
`ory. The first memory management
`technique is re-
`
`
`
`3
`ferred to as “swapping,” in that different processes or
`programs are run for a given period oftime and then the
`entire program is “swapped” out for another program.
`The second techniqueis the Virtual Memory technique,
`which implies that provision must be made for the mem-
`ory managementfunction to handle page faults, so that
`defined portions or pages of the program can be
`brought into main memory as needed and returned to
`the back-up store when the pages are no longer re-
`quired.
`If the Virtual Memory Managementfunction is left
`with the kernel of the UNIX operating system, the page
`fault mechanism will consumea considerable portion of
`the CPU operating time. As a result, prior art virtual
`memory systems generally prefer to establish a Virtual
`Memory Management function as a separate level of
`programming on a device whose primary function is
`memory management. The page fault mechanism is
`then a part of the memory manager, and the CPUisfree
`from time-consuming tasks of controlling the paging
`operation.
`In the cross-referenced application Ser. No. 819,458,
`a virtual memory data processing system is disclosed in
`which virtual machines are established by a Virtual
`Resource Manager which provides each virtual ma-
`chine with a large virtual memory. In that system, to
`avoid the potential conflicts that arise in some virtual
`memory systems between the operating system’s re-
`quest for I/O disk storage operations and I/O disk stor-
`age operations controlled by the page fault handler, the
`responsibility for performing all 1/O disk storage opera-
`tions was assigned solely to the page fault handling
`mechanism. In addition, the normal UNIX interface to
`the application program by System Calls was supple-
`mented by a mapped page technique. This latter tech-
`nique permitted the application program to employ
`simple load and store type instructions to address mem-
`ory, rather than tie up the system processorin executing
`UNIX System Calls to the disk storage. Anyfile stored
`in a defined segment of virtual memory could be
`mapped ai
`the request of the application program
`which, in effect, established a table of virtual addresses
`and assigned disk block addresses for each page of data
`that was in the defined segment of virtual memory as-
`signed to that file. The table or map was stored in a
`separate “segment” of the virtual memory.
`The “kernel” of the UNIX operating system was
`enhanced to provide a new System Call designated
`“SHMAT_MAP.” The conventional UNIX operating
`system includes a variety of “SHMAT”System Cails,
`each with a slightly different function, such as (1) read
`only,
`(2) read/write,
`(3) copy—on—write, etc. The
`SHMAT_MAP command was also provided with the
`corresponding functions.
`Since the system described in the cross-referenced
`application was designed to operate with applications
`previously written for a conventional UNIX operating
`system, all UNIX System Calls had to be supported.
`The support is transparent to the user, in that any con-
`ventional UNIX System Call from an application pro-
`gram to the UNIX kernelis effectively intercepted by
`the Memory Manager, which then assigns the tasks to
`the page fault mechanism. Thus,
`in that system,
`the
`SHMAT_MAP command further specified whether
`the file was to be mapped, read/write (R/W), read only
`(RO), or copy_on—write (CW). The copy—on—write
`function in UNIX allowsa file in system memory to be
`changed. When the CW file is paged out of real mem-
`
`10
`
`-_ 5
`
`40
`
`45
`
`60
`
`65
`
`4,742,450
`
`4
`ory, it does not replace the permanentfile. A separate
`System Call is required for the copy—on—write file,
`whichis usually in a disk cache, to replace the perma-
`nent copy ofthe file in the secondary storage device.
`Two users who concurrently map a file read/write or
`read only share the same mapped segment. However,
`each user who requests
`to map the same file,
`copy—on_write, at the same time, create their own
`private copy_.on—write segment. The term segment
`implies a section of the virtual address space. Each user
`is permitted to have only one CW segment for a given
`file at one time. The system of the cross-referenced
`application, therefore, is fully compatible with the prior
`art UNIX approachfor shared files.
`This aspect of the commondesign, however, perpetu-
`ates the problem which exists with UNIX files, in that
`the sharing of a mapped file CW segment by multiple
`users is prohibited. The capability of multiple users
`sharing the same mapped file copy_on—write segment
`is highly desirable, and a method of achieving that func-
`tion in systems of the type described in the cross-
`referenced application is the subject of the present in-
`vention.
`
`SUMMARY OF THE INVENTION
`
`In accordance with the method ofthe present inven-
`tion, an additional System Call flag is created for the
`“SHMAT”type System Calls. When this flag is speci-
`fied by the user in combination with the System Call for
`a copy—on—write segment, a common copy_on__write
`segment is created for the mappedfile.
`Thefirst user to request the shared copy__on_—write
`segment for the file causes creation of a common
`mapped file copy—_on_write segment. The segment ID
`for this segment would then be saved in a data structure
`such as the inode data structure for the UNIX file, so
`that any future request for the shared copy_on_write
`segment
`for
`the mapped file causes the common
`copy—on_write segment to be used.
`Also saved in the inode structure is a reference
`counter, used to indicate how many users currently
`have access to the shared segment (CW). Each request
`for the shared copy_on—write segment for the file
`causes the counter to be incremented and each closing
`of the file descriptor by a user accessing the file refer-
`ence by the file descriptor via the copy_on_—write seg-
`ment causes the counter to be decremented. Every time
`the counter is decremented, a check is madeto see if the
`counter has become zero, and if so,
`the shared
`copy—on_write segment is destroyed so that a future
`request for a shared copy—on_write segment for the
`file causes a new shared copy_on_write segmentto be
`traded (and a new segmentID placed in the inode struc-
`ture for thefile).
`All existing mapped file features continue to be sup-
`ported, as described in the cross-referenced application;
`(1) whenevera file is mapped there exists a read/write
`segment for the mappedfile, so that read or write Sys-
`tem Calls reference the file by the mapped file read/-
`write segment; (2) the support of private copy—on_.
`write segments is maintained so that a user can still
`continue to request a private copy_on_write version of
`thefile.
`It is therefore an object of the present invention to
`provide an improved method for a number of data pro-
`cessing system users who are concurrently running
`separate UNIX processes in a page segment virtual
`
`
`
`4,742,450
`
`5
`memory environment to share a copy ofthefile in the
`same segment of virtual memory.
`A further object of the present inventionis to provide
`an improved methodforusers in a virtual memory data
`processing system running a UNIX type operating sys-
`tem to concurrently sharea file that has been designated
`copy—on__write by a SHMATtype UNIX System Call.
`A further object of the present invention is to provide
`anew method for permitting users of a UNIX operating
`system to concurrently share a file that has been opened
`by a shared copy—on—write UNIX System Call by
`employing the same mapped copy—on__write segment
`of the virtual memory.
`Objects and advantages other than those mentioned
`above will become apparent from the following de-
`scription, when read in connection with the drawing.
`BRIEF DESCRIPTION OF THE DRAWING
`
`6
`which is one of the main reasons for requiring a large
`virtual memory type storage system.
`FIG.2 illustrates the relationship of the Virtual Re-
`source Manager 25 to the other components of the
`system. As shownin FIG.2,a virtual machine includes
`one or more application programs such as 22a—22c and
`at least one operating system 30. A virtual machine
`interface 31 is established between the virtual machine
`and the VRM 25. A hardwareinterface 32 is also estab-
`lished between the VRM 25 and the hardware section
`10. The VRM 25 supports virtual memory. It can be
`assumed, for purposes of explanation, that the memory
`capabilities of the hardware shown in FIG.1 includes a
`24 bit address space for system memory 14, which
`equates to a capacity of 16 megabytes for memory 14,
`and a 40 bit address space for virtual memory, which
`equates to | terrabyte of memory. A paged segmenta-
`tion techniqueis implemented for the Memory Manage-
`ment Unit 13, so that the total virtual address space is
`FIG.1 is a schematic illustration of a virtual memory
`divided into 4,096 memory segments, with each mem-
`system in which the method of the present invention
`ory segment occupying 256 megabytes. FIG. 3 illus-
`may be advantageously employed.
`trates the virtual storage model. The processor 12 pro-
`FIG.2 illustrates the interrelationship of the Virtual
`vides a 32 bit effective address which is specified, for
`Resource Manager shownin FIG.1 to the data process-
`example, by the application program. The high order 4
`ing system and a virtual machine.
`bits of the 32 bit address functions to select
`1 of 16
`FIG,3 illustrates the virtual storage model for the
`segment registers which are located in the Memory
`system shownin FIG. 1.
`Management Unit (MMU) 13. Each segmentregister
`FIG. 4 illustrates conceptually, the address transla-
`contains a 12 bit segment ID section, along with other
`tion function of the system shown in FIG,1.
`special control-type bits. The 12 bit segment ID is con-
`FIG.5 illustrates the interrelationships of some of the
`catenated with the remaining 28 bits ofthe initial effec-
`data structures employed in the system of FIG.1.
`tive address to provide the 40 bit virtual address for the
`FIG.6 illustrates the interrelationship of a number of
`system. The 40 bit virtual addressis subsequently trans-
`data structures to the Virtual Resource Manager, the
`lated to a 24 bit real address, which is used to address
`virtual memory, and real memory.
`the system memory 14.
`FIGS. 7A and 7B showaflow chart, illustrating the
`The MMU 13 utilizes a Translation Look-aside
`operation of mapping a file copy—on—write.
`Buffer (TLB) to contain translations of the most re-
`FIG.8 is a flow chart,illustrating the steps involved
`cently used virtual addresses. Hardware is used to auto-
`in completing the data structures shown in FIG.6 by a
`matically update TLB entries from main storage page
`Map page rangeservice.
`tables as new virtual addresses are presented to the
`TLBsfor translation. FIG. 4 illustrates conceptually,
`DESCRIPTION OF THE PREFERRED
`the TLBreload function.
`EMBODIMENT
`The 40 bit virtual addresses are loaded into the TLB
`by looking them up in an Inverted Page Table (IPT), as
`shown in FIG. 4. The table is “inverted” because it
`contains one entry for each real memory page, rather
`than one pervirtual page. Thus, a fixed portion of real
`memory is required for the IPT, regardless of the num-
`ber of processes or virtual segments supported. To
`translate an address, a hashing function is applied to the
`virtual page number (high order part of the 40 bit vir-
`tual address, less the page offset) to obtain an index to
`the Hash Anchor Table (HAT). Each HAT entry
`points to a chain of IPT entries with the same hash
`value. A linear search of the hash chain yields the IPT
`entry and,
`thus, the real page number which corre-
`spondsto the original 40 bit virtual address. If no such
`entry is found,
`then the virtual page has not been
`mapped into the system, and a page fault interrupt is
`taken.
`The function of the Page Fault Handler (PFH)is to
`assign real memory to the referenced virtual page and
`to perform the necessary I/O to transfer the requested
`data into the real memory. The system is, thus, a de-
`mand paging type system.
`When real memory becomes full, the PFH is also
`responsible for selecting which page of data is paged
`out. The selection is done by a suitable algorithm such
`as a clock page replacementalgorithm, where pages are
`
`System Overview: FIG. 1 is a schematic illustration
`of a virtual memory system in which the method of the
`present invention is employed. As shown in FIG. L., the
`system comprises a hardware section 10 and a software
`or programming section 11. Hardware section 10, as
`shown, comprises a processor function 12, a memory
`management function 13, a system memory function or
`RAM 14,system bus 15, an Input/Output Channel Con-
`troller (IOCC) 16, and an Input/Output bus 21. The
`hardware section further includes a group of I/O de-
`vices attached to the I/O bus 21 through the IOCC 16,
`including a disk storage function 17, a display function
`18, a co-processor function 19, and block 20, represent-
`ing other I/O devices such as a keyboard or mouse-type
`device.
`The program section of the system includesthe appli-
`cation program 22 that is to be run on the system, a
`group of application development programs23, or tools
`to assist in developing new applications, an operating
`system kernel 24, which, for example, may be an exten-
`sion of the UNIX system V kernel, and a Virtual Re-
`source Manager program 25, which functions to permit
`a numberof virtual machines to be created, each of
`whichis running a different operating system, but shar-
`ing the system resources. The system may operate,
`therefore,
`in a multi-tasking, multi-user environment
`
`60
`
`65
`
`
`
`7
`replaced based on when the page was last used or refer-
`enced. Pages are transferred out to disk storage.
`The details of the other data structures employed by
`the system shown in FIGS. 1 and 2 are set forth in the
`cross-referenced application, particularly Ser. No.
`819,458, which is herein incorporated by reference.
`Similarly, the data structures which were unique to the
`map file service function of that application are also
`employedin the method of the present invention. Refer-
`ence should be made to FIG.6, specifically to the map
`node data structures 70 and 71. These two structures are
`described in detail in the cross-referenced application.
`The copy_on_write segment
`field 74 and the
`copy—on_write map countfield 75 are the two specific
`fields of the map node data structure employed in the
`method of the present invention to permit concurrent
`use of a copy—on_write segment.
`FIGS.7A and 7B showa flow chart, illustrating the
`operation of the mappingofthe file copy_on_write by
`an application. The application initiates a process that
`issues an SHMAT_COPY_ON_WRITEinstruction
`as indicated by block 100.
`Block 101 determinesif the file is currently mapped
`read/write, by checking the inode data structure. If the
`file is currently mapped, the process is terminated at
`block 102, since protocol does not permit a file to be
`both mapped copy_on_write and read/write.
`If the file is not currently mapped, block 103 tests to
`determine if the segment exists by checking the inode
`data structure. If the segment exists, the block 104 tests
`the map node data structure 70 to determine if a
`copy_on_write segment exists, block 105 then incre-
`ments the reference countfield 75 in map node 70 by 1
`and obtains the segment ID from the map nodein block
`106. Block 107 loads the segmentregister with the ob-
`tained ID and block 108 tests if the file is currently
`mapped. Block 109 represents the mapped page range
`service function which is called to map the file from
`block 108. If block 108 indicates the segment is mapped
`copy—on_write, the process ends at block 110. Ifblock
`103 indicates that the segment does not exist, block 111
`creates the segment by issuing a call to the create seg-
`mentservice of the system. Thetest in block 104 is then
`madeandif a copy_on_write segment does not exist, a
`call to the create copy_on—write segment in block 112
`is made. The count in the map nodefield 75 is incre-
`mented and the process flow continues, as previously
`described.
`Whenthe process issues a UNIX read system or load
`instruction in block 115, or a UNIX write System Call
`or a store instruction in block 116, the operation per-
`forms a basic memory reference process, as indicated in
`block 117. Block 118 tests the Inverted Page Table to
`determineif a page is in system memory.If not, block
`119 allocates a page frame in main memory. This re-
`quires an 1/O operation in block 120, which halts the
`process until the page frameis allocated. If block 119
`indicates the page is in memory, block 121 tests to see if
`a read (or load) operation is involved. If so, a requestis
`placed in the I/O queue by block 122.
`If a write or store operation is involved, block 123
`prepares the page and blocks 124 and 125 prepare the
`system to receive the copy_.on_write page in a paging
`space allocation on the disk file for copy—on—write
`pages. These operations require I/O to the disk file and,
`therefore, they are queued by block 122.
`FIG. 8 is a flow chart, illustrating the steps involved
`by page range service in completing the map node data
`
`20
`
`25
`
`40
`
`55
`
`60
`
`65
`
`4,742,450
`
`8
`structure 70 and the mapped file data structure 71,
`shownin FIG.6.
`After a segment has been created the file must be
`mapped into the segment. This is a dynamic operation,
`since the primary storage allocation is virtual, and the
`segmentassignmentis transient. Asillustrated in FIG. 8
`the inode structure 181 is read for the block address of
`each page to be allocated for the file. Each group of
`contiguously allocated blocks is summed, and the count
`recorded in the field adjacent to the starting block num-
`ber 2 entry in the map page rangestructure. Discontigu-
`ous blocks are reflected in discrete entries in the map
`page range structure. Whenthe entire file inode struc-
`ture has been scanned, the map page range SVCis is-
`sued and the external page table slot entries for the
`appropriate segment are updated with the block ad-
`dresses for each pageofthefile.
`While the invention has been shown and described
`with reference to a particular embodiment, it should be
`appreciated by those persons skilled in the art that
`changes and modifications may be made without de-
`parting from the spirit of the invention or the scope of
`the appended claims.
`Whatis claimedis:
`1. A method for facilitating the interchange of data
`stored in a Unix file between two UNIX processes being
`run concurrently on two virtual machines in a page
`segmented virtual memory virtual machine type data
`processing system having,
`(1) a main memoryincluding,
`a first plurality of byte addressable storage loca-
`tions each of which functionsto store one byte of
`data,
`(2) a secondary storage device including,
`a second plurality of block addressable storage
`locations each of which functionsto store at least
`one virtual page of data,
`(3) a virtual resource manager for creating at least
`first and second virtual machines having a UNIX
`type Operating System {UOS} program which
`includes,
`(a) co