`
`a2) United States Patent
`US 7,140,015 Bl
`(10) Patent No.:
`Bhanjois etal.
`(45) Date of Patent:
`Nov. 21, 2006
`
`
`(54) MICROKERNEL FOR REAL TIME
`APPLICATIONS
`
`(75)
`
`Inventors: Bhimsen Bhanjois, Santa Clara, CA
`(US); Lakshman Narayanaswamy,
`Santa Clara, CA (US); Srinivas
`Pothapragada, Hyderabad (IN); Naidu
`Bollineni, Sunnyvale, CA (US);
`Pradeep Kankipati, San Jose, CA (US)
`
`73)
`
`g
`pp
`y
`Assignee: Network Appliance, Inc., Sunnyvale,
`CA (US)
`
`*)
`
`Notice:
`
`J
`y
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`US.C. 154(b) by 0 days.
`
`(21) Appl. No.: 09/408,149
`
`(22)
`
`Filed:
`
`Sep. 29, 1999
`
`(51)
`
`Int. CL.
`(2006.01)
`GO6F 9/44
`(52) US. Ch eee 718/100; 718/103; 718/107
`(58) Field of Classification Search........ 709/100-104,
`709/106-108, 310, 313, 314; 718/100-104,
`718/106-1008; 719/310, 313, 314
`See application file for complete search history.
`
`11/1998 Culbert wee 395/674
`5,838,968 A
`... 395/673
`5/1999 Dingwall et al.
`.
`5,903,752 A
`
`11/1999 Yodaiken oe 703/26
`5,995,745 A *
`7/2000 Maytal ...... eee 718/100
`6,092,095 A *
`we. 718/103
`6,167,425 A * 12/2000 Beckhoff
`wee 717/149
`6,192,514 BL*
`2/2001 Lurndal
`..
`6,424,988 BL*
`7/2002 Lurndal
`..
`wee 709/107
`6,466,962 BIL* 10/2002 Bollella ......
`6,507,861 BL*
`1/2003 Nelson et al. oo... 718/104
`OTHER PUBLICATIONS
`
`.. 709/106
`
`Ramanvitham et al., “Scheduling Algorithms and Operating Sys-
`tems Support for Real-Time Systems”, Proceedings of the IEEE,
`vol. 82, No. 1, Jan. 1994, pp. 55-67.*
`Ritchie et al., “User Level IPC and Device Management in the
`Raven Lemel”, Proceedings of the USENIX Microkernels and.
`Other Kernel Architectures Symposium, Sep. 20-23, 1993, pp.
`1-15.*
`Ford et al., “Microkernels Meet Recursive Virtual Machines”,
`Department of Computer Science University of Utah, May 10,
`1996, pp. 1-15.*
`Jochen Liedtke, “On u-Kernel Construction”, German National
`Research Center for Information Technology, 1995, pp. 237-250.*
`Theory of Operation, Auspex, Version 1.0.1, Module-4, pp. 165-
`247.
`Auspex, 4Front NS2000, product information, 1999.
`NetServer Theory of Operation, Version 1.0.2, Auspex, Module-4,
`166-258, 1997.
`NetServer Overview, Version 1.02, Auspex, Module-2, 32-92, 1997.
`
`* cited by examiner
`
`(56)
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`Primary Examiner—Chun Cao
`(74) Attorney, Agent, or Firm—Fish & Richardson P.C.
`
`4,993,017 A *
`2/1991 Bachingeret al.
`.......... 370/360
`5,485,579 A *
`1/1996 Hitz et al. 0...
`. 709/221
`
`5,526,521 A *
`..
`... 709/108
`6/1996 Fitch et al.
`
`5,721,922 A *
`.....
`... 718/103
`2/1998 Dingwall
`5,729,710 A *
`... 711/203
`3/1998 Magee etal. ...
`
`5,742,825 A *
`... 719/329
`4/1998 Mathur et al.
`.....
`5,745,759 A
`4/1998 Hayden et al. oe. 395/680
`5,764,984 A
`6/1998 Loucks oo... eee 395/682
`5,771,383 A *
`... 709/312
`6/1998 Mageeetal.
`
`5,835,764 A * 11/1998 Platt et al. we. 709/101
`
`(57)
`
`ABSTRACT
`
`An operating system includes a non-preemptive microkernel
`executing one or more processes in accordance with a
`predetermined priority; and one or more kernels adapted to
`be executed as one or more processes by the non-preemptive
`microkernel.
`
`26 Claims, 12 Drawing Sheets
`
`100
`
`o
`
`108
`
`110
`
`os
`Commands/
`Libraries
`
`106
`
`System Call Interface
`
`
`
`
`
`Application
`
`
`
`
`
`104
`
`102
`
`Google Exhibit 1064
`Google v. VirtaMove
`
`Google Exhibit 1064
`Google v. VirtaMove
`
`
`
`U.S. Patent
`
`Nov. 21, 2006
`
`Sheet 1 of 12
`
`US 7,140,015 B1
`
`100
`
`108
`
`110
`
`Application
`
`Commands/
`Libraries
`
`Hardware
`
`System Call Interface
`
`Microkernel
`
`FIG. 1
`
`
`
`U.S. Patent
`
`Nov. 21, 2006
`
`Sheet 2 of 12
`
`US 7,140,015 B1
`
`120
`
`122
`
`OS Task A
`
`124
`
`Priority 1
`
`OS Task Boe126
`
`
`
`Priority N
`
`FIG. 2
`
`
`
`U.S. Patent
`
`Nov.21, 2006
`
`Sheet 3 of 12
`
`US 7,140,015 B1
`
`el
`
`aIQuoz
`
`$8990
`
`yualedg
`
`|SS8001q
`
`OF
`
`quale
`
`Zzssa00lg
`
`O€L
`
`orl
`
`
`
`s8900/4____(882800)pIUDSS9901g
`aiquozJyxe|painoaxgoaxaplu
`
`
`
`___{889901PIYDssao0lgBuyeiadO
`
`xepaynsax39exePUD
`PPLZt921
`
`zelJawaxosyy
` jaweyVv4SeLOLIN
`
`
`yualeg
`
`|ssa201g
`
`yO}ozl
`
`Oct
`
`b4seL
`
`yy
`
`weed
`
`ZSS8901g
`
`£Sid
`
`wayshS
`
`NxSEL
`
`¥OL
`
`
`
`
`
`
`
`
`U.S. Patent
`
`Nov. 21, 2006
`
`Sheet 4 of 12
`
`US 7,140,015 B1
`
`M16 Mailbox - Standard
`
`150
`
`Request1
`
`152
`
`Request 2
`
`54
`
`—(Maitbox)
`
`156
`
`
`
`Request 4
`
`Request N
`
`FIG. 4
`
`
`
`U.S. Patent
`
`Nov. 21, 2006
`
`Sheet 5 of 12
`
`US 7,140,015 B1
`
`Message Type
`
`200
`
`
`
`
`SenderPID 210
`
`
`Contents of Message
`212
`
`
`~ 204
`
`Forwarder PID
`
`Destination PID
`
`Msg address
`
`216
`
`218
`
`UNIX: msg type
`
`4 Local values
`222
`
`
`220
`
`IPC System Area
`
`FIG. 5
`
`
`
`U.S. Patent
`
`Nov. 21, 2006
`
`Sheet 6 of 12
`
`US 7,140,015 B1
`
`Process Send
`
`CPU-1
`
`CPU-2
`
`Process Receive
`
`24 . Request
`
`12. Blocked, waiting for msg;
`receives msg, becomes
`adready
`
`.
`43. Does processing requested
`in the message
`
`14. Replies to message
`
`. Poke of descriptor
`causesinterrupt
`
`. Fetch descriptor from
`FIFO
`
`9. Allocate msg space
`
`10. DMA message to
`local space
`
`11. Notice Process
`Receive of
`message
`
`“4
`
`space
`msg SP
`
`. Create msg
`
`. Obtain Process Receive
`PID
`
`. Send msg to PID, block
`waiting for reply
`
`2. Return msg adddress
`
`6. Send msg descriptor to
`CPU-2 FIFO
`
`FIG. 6
`
`
`
`U.S. Patent
`
`Nov. 21, 2006
`
`Sheet 7 of 12
`
`US 7,140,015 B1
`
`Process Send
`
`Process Receive
`
`CPU-1
`
`CPU-2
`
`. Blocked waiting for reply,
`becomes ready
`
`. Runs, perhaps using
`information in the reply
`
`14. Replies to message sender;
`goes to sleep at k_receive()
`
`Sendof reply
`
`. ACK into FIFO
`causesinterrupt
`
`. CPU-1 fetches descriptor
`from FIFO
`
`. Notifies Process
`
`15. Copies messages back to
`original CPU-1 message space;
`
`16. Sends ACK message descriptor
`to CPU-1
`
`FIG. 7
`
`
`
`U.S. Patent
`
`Nov. 21, 2006
`
`Sheet 8 of 12
`
`US 7,140,015 B1
`
`
`
`Input (workload)
`initiated by
`network
`
`300
`
`
`
`
`
`
`
`Schedule on
`
` Needs any FMP
`dependent OS
`services?
`process
`
`
`
`
`
`
`304
`
`Schedule FMP
`
`processes
`
`306
`
`Need application
`service?
`
`
`Output workload to
`308
`another functional
`
`
`processor
`
`FIG. 8
`
`
`
`U.S. Patent
`
`Nov. 21, 2006
`
`Sheet 9 of 12
`
`US 7,140,015 B1
`
`332
`
`334
`
`336
`
`338
`
`340
`
`330
`
`module
`
`
`
`Auspex Kernel
`
`NAS management
`
`Switch OS
`
`Switch/router
`management module
`
`
`
`FDDI
`
`Ethernet
`
`ATM
`
`FIG. 9
`
`
`
`U.S. Patent
`
`Nov. 21, 2006
`
`Sheet 10 of 12
`
`US 7,140,015 B1
`
`
`
`400
`
`FIG. 10
`
`
`
`U.S. Patent
`
`Nov. 21, 2006
`
`Sheet 11 of 12
`
`US 7,140,015 B1
`
`
`
`FIG. 11
`
`
`
`U.S. Patent
`
`Nov.21, 2006
`
`Sheet 12 of 12
`
`US 7,140,015 B1
`
`609
`
`809
`
`SIN
`
`PCI-B SLOT #4
`
`PCI
`
`-B SLOT #3
`
`98S
`
`¥6S
`
`Serial Ports
`
`OfdAdNS
`
`909
`
`3©
`
`Parallel Port
`
`[oo
`
`06s
`
`PCI-6 SLOT #1
`
`Floppy
`
`PCI-B SLOT #2 Onve
` o°Sg%HSW14‘%9;1399zag8ZSos~|@ZZ$11f]|E[906
`209o3Gaaa
`pes949
`O#LO1SVSIalS
`
`
`
`gas
`
`eb‘Sid
`
`ZZ$
`
`9S
`
`oos
`
`
`
`
`
`US 7,140,015 Bl
`
`1
`MICROKERNEL FOR REAL TIME
`APPLICATIONS
`
`The invention relates to operating systems with real-time
`capability.
`System administration convenience and ease of accessi-
`bility have been driving the growth of computer networks.
`In a computer network,
`individual user workstations are
`referred to as clients, and shared resourcesforfiling, print-
`ing, data storage and wide-area communicationsare referred
`to as servers. Clients and serversare all considered nodes of
`a network. Client nodes use standard communications pro-
`tocols to exchange service requests and responses with the
`servers. The servers in turn execute various processes, as
`controlled by the servers’ operating systems.
`The operating systems for these servers face a growing
`need to deliver higher data availability, faster access to
`shared data, and reduced administrative costs through net-
`work data consolidation. Additionally, certain tasks dealing
`with communications and natural data types such as audio/
`video streaming require real-time responses. In these appli-
`cations, if a delay exists in the capture or playback of audio
`or video data, a userat the client nodes may hearclicks and
`pops from audio data output and see modulating or jerky
`video output. Furthermore, a natural playback of audio and
`video data requires that the audio/video data transmission be
`synchronized. Hence, in addition to handling requests effi-
`ciently, the operating system also needs to provide real-time
`capabilities.
`Additionally, the operating system needs to support mul-
`titasking. Multitasking and real
`time processing are
`attributes of an operating system that are closely related to
`scheduling. Multitasking is a scheduling schemethat allows
`the process to work on more than one process or task at a
`time. Real time processing refers to the scheduling con-
`straint that a process must be scheduled and executed within
`a predictable period of time because of someexternal, “real
`world” timing requirement. Real time processing is impor-
`tant for application programs that execute in predictable
`period of time.
`In a multitasking operating system,
`the
`operating system implements a scheduling scheme so that
`real time applications are scheduled to run in a predictable
`period of time. To support real time processing, an operating
`system needs to have some form of preemptive scheduling,
`that is the process of interrupting a currently running process
`to run a higher priority process, such as a real time appli-
`cation program. To ensure that the real time application is
`processed in a predictable period of time,
`the operating
`system needs to be able to gain control of the processor,
`possibly preempting the currently running process, and
`schedule the real time process regardless of other processes
`in the system.
`A traditional operating system is logically layered and
`divided into two main portions: the kernel and user pro-
`grams. The kernel interfaces with and controls the hardware,
`and also provides the user programs with a set of abstract
`system services called system calls. The kernel runs at a
`kernel level, where it can execute privileged operations and
`allows the kernel to have full control over the hardware as
`
`well as user level programs. This centralization provides an
`environment where all programs share the underlying hard-
`ware in a coordinated fashion.
`Traditional kernels have been implemented as a mono-
`lithic program. More recently, the monolithic kernel has
`been partitioned into independent modules to enhanceflex-
`ibility in implementing the operating system as well as to
`modify various services associated with the kernel. In the
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`microkernel, certain services are migrated outside of the
`kernel and run at a user level in special server processes.
`Typically, the microkernel performs only inter-process com-
`munication (IPC) and process scheduling. External pro-
`cesses then use these core services to implementthe remain-
`der of the operating system functionally. The removal of
`complexity from the kernel allows a more efficient IPC
`implementation,
`that
`reduces
`the performance penalty
`incurred (from communicating with external service-provid-
`ing processes) such that the microkernel can be comparable
`in performance to the monolithic kernel.
`When a user requests a program to be executed, a new
`process is created to encompassits execution. The processis
`a combination of the program plus the current state of its
`execution that normally includes the values ofall variables,
`as well as the conditions of the hardware (the program
`counter, registers and condition code, among others and the
`contents of the address space). The process exists within the
`system until it terminates, either by itself as designed, by the
`kernel, or by the request of the user. The processitself is an
`abstraction. The management of program execution can be
`controlled by modifying the scheduling priority of pro-
`cesses.
`
`In traditional operating systems including the Unix oper-
`ating system, the kernel schedules only processes for execu-
`tion since all systemactivities, whether user or kernellevel,
`occur within the context of some process. When using
`traditional
`time-sharing scheduling policies, processes
`executing at the user level may be timesliced at any time in
`order to share the processing resources fairly among all
`processes. Processes operating at the kernel level are exempt
`from time slicing. A switch to a different process while
`executing at the kernel level is typically performed only
`whenthe current kernel process explicitly allows it to occur.
`As discussed above, there are often times when certain
`applications demand a different scheduling algorithm than
`whatthe operating system provides. Typically, the vendors
`of the operating systems modify a scheduler to provide a
`real-time like response, rather than givethe flexibility to the
`user. Other vendors run the real-time kernels as processes
`under the operating system. However, in such an approach,
`the scheduler of the time sliced operating system can pre-
`empt the real-time kernel at will and defeat the purpose of
`the real-time nature of the kernels that are running as
`processes.
`
`
`
`SUMMARY OF THE INVENTION
`
`An operating system includes a non-preemptive micro-
`kernel executing one or more processes in accordance with
`a predeterminedpriority; and one or more kernels adapted to
`be executed as one or more processes by the non-preemptive
`microkernel.
`
`Implementations of the invention include one or more of
`the following. One of the kernels can execute an operating
`system. The operating system can be a time-sliced operating
`system such as Unix. Each processhasits own stack, and the
`processes can communicate using one or more messages.
`Each process also has a unique process identifier (PID). A
`mailbox connected to a plurality of processes can service
`messagessent to a single PID. The processes executed by the
`system never terminate. The kernel executed as a process
`can be a monolithic kernel or can be a microkernel.
`
`Advantages of the invention include one or more of the
`following. Real-time applications
`such as multimedia
`streaming, voice/audio processing and applications operat-
`ing with natural data types are supported without allowing
`
`
`
`US 7,140,015 Bl
`
`3
`other operations to disrupt the capture, delivery or playback
`of data. No modification to the operating system’s schedul-
`ing algorithm is needed. Moreover, the operating system
`applications that are running as processes are protected
`without degrading the real-time response capability of the
`operating system.
`The non-preemptive micro kernel that can run other micro
`kernels or operating systems as processes and protect the
`nature of that kernel that has piggybacked. For example, a
`user can run the Unix operating system as a process and
`schedule Unix to run to protect the nature of all the appli-
`cations that are running on Unix. When Unix gets control of
`the computer,
`it can run applications such as Web CGI
`scripts to generate networkorfile system or storage tasks.
`The microkernel offers scalability: simply by including or
`excluding additional microkernel processes, the functional-
`ity (and resource requirements) of the operating system
`could be scaled to address different application needs requir-
`ing different operating systems using the same microkernel.
`The microkernelalso offers extensibility achieved by adding
`specific operating system microkernels running as pro-
`cesses. Moreover, these functionality enhancements can be
`readily accomplished by the users, rather than requiring (or
`waiting for) the hardware vendor to implement them. The
`microkernel also offers a high degree of concurrence, since
`one operating system microkernel can run as several con-
`current processes, it can provide greater concurrence than a
`single microkernel.
`The microkernel may also manage an adaptivefile system
`that is tuned for specific applications. The tuning processis
`simple, and only requires the user or suitable software to
`select from a list of options as to the characterization of the
`processing load.
`The resulting server with the microkernel is powerful,
`scalable and reliable enough to allow users to consolidate
`their data onto one high performance system instead of
`sources of smaller, less reliable systems. This consolidation
`of data resources onto a powerful server brings a number of
`advantagesto the client-server environment. The consolida-
`tion of data reduces the need to replicate data and to manage
`the consistency of the replicated data. Data is available more
`quickly and reliably then conventional client-server archi-
`tecture.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`Other features and advantages will be apparent from the
`following description and the claims.
`
`45
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The invention will be described with respect to particular
`embodiment thereof, and reference will be made to the
`drawings, in that:
`FIG. 1 is a diagram illustrating a microkernel that man-
`ages one or more additional microkernels as processes.
`FIG. 2 is a diagram of tasks executed by the microkernel
`of FIG. 1.
`
`FIG.3 is a diagram illustrating the microkernel of FIG. 1
`executing another kernel.
`FIG. 4 shows communication pathways between request-
`ors and workers.
`
`FIG. 5 is a diagram illustrating the components of a
`message.
`FIG.6 is a diagram illustrating a message send process.
`FIG. 7 is a diagram illustrating a message reply process.
`FIG. 8 is a flowchart of a process
`FIG. 9 is a diagram illustrating an exemplary application
`of the microkernel in accordance with the invention.
`
`50
`
`55
`
`60
`
`65
`
`4
`FIG. 10 is a block diagram ofa first computer system that
`loosely couples a plurality of tightly-coupled processors.
`FIG. 11 is a block diagram of a second computer system
`that loosely couples a plurality of tightly-coupled proces-
`sors.
`
`FIG. 12 is a block diagram of an n-way processor com-
`puter system.
`
`DESCRIPTION OF THE PREFERRED
`EMBODIMENTS
`
`Referring now to FIG. 1, various layers executing in a
`computer environment 100 are shown. The computer envi-
`ronment 100 has a hardware layer 102, a microkernel layer
`104, a system call interface layer 106, an application portion
`108 and an operating system command/library portion 110.
`The microkernel layer 104 interfaces with the hardware
`layer 102. The microkernel layer 104 runs at a kernel level
`where the microkernel layer can execute privilege opera-
`tions to allow the kernel
`to have full control over the
`hardware and user level programs. The application portion
`108 and the OS command/library portion 110 run at a user
`level. The user level interacts with the kernel level through
`various systemscall interfaces. The user level executes at an
`unprivileged execution state of the hardware and thus are
`executed in a restricted environment, controlled by the
`microkernel layer. Hence, the microkernel layer prevents
`simultaneously executed programsfrom interfering with one
`anothereither intentionally or maliciously. The microkernel
`layer 104 executes a non-preemptive microkernel that can
`run other kernels or microkernels as processes. As such, the
`microkernel layer 104 can protect the nature of kernels that
`have “piggybacked” onto the microkernel layer 104. For
`instance, the user can run the Unix operation system as a
`process managed by the microkernel layer 104.
`A process performs a sequenceofactions. A processor can
`perform only one action (of one process) at a time. Each
`processor can be executing a different process, concurrently.
`All processes running on a given processor share the same
`address space,
`including external variables and program
`code. Each process has its own stack, 1.e., automatic vari-
`ables and function arguments are local to each process.
`A process in this abstraction is very “light weight” com-
`pared to a regular Unix process. A process in this abstraction
`requires as little as a few hundred bytes of memory for the
`process descriptor and a stack. Many processes can run on
`each processor. Process switching time is very small because
`only the stack pointer and caller-saved registers need to be
`saved and restored. All local processes run in the same
`address space, so there are no memory mapsto switch. The
`microkernel layer contributes negligible CPU overhead for
`single-process applications as well as most multiprocessing
`applications.
`layer 104 is a task-based operating
`The microkernel
`system and executesa plurality of tasks, as shown in FIG.2.
`Tasks 120, 122, 124 and 126 are executed in a predetermined
`priority sequence that is shown in FIG. 2 as priorily levels
`1 through priority level N. Any of the tasks can be used to
`execute an operating system kernel or microkernel. Thus, in
`the exemplary embodimentof FIG.2, tasks 122 and 124 are
`processes associated with different operating systems.
`The microkernel layer 104 in effect becomes a “domi-
`nant” operating system that loads “sub-dominant”operating
`systemsfor execution. Each sub-dominantoperating system
`environment is set up as a process that depends from the
`dominant operating system. The dominant operating system
`can run one or more functional processing operations such
`
`
`
`US 7,140,015 Bl
`
`6
`Once a process is created it lives forever. The function
`executed by a process must never return; however, it may
`call other functions that return.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`where funcis a pointer to the functionthat the process will
`execute, and arg is an argument, of type long, that is
`passed to func, stack_size specifies the size in bytes of
`the process’ stack, and priority specifies the process’
`scheduling priority level.
`
`FIG.6 illustrate in more detail the use of a message in
`sending data to a process. FIG. 6 depicts an example
`message communication between a process running on
`CPU-1 (Process Send) and a process running on CPU-2
`
`5
`as networking, file system, and storage. The sub-dominant
`operating systems in turn run application specific processes
`such as SNMPdecoding, Java bytecode execution or CGI
`scripting operations.
`The relationship between the microkernel layer 104 and
`the operating system tasks it supports is show in moredetail
`in FIG. 3. In this exemplary configuration, the microkernel
`layer 104 executes task 1 120, the operating system micro-
`kernel task 122 and task N 126. The operating system
`microkernel task 122 in turn runs an operating system such
`as Unix and a plurality of processes executed by the oper-
`ating system of task 122. For instance, the task 122 can
`execute a parent process 1 130 that can fork and generate
`one or more child processes 132 that become executed child
`processes 134. Upon completion of the executed child
`processes 134, the executed child processes become zombie
`processes 136 and control is returned to the parent process
`130. Similarly, the task 122 can execute a parent process 2
`140 that can fork and generate one or more child processes
`142 that become executed child processes 144. Upon
`completion of the executed child processes 144,
`the
`executed child processes become zombie processes 146 and
`control is returned to the parent process 140.
`Referring now to FIG. 4, communications to and from a
`mailbox 160 are shown. Some applications require several
`“worker” threads of control to serve messages from other
`processes, but need a single process identifier (PID) to that
`those other processes would send the message. For this
`purpose, a mailbox is used. The mailbox 160 receives
`incoming messages from a plurality of requests as 150, 152,
`154, 156, and 158. The mailbox 160 in turn communicates
`with a plurality of workers 162, 164, 166, 168 and 170.
`In a standard mailbox, a client sends a work request to the
`manager; the manager then sends the work request to an idle
`worker. In an inverted mailbox, a client asks the manager for
`the nameofan idle worker and the manager respondsto the
`client with the nameof the idle worker; the client then sends
`the work request directly to the worker, not through the
`manager.
`Mailboxes are created by a process converting itself to a
`mailbox and messages sent to that PID will queue up at that
`mailbox. A destination to that a message can be sent, and a
`sender to that a reply to a messageis returned,is identified
`by a unique 32-bit PID. Definition of one exemplary PID
`Type is shown below:
`/*
`* To the user, PIDs are just integers. To the kernel, they
`* consist of 8-bits of virtual IPC slot, and 20 bits identifying
`the
`
`The process states include ready, blocked, or waiting. A
`process is said to be readyif it is able to perform the next
`action of its program. A process that is not ready is said to
`be blocked. A process is blocked whenit does any one of the
`following:
`sends a message to another process
`waits to receive a message from another process or from
`the kernel timer service
`waits for a hardware interrupt to occur
`Whena process requests certain system services, such as
`registering or looking up process names, the process sends
`a request to a kernel process; this causes the requesting
`process to become blocked. When a process becomesready,
`it is placed in its processor’s ready queue. Processes in the
`ready queue are ordered from highest to lowest priority.
`Within each priority level, processes execute in the order in
`that they became ready.
`Whena process is blocked, it relinquishes the processor,
`and the processor is allocated to the process at the head of
`the ready queue. If the ready queue becomes empty, the
`processor is allocated to the next process to becomeready.
`The microkernel is non-preemptive, so processes give up the
`CPU only when they block, or whenthey explicitly request
`to be preempted.
`For each way that a process can be blocked, there is a
`corresponding way that it can becomeready:
`A process blocked after sending a message becomesready
`when some other process replies to the message
`A process blocked waiting for a message to arrive
`becomes ready when one does. The message may be
`from another process, or it may be an alarm message
`delivered by the kernel.
`A processthat is blocked waiting for a hardware interrupt
`becomes ready when the interrupt occurs.
`FIG. 5 shows components associated with a message. A
`message is a fixed-sized vector containing K_MSG_SIZE
`bytes of data. Thefirst section of every message is a message
`type field 200 that identifies a function requested by the
`message. Data is stored in a message content area 202.
`Messagesalso include private information and may need to
`obey special alignment required by hardware. An IPC sys-
`tem area 204 is used for local message administration
`purposes andis not retrieved by a receiving processor. Only
`the message type and the data stored in the message content
`area of a messageare retrieved by the receiving processor.
`In one implementation, the IPC system area 204 includes a
`sender PID 210, a forwarder PID 212, and a destination PID
`126. The IPC system area 204 also includes a message
`address 218 and a messagetype (e.g., UNIX) 220. The IPC
`system area can have local values 222.
`typedef long K_PID;/* Pid as seen by user. */
`typedef struct m16_pid_t {/* Pid as seen by m16 kernet */
`All communication between boards is through primitives
`to send, receive, and reply with their options. When a
`unsigned pid_slot: 8;
`
`unsigned :4; process sends a message, it goes to sleep. Whenareply is
`unsigned pid_proc: 20;
`sent for that message, the kernel locates the message(via the
`} M16_PID_T;
`address contained in the reply message descriptor) and
`within that message it finds the processing structure. It then
`A process can create a new process on the same process
`links the reply message to the processing structure and
`by calling
`places the process on the run queue. Once the process is
`pid=k create(func, stack_size, priority, arg);
`running, it finds the reply message on its own processing
`structure.
`
`* local process.
`
`a *
`
`/
`
`
`
`US 7,140,015 Bl
`
`implements
`
`7
`(Process Receive). A user-level process space 230, a micro-
`kernel layer 232, and a FIFO 234 are associated with CPU-1.
`Auser-level process space 240, a microkernel layer 242, and
`a FIFO 244are associated with CPU-2. As shown therein,
`the steps of sending data to a process include:
`1. The sending process
`(Process Send)
`k_alloc_msg( ) to obtain a message.
`2. A message is obtained from a free message list. The
`microkermel layer 232 allocates space for additional
`messagesif there are no messages in the free message
`list. If a message vector cannot be allocated due to
`insufficient memory, k_alloc_msg( ) will panic the
`board. All processes running on a processor share the
`same address space.
`3. The sending process creates the message.
`4. Messages are sent to a PID. Therefore, the sending
`process obtains the PID ofthe receiving process (Pro-
`cess Receive). In general, this occurs only at boot-up,
`since processes never terminate.
`5. The sending process sends the message to the PID of
`the receiving process; the sending process blocks wait-
`ing for the reply.
`6. The microkernel layer 232 on CPU-1 pokes a message
`descriptor (the address of the message) into FIFO 244
`of CPU-2.
`
`7. Poking of the message descriptor into the FIFO 244 of
`CPU-2 causes an interrupt on CPU-2.
`8. The microkernel layer 242 on CPU-2 fetches message
`descriptor from its FIFO 244.
`9. The microkernel layer 242 on CPU-2 implements a
`k_alloc_msg( ) to allocate local space for the message.
`10. The microkernel
`layer 242 on CPU-2 DMAsthe
`message from the VME (Versa Module Europe bus
`standard)
`space address
`included in the message
`descriptor. The message includes the PID of the sender
`so that the receiving process knows to where to reply.
`11. The microkernel
`layer 242 on CPU-2 passes the
`local-space address of message to the receiving process
`in the user-level process space 240.
`12. The
`receiving process has been sleeping at
`k_receive( )
`(1e., blocked) waiting for a message;
`receipt of message causes the receiving process to
`become ready.
`13. The receiving process does the processing requested
`in the message.
`14. The receiving process sends a reply (this does not
`cause the receiving process to block). In this example,
`the receiving process returnsto sleeping at k_receive( ),
`blocked until the next messagearrives.
`FIG. 7 shows a corresponding message reply process
`from a process. As shown in FIG.7, the steps are:
`14. The receiving process sends a reply (this does not
`cause the receiving process to block). In this example,
`the receiving process returnsto sleeping at k_receive( ),
`blocked until the next message arrives.
`15. The microkernel layer 242 on CPU-2 DMAs message
`back to the original message space of CPU-1 and does
`a k_free_msg( ) to free local message space.
`16. The microkernel layer 242 on CPU-2 pokes ACK
`message descriptor into FIFO 234 of CPU-1.
`17. Poking of message into FIFO 234 causes an interrupt
`on CPU-1.
`
`18. CPU-1 fetches messagedescriptor from its FIFO 234.
`19. The microkernel
`layer 232 on CPU-1 notifies the
`sending process ofthe reply.
`
`5
`
`15
`
`25
`
`30
`
`40
`
`45
`
`50
`
`60
`
`8
`20. The sending process has been blocked waiting for the
`reply; receipt of the reply causes the sending process to
`become ready.
`21. The sending process runs, using information in the
`reply message.
`22. The message can befreed.
`FIG. 8 shows a flowchart for executing processes using
`the microkermel. First, a workload such as a system trap or
`error is initiated over a network (step 300). The process of
`FIG. 8 determines whether one or more functional multi-
`
`processing (FMP) services are needed (step 302). FMP
`services include services that handle NFS, CIFS, FTP or
`HTTP, among others. If one or more FMP services are
`required, the process of FIG. 8 schedules the FMPprocesses
`as required (step 304). From step 304, the process deter-
`mines whether an applications service is needed (step 306).
`If not, the resulting workload is sent to another functional
`processor for handling (step 308).
`From step 302, if an FMP service is not needed, the
`process of FIG. 8 transitions to step 310 where it schedules
`the request as a dependent operating system process. Simi-
`larly, from step 306, if an application service is needed, the
`process of FIG. 8 schedules the request as a dependent
`operating system process (step 310). From step 310, the
`process of FIG. 8 proceeds to step 308 to output
`the
`workload to another processor.
`FIG. 9 shows an exemplary microkernel configuration
`that is optimized to a telecommunications application. In
`FIG. 9, a computer 330 executes a microkernel 332, that in
`this case is a network attached storage (NAS) management
`module. The microkernel 332 in turn executes one or more
`dependent operating systems 334,
`that in this case is a
`switching operating system. The switch operating system
`provides switch/router management services. The switch
`operating system in turn supervises ports that can be Fiber
`Distributed Data Interface (FDDI) ports 336, Ethernet ports
`338 or Asynchronous Mode Transfer (ATM) ports 340.
`FIG. 10 shows a computer system 400 that
`loosely
`couples a plurality of tightly coupled processors in collec-
`tively providing a high performance server. The system 400
`has a plurality of processors 402-