`a2) Patent Application Publication (10) Pub. No.: US 2004/0153709 Al
`(43) Pub. Date: Aug. 5, 2004
`
`Burton-Krahn
`
`US 20040153709A1
`
`(54)
`
`(76)
`
`METHOD AND APPARATUS FOR
`PROVIDING TRANSPARENT FAULT
`TOLERANCE WITHIN AN APPLICATION
`SERVER ENVIRONMENT
`
`Inventor: Noel Morgen Burton-Krahn, Victoria
`(CA)
`
`Correspondence Address:
`Noel Burton-Krahn
`919 Dunsmuir Road
`
`Victoria, BC V9A 5C4 (CA)
`
`(21)
`
`Appl. No.:
`
`10/611,930
`
`(22)
`
`Filed:
`
`Jul. 3, 2003
`
`Related U.S. Application Data
`
`(60)
`
`Provisional application No. 60/393,630,filed on Jul.
`3, 2002.
`
`Publication Classification
`
`61)
`(52)
`
`Tt, C07 i eeescecccsssssseseescccessssenescccceccessnnenescss HO4L 1/22
`US. de cece ssessessntesnssnesnesnensessesnesne 714/4
`
`(57)
`
`ABSTRACT
`
`Disclosed is an apparatus for providing transparent fault
`protection for redundant server systems comprising a plu-
`rality of servers connected to a plurality of clients over a
`network. One or more servers are configured in a master and
`back-up configurations. Each server operates independently
`from the other and each server is connected to the network
`
`using an identical address so that each master and back-up
`server receives the same client communications. Each server
`
`runs the same copy of operating system, server application
`system and fail over protection system programs. The inven-
`tion provides for a method oftransparent fail over protection
`between the master and the back-up servers by synchroniz-
`ing the operation of the master with the back-up. Synchro-
`nization is accomplished by synchronizingthe initial state of
`the operating system by ensuring that the respective master
`and back-up operating systems are using the same file
`systems. Synchronization of the servers also necessitates
`synchronization of the application states of the respective
`master and back-up server application programs and syn-
`chronization of the respective network connection states
`between the master and back-up servers and the network
`respectively. Once synchronization is achieved,the fail over
`between master and back-up servers will be transparent to
`the client.
`
`\4
`
`2
`
`\S
`
`Google Exhibit 1097
`Google v. VirtaMove
`
`Google Exhibit 1097
`Google v. VirtaMove
`
`
`
`Patent Application Publication Aug. 5, 2004 Sheet 1 of 8
`
`US 2004/0153709 A1
`
`ra
`
`oo
`
`O
`Y)
`
`a!
`
`Lt
`
`Figure1
`
`€vO
`
`e
`
`
`
`Patent Application Publication Aug. 5, 2004 Sheet 2 of 8
`
`US 2004/0153709 A1
`
`Figure2
`
`
`
`Patent Application Publication Aug. 5, 2004 Sheet 3 of 8
`
`US 2004/0153709 Al
`
`[waisksontsuyeisdO—wa)skS
`
`
`VE<u
`uoneoyddy=}<————_——S—4[ono
`Teve/Oc
`ofge
`8Or
`Ao
`
`
` jseterwomen,[EPP
`
`nen)yI\
`
`ake]YIOMISNi
`
`9T
`
`
`
`IoAINgdnyorg
`
`9%.
`
`€oinbi4
`
`I9AIIS
`
`uoryeorddy
`
`wreId0lg
`
`
`
`IDAIOSJOS
`
`ol
`
`5
`
`WoT
`
`
`
`
`
`
`
`Patent Application Publication Aug. 5, 2004 Sheet 4 of 8
`
`US 2004/0153709 Al
`
`Joynduioz)dnyoeg
`
`pyoinbi4
`
`IO}SPY
`eedemgioH]
`
`
`
`demgiopH
`
`JOINAGMIOT)
`
`
`
`Patent Application Publication Aug. 5, 2004 Sheet 5 of 8
`
`US 2004/0153709 Al
`
`()[TeosAs|~Teuzey|
`
`cz
`
`deMsioy
`
`Ja\Sel|
`
`goinbi4
`
`oOs’PptT TE
`
`
`
`89(4ay4uUI]DIWWeUAp)
`
`
`
`Os‘UTYS=qvOTHudI
`
`
`
`Patent Application Publication Aug. 5, 2004 Sheet 6 of 8
`
`US 2004/0153709 Al
`
`9o1n6i4 <p|Foe]
`fr]Peztuompukg demsjopy
`
`
`poreys
`
`
`
`ssoJppVYOMISN
`
`
`
`SSOIPPYJIOMION
`
`pereys
`
`SIOMION,
`
`NN
`
`quel)
`
`
`
`
`
`
`
`
`Patent Application Publication Aug. 5, 2004 Sheet 7 of 8
`
`US 2004/0153709 Al
`
`
`
`WOJOYDLd
`
`qual
`
`Zoanbi4
`
`
`
`yeyoedaaisooy
`
`PoJeysYUM
`
`
`
`SSBJPPeJaAsas
`
`MON
`
`éuandeUUOD
`
`
`
`0}}da20epuas
`
`JOAIAS
`
`
`
`0}Joyoedpuss
`
`JBAIRS
`
`
`
`yeyoedaaiacey
`
`peseysYIM
`
`
`
`SSOIPPeJOAIOS
`
`Jaysep
`
`JOJWEM
`
`0)dnyoeq
`
`ydanse
`
`
`
`0}Je9edpuss
`
`JOAIOS
`
`
`
`
`
`
`
`Patent Application Publication
`
`Aug. 5, 2004 Sheet 8 of 8
`
`US 2004/0153709 Al
`
`
`
`0}sa06axed
`
`qual
`
`ga1nbi4
`
`SHOAUT Ze
`“JOAOIIEJBYOAUT“JOAOHEJ
`
`
`
`
`SCUILLID}4a)SeWWalu0eq
`UYFese
`
`
`
`FIMO35srayseuwnsyDaYopuas
`
`BulobynoureiqnSeW0}
`BanoyJOISEWO:
`
`Jo]}seW
`
`yumsoalbe
`
`yndjno
`
`Jayng
`
`puccey
`
`anlasqo
`
`yeyoed
`
`wold
`
`4a}seu
`
`ONONayr
`
`Bujofino
`
`yayoed
`
`Jayng
`
`
`
`0}yayoedppy
`
`
`
`JayngBuobyno
`
`
`
`eI9T\wayJed
`
`J3AIOS
`
`qnoauwl,
`
`dnypeqJo
`
`SoaiBesip
`
`DAIQOOY
`
`AJUaApue
`
`
`
`‘OaS‘OWUlW
`
`qWual|>O}NIM
`
`
`
`Jayjngqwooly
`
`
`
`“JBAOHIE}DJOAUT
`
`
`
`dnyoeqauoub)
`
`SOUPAPY
`
`Burobyno
`
`
`
`Jayngyeyoed
`
`Bui06yno
`
`yeyped
`
`Jaynq
`
`
`
`Burobynoou0}Ja»pedppyJayyng
`
`
`
`
`
`OblWOd}yxDed
`
`4ASAI3S
`
`
`
`uMJeyxDedpuas
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`US 2004/0153709 Al
`
`Aug. 5, 2004
`
`METHOD AND APPARATUS FOR PROVIDING
`TRANSPARENT FAULT TOLERANCE WITHIN AN
`APPLICATION SERVER ENVIRONMENT
`
`CROSS-REFERENCE TO RELATED
`APPLICATIONS
`
`[0001] This application is entitled to the benefit of Provi-
`sional Patent Application 60/393,630 filed on Jul. 3, 2002.
`
`REFERENCE TO MICROFICHE APPENDIX
`
`[0002] Not applicable.
`
`FIELD OF THE INVENTION
`
`[0003] This invention pertains to providing fault protec-
`tion for server systems and moreparticularly a method and
`apparatus for providing transparent fault tolerance within an
`application server environment.
`
`BACKGROUND OF THE INVENTION
`
`[0004] Computer network server applications must sup-
`port many simultaneous client connectionsatall times. They
`need to be scalable to many users, available at any time, and
`each connection must be completely reliable. These features
`are critical in the long term, but are usually only considered
`after
`initial development. Most server applications are
`developed with inexpensive components that do not support
`high availability or scalability. After initial development,
`they must be altered to deal with hardware faults and high
`connection loads.
`
`[0005] Servers may become unavailable for many reasons
`such as hardware failure, software failure, maintenance
`outages, network infrastructure failure and physical damage
`due to unforeseen events suchasfires or floods. Each failure
`mode has a unique duration and potentialto corrupt or loose
`data. Adding fault tolerance to an existing system can be
`difficult and expensive, and may not be possible for some
`kinds of server applications. Many computer network server
`applications are developed using freely available tools like
`Linux™, Apache™, PHP™ and MySQL™. However, none
`of these applications have built-in fault tolerance.
`
`applications vary
`server
`network
`[0006] Computer
`between web servers (HTTP), web applications (HTML),
`databases (eg. MySQL™and Oracle™), streaming media
`(eg. RealAudio™) and teleconferencing (eg. NetMeeting™
`and Roger Wilco™). Understandably, servers must be con-
`tinuously available despite server failures. Since each appli-
`cation has a different client connection characteristic (such
`as duration of connection and internal state of the server),
`different server failure modes are encountered necessitating
`various strategies for fault tolerance. For example, redun-
`dant servers or server clustering provides good fault toler-
`ance for HTTP and HTMLapplications. However, if the
`active server fails the client’s connection will be broken and
`data can be lost. Databases are particularly vulnerable to
`failures because they must support many concurrent read/
`write transactions. Databases generally rely solely on peri-
`odic back-up. Therefore, database failure can result in lost
`information between the time of the last back-up and the
`time of failure. Commercial redundant database solutions
`
`like Oracle™ and Solid™provide better reliability but they
`are expensive. Many applications are made with freely
`available databases like MySQL™and PostgreSQL™that
`
`have excellent performance, but no built-in fault tolerance.
`Server redundancy does not necessarily increase the reli-
`ability of streaming media over the Internet. For example, a
`broken connection during a movie may result in having to
`restart the movie from the beginning. Alternatively,
`the
`server may have to support an ability to restart a broken data
`stream resulting in additional costs to the user.
`
`fault
`[0007] One example of a known art device for
`tolerance is described in U.S. Pat. No. 6,097,882 “Method
`and apparatus of improving network performance and net-
`workavailability in a client-server network by transparently
`replicating a network service” issued to Mogul on Aug. 1,
`2000. Mogul describes a server cluster where a “replicator”
`transparently distributes requests from clients to servers.
`However, there is no effort to preserve a connection if the
`server fails or to transfer server state from a failed server.
`
`Another example of a known art fault tolerance device is
`described in U.S. Pat. No. 6,256,641 “Client transparency
`system and methodtherefore” issued to Kasi on Jul. 3, 2001.
`Kasi teaches a programming scheme which adds a middle
`component between a client and a server. The middle
`componentwill retry a requestif the serverfails, without the
`client knowing. This only worksfor transaction-based appli-
`cations. The state from the failed server is not preserved.
`
`It is apparent that the known art methods of pro-
`[0008]
`viding higher server availability such as server clusters,
`periodic back-up and redundant hardware have limitations.
`They allow users to reconnect to a new server if onefails but
`connections andstate at the failed sever will be lost. These
`
`solutions often rely on client connections being short and
`repeatable. They are not suitable for a real-time teleconfer-
`encing, gaming applications or databases because redundant
`database servers must maintain a consistent state. They can
`be very expensive to implement requiring additional pro-
`gramming labor and hardware.
`
`Thereis still no general way to provide inexpensive
`[0009]
`and transparent failover for off-the-shelf servers. Therefore,
`there is still a requirementto provide a method and apparatus
`that permits any existing server to fail over transparently to
`a back-up server without breaking client connections.
`
`SUMMARYOF THE INVENTION
`
`[0010] The present invention provides a redundant server
`system for providing transparent fault tolerance within an
`application server environment comprising a network of
`computers. The preferred embodimentof the present inven-
`tion comprises one server designated as a master server for
`storing and operating a first operating system program and
`a first server application program. The master server is
`connected to a computer network and has a network address.
`The invention also includes a second server designated as a
`back-up server. The back-up server stores and operates a
`second operating system program and a second server
`application program. The second operating system program
`and second server application program are identical to the
`first operating system program andthefirst server applica-
`tion program. The back-up server is also operatively con-
`nected to the same computer network.
`
`[0011] The master server is operatively connected to the
`back-up server and the two servers are in continuous com-
`munication with each other. One novel feature of my inven-
`tion is that the operation of the master server and back-up
`
`
`
`US 2004/0153709 Al
`
`Aug. 5, 2004
`
`server are synchronized. Included are means for monitoring
`synchronicity between the master server and the back-up
`server and meansfor detecting non-synchronicity between
`the two servers. In the failure modes contemplated by my
`invention, the master server may fail to operate resulting in
`a non-synchronicity between it and the back-up. In this case,
`the master server will terminate its operation and all func-
`tions of the master server will be transferred to the back-up
`server without the client knowing the transfer has taken
`place and without any loss of data, in other words, trans-
`parently. The other failure mode of the system is when the
`back-up server fails to operate in a synchronized manner
`with the master. In this scenario, the back-up server will
`terminate and all functions will remain with the operating
`master. Within each server there is embedded automatic
`fail-over protection. The fail over protection will, upon a
`detection of non-synchronicity between the two servers,
`invoke a transfer of server operations from the failed server
`to the non-failed server.
`
`Myinvention also discloses a methodfor providing
`[0012]
`transparent fault tolerance within an application server envi-
`ronment comprising a network of computers. The method
`comprises the steps of:
`
`a. providing a first server for storing and
`[0013]
`operating a first operating system program anda first
`server application program;
`
`b. providing a second server for storing and
`[0014]
`operating a second operating system program and a
`second server application program;
`
`c. placing said first server in communication
`[0015]
`with said second server;
`
`d. selecting from the first server and the sec-
`[0016]
`ond server a master server and a back-up server;
`
`e. synchronizing the operation of the master
`[0017]
`server and the back-up server;
`
`f. providing from the network an identical
`[0018]
`client data stream input simultaneously to the master
`server and the back-up server wherein:
`
`i. the master server and back-up server have
`[0019]
`the same network address
`
`the master server and back-up server
`ii.
`[0020]
`simultaneously process said identical client data
`stream; and wherein,
`
`iii. the master server and the back-up server
`[0021]
`simultaneously produce a respective first and sec-
`ond output data streams; and wherein,
`
`Inthe event that the invention detects non-synchro-
`[0025]
`nicity, the invention will execute the following steps:
`
`a. receive an indication of divergence from
`[0026]
`identicality of the first output data stream from the
`second output data stream;
`
`initiate fail over protection wherein the
`b.
`[0027]
`backup assumes the duty of the master without
`breaking any network connections.
`
`OBJECTS AND ADVANTAGES OF THE
`INVENTION
`
`[0028] Myinvention hasas its objects and advantages the
`following:
`
`to provide transparent fail over for commer-
`[0029]
`cial servers which do not have inherent fail over
`
`protection;
`
`to protect against faults that cause a host to
`[0030]
`become unresponsive such as hardware failures,
`network failures, powerfailures, or natural disasters;
`
`[0031] making a server highly available even though
`it runs on unreliable hardware; and,
`
`replicate the application state of a master
`[0032]
`server on a back-up server by running an identical
`copy of the server application program on the back-
`up server and feeding the back-up server the same
`input as the master server.
`
`[0033] The above and additional advantagesof the present
`invention will become apparent to those skilled in the art
`from a reading of the following detailed description when
`taken in conjunction with the accompanying drawings.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`[0034] FIG. 1 shows a client connected to a single non-
`replicated server
`
`[0035] FIG. 2 shows a client connected to replicated
`servers embodying the present invention.
`
`[0036] FIG.3 showsthe relationship between the present
`invention and the other operating programs within the
`servers.
`
`[0037] FIG. 4 schematically portrays the synchronizing of
`system calls.
`
`iv. said first and said second output data
`[0022]
`streams are identical if the master server and the
`
`[0038] FIG. 5 shows a process for the interception of
`system calls.
`
`back-up server are operating correctly;
`
`g. comparing saidfirst output data stream with
`[0023]
`said second output data stream for divergence from
`identicality of the first output data stream from the
`second output data stream;
`
`h. detecting no divergence from identicality of
`[0024]
`the first output data stream from the second output
`data stream;
`
`[0039] FIG. 6. shows schematically how the network
`connection states between the Master and Back-up servers
`are synchronized.
`
`[0040] FIG. 7 shows schematically the synchronization of
`TCP packets from client to servers.
`
`[0041] FIG. 8 shows schematically the synchronization of
`TCP packets from servers to client.
`
`
`
`US 2004/0153709 Al
`
`Aug. 5, 2004
`
`DETAILED DESCRIPTION OF THE
`INVENTION
`
`[0042] Definitions
`
`[0043] The following terms are defined for additional
`clarity.
`
`[0044] Client—is a program that connects to a server.
`
`Server—aserveris a collection of processes on a
`[0045]
`single device that accept and process connections from
`clients.
`
`[0046] Master—the primary server responsible for hand-
`ing a client connection.
`
`[0047] Back-up—anidentical server to the Master that can
`take over the client connection if the Masterfails.
`
`[0048] Failover—The ability for a client connection to be
`relocated from Master to Back-up without interruption or
`loss of information. Failover should be transparentto clients.
`The client’s connection should not be broken or need to be
`
`manually restarted. The difficult part of transparent fail over
`is transferring the state from the failed Master to the Back-
`up.
`
`[0049] Application State—As the client and server com-
`municate, the Master server application program changes
`state. The Master server application program may advance a
`file pointer, update files on disk, or change its internal
`memory state. This is known as the Application State. The
`present invention runs the Master and the Back-up servers in
`such a way as to synchronize Application State efficiently.
`
`[0050] Network Connection State—The operating system
`uses a network protocol to connect the Master with the
`Client. This network protocol uses a set of state variables.
`For example, the TCP protocol includes sequence numbers
`(SEQ) acknowledgements (ACK), and timers for timeouts
`and retransmits. This set of state variables is knownas the
`
`Network Connection State. The Back-up mustreplicate the
`Network Connection State for transparent fail over.
`
`[0051] System Call—Application programs interact with
`operating systems by System Calls. A System Call occurs
`when an application program invokes a function that is
`implemented by its operating system, for example, open or
`read a file or get the current time.
`
`[0052] System State—Thestate of the operating system in
`which a server application program runs.
`
`[0053] That includes the current time, the available files
`and process identificationsetc..
`
`DESCRIPTION OF THE PREFERRED
`EMBODIMENT
`
`[0054] The preferred embodimentofthe present invention
`provides a method and an apparatus for providing fault
`tolerance through transparent fail over protection to existing
`off-the-shelf servers with little or no modification or rewrit-
`
`ing of the existing server software. For ease of reference,
`throughoutthis disclosure, I will be making reference to my
`invention as HOTSWAP. HOTSWAP applies to webservers,
`mail servers, teleconferencing servers and any server that
`supports a process that accepts connections from a client and
`includes a program that initiates connections to a server.
`
`[0055] Referring to FIG. 1 there is shown in schematic
`form a single client (10) connected to a single server (12)
`through the Internet. (14) in a non-redundant fashion. In this
`configuration, failure of the single server will result
`in
`failure of the client connection and loss of data.
`
`[0056] HOTSWAP also provides for a method for control-
`ling two different servers that cooperate to run two inde-
`pendent copies of a server application program in sync. One
`of these computersis called the “Master” and the otheris the
`“Back-up”. FIG. 2 a typical redundant server system in
`which HOTSWAP would be used. The client (10) transmits
`data packets over the Internet (14) to be received simulta-
`neously by a Master server (20) and a Back-up server (22).
`The client is not aware of the redundancy. The system may
`be operating with either of the two servers being designated
`as the Master or the Back-up server.
`
`[0057] While the manner of operation of HO''SWAP is
`described in the context of a single Master server and a
`single Back-up server,
`it will be understood by persons
`skilled in the art that the present invention may be adapted
`to support multiple Master server with multiple Back-up
`servers.
`
`[0058] HOTSWAP operates on both the Master and the
`Back-up servers. The same server application program also
`runs on the Master and the Back-up. Both the Master and the
`Back-up servers receive the same input from the network.
`The Master and Back-up server applications programswill
`be able to maintain the same Application State if they
`receive the same sequence of inputs from the client com-
`mencing at the time of server start-up.
`
`[0059] Both Master and Back-up servers receive the same
`input from the client. The Back-up sends its output to the
`Master. The Master receives and verifies the Back-up’s
`output and forwards it on to the client. The Back-up pro-
`duces the same output as the Master so that the Back-up is
`able to replace the Master at any time without the client’s
`intervention or knowledge.
`
`[0060] FIG. 3 showsa detailed view of the present inven-
`tion controlling a Master (20) and Backup server (22) and
`their connection (14) to a client (10). The two independent
`servers (20) and (22) that share network connection (14)are
`configured to run identical HOTSWAP programs (24) and
`(26). One computer, shown here as (20) will become the
`Master and one shown here as (22) becomes the Backup.
`Each computer starts its own copy of the HOTSWAP
`program. The two HOTSWAP programsestablish a connec-
`tion (27) with each other. HOTSWAP negotiates the roles of
`Master (20) and Backup (22), and the unique network
`address they will share. The Backup synchronizesits file
`system (28) with the Master’s file system (30). Masterserver
`and Backup server start their own server application pro-
`grams (32) and (34) respectively and begin accepting net-
`work connections (14) from the client (10).
`
`[0061] Client (10) establishes a connection to the Master
`and the Backup servers using their identical shared network
`address. The Master and Backup HOTSWAP programs (24)
`and (26) respectively accept the new connection and forward
`the connection to their local server application programs
`(32) and (34). Both copiesof the server application program
`process the client’s requests but only the Master’s output
`(15) is sent to the client. The Backup HOTSWAP program
`
`
`
`US 2004/0153709 Al
`
`Aug. 5, 2004
`
`(26) discardsits output (36) as long as it observes the Master
`producing the same output as the Back-up. The Master and
`Backup HOTSWAP programs maintain their connection
`(27) with each other. If one detects an internal error, such as
`failure to respond to a client request or if their output
`disagrees or if a System Call fails on one computer but
`succeeds on the other then it will invoke fail over. Fail over
`
`is when the faulty server terminates and the other non-faulty
`server continues. The surviving server becomes the Master
`(if it wasn’t already) and continues processing client
`requests.
`
`[0062] Each HOTSWAP program monitors its respective
`server and the network traffic between that sever and its
`
`clients to ensure both Master and Backupserversare receiv-
`ing the same input from the client and producing the same
`output. HOTSWAP maintains server synchronization by
`controlling the inputs to its respective server. If two servers
`start in the sameinitial state and receive the sameinput, they
`should produce the same output. HOTSWAPcontrols the
`inputs to is respective server by controlling that server’s
`System Calls and the Network Connection State. Transpar-
`ent fault protection requires synchronizing both Network
`Connection State and Application State between Master and
`Back-up. Synchronizing state between two running appli-
`cations is difficult. The overhead of communication between
`
`the Master and Backup programs can be prohibitive by
`degrading the performance of the application so muchthat
`it
`is not usable. HOTSWAP takes the novel approach of
`synchronizing only the initial state of the application server
`programs and inputs to independentservers. This approach
`uses less communications overhead. HOTSWAP requires
`that if both the Master and Back-up receive the same input,
`they will produce the same output. The process of control-
`ling the input of Master and Backupserversis referred to as
`synchronizing their Application State.
`
`the Master
`[0063] To synchronize Application State,
`records its output and then verifies that the Back-up pro-
`duces the same output. HOTSWAP assumes that
`if the
`Master and Back-up receive the same client input, and have
`been started in the same initial state, they will naturally
`maintain the same Application State and produce the same
`output.
`
`[0064] However, the Master may receive input from non-
`deterministic outside events. For example:
`
`[0065] All programs run under multitasking operating
`systems whichrely on hardwareinterrupts to schedule tasks.
`The order and duration each task gets the processor is not
`deterministic;
`
`[0066] The operating system may deliver asynchronous
`signals to a process at different points in execution. Two
`programswill not receive the samesignal at the same stage
`of processing;
`
`[0067] Different scheduling and event handling can cause
`the operating system to process network traffic in different
`order. In particular;
`
`[0068] New connections may be accepted in any order;
`
`[0069] Packets maybelost at one host but not on another;
`
`[0070] Outgoing packets will be assembled in different
`sized chunks due to buffering, timing, and retries;
`
`[0071] The clocks on two hosts can never be completely
`synchronized, and scheduling will never guarantee that two
`programs read the clock at the same moment;
`
`[0072] Operating systems supply arbitrary ids for system
`objects. For example, process IDs returned by fork( ), wait(
`), and getpid( ). The Master and Back-up processes will have
`different process ids;
`
`[0073] Programs may access hardware-specific files such
`as:
`
`/dev/urandom the system hardware random
`[0074]
`device;
`
`/proc/*—a Linux file system which represents
`[0075]
`the kernel’s view of processes by process ID;
`
`[0076] Some programs may depend on uninitialized
`memory for input (intentionally or not).
`
`[0077] Many of these sources of nondeterminism come
`from the operating system (38) and (40) itself through
`system calls like time( ), fork( ), getpids( ), read( ), etc.
`HOTSWAP reduces nondeterminism by synchronizing net-
`work traffic and system calls.
`
`an
`provide
`connections
`network
`[0078] Encrypted
`example of the problem of replicating non-deterministic
`system calls to synchronize Application State. When the
`client connects,
`the Master server computes a random
`encryption key by using pseudo-random inputs like the
`current time, the server’s process ID, and possibly a hard-
`ware random numbergenerator. If any of these inputs are
`different, the Backup server will compute a different encryp-
`tion key and fail to establish the same connection to the
`client. HOTSWAP captures and replicates system calls to get
`the current time, process ID, and random number devices so
`the Backup will have the same inputs to its random key
`generator as the Master, and thus both will computer the
`sameencryption key.
`
`[0079] Synchronizing the Application State
`
`[0080] HOTSWAP overcomesinherent non-determinism
`by ensuring that the files systems of the Master and Back-up
`are identical before starting the servers. Non-deterministic
`System Calls are intercepted by HOTSWAP and synchro-
`nized on the Master and the Back-up. This ensuresthat the
`Master and the Back-up receive the same results from
`otherwise non-deterministic System Calls and thus main-
`tains the same Application State on both servers.
`
`[0081] Synchronizing the initial states of the Master and
`Back is accomplished by ensuring that
`the Master and
`Back-up are relying upon the same executables, configura-
`tion files, and data files. This is be done by copying files
`from the Master to the Back-up before starting the servers.
`When the Application State of the Master and Back-up are
`synchronized,
`they will act
`in an identical manner and
`reproduce writes to local files and maintain exact duplicates
`of data files. In this manner, the Back-up operating system
`(40) is able to maintain synchronicity with the Master
`operating system (38) without using such devices as a shared
`file server or similar back-up strategies.
`
`[0082] The System Call is a function call that is processed
`ultimately by the operating system program. For example,
`on an UNIX based system programmed using C, all System
`Calls are made available by “libc.so”, the shared system
`
`
`
`US 2004/0153709 Al
`
`Aug. 5, 2004
`
`library. Different operating systems provide different mecha-
`nisms for implementing system calls. System Calls may be
`intercepted so that one program can divert the course of a
`system call before it gets into the operating system. There
`are several techniques for intercepting system calls depend-
`ing on the operating system. For example, system calls may
`be intercepted within the operating system, just before they
`get to the operating system, before they get to libe, or just
`before the application invokes the system call.
`
`[0083] FIG. 4 showsthe details of how HOTSWAP syn-
`chronizes a server’s application state by capturing local
`system calls. Server application programs(32) and (34) gain
`input from the local system by executing system calls (41)
`and (43) to open and read files, get the current date, etc.
`When a server invokes a local system call, HOTSWAP’s
`synchronization library HOTSHIM(44) and (46) catches the
`call and ensures the Master and Backup server application
`programs receive the same result.
`
`[0084] The Master HOTSWAP (24) invokes the system
`call (50) on its local operating system (52) and sends (54)the
`result to HOTSWAP (25) on the Backup. The Backup waits
`for the Master’s result. Both Master and Backup servers
`receive the Master’s result and send it (56) and (58)to their
`respective server application programs (32) and (34)
`
`Ifasystem call fails on the Master but succeeds on
`[0085]
`the Backup, the Backup mayinvoke fail over.
`
`[0086] The method for intercepting system calls depends
`on the specific mechanism that the operating system uses for
`implementing system calls. The present invention may use
`any appropriate mechanism for intercepting system calls.
`Current techniques for intercepting system calls are: (a)
`inserting a library between the server and system libraries,
`(b) redirecting function calls within the running server,or (c)
`modifying the system itself.
`
`[0087] The synchronization of System Calls can be
`affected by a variety means such as modifying the operating
`system call entry point,utilizing external debugger, dynamic
`code patching,
`and LD_PRELOAD. HOTSWAP uses
`LD_PRELOAD in a LINUX operating system as shown in
`FIG.5.
`
`[0088] FIG. 5 showsthe details of how HOTSWAP (24)
`on the Master server uses LD_PRELOAD to achieve system
`call capture on the Linux operating system. A server appli-
`cation program (32) consists of code modules (62) which
`make system calls (41), such as the time() function (66). The
`Linux operating system provides a dynamic linker (68) that
`connects the system call from the server application program
`(32) to the system library (70). The system library (70)
`passes the call to the operating system (72) also known as
`the kernel. The Linux dynamic linker (68) provides a
`mechanism known as LD_PRELOAD (74) whichallowsthe
`insertion of a “shim”library (76) between the server module
`(32) and the system library (70). HOTSWAP commandsthe
`LD_PRELOAD mechanism to intercept system calls for
`running servers before they get to the system library. Once
`the System Call
`is intercepted the Master and Back-up
`exchange the System Call information as shown in FIG.4.
`
`[0089] Synchronizing the Network Connection State
`
`[0090] A master computer may fail while clients are
`actively connected to its server application program. Trans-
`
`parent fail over requires that the backup computer must
`continue the client connection without interruption. Other
`systems for fault tolerance have limited ability to continue
`client connectionson failover. Continuing client connections
`requires synchronizing the state of the conversation between
`client and server as well as synchronizing the state of its
`network connection. HOTSWAP’s ability to preserve net-
`work connections makes it suitable for both transaction-
`oriented and continuous connections. This is one advantage
`of the present invention.
`
`[0091] A client establishes a network connection to a
`server by executing network system calls to the client’s
`operating system. The client’s and server’s operating sys-
`tems provide a network layer which encapsulates their
`conversation within a network protocol. A network protocol
`breaks a conversation into a sequence of network packets,
`which are routed and reassembled. The network protocol
`uses state variables in each packet to reassemble packets into
`the original conversation. The network layers within the
`client and server operating systems negotiate the state of the
`network protocol when the connection is e