`US 6,321,323 Bl
`(10) Patent No.:
`
`(45) Date of Patent: Nov.20, 2001
`Nugrohoet al.
`
`US006321323B1
`
`(54) SYSTEM AND METHOD FOR EXECUTING
`PLATFORM-INDEPENDENT CODE ON A
`CO-PROCESSOR
`
`(75)
`
`Inventors: Sofyan I. Nugroho, Sunnyvale; Anil K.
`Srivastava; Rohit Valia, both of
`RedwoodCity, all of CA (US)
`
`(73) Assignee: Sun Microsystems, Inc., Palo Alto, CA
`(US)
`
`(*) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`(21) Appl. No.: 08/884,186
`
`(22)
`Filed:
`Jun. 27, 1997
`CSV) Tite C0 ieeeeeecescessceeeeennneeneeseeeeceennenee GO6F 15/82
`(52) U.S. Ch. eee eeesesneseennens 712/34; 711/103
`(58) Field of Search 20... 709/1, 100-108,
`709/400; 710/14, 22; 712/1, 28, 30, 31,
`34, 25, 26, 27; 717/4, 5; 711/5, 103; 713/1,
`2, 100
`
`(56)
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`5,313,586 *
`5/1994 Rutman oo... eee eeeeene 712/34
`5,655,131 *
`8/1997 Davies ...ccccessseceneeectecseeeere 712/34
`5,784,553 *
`7/1998 Kolawa et al.ee 714/38
`
`5,875,336 *
`.. 7177/5
`2/1999 Dickol ct al.
`......
`5,920,720 *
`.. 717/5
`7/1999 Toutonghiet al.
`5,923,878 *
`7/1999 Marsland ......
`. T17/4
`
`5,937,193 *
`.. F17/5
`8/1999 Evoy ou...
`6,126,328 * 10/2000 Mallory et al. ween T17/4
`OTHER PUBLICATIONS
`
`
`
`“Not Just Java”, by Peter van der Linden, SunSoft Press
`1997, pp. 92-271.
`“Just Java”, by Peter van der Linden, SunSoft Press 1997,
`pp. 340-350.
`
`“The JIT Compiler API”, by Frank Yellin, Oct. 4, 1996, pp.
`1-23.
`
`“Inside Windows NT’, by Helen Custer, Microsoft Press,
`1992, pp. 15-30.
`
`“The Java Virtual Machine Specification”, by Tim Lindholm
`etal., Sep. 1996, pp. 57-82.
`
`“Rockwell Unveils a New Chip Created for Java Applica-
`tions”, by Frederick Rose, The Wall Street Journal Interac-
`tive Edition, Sep. 22, 1997, 2 pages, http:/Awww.wsj.com/
`edition/current/articles/SB874891257730788000.html.
`
`“Remote Queues: Exposing Message Queues for Optimiza-
`tion and Atomicity” by Eric A. Breweret al. 1995.*
`
`* cited by examiner
`
`Primary Examiner—St. John Courtenay, II
`(74) Attorney, Agent, or Firm—Park, Vaughan & Fleming
`LLP
`
`(57)
`
`ABSTRACT
`
`A system and method for executing platform-independent
`code on a co-processor is described. The system includes a
`processor, a main memory and the co-processor, each inter-
`connected with each other. The processor and the
`co-processor operate under control of an operating system.
`A memory manager operatively coupled to the operating
`system initializes a runtime environment
`including an
`address space in the main memory for
`the platform-
`independent code. Aruntime shim operatively coupled to the
`operaling system provides the initialized runtime environ-
`ment to the co-processor through the operating system. The
`co-processor executes the platform-independent code
`responsive to the runtime shim with reference to the address
`space in the main memory.
`
`32 Claims, 9 Drawing Sheets
`
`TS OROCESSOR a 1
`CO-PROGESSOR 34
`
`
`TRANSLOGIC
`
`
`
`
`
`
`
`
`
`BUFFER
`8B
`
`
`
`JAVA CLASS
`EEPROM
`
`BUS
`INTERFACE
`
`3
`
`Google Exhibit 1057
`Google v. VirtaMove
`
`Google Exhibit 1057
`Google v. VirtaMove
`
`
`
`U.S. Patent
`
`Nov.20, 2001
`
`Sheet 1 of 9
`
`US 6,321,323 BI
`
`MYOMLAN
`
`OL
`
`YaLHONVG
`
`AYOWSAWNIV
`
`vk
`
`VAVE
`
`82NOLLWOMddv
`
`1HOddN$
`
`S¢@SAOIAYAS
`
`ONILVaddO
`
`6éWALSAS
`
`
`
`dxuvOdOAGIA
`
`SL
`
`VWQa
`
`€e
`
`YSaTIONLNOO
`
`sna
`
`ch
`
`OZYATIONLNOD
`
`VG
`
`cc
`
`AYVONOOAS
`
`JOVYOLS
`
`Le
`
`quyvog
`
`62
`
`
`
`U.S. Patent
`
`Nov.20, 2001
`
`Sheet 2 of 9
`
`US 6,321,323 BI
`
`SOVAYSLNI
`
`éSls Le
`sng gE
`
`WOu
`
`ge
`
`SSV19VAP
`
`WOdd54
`
`VIANAG
`
`fe
`
`YasinAdd ce
`
`||||||||NIOOTSNVYL]|||LveYOSSADONYd-09
`=——_
`
`
`
`
`
`
`U.S. Patent
`
`Nov.20, 2001
`
`Sheet 3 of 9
`
`US 6,321,323 BI
`
`Gy
`
`€‘Old
`
`—————|HAOVNVW|YSdvOT
`c¢HOLVOOTIV|AYOWAW|SSVTO
`
`seat3d090NOIN
`$3009ALANOLLVOIMddv¥VATKnOr
`‘WH
`
`
`ayeS|USACSOIAWEoyWALSASHOSSADON-09
`,WAve
`
`LG
`
`
`
`LPWIHSSAWILNNYVAP
`
`esyaqvo71|0S|oe
`
`AYOOVAVT
`
`SSVTO
`
`Advi
`
`Sr
`
`NOLLWONddv
`
`LNSWO3S
`
`War
`
`
`
`
`U.S. Patent
`
`Nov. 20, 2001
`
`Sheet 4 of 9
`
`US6,321,323 B1
`
`START
`
`JAVA APPLICATION INVOKED
`60
`
`JAVA RUNTIME SHIM
`INITIALIZES JAVA
`APPLICATION INVOCATION
`61
`
`JAVA RUNTIME SHIM
`NOTIFIES CO-PROCESSOR
`62
`
`63
`
`CO-PROCESSOR EXECUTES
`JAVA APPLICATION BYTE
`CODE
`
`END
`
`FIG. 4
`
`
`
`U.S. Patent
`
`Nov.20, 2001
`
`Sheet 5 of 9
`
`US 6,321,323 B1
`
`RECEIVING INVOCATION
`61
`
`
`
`LOAD NECESSARY OBJECT
`CLASSES EXCEPT CORE OBJECT
`
`
`
`CLASSES
`70
`
`ALLOCATE JAVA APPLICATION
`ADDRESS SPACE IN MAIN MEMORY
`71
`
`
`
`LOAD JAVA APPLICATION INTO
`ALLOCATED ADDRESS SPACE
`72
`
`LOCK ALLOCATED ADDRESS SPACE
`13
`
`FIG. 5
`
`
`
`U.S. Patent
`
`Nov. 20, 2001
`
`Sheet 6 of 9
`
`US6,321,323 B1
`
`82
`
`NOTIFYING
`CO-PROCESSOR
`62
`
`OPEN JAVA CO-PROCESSOR DEVICE
`DRIVER AND SEND RUNTIME INFO,
`INCLUDING JAVA APPLICATION ADDRESS
`SPACE, FROM JAVA RUNTIME SHIM
`80
`
`INTERRUPT AND CONTEXT SWITCH CO-
`PROCESSOR VIA JAVA CO-PROCESSOR
`DEVICE DRIVER
`81
`
`SEND RUNTIME INFO, INCLUDING JAVA
`APPLICATION ADDRESS SPACE, TO CO-
`PROCESSOR
`
`RETURN
`
`FIG. 6
`
`
`
`U.S. Patent
`
`Nov.20, 2001
`
`Sheet 7 of 9
`
`US 6,321,323 B1
`
`EXECUTING JAVA
`APPLICATION BYTECODE
`63
`
`
`
`SET UP DVMA POINTER TO ALLOCATED
`ADDRESS SPACE
`90
`
`PERFORM BYTE
`CODE
`TRANSLATION
`
`93
`
`PERFORM BYTE CODE VERIFICATION
`91
`
`
`
`
`JAVA-TYPE
`NO
`CO-PROCESSOR?
`
`92
`
`
`YES §
`
`FETCH AND EXECUTE BYTE CODE
`
`USING DVMA POINTER 94
`
`RETURN
`
`FIG. 7
`
`
`
`U.S. Patent
`
`Nov. 20, 2001
`
`Sheet 8 of 9
`
`US6,321,323 B1
`
`FETCH AND
`EXECUTE
`94
`
`RETRIEVE A BYTE CODE INSTRUCTION
`100
`
` NO
`REFERENCE
`
`TO CORE CLASS?
`101
`
`
`
`
`
`
`
`
`PROCESS REFERENCE USING DYNAMIC
`LINKING TO CORE CLASS LIBRARY
`102
`
`
`
`
`
`EXECUTE BYTE
`
`SYSTEM
`CODE
`
`SERVICE CALL?
`
`
`
`INSTRUCTION
`
`
`103
`105
`
`PROCESS SYSTEM SERVICE CALL
`INTERRUPT
`104
`
`
`
`
`
`RETURN
`
`
`FIG. 8
`
`
`
`U.S. Patent
`
`Nov. 20, 2001
`
`Sheet 9 of 9
`
`US6,321,323 B1
`
`104
`
`PROCESS
`INTERRUPT
`
`
`
`CO-PROCESSOR SENDS SYSTEM SERVICE
`
`
`CALL INTERRUPT TO JAVA CO-PROCESSOR
`
`DEVICE DRIVER
`
`110
`
`
`
`SYSTEM SERVICE CALL SENT TO JAVA
`
`
`RUNTIME SHIM FROM JAVA CO-PROCESSOR
`
`DEVICE DRIVER
`
`111
`
`JAVA RUNTIME SHIM MAKES SYSTEM
`
`SERVICE CALL TO SYSTEM SERVICES 112
`
`PERFORM SYSTEM SERVICE CALL
`113
`
` JAVA RUNTIME SHIM NOTIFIES JAVA CO-
`
`PROCESSOR DEVICE DRIVER UPON
`
`
`
`
`SYSTEM SERVICE CALL COMPLETION
`
`114
`
`
`
`JAVA CO-PROCESSOR DEVICE DRIVER
`NOTIFIES CO-PROCESSOR OF SYSTEM
`
`
`
`
`SERVICE CALL INTERRUPT COMPLETION
`
`115
`
`RETURN
`
`FIG. 9
`
`
`
`US 6,321,323 Bl
`
`1
`SYSTEM AND METHOD FOR EXECUTING
`PLATFORM-INDEPENDENT CODE ON A
`CO-PROCESSOR
`
`FIELD OF THE INVENTION
`
`to platform-
`invention relates in general
`The present
`independent code and,in particular, to a system and method
`for executing platform-independent code on a co-processor.
`BACKGROUND OF THE INVENTION
`
`Soltware developers often strive to tailor or “port” their
`applications to a variety of computing platforms to achieve
`a wider user base and increased product acceptance.
`However, system-dependentvariables, such as microproces-
`sor type and operating system, make porting a difficult task.
`Moreover, ported applications must thereafter be supported
`in each computing platform-specific environment.
`Consequently,
`the overall product cost, including porting
`and support, must be weighed against the potential gains in
`the marketplace.
`An increasingly preferred alternative to porting custom-
`ized applications is to write software in a platform-
`independent programming language, such as the Java™
`programming language (hereinafter “Java”). Java™ is a
`trademark of Sun Microsystems, Inc., Mountain View,Calif.
`Writing in Java enables developers to create programs for
`diverse computing platforms independent of the particular
`microprocessors or operating systems used. Applications
`written in Java (hereinafter “Java programs”) can beutilized
`over a wide spectrum of computers, both as applications
`embedded within web pages,called “applets,” and as appli-
`cations which min stand-alone or over a distributed environ-
`ment.
`
`The Java program codeisfirst “compiled”into platform-
`independent bytecode. During runtime,
`the bytecode is
`“executed.” Presently, two forms of interpreters for execut-
`ing bytecode are used. The first form of interpreter is a
`software interpreter for executing bytecode on a line-by-line
`basis, such as the Java virtual machine (JVM)described in
`T. Lindholm & F. Yellin, “The Java Virtual Machine
`Specification,” Addison-Wesley (1997),
`the disclosure of
`which is incorporated herein by reference. The TVM is an
`application program functionally interposed as a layer
`between the Java program and the native operating system
`and hardware. However, the JVM results in a significant
`performance degradation, potentially causing a slow-down
`of up to filly times that of a comparable C or C++ program-
`ming language application.
`The other form of bytecodeinterpreter is a native instruc-
`tion translator, such as the Just-In-Time (JIT) compiler
`described in F. Yellin, “The JIT Compiler API,” ftp://
`ftpjavasoft.com/docs/jit_interface.pdf, Oct. 4, 1996, the dis-
`closure of whichis incorporated herein by reference. The JIT
`compilertranslates the bytecode into native machineinstruc-
`tions to achieve near native cade execution speeds.
`However, a onc time computation cost is incurred cach time
`an application is run, thereby causing overall slower execu-
`tion than applications compiled directly into native machine
`instructions.
`
`Therefore, there is a need for a system and method for
`accelerating execution of platform-independent code which
`avoids the slower performance of a JVM and JIT compiler.
`Preferably, such a system and method would operate con-
`currently and independently of the main processor using a
`co-processor.
`SUMMARY OF THE INVENTION
`
`The present invention enables the above problems to be
`substantially overcome by providing a system and method
`
`10
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`for executing platform-independent code using a
`co-processor. Platform-independentcodeis intercepted at an
`application layer, an interrupt for a co-processoris generated
`and the platform-independent program code is executed by
`the co-processor.
`An embodiment of the present invention is a system and
`method for executing platform-independent code on a
`co-processor. The system includes a processor, a main
`memory and the co-processor, each interconnected with
`each other. The processor and the co-processor operate
`under control of an operating system. A memory manager
`operatively coupled to the operating system initializes a
`runtime environmentincluding an address space in the main
`memoryfor the platform-independent code. A runtime shim
`operatively coupled to the operating system provides the
`initialized runtime environmentto the co-processor through
`the operating system. The co-processor executes the
`platform-independent code responsive to the runtime shim
`with reference to the address space in the main memory.
`A further embodiment of the prescnt invention is an
`apparatus for efficiently executing platform-independent
`code in a computer system. The computer system includes a
`processor and a main memory with each interconnected with
`each other. Interfacing logic interconnects the apparatus
`with the processor and the main memory and includes
`channels for exchanging control, data and address signals
`with the processor and the main memory. A co-processor
`executes the platform-independent code in coordination
`with but
`independently from the processor. A buffer is
`interconnected with the co-processor and includesa plurality
`of storage locations in which are staged segments of the
`platform-independent code prior
`to execution by the
`co-processor. A direct memory access (DMA) controller is
`interconnected with the buffer and interfaces directly to the
`main memory through the interfacing logic. The DMA
`controller stages the segments of the platform-independent
`code into the buffer from the main memory. A businternal
`to the apparatus interconnects the interfacing logic,
`the
`co-processor, the direct memory access controller, the pro-
`grammable read only memory and the read only memory.
`The interfacing logic provides the control, data and address
`signals over the internal bus.
`invention is a
`A further embodiment of the present
`method using a computer for facilitating execution of
`platform-independent program code on a co-processor. The
`computer includes a processor, a main memory and the
`co-processor with each interconnected with each other. A
`runtime environmentincluding an address space in the main
`memory in which is stored the platform-independent pro-
`gram code is initialized. The co-processor is notified to
`begin execution of the platform-independent program code
`including being provided the address space in the runtime
`environmentto the co-processor. Execution of the platform-
`independent program code by the co-processor with inde-
`pendentexecution of other program code by the processoris
`coordinated and the main memory between the address
`space in the runtime environment and the main memory
`used by the processor is managed.
`Still other embodiments of the present invention will
`becomereadily apparent to those skilled in the art from the
`following detailed description, wherein is shown and
`described only the embodiments of the invention by way of
`illustration of the best modes contemplated for carrying out
`the invention. As will be realized, the invention is capable of
`other and different embodiments and severalofits details are
`capable of modification in various obvious respects, all
`without departing from the spirit and scope of the present
`
`
`
`US 6,321,323 Bl
`
`3
`invention. Accordingly, the drawings and detailed descrip-
`tion are to be regarded asillustrative in nature and not as
`restrictive.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a schematic diagram of a system for accelerating
`execution of platform-independent program code on a
`co-processor according to the present invention.
`FIG. 2 is a schematic diagram of a daughter board
`including the co-processor for use in the system of FIG. 1.
`FIG. 3 is a functional block diagram of the operational
`components used by the system of FIG. 1.
`FIG. 4 is a flow diagram of a method for accelerating
`execution of platform-independent program code on a
`co-processor according to the present invention.
`FIG. 5 is a flow diagram of a routine for initializing an
`application invocation for use in the method of FIG. 4.
`FIG. 6 is a flow diagram of a routine for notifying the
`co-processor of a new pending application for use in the
`method of FIG. 4.
`
`FIG. 7 is a flow diagram of a routine for executing the
`application on the co-processorfor use in the method of FIG.
`
`FIG. 8 is a flow diagram of a routine for fetching and
`executing a bytecode instruction on the co-processor for use
`in the routine of FIG. 7.
`
`FIG. 9 is a flow diagram of a routine for processing an
`interrupt for use in the routine of FIG. 8.
`DETAILED DESCRIPTION
`
`I. System for Accelerating Execution of Platform-
`Independent Program Code
`
`FIG. 1 is a schematic diagram of a system 10 for accel-
`erating execution of platform-independent program code,
`such as bytecodes 51 for a Java application 28 (described
`below), on a co-processor 34 (shownin FIG. 2 and included
`as part of a daughter board 29 shown in FIG. 1) according
`to the present invention. The system 10, with the exception
`of the daughter board 29,
`is a conventional programmed
`digital computer. The individual components implementing
`the system 10 are interconnected with a central system bus
`11 used for exchanging addresses, data and control signals.
`Other forms of component interconnections are possible.
`Access requests to the system bus 11 are coordinated by a
`bus controller 12. A central processing unit (CPU) 13
`interconnected with the system bus 11 controls the execution
`of the system 10. A main memory 14 also interconnected
`with the system bus 11 stores data and instructions for
`execution by the CPU 13.
`A plurality of peripheral components can be intercon-
`nected via system bus 11, including a network interface
`controller (NIC) 8 for interconnecting the system 10 with a
`network 9 for exchanging data and control signals transmit-
`ted as a data signal in a carrier wave; a video board 15 for
`displaying program output via a monitor 16; an input/output
`(I/O) board 17 for providing user input devices, such as a
`keyboard 18 and mouse 19; and a controller 20 connected to
`secondary storage device 21, such as a hard disk or tape
`drive unit. The system 10 can also include devices for
`accepting computer-readable storage medium (not shown).
`Finally, expansion cards can be plugged into the system bus
`11 for providing additional functionality to the system 10,
`such as a daughter board 29 with a co-processor for execut-
`ing bytecode at substantially near native instruction execu-
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`tion speed. The daughter board 29 is further described
`hereinbelow with reference to FIG.2.
`
`Upon boot-up of the system 10, the operating system 24
`and support services 25, such as device drivers andrelated
`interfaces, are loaded into main memory 14. The main
`memory area occupied by the operating system 24 and
`support services 25 is generally referred to as kernel space.
`Thereafter, the system 10 under the control of the CPU 13
`runs application programs, such as a Java virtual machine
`(JVM)26, Just-In-Time (JIT) compiler 27 and Java appli-
`cation 28. The main memory area occupied by the applica-
`tion programs is generally referred to as user space. The
`program code for each application program isfirst retrieved
`from the secondary storage 21 and stored into main memory
`14 for execution by the CPU 13.
`In the described embodiment, the system 10 is an IBM-PC
`compatible microcomputer running the Windows NT oper-
`ating system environment. However, use of the methods
`described and suggested herein are not limited to a particular
`computer configuration. The system bus 11 is a peripheral
`connection interface (PCI) bus, although other types of
`system buses, such as industry standard architecture (ISA),
`NuBusand other buses, can be used. The system bus 11 is
`a 32-bit bus operating al a speed determined by the system
`board. The bus controller 12 is a standard bus interface, such
`as an Intel bus controller for a PCI bus. Finally, the CPU 13
`is an Intel i86 or compatible microprocessor, such as a
`Pentium microprocessor. Windows95, Intel and Pentium are
`trademarksor registered trademarksof their respective own-
`ers.
`
`FIG. 2 is a schematic diagram of the daughter board 29,
`including a co-processor 34, for use in the system of FIG. 1.
`The daughter board is removably interconnected with the
`system bus 30 via an expansion slot (not shown). The
`individual components on the daughter board 29 are inter-
`nally interconnected with a bus 30. Bus interface logic 31
`interfaces the bus 30 of the daughter board 29 and the system
`bus 11 by providing channels for exchanging control, data
`and address signals with the CPU 13 and the main memory
`14. The businterface logic 31 is a standard componentfor
`interfacing an expansion card to the system bus 11, such as
`the SIS85C50X PCI chipset, manufactured by Intel
`Corporation, Santa Clara, Calif.
`The co-processor 34is interconnected with the bus 30 for
`executing the Java application 28 in coordination with but
`independently from the CPU 13, as further described here-
`inbelow with reference to FIG. 3 et seq. In one embodiment
`of the present invention, the co-processor 34 is a micropro-
`cessorfor directly executing Java programs using bytecodes
`as its native instruction set, such as the picoJava micropro-
`cessor manufactured and licensed by Sun Microsystems,
`Inc., Mountain View, Calif. The picoJava microprocessoris
`described in P. van der Linden, “Not Just Java,” p. 271, Sun
`Microsystems Press (1997),
`the disclosure of which is
`incorporated herein by reference. In a further embodiment of
`the present invention, the co-processor 34 is a non-native
`Java microprocessor32, such as an Intel 186 microprocessor
`or compatible or MicroSPARC™ microprocessor, coupled
`to translation logic 33 for translating Java bytecodesinto the
`instruction set specific to the CPU 32. MicroSPARC™is a
`trademark of Sun Microsystems, Inc., Mountain View,Calif.
`The two embodiments of microprocessor logic will be
`referred to hereinafter generally as co-processor 34 and
`include both native and non-native Java bytecode instruction
`set microprocessors.
`Several additional components make up the daughter
`board 29. First, direct virtual memory acecss (DVMA)logic
`
`
`
`US 6,321,323 Bl
`
`5
`37 is interconnected with a buffer 38 and the bus 30 for
`directly accessing the main memory 14 via the system bus
`11. DVMA logic 37 could also be conventional direct
`memory access (DMA)logic. In turn,
`the buffer 38 is
`interconnected with the co-processor 34 and is used for
`caching segments of the Java application 28 prior to execu-
`tion. The buffer 38 includes a plurality of storage locations
`(not shown) in which are staged by the DVMAlogic 37
`segments of the Java application 28. In addition to staging
`program segments, the DVMAlogic 37 frees the CPU 13
`(shownin FIG. 1) from performing memoryaccesses for the
`co-pracessar 34 and enables the co-processor 34 to avoid
`memory contention with the CPU 13 while allocating and
`locking the main memory 14 via the operating system 24. In
`the described embodiment, the buffer 38 includes a level two
`cache and a cache controller that is conventional in theart.
`
`In a further embodimentof the present invention, the DVMA
`logic 37 could be replaced by conventional DMAlogic for
`providing direct memory access.
`Also, a Java class electronically-erasable, programmable
`read only memory (EEPROM)36 is interconnected with the
`bus 30 for storing Java core classes for use by the
`co-processor 34. The EEPROM 36 includes a plurality of
`storage locations (not shown) in whichare stored executable
`core program segments for Java core classes. These core
`Java classes include those classes specified in the core
`application programming interface (API) which must be
`supported by every Java system, such as described in P. van
`der Linden, “Just Java,” pp. 340-350, 2d ed., Sun Micro-
`systems Press (1997), the disclosure of which is incorpo-
`rated herein by reference. The Java class EEPROM 36stores
`bytecodes or pre-compiled native object code for Java core
`classes, depending upon whethera native or non-native Java
`co-processor 34 is used.
`Using the EEPROM 36 has several benefits. First, the
`Java core classes are staged in the EEPROM 36, thereby
`avoiding the need to fetch each core class member from the
`main memory 14 or secondarystorage 21. Thus, core class
`accesses are faster. Second, the EEPROM 36 allows upgrad-
`ing of the Java core classes via a download of new firmware
`codes. Finally,
`the EEPROM 36 creates a more secure
`computing environment by preventing spoofing of the core
`classes. The Java core classes are limited to those classes
`stored on the Java class EEPROM 36 and thus are immune
`
`In the described
`from unauthorized replacement.
`embodiment,
`the Java class EEPROM 36 is a two- or
`four-megabyte memory device.
`Finally, a read-only memory (ROM)335is interconnected
`with the bus 30 for specifying the behavior of the
`co-processor 34 via microcode instructions. The ROM 35
`includes a plurality of storage locations (not shown) in
`whichare stored the microcodeinstructions. In the described
`
`embodiment, the behavior is based on a hardware imple-
`mentation of the JVM 26 and the ROM 35is a one-megabyte
`memory device.
`
`II. System Operational Components
`
`FIG. 3 is a functional block diagram of the operational
`components 40 used by the system of FIG. 1. Each opera-
`tional componcnt40 represents a sequence of process steps
`embodied preferably in software or firmware which lead to
`a desired result presented largely in functional
`terms of
`methods and symbolic representations of operations on data
`bits within a programmed digital computer and similar
`devices. The arrows interconnecting each operational com-
`ponent 40 generally indicate a flow of data or control
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`information between the respective operational components
`40. As would be clear to one skilled in the art, the process
`steps can be embodied as code for a computer program for
`operation on a conventional programmed digital computer,
`such as system 10 (shownin FIG. 1). The program code can
`be embodied as a computer program on a computer-readable
`storage medium or as a data signal
`in a carrier wave
`transmitted over network 9.
`
`Briefly, the Java application 28 (shownin FIG. 1) is made
`up of bytecodes 51, preferably for Java, but could also be
`any form of executable, platform-independent program
`code. Conventionally, the bytecodes 51 are interpreted by
`the JVM 26 or JIT 27 interfacing the operating system 24
`using the CPU 13 directly (shown in FIG. 1). An embodi-
`ment of the present invention replaces the JVM 26 and JIT
`27 with a Java runtime shim. (“shim”) 41 which enables the
`bytecodes to be interpreted by the co-processor modules 43
`using the co-processor 34 (shownin FIG. 2). The individual
`operational components 40 will now be described, starting
`with the memory space used by the CPU 13 (shownin FIG.
`1).
`
`The memory space of main memory 14 is functionally
`divided into two types of space: user space and kernel space.
`The separation of user space and kernel space is indicated by
`dotted line 44. Other functional divisions of the memory
`space are possible. The bytecodes 51 for the Java application
`28, the JVM 26, the JIT 27 and a Java runtime shim41 reside
`in the user space. The operating system 24, including a Java
`co-processor device driver 42 (described below), the system
`services 25 and a hardware abstraction layer 45, reside in the
`kernel space.
`Within the operating system 24, the hardware abstraction
`layer 45 provides an optional interface layer between the
`individual device drivers, such as the Java co-processor
`device driver 42, and the physical hardware components of
`the system 10 (shown in FIG. 1).
`In the described
`embodiment,
`the HAL 45 is part of the Windows NT
`operating system environment, such as described in H.
`Custer, “Inside Windows NT,” Microsoft Press (1992), the
`disclosure of which is incorporated herein byreference. In
`an alternate embodiment, the HAL 45is replaced by a device
`driver architecture, such as used in the Windows 95 oper-
`ating system environment. Windows NT and Windows 95
`are trademarks of their respective holders. The present
`discussion assumes the functionally of the HAL 45 is
`transparent to the Java co-processor device driver 42.
`As conventional in the art, each Java application 28 is
`initially implemented as Java source code (not shown)
`which is compiled into bytecodes 51 using a compiler (not
`shown). Bytecodes 51 are a form of platform-independent
`program code for operation on a plurality of microproces-
`sors in an architectural neutral fashion. Unlike conventional
`
`object code which is generated for a particular processor,
`bytecodes 51 are executed at a level slightly higher than
`object code. However, bytecodes 51 can be executed without
`further compilation or modification conventionally using
`either the JVM 26 or JIT compiler 27. Moreover, bytecodes
`51 are not limited to Java applications 28 and can include
`applications written in other programming languages com-
`pilable into valid bytecodes 51, such as described in P. van
`der Linden,“Not Just Java,” p. 92, Sun Microsystems Press
`(1997), the disclosure of which is incorporated herein by
`reference. Upon invocation of a Java program, the CPU 13
`loads the JVM 26orJIT 27 into the main memory 14 (shown
`in FIG. 1) for execution.
`Currently,
`the TVM 26 is an interpreter executed at
`runtime for operating on the bytecodes 51 in a linc-by-line
`
`
`
`US 6,321,323 Bl
`
`7
`manner. The JVM 26 implements a simple stack machine
`(not shown) fortranslating the bytecodes 51 into the native
`instruction set of the CPU 13. Since the JVM 26isitself an
`
`application program, the JVM 26 operates in user space as
`a non-privileged process and does not receive the higher
`execution priority given to an operating system 24 routine
`executing in kernel space. As a result, execution of a Java
`application 28 runs as muchasfifty times slower than an
`application program written in native object code for the
`CPU 13. Thus, execution is dependent on the speed of
`translation of the bytecodes 51 by the JVM 26.
`Like the JVM 26,
`the JIT 27 is also an application
`program running in user space. However,
`the JIT 27
`attempts to avoid the overall performance penalty of line-
`by-line interpretation of the JVM 26 by pre-compiling the
`Java application 28 into native machineinstructionsprior to
`execution by the CPU 13. The actual execution of the Java
`application 28 approaches near native execution speeds.
`However, compilation and class library linking costs are
`incurred each time the Java bytecodes 51 are executed,
`thereby resulting in slower overall execution times.
`According to an embodimentof the present invention, the
`system 10 (shown in FIG. 1) is modified by the following
`operational components 40. First, the JVM 26 and JIT 27 are
`replaced byor, in a further embodiment, augmented with the
`shim 41 for accelerating execution of the bytecodes 51 on
`the co-processor 34. The purpose of the shim 41 is to trap the
`bytecodes 51 for the Java application 28 under execution
`and coordinate their execution with the co-processor 34. The
`shim 41 docs not execute the bytecodes 51 for the Java
`application 28. Rather,
`the shim 41 sets up a runtime
`environment for the co-processor 34 to execute the byte-
`codes 51 in parallel with the CPU 13.
`Functionally, the shim 41 includes a class loader 9 and a
`memory manager 50. The class loader 49 loadsand links any
`missing runtime libraries and Java non-core object classes.
`The memory manager50 initializes the runtime environment
`for the Java application 28. An address space allocator 52 in
`the memory manager 50 sets up an address space for the
`bytecodes 51 for the Java application 28 and non-core class
`instances (not shown) in the main memory 14 (shownin
`FIG. 1) while a code loader 53 loads the bytecodes 51 and
`the non-core class instances into the address space using the
`operating system 24. The memory manager 50 also ensures
`critical bytecode segments are locked into place in the main
`memory 14. The operational steps performed by the shim 41
`and its related system components,
`including the Java
`co-processor device driver 42 and co-processor components
`43, are further described hereinbelow with reference to FIG.
`4 et seq.
`Second, the operating system 24 is augmented with a Java
`co-processor device driver 42. The purpose of the Java
`co-processor device driver 42 is to coordinate the processing
`of system service requests received from the co-processor
`modules 43 with the requested device in the system 10 via
`the system services 25 component of the operating system
`24 and to interact with the shim 41.
`
`Finally, co-processor components 43 are introduced for
`actually executing the bytecodes 51 on the co-processor 34.
`The co-processor components 43 include a microcoded JVM
`46 interconnected with a Java core class library 48 and a
`Java applications segment 47. The microcoded JVM 46 is
`exccuted using the microcode stored in the ROM 35 (shown
`in FIG. 2) for specifying a firmware implementation of the
`functionality of a conventional JVM 26, such as described in
`T. Lindholm & F. Yellin, “The Java Virtual Machine
`
`10
`
`15
`
`20
`
`30
`
`35
`
`45
`
`50
`
`55
`
`60
`
`65
`
`8
`Specification,” cited hereinabove, the disclosure of which is
`incorporated herein by reference. The Java core class library
`48 is stored in the Java class EEPROM 36 (shownin FIG.2)
`as either Java bytecodes or native object code, depending on
`the type of CPU 32 (shown in FIG. 2) employed as the
`co-processor 34. During execution of the bytecodes 51 of a
`Java application 28, object references to members of a core
`class in the Java core class library 48 are preferably dynami-
`cally linked. The dynamic linking combined with low
`memoryaccess latency result in improved execution speed.
`The Java application segment 47 stores a segment of the
`bytecodes 51 for the Java application 28 presently staged in
`the buffer 38 (shown in FIG. 2). The step-wise operation of
`the operational components 40 will now be described.
`
`II. Method for Accelerating Execution of Platform-
`Independent Program Code
`
`FIG. 4 is a flow diagram of a method for accelerating
`execution of platform-independent program code, such as
`bytecodes 51 for the Java application 28, on a co-processor
`34 (shownin FIG. 2) according to the present invention. In
`the described embodiment,
`the method operates on the
`system 10 of FIG. 1, but also can operate on a functionally-
`equivalent system implementing co-processor modules 43
`which execute independently of the CPU 13. T



