`US 20060031842Al
`
`(19) United States
`(12) Patent Application Publication
`Neiman et al.
`
`(10) Pub. No.: US 2006/0031842 Al
`Feb. 9, 2006
`(43) Pub. Date:
`
`(54) SYSTEM AND METHOD FOR DIVIDING
`COMPUTATIONS
`
`(76)
`
`Inventors: Steven Neiman, Staten Island, NY
`(US); Roman Sulzhyk, New York, NY
`(US)
`
`Correspondence Address:
`MILBANK, TWEED, HADLEY & MCCLOY
`1 CHASE MANHATTAN PLAZA
`NEW YORK, NY 10005-1413 (US)
`
`(21)
`
`Appl. No.:
`
`11/222,470
`
`(22)
`
`Filed:
`
`Sep. 8, 2005
`
`Related U.S. Application Data
`
`(62)
`
`Division of application No. 10/177,597, filed on Jun.
`20, 2002.
`
`(60)
`
`Provisional application No. 60/355,274, filed on Feb.
`8, 2002.
`
`Publication Classification
`
`(51)
`
`Int. Cl.
`G06F 9/46
`
`(2006.01)
`
`(52) U.S. Cl. .............................................................. 718/103
`
`(57)
`
`ABSTRACT
`
`In certain aspects, the invention features a system and
`method for receiving a parent job configured to produce one
`or more descendant jobs, and scheduling computation of the
`parent job on a node computing device that is one of a
`plurality of node computing devices of a distributed com(cid:173)
`puting system. In such an aspect, the distributed computing
`system further includes a scheduler server configured to
`selectively reschedule computation of a job other than a
`parent job from any one of the plurality of node computing
`devices to another of the node computing devices. Such an
`aspect further includes preventing rescheduling of the parent
`job unless each of the descendant jobs is completed or
`terminated. In other aspects, the invention features a system
`and method for receiving, for computation by a node com(cid:173)
`puting device, a parent job configured to produce a descen(cid:173)
`dant job, wherein the node computing device is one of a
`plurality of node computing devices of a distributed com(cid:173)
`puting system that also includes a scheduler server. In such
`aspects, the distributed computing system creates the
`descendant job, and the parent and descendant jobs are
`scheduled for computation on different node computing
`devices.
`
`s
`~
`¢
`
`Service
`Manager
`700
`¢
`
`1/0 Communications Ports 425
`
`Software 430
`
`Operating
`I
`I
`System432
`I Database Management I
`Program434
`
`I
`
`CPU
`405
`
`I
`
`Input
`Device
`410
`
`TRANSACTION MANAGER 400
`
`DATA STORAGE DEVICES 415
`'
`__,
`-
`Database
`420-1
`
`,.__
`----
`
`__,
`
`'-
`
`-
`
`~
`
`~
`
`Database
`420-2
`
`'-
`
`~
`
`LJ LJ
`LJ
`
`Netflix, Inc. - Ex. 1006, Page 000001
`IPR2022-00322 (Netflix, Inc. v. CA, Inc.)
`
`
`
`Patent Application Publication Feb. 9, 2006 Sheet 1 of 13
`
`US 2006/0031842 Al
`
`APPLICATION LA YER
`
`PRESENTATION LA YER
`
`SESSION LA YER
`
`TRANSPORT LA YER
`
`NETWORK LA YER
`
`LINK.LAYER
`
`PHYSICAL LA YER
`
`FIG. I
`
`Netflix, Inc. - Ex. 1006, Page 000002
`
`
`
`~
`0
`0
`N
`rJJ
`d
`
`""""
`>
`N
`,i:..
`"""" 00
`0 8
`
`"""" ~
`0 ...,
`~ ....
`rJJ =(cid:173)~
`
`N
`
`0'I
`0
`0
`N
`-.1.C
`?"
`~
`~
`
`~ .... o· =
`O"' c::
`l
`~ .... o· =
`;:;·
`'E.
`?
`~ = ....
`~ ....
`""C
`
`(')
`
`FIG.2
`
`i .......................................................................................................................... ----------------................................................................................................................................................ ;
`
`;
`
`CACHE
`GLOBAL
`
`900
`
`1000
`
`ADMINISTRA TNE GIB
`
`COMPUTER 1+0
`
`NODE
`
`800•N
`
`• • •
`
`800·2
`
`COMPUTER
`
`NODE
`
`COMPUTER 10
`
`800-1
`
`.------.
`
`NODE
`
`/
`
`0
`
`SCHEDULER
`
`600
`
`500
`
`QUEUE
`
`SERVICE MANAGER
`
`700
`
`400
`
`TRANSACTION MANAGER
`
`BACKBONE
`COMPUTE
`
`300
`
`................................................................. ········• ······················•···································································•··························•······
`
`r
`
`,.
`
`API190
`
`1-_I
`
`100-N
`
`COMPUTER
`
`LOCAL
`
`•
`
`•
`
`•
`
`COMPUTER
`
`LOCAL
`
`100-3
`
`SYSTEM lO
`
`COMPUTER
`
`LOCAL
`
`I00-2
`
`COMPUTER
`
`LOCAL
`
`100-1
`
`Netflix, Inc. - Ex. 1006, Page 000003
`
`
`
`8
`0
`~
`0
`0
`N
`rJJ.
`d
`
`> I--"
`
`N
`.i:..
`00
`I--"
`
`~
`I--"
`0 -.,
`~
`
`~ ...
`rJJ. =(cid:173)~
`
`0'I
`0
`0
`N
`~~
`?'
`~
`l'zj
`
`~ ... o· =
`O" -....
`~
`~ ... o· =
`t "Cl -....
`~ ... ~ = ...
`
`(')
`
`(')
`
`""C
`
`FIG. 3
`
`110
`
`DATA STORAGE DEVICES
`
`COMMUNICATIONS
`
`PORTS 150
`
`INPUT/OUTPUT
`
`I
`I
`OPERA TING SYSTEM 170 I
`
`API 190
`
`APPLICATION 180
`
`SOFTWARE 160
`
`OUTPUT DEVICES
`
`140
`
`130
`
`INPUT DEVICES
`
`30
`
`APPLICATIONS DEVELOPER
`
`20
`
`USER
`
`LOCAL COMPUTER 100
`
`Netflix, Inc. - Ex. 1006, Page 000004
`
`
`
`.... 0 =
`~ ....
`O' -....
`~
`.... 0 =
`~ "Cl -....
`~ .... ~ = ....
`
`'"""'
`>
`N
`,i;;..
`'"""' 00
`8
`0
`~
`0
`0
`N
`'JJ.
`d
`
`'"""' ~
`0 ....,
`~ ....
`'JJ. =(cid:173)~
`
`,i;;..
`
`0'I
`0
`0
`N
`~~
`?'
`~
`"'!"j
`
`(')
`
`~ ....
`
`(')
`
`""C
`
`FIG.4
`
`TRANSACTION MANAGER 400
`
`410
`
`Device
`Input
`
`405
`CPU
`
`420-2
`
`Database
`
`420-1
`
`Database
`
`DATA STORAGE DEVICES 415
`
`Program434
`
`Database Management
`
`System 432
`Operating
`
`Software 430
`
`I/O Communications Ports 425
`
`¢
`700
`
`Manager
`Service
`
`¢
`~
`8
`
`Netflix, Inc. - Ex. 1006, Page 000005
`
`
`
`""""
`>
`N
`.i.
`"""" 00
`8
`Q
`~
`Q
`Q
`N
`rJ'1
`~
`
`~ -tll
`r,:i =(cid:173)~
`=-(cid:173)
`N g
`
`"""" w
`0 ..,,
`
`~~
`?'
`~
`
`~ = -s· =
`~ r::,' =-:
`
`-= = g = -{ -· ~ = -s· =
`
`FIG.5
`
`SCHEDULER 600
`
`610
`
`Device
`Input
`
`605
`CPU
`
`620-2
`
`· Database
`
`620-1
`
`Database
`
`DATA STORAGE DEVICES 615
`
`Program 634
`
`Database Management
`
`System 632
`Operating
`
`Software 630
`
`UO Communications Ports 625
`
`¢
`700
`
`Manager
`Service
`
`~ y
`
`Netflix, Inc. - Ex. 1006, Page 000006
`
`
`
`> I--'
`00 .,.
`8
`0
`~
`0
`0
`N
`rJJ
`Cj
`
`I--'
`
`N
`
`~
`I--'
`
`0'I
`
`0 ...,
`(0 ....
`rJJ =(cid:173) (0
`
`0'I
`0
`0
`N
`~~
`~
`(0
`l'zj
`
`~ .... o· =
`O" -....
`~
`~ .... o· =
`? "Cl -....
`(0 = ....
`~ ....
`""C
`
`r')
`
`r')
`
`__.,,
`....
`
`, .....
`,,---
`
`"-
`/
`DATA STORAGE DEV[CES 715
`
`-
`
`----
`
`"
`
`r-.
`r
`
`_.,.,,,
`....
`
`_.,.,,,
`'
`
`FIG.6
`
`'-
`
`I'-.
`
`'-
`
`'
`r
`
`......_
`
`720-2
`
`Database
`
`720-1 ·
`Database
`___,
`'
`
`SERVICE MANAGER 700
`
`710
`
`Device
`Input
`
`l
`
`705
`CPU
`
`Program 734
`
`Database Management
`
`System 732
`Operating
`
`Software 730
`
`1/0 Communications Ports 725
`
`<D
`
`GUI 1000
`
`Administrative
`
`Q}
`
`Scheduler 600
`
`¢
`400
`
`Manager
`Transaction
`
`i
`800-N
`
`Node Computer
`
`• • •
`
`<D
`
`800-1
`
`Node Computer
`
`Netflix, Inc. - Ex. 1006, Page 000007
`
`
`
`8
`0
`~
`0
`0
`N
`rJJ.
`d
`
`> I--"
`
`N
`.i:..
`00
`I--"
`
`~
`I--"
`0 -.,
`---l
`
`~ ...
`rJJ. =(cid:173)~
`
`0'I
`0
`0
`N
`~~
`?'
`~
`l'zj
`
`~ ... o· =
`O" -....
`~
`~ ... o· =
`t "Cl -....
`~ ... ~ = ...
`
`""C
`
`(')
`
`(')
`
`. Manager 700
`
`L_j\...... Global Cache 900
`.. A
`~ Service
`.i---7
`
`1'4
`
`(cid:141)
`
`Queue 500
`
`FIG. 7
`
`810
`
`DATA STORAGE DEVICES
`
`COMMUNICATIONS
`
`PORTS850
`
`INPUT/OUTPUT
`
`NODE COMPUTER 800
`
`I
`OPERA TING SYSTEM 870 I
`
`LAUNCHER880
`
`SOFI'W ARE 860
`
`840
`
`OUTPUT DEVICES
`
`830
`
`INPUT DEVICES
`
`30
`
`APPLICATIONS DEVELOPER
`
`20
`
`USER
`
`Netflix, Inc. - Ex. 1006, Page 000008
`
`
`
`Patent Application Publication Feb. 9, 2006 Sheet 8 of 13
`
`US 2006/0031842 Al
`
`OBTAIN WORKER MODULE HA VINO
`A COMPUTE FUNCTION THEREIN
`AND DEPLOY COMPUTE FUNCTION
`ON COMPUTE BACKBONE 300
`1610
`
`!
`
`OBTAIN REQUEST FROM A CALLING
`APPLICATION 180 TO CREATE A JOB
`(COMPUTATION SESSION) WITH THE
`COMPUTEBACKBONE300
`1620
`
`l
`
`QUEUE JOB 182 IN PERSISTENT
`STORAGE QUEUE 500
`1625
`l
`
`DETERMINE THE AVAILABILITY OF
`NODE COMPUTERS 800-1 TO 800-N
`1630
`
`!
`
`. SCHEDULE THE JOB 182 ON AN AVAILABLE
`NODE COMPUTER 800 IN ACCORDANCE
`WITH NODE ALLOCATION REQUIREMENTS
`PROVIDED IN META•INFORMATION OF JOB
`1640
`
`l
`
`SEND JOB 182 TO PROPER NODE
`COMPUTER 800 AND INITIATE JOB
`182 ON THAT NODE 800
`1650
`
`+ Q
`
`FIG. 8a
`
`Netflix, Inc. - Ex. 1006, Page 000009
`
`
`
`Patent Application Publication Feb. 9, 2006 Sheet 9 of 13
`
`US 2006/0031842 Al
`
`DETERMINE IF WORKER MODULE
`TO BE INVOKED BY THE JOB 182 IS
`LOADED INTO MEMORY 820 OF
`NODE COMPUTER 800
`1660
`
`Yes
`
`NODE COMPUTER 800 ACCESSES
`THE WORKER MODULE AND LOADS ·
`IT INTO THE MEMORY 820 OF THE
`NODE COMPUTER
`1670
`
`JOB 182 RECEIVES A TASK 186 AND
`GLOBAL DATA, AND NODE COivlPUTER
`800 INVOKES THE WORKER MODULE
`TO GET A RESULT
`1680
`
`NODECOMPUTER800MAKESTHE
`RESULT AVAILABLE ON THE
`COlvIPUTEBACKBONE300FOR
`RETRIEVAL BY THE CALLING
`APPLICATION 180
`1690
`
`FIG. 8b
`
`Netflix, Inc. - Ex. 1006, Page 000010
`
`
`
`Patent Application Publication Feb. 9, 2006 Sheet 10 of 13
`
`US 2006/0031842 Al
`
`NO
`
`COMPUTE THE COMPUTE
`FUNCTION ON NODE COMPUTER
`TO DETERMINE PARTIAL OR
`INTERMEDIATE RESULT
`1860
`
`YES
`
`RETURN PARTIAL OR
`INTERMEDIATE RESULT
`1855
`
`•
`BLOCK OTHER NODE
`COMPUTERS FROM COMPUTING
`THE COMPUTE FUNCTION WHILE
`IT IS BEING COMPUTED TO
`OBTAIN PARTIAL OR
`INTERMEDIATE RESULT
`1865
`
`FIG. lOb
`
`Netflix, Inc. - Ex. 1006, Page 000011
`
`
`
`Patent Application Publication Feb. 9, 2006 Sheet 11 of 13
`
`US 2006/0031842 Al
`
`. PARENT JOB 182 OF A CALLING APPLICATION 180 IS SENT TO A NODE
`COMPUTER 800 ACCORDING TO A SCHEDULING INSTRUCTION BY THE
`SCHEDULER 600
`1710
`
`1
`
`PARENT JOB 182 USES MET AINFORMATION TO INSTRUCT NODE
`COMPUTER 800TODMDEOtrrDESCENDANT JOBS 182-1 TO 182-N, TO
`SEND DESCENDENT JOBS 182-1 TO 182-N TO THE SCHEDULE 600 FOR
`PRIORITIZATION, AND NOT TO TERMINATE PARENT JOB 182 UNTIL
`ALL DESCENDANT JOBS 182-1 TO 182-N ARE COMPLETED
`1720
`
`1
`
`SCHEDULER 600 PRIORITIZES AND SENDS EACH DESCENDANT JOB
`182-1 TO 182-N TO AN AVAILABLE NODE COMPUTER 800 FOR
`PROCESSING
`1730
`
`l
`
`NODE COMPUTER 800 PROCESSES DESCENDANT JOB ACCORDING TO
`COMPUTE FUNCTION SPECIFIED BY MET AINFORMATION
`1740
`
`i
`
`RESULTS OF EACH DESCENDANT JOB 182-1 TO 182-N ARE STORED IN
`QUEUE 500 AND MADE AVAILABLE TO TIIE PARENT JOB 182
`1750
`
`1
`
`RESULTS OF ANY ONE DESCENDANT JOB 182-1 ARE STORED IN
`GLOBAL CACHE 900 FOR USE BY OTIIER DESCENDANT JOBS OR
`PARENT JOB 182
`1760
`
`l
`
`PARENT JOB 182 USES TIIE RESULTS FROM THE DESCENDANT JOBS
`182-1 TO 182-NTO CREATE A PARENT JOB RESULT
`1770
`
`l
`
`PARENT JOB RESULT IS MADE AVAILABLE FOR RETRJEV AL BY THE
`CALLING APPLICATION 180
`1780
`
`FIG. 9
`
`1
`
`PARENT JOB 182
`TERMINATES
`1752
`
`l
`
`ALL DESCENDANT
`JOBS 182-1 TO 182-N
`OF PARENT JOB 182
`TERMINATE UPON
`TERMINATION OF
`PARENT JOB 182
`1754
`
`Netflix, Inc. - Ex. 1006, Page 000012
`
`
`
`Patent Application Publication Feb. 9, 2006 Sheet 12 of 13
`
`US 2006/0031842 Al
`
`COMPUTE BACKBONE 300 RECEIVES A JOB 182
`FROM CALLING APPLICATION 180
`1810
`
`SCHEDULER 600 SENDS THE JOB 182 TO AN
`AVAILABLE NODE COMPUTER 800-1
`1815
`
`NODE COMPUTER 800-1 PROCESSES TASKS 186-1
`TO 186-N OF THE JOB 182 TO CREATE RESULT
`PREVIOUSLY IDENTIFIED ASA PARTIAL OR
`INTERMEDIATE RESULT TO BE CACHED
`1820
`
`NODE COMPUTER 800-1 SENDS PREVIOUSLY IDENTIFIED
`PARTIAL OR INTERMEDIATE RESULT TO GLOBAL CACHE
`900 (E.G., SAN OR RAID) FOR STORAGE THEREIN BY
`ASSIGNING KEY/RESULT PAIR TO PARTIAL OR
`INfERMEDIA TE RESULT
`1825
`
`SCHEDULER 600
`SENDS JOB 182 TO
`YES
`ANOTHERAVAILABLE - - - - - - - - (cid:173)
`NODE 800-2
`1835
`
`NODE COMPUTER 800-
`2 ACCESSES GLOBAL
`CACHE900TO
`RETRIEVE PARTIAL
`OR lNTERMEDIA TE
`RESULT
`1840
`
`SUBSEQUENT JOB 182 RUNNING ON ANY NODE
`COMPUTER 800 ACCESSES GLOBAL CACHE 900 TO
`RETRIEVE PARTIAL OR INTERMEDIATE RESULT FROM
`PRJOR JOB 182 BY PRESENTING TO GLOBAL CACHE 900
`AN ATOMIC LOOKUP FUNCTION WITH A KEY AND
`COMPUTE FUNCTION FOR EACH INTERMEC>IATE RESULT
`1845
`
`FIG. 10a
`
`Netflix, Inc. - Ex. 1006, Page 000013
`
`
`
`Patent Application Publication Feb. 9, 2006 Sheet 13 of 13
`
`US 2006/0031842 Al
`
`APPLICATION DEVELOPER 30 CREATES JOB
`182 AND WORKER MODULE 195 ON LOCAL
`COMPUTER 100
`1910
`
`CALLING APPLICATION 180 INITIALIZED AND
`LINKED TO API 190 "LOCAL" MODE FILE
`1920
`
`, r
`
`API 190 LOADS WORK.ER MODULE 1.95 INTO
`PROCESS SPACE OF LOCAL COMPUTER 100
`1930
`
`API 190 LOADS FUNCTIONAL REPLICA OF
`COMPUTE BACKBONE 300 INTO PROCESS SPACE
`OF LOCAL COMPUTER 100
`1940
`
`WORKERS 155-1 TO 155-N OF WORKER MODULE
`195 PROCESSED BY CPU 120 OF LOCAL
`COMPUTER I 00
`1950
`
`FIG. 11
`
`Netflix, Inc. - Ex. 1006, Page 000014
`
`
`
`US 2006/0031842 Al
`
`Feb.9,2006
`
`1
`
`SYSTEM AND METHOD FOR DIVIDING
`COMPUTATIONS
`
`BACKGROUND
`I. Field of the Invention
`
`[0001]
`
`[0002] The present invention relates to the structure and
`operation of computing systems, and more particularly, to
`distributed computing systems and methods of operating
`such systems.
`
`[0003]
`
`II. Description of the Related Art
`
`[0004] Certain organizations have a need for high perfor(cid:173)
`mance computing resources. For example, a financial insti(cid:173)
`tution may use such resources to perform risk management
`modeling of the valuations for particular instruments and
`portfolios at specified states of the world. As another
`example, a pharmaceutical manufacturer may use high per(cid:173)
`formance computing resources to model the effects, efficacy
`and/or interactions of new drugs it is developing. As a
`further example, an oil exploration company may evaluate
`seismic information using high performance computing
`resources.
`
`[0005] One conventional computing system includes a
`mainframe computer attached to an individual user terminal
`by a network connection. Using the terminal, a user may
`instruct the mainframe computer to execute a command. In
`this conventional system, almost all data storage and pro(cid:173)
`cessing functionality resides on the mainframe computer,
`while relatively little memory or processing capability exists
`at the terminal. This terminal/mainframe architecture may
`not, however, allow computations requested by a user to be
`computed rapidly or automatically.
`
`[0006] The open systems interconnection (OSI) model
`describes one conceptual network architecture represented
`by seven functional layers. In this model, the functions of a
`networking system in a data communications network are
`reflected as a set of seven layers, including a physical layer,
`data link layer, network layer, transport layer, session layer,
`presentation layer and application layer. One or more enti(cid:173)
`ties within each layer implement the functionality of the
`layer. Each entity provides facilities for use only by the layer
`above it, and interacts directly only with the layer below it.
`FIG. 1 depicts the seven functional layers of the OSI model.
`
`[0007] The physical layer describes the physical charac(cid:173)
`teristics of hardware components used to form a network.
`For example, the size of cable, the type of connector, and the
`method of termination are defined in the physical layer.
`
`[0008] The data link layer describes the organization of
`the data to be transmitted over the particular mechanical/
`electrical/optical devices described in the physical layer. For
`example, the framing, addressing and check summing of
`Ethernet packets is defined in the data link layer.
`
`[0009] The network layer describes how data is physically
`routed and exchanged along a path for delivery from one
`node of a network to another. For example, the addressing
`and routing structure of the network is defined in this layer.
`
`[0010] The transport layer describes means used to ensure
`that data is delivered from place to place in a sequential,
`error-free, and robust (i.e., no losses or duplications) con(cid:173)
`dition. The complexity of the transport protocol is defined by
`the transport layer.
`
`[0011] The session layer involves the organization of data
`generated by processes running on multiple nodes of a
`network in order to establish, use and terminate a connection
`between those nodes. For example, the session layer
`describes how security, name recognition and logging func(cid:173)
`tions are to take place to allow a connection to be estab(cid:173)
`lished, used and terminated.
`
`[0012] The presentation layer describes the format the data
`presented to the application layer must possess. This layer
`translates data from the format it possesses at the sending/
`receiving station of the network node to the format it must
`embody to be used by the application layer.
`
`[0013] The application layer describes the service made
`available to the user of the network node in order to perform
`a particular function the user wants to have performed. For
`example, the application layer implements electronic mes(cid:173)
`saging (such as "e-mail") or remote file access.
`
`[0014]
`In certain conventional high performance comput(cid:173)
`ing systems designed using the OSI model, the hardware
`used for computation-intensive processing may be dedicated
`to only one long-running program and, accordingly, may not
`be accessible by other long running programs. Moreover, it
`may be difficult to easily and dynamically reallocate the
`computation-intensive processing from one long running
`program to another. In the event processing resources are to
`be reallocated, a program currently running on a conven(cid:173)
`tional high performance computer system typically must be
`terminated and re-run in its entirety at a later time.
`
`SUMMARY OF IBE INVENTION
`
`[0015]
`In one aspect, the invention features a method
`including receiving, for computation by a node computing
`device of a distributed computing system, a parent job
`configured to produce one or more descendant jobs, wherein
`the node computing device is one of a plurality of node
`computing devices of the distributed computing system.
`Such a method also includes scheduling computation of the
`parent job on the node computing device. In accordance with
`such an aspect, the distributed computing system further
`includes a scheduler server configured
`to selectively
`reschedule computation of a job other than the parent job
`from any one of said plurality of node computing devices to
`another of the node computing devices, and to receive data
`descriptive of an indication that the parent job is not to be
`rescheduled unless each of the descendant jobs is completed
`or terminated. According to such an aspect, the method
`further includes preventing rescheduling of the parent job
`unless each of the descendant jobs is completed or termi(cid:173)
`nated.
`
`[0016]
`In another aspect, the invention features a distrib(cid:173)
`uted computing system including a plurality of node com(cid:173)
`puting devices, means for receiving, for computation by at
`least one of the node computing devices, a parent job
`configured to produce one or more descendant jobs. Such a
`system also includes means for scheduling computation of
`the parent job on the node computing device. In accordance
`with such an aspect, the means for scheduling is further
`configured to selectively reschedule computation of a job
`other than the parent job from any one of the plurality of
`node computing devices to another of the node computing
`devices, and to receive data descriptive of an indication that
`the parent job is not to be rescheduled unless each of the
`
`Netflix, Inc. - Ex. 1006, Page 000015
`
`
`
`US 2006/0031842 Al
`
`Feb.9,2006
`
`2
`
`descendant jobs is completed or terminated. According to
`such an aspect, the distributed computing system further
`includes means for preventing rescheduling of the parent job
`unless each of the descendant jobs is completed or termi(cid:173)
`nated.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`[0017] Features and other aspects of the invention are
`explained in the following description taken in conjunction
`with the accompanying drawings, wherein:
`[0018] FIG. 1 depicts the seven functional layers of the
`open systems interconnection (OSI) model;
`[0019] FIG. 2 illustrates a system 10 including a compute
`backbone 300 according to one embodiment of the present
`invention;
`
`[0020] FIG. 3 illustrates certain components of one
`embodiment of a local computer 100 of the system 10 shown
`in FIG. 2;
`[0021] FIG. 4 illustrates certain components of one
`embodiment of a transaction manager 400 of the system 10
`shown in FIG. 2;
`
`[0022] FIG. 5 illustrates certain components of one
`embodiment of a scheduler 600 of the system 10 shown in
`FIG. 2;
`
`[0023] FIG. 6 illustrates certain components of one
`embodiment of a service manager 700 of the system 10
`shown in FIG. 2;
`[0024] FIG. 7 illustrates certain components of one
`embodiment of a node computer 800 of the system 10 shown
`in FIG. 2;
`[0025] FIGS. Sa and Sb illustrate one embodiment of a
`method of executing a computing application using the
`system shown in FIG. 2.
`
`[0026] FIG. 9 illustrates one embodiment of a method of
`distributing computations using the system 10 shown in
`FIG. 2;
`
`[0027] FIGS. 10a and 10b illustrate one embodiment of a
`method of caching results using the system 10 shown in
`FIG. 2; and
`
`[0028] FIG. 11 illustrates one embodiment of a method of
`debugging using the system 10 shown in FIG. 2.
`
`[0029]
`It is to be understood that the drawings are exem(cid:173)
`plary, and are not limiting.
`
`DETAILED DESCRIPTION OF PREFERRED
`EMBODIMENTS
`
`[0030] Various embodiments of the present invention will
`now be described in greater detail with reference to the
`drawings.
`
`I. System Embodiments of the Invention
`
`[0031] FIG. 2 illustrates certain components of one
`embodiment of a system 10 of the present invention, which
`may generally include a number of local computers 100-1 to
`100-N in communication, via a network 200, with a compute
`backbone 300.
`
`[0032] A function of this embodiment of the system 10 is
`to service parametric computation requests of various users
`20 or groups of users. In particular, such a system 10 may
`allow each user 20 access to a service on a common
`infrastructure for performing compute dense calculations by
`dynamically allocating a portion of the compute backbone
`300 infrastructure to the user 20 for processing of each
`user's 20 distinct application. A system 10 of one embodi(cid:173)
`ment may include software that allows compute intensive
`applications to queue, schedule and prioritize their calcula(cid:173)
`tions on the infrastructure. In addition, the infrastructure and
`software of such an embodiment may operate to manage
`resource allocation, authentication, job distribution, data
`flow and fault tolerance. In accordance with this system 10,
`distinct applications may each connect to the compute
`backbone 300 infrastructure, which may perform several
`operations including prioritizing compute requests from the
`applications according to a policy (predetermined or other(cid:173)
`wise), allocating hardware and software resources, assigning
`compute requests to a proper computation resource, and
`returning results to the applications.
`
`[0033] A Local Computer 100
`
`In the embodiment depicted in FIGS. 2 and 3, each
`[0034]
`local computer 100 may generally include one or more data
`storage devices 110, a central processing unit (CPU) 120,
`one or more input devices 130, one or more output devices
`140, input/output (1/0) communications ports 150, and other
`hardware components (not shown) which facilitate perfor(cid:173)
`mance of the functions of the local computer 100 and/or the
`system 10 as described herein. In one embodiment, the
`hardware devices of a local computer 100 may be in
`communication with one another by a shared data bus and/or
`by dedicated connections (not shown). In addition, a number
`of software components 160 may run on each local com(cid:173)
`puter 100.
`
`[0035] A local computer 100-1 of one embodiment may
`be, for example, a shared memory multiprocessor machine
`made by Sun Microsystems configured to run programs
`created using the Smalltalk programming language. Another
`embodiment of a local computer 100-2 may be an IBM
`machine running programs created using the C program(cid:173)
`ming language. Yet another embodiment of a local computer
`100-3 may be an SGI machine running programs using the
`C++ and/or Java programming languages. A further embodi(cid:173)
`ment of a local computer 100-4 may include a composition
`of a number of separate devices.
`
`[0036] The data storage devices 10 of one embodiment
`may include one or more hard disk drives. However, it is to
`be understood that data storage devices 10 such as RAM,
`ROM, CD-ROM, DVD-ROM, solid state drive, floppy
`disk-drive or combinations thereof may also be included in
`the embodiment shown in FIG. 3, or in certain other
`appropriate embodiments. One embodiment of a local com(cid:173)
`puter 100-1 may include input device(s) 130 (e.g., keyboard,
`pointing/selecting device such as a mouse or track ball,
`floppy disk-drive, scanner and/or touch screen interface) that
`may enable a user 20 and/or applications developer 30 of the
`system 10 to provide information and instructions for stor(cid:173)
`age in the local computer 100 and use in operation of the
`system 10. An embodiment of a local computer 100-1 may
`also include output devices 140 ( e.g., printer, display device,
`floppy disk-drive and/or computer monitor) that may enable
`
`Netflix, Inc. - Ex. 1006, Page 000016
`
`
`
`US 2006/0031842 Al
`
`Feb.9,2006
`
`3
`
`a user 20 and/or applications developer 30 to receive, for
`further manipulation and/or storage, information generated
`using the local computer 100 and/or the system 10. The 1/0
`communications ports 150 of a local computer 100-1 of one
`embodiment may be serial and parallel, and may be config(cid:173)
`ured to include multiple communications channels for
`simultaneous connections. The software components 160
`may include an operating system 170 ( e.g., Linux, Unix,
`Microsoft Windows NT), one or more user interface tools
`175, calling applications 180, and an application program
`interface (API) 190. One embodiment of the system 10 may
`include ten or more local computers 100-1 to 100-N.
`
`[0037]
`
`i. Calling Application 180
`
`In one embodiment, a calling application 180 may
`[0038]
`be a computer program that contains logic to achieve or
`produce an outcome for a user 20. The software architecture
`of certain applications may conceptually consist of four
`layers: user interface and ad hoc calculation tools; logic;
`persistence; and high performance computing. The user 20
`may send certain computation intensive portions of a par(cid:173)
`ticular calling application 180 (i.e., the high performance
`computing layer) to the compute backbone 300 for process(cid:173)
`ing rather than have the local computer 100 process those
`computation intensive portions. In accordance with one
`embodiment, the user 20 may do so by (i) creating one or
`more worker modules 195-1 to 195-N (e.g., shared libraries,
`executable files compliant with a compute backbone 300,
`Java archive files and/or other archive files), each of which
`contains one or more compute functions or engines called
`"workers"l55-1 to 155-N, (ii) deploying the worker mod(cid:173)
`ules 195-1 to 195-N on the compute backbone 300, and (iii)
`sending to the compute backbone 300 a job 182 that requests
`the compute backbone 300 to perform a computation using
`a worker 155 contained in a worker module 195 that has
`been deployed on the compute backbone 300. A worker 155
`may be constructed to conform to and operate with the API
`190, and may conceptually "plug" into the infrastructure of
`the compute backbone 300 (in particular, to the launcher 880
`as described below in section v.). A compute function may
`be implemented in a number of ways including, without
`limitation, as a function, as a class method or as an execut(cid:173)
`able constructed to be compatible with the compute back(cid:173)
`bone 300. In accordance with one embodiment, a worker
`155 may be capable of staying initialized after completing a
`computation in order to handle additional compute requests
`should the scheduler 600 send such requests to the node
`computer 800 on which the worker 155 is invoked.
`
`[0039] According to one embodiment, a worker 155 may
`be capable of computing tasks 186-1 to 186-N once loaded
`onto the compute backbone 300. For example, a worker 155
`may be a function that takes task inputs and returns a task
`output or an error indication. Furthermore, a worker 155
`may itself create a job 182 and schedule tasks 186-1 to
`186-N with the compute backbone 300, thereby further
`subdividing computations to be performed.
`
`[0040] A job 182 may be conceptualized as a means for
`opening or establishing a computation session with the
`infrastructure of the compute backbone 300. In one embodi(cid:173)
`ment, a job 182 may include and supply to the compute
`backbone 300 certain defining requirements or parameters
`for a computation session. In particular, one embodiment of
`a job 182 may include meta-information, such as an iden-
`
`tification of a particular worker 155 to be used with the job.
`In one embodiment, meta-information supplied by a job 182
`identifies only one worker 155 such that all jobs 182-1 to
`182-N on the compute backbone may have a generally
`homogeneous format. In another embodiment, meta-infor(cid:173)
`mation supplied by a job 182 may identify more than one
`worker 155-1 to 155-N.
`
`[0041] Other optional meta-information may
`include
`information about the priority of the job 182 in relation to
`other jobs, a specification of minimal hardware requirements
`(e.g., minimum RAM and/or CPU power) for the job 182, a
`specification of a minimum number of nodes to be allocated
`in order for the particular job 182 to be run properly or
`efficiently, the amount of debugging information the job 182
`is to provide while it is running, and task logic to control
`sequencing and control of task computation (e.g., fail all
`tasks if one task fails, one task is dependent upon another
`task).
`
`[0042] According to one embodiment, certain meta-infor(cid:173)
`mation may be changed while a job 182 is running. For
`example, the priority of the job 182 may be adjusted by a
`user 20 without terminating or suspending the job 182. As
`another example, a user 20 may modify the amount of
`debugging information the job is to provide while it is
`running.
`In one embodiment, a job 182 may also contain one
`[0043]
`or more tasks 186 and inputs which collectively represent a
`unit of computational work to be performed by a processor.
`Such inputs may include optional global data. A particular
`worker 155 of a worker module 195 deployed on the
`compute backbone 300 may perform each task 186-1 to
`186-N. Global data and task inputs 187-1 to 187-N may
`combine to represent the inputs to a particular computation.
`For example, a job 182 may be defined to compute the value
`of a number of financial instruments based on the market
`conditions at closing time on a particular trading day. A user
`20 may configure the job 182 such that the global data for
`the job 182 defines the market conditions at closing, and
`each instrument may be represented by a separate task 186.
`In such a case, the task inputs 187-1 to 187-N and global
`data would be supplied to generate task output 189. How(cid:173)
`ever, inputs (e.g., global data and/or task inputs 187-1 to
`187-N) need not be provided to a job 182 at the time the job
`182 is created. In addition, tasks 186-1 to 186-N need not be
`supplied at the time of job 182 creation. A job 182 also may
`have a dynamic collection of one or more tasks 186-1 to
`186-N.
`
`[0044] A task 186 may be an encapsulation of a single
`computation to be performed by the compute backbone 300.
`A task 186 has an input object 187 (i.e., the input needed for
`a calculation), and on success it will have an output object
`or an error indication 189. At any point in time a task 186
`also has a state 188, such as an indication of whether the task
`186 has been completed or not (e.g., queued, running,
`completed, rescheduled, suspended, or error), and produce
`log data as generated by the worker 155. In accordance with
`one embodiment, a worker 155 on the compute backbone
`300 loads a worker module 195, performs a requested
`computation, and creates task output 189.
`
`[0045]
`In one embodiment, calling applications 180-1 to
`180-N running on the local computers 100-1 to 100-N are
`programmed to interface with the compute backbone 300. In
`
`Netflix, Inc. - Ex. 1006, Page 000017
`
`
`
`US 2006/0031842 Al
`
`Feb.9,2006
`
`4
`
`particular, a calling application 180 running on a particular
`local computer 100 is compatible with the API 190 also
`running on that local computer 100. For example, a calling
`application 180 created in C programming language may be
`compatible with the C language API 190 running on a
`particular local computer 100. In such an example, a portion
`of the API 190 may communicate with both the calling
`application 180 and the compute backbone 300 in the
`following manner. First, a calling application 180 may send
`a request, in C langua



