`Dimitroff et al.
`
`USOO6742O2OB1
`(10) Patent No.:
`US 6,742,020 B1
`(45) Date of Patent:
`May 25, 2004
`
`(*) Notice:
`
`(54) SYSTEM AND METHOD FOR MANAGING
`DATA FLOW AND MEASURING SERVICE IN
`A STORAGE NETWORK
`(75) Inventors: John Dimitroff, Houston, TX (US);
`Minh Chau Alvin Nguyen, Cypress,
`TX (US)
`(73) Assignee: Hewlett-Packard Development
`Company, L.P., Houston, TX (US)
`Subject to any disclaimer, the term of this
`tent is extended
`diusted under 35
`p S g lsi), 7s es CC UCC
`a --
`y
`yS.
`(21) Appl. No.: 09/589,778
`(22) Filed:
`Jun. 8, 2000
`7
`(51) Int. Cl." ................................................ G06F 15/16
`(52) U.S. Cl. ....................... 709/217; 709/213; 709/224;
`707/10
`(58) Field of Search ................................. 709/226, 224,
`709/201, 213, 217; 707/201, 10, 8, 5, 2
`
`(56)
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`5,412,772 A 5/1995 Monson ...................... 395/155
`5,655,154 A
`8/1997 Jain et al. ................... 395/899
`5,675,802 A 10/1997 Allen et al. ................. 395/703
`5,724.575 A
`3/1998 Hoover et al. .
`... 395/610
`5,758,333 A 5/1998 Bauer et al. ................... 707/1
`5,920,700 A * 7/1999 Gordon et al. .
`... 709/226
`5.987,506 A * 11/1999 Carter et al................. 70213
`6,151,624 A 11/2000 Teare et al. ................. 700,217
`6,173,293 B1 * 1/2001 Thekkath et al. .
`... 707/201
`6,389,420 B1
`5/2002 Vahalia et al. ................. 707/8
`
`6,505.311 B1 * 1/2003 Ichinohe et al. .............. 714/56
`6,512,745 B1
`1/2003 Abe et al. ................... 370/232
`OTHER PUBLICATIONS
`“The Compaq Enterprise Network Storage Architecture
`White Paper: An Overview”; 1998 Compaq Computer
`Corp., pp. 1-14.
`“Federated Management Architecture (FMA) Specifica
`tion, Version 1, Revision 0.4; 1999 Sun Microsystems, Inc.,
`pp. 1-184.
`* cited by examiner
`Primary Examiner—David Wiley
`Assistant Examiner Phuoc Nguyen
`(57)
`ABSTRACT
`A computer network has Several machines, including
`machines having Storage Systems, and communication
`resources for communicating between the machines. A
`metadata registry having information about data Stored on
`the network is distributed on two or more machines of the
`network, and local copies of part of this metadata registry
`reside on a compute machine of the network. The metadata
`registry has a command object that comprises network
`address information about at least Some of the machines of
`the computer network that participate in a first communica
`tion. An agent monitors communications between the
`machines of the computer network for communications
`relevant to the command object, the agent modifies the
`command object by adding network address information of
`additional machines of the computer network that should
`participate in the first communication between Said
`hi
`t
`intai
`h
`fth
`tadat
`ist
`d
`macnines to maintain conerency of the metadala registry an
`local copies thereof.
`
`10 Claims, 12 Drawing Sheets
`
`10
`11
`WORKSTATION WORKSTATION
`MACHINES
`MACHINES
`
`
`
`
`
`COMPUTE
`MACHINES
`
`108
`COMPUTE
`MACHINES
`
`108
`COMPUTE
`MACHINES
`
`
`
`112
`ROUTERS OR
`SWITCHES
`
`NETWORK
`
`106
`STORAGE
`MACHINES
`
`122
`
`SCS
`BUSS
`
`FIBERCHANNEL
`CONNECTION
`
`
`
`OPTICAL
`STORAGE
`CEWICES
`
`TAPE
`BACKUP
`DRIVES
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Ex.1018 / Page 1 of 19Ex.1018 / Page 1 of 19
`
`TESLA, INC.TESLA, INC.
`
`
`
`U.S. Patent
`
`May25, 2004
`
`Sheet 1 of 12
`
`US 6,742,020 B1
`
`SYOMLAN
`
`
`
`YOSYSALNOY
`
`SAHOLIMS
`
`FOL
`
`SOF801
`
`
`
`FLNdWODSLAdHCS
`
`
`
`SANIHOVWSANIHOVIAN
`
`SaLNdWOS
`
`SANIHOVW
`
`OilObL
`
`NOLLYLSMHOM
`NOLLVLSHYOM
`
`SANIHOVIN
`
`SAINIHOVA
`
`901SOL
`
`
`
`SJOVYHOLSASVYOLS
`
`
`
`SANIHOVSANIHOVW
`
`gO}
`
`ASVHOLS
`
`SANIHDVAN
`
`L“BisS3AINC
`ca]|eOzrO®vel
`
`
`
`Sit
`
`7ANNVHWa‘S58
`
`céh
`
`SLL
`
`Wolldo
`
`AQVYOLS
`
`S3S9IAAG
`
`4a
`
`aywavo
`
`921
`
`thi
`
`
`Ex.1018 / Page 2 of 19Ex.1018 / Page 2 of 19
`Ex.1018 / Page 2 of 19
`
`TESLA, INC.TESLA, INC.
`TESLA,INC.
`
`
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`May 25, 2004
`
`Sheet 2 of 12
`
`US 6,742,020 B1
`
`
`
`ÅdOO TWO OT!
`
`
`
`80
`
`
`
`
`
`
`
`
`
`
`
`
`
`z ºfii:
`
`SEO?HTHOSE?-!
`
`
`Ex.1018 / Page 3 of 19Ex.1018 / Page 3 of 19
`
`TESLA, INC.TESLA, INC.
`
`
`
`U.S. Patent
`
`May 25, 2004
`
`Sheet 3 of 12
`
`US 6,742,020 B1
`
`
`
`
`Ex.1018 / Page 4 of 19Ex.1018 / Page 4 of 19
`
`TESLA, INC.TESLA, INC.
`
`
`
`U.S. Patent
`
`May 25, 2004
`
`Sheet 4 of 12
`
`US 6,742,020 B1
`
`
`
`
`Ex.1018 / Page 5 of 19Ex.1018 / Page 5 of 19
`
`TESLA, INC.TESLA, INC.
`
`
`
`U.S. Patent
`
`May 25, 2004
`
`Sheet S of 12
`
`US 6,742,020 B1
`
`8
`
`007
`
`ZO;
`
`Z?;ZUff
`
`10ETEO
`
`NOI LOE?]|C.
`
`J CETHO
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Ex.1018 / Page 6 of 19Ex.1018 / Page 6 of 19
`
`TESLA, INC.TESLA, INC.
`
`
`
`U.S. Patent
`U.S. Patent
`
`May25, 2004
`
`Sheet 6 of 12
`
`US 6,742,020 B1
`US 6,742,020 B1
`
`vis
`
`[|ry
`@YSANSS/
`vg‘Big
`
`3SAO
`SO PA?ACI
`
`vLg
`
`
`
`
`
`
`
`
`
`OLS
`
`@YSAuaS5
`
`GHOLLIS&/\VYusAuaS
`
`“fy
`
`0SAR
`
`g‘Bi
`
`
`Ex.1018 / Page 7 of 19Ex.1018 / Page 7 of 19
`Ex.1018 / Page 7 of 19
`
`TESLA, INC.TESLA, INC.
`TESLA,INC.
`
`
`
`
`
`
`
`U.S. Patent
`
`May 25, 2004
`
`Sheet 7 of 12
`
`US 6,742,020 B1
`
`
`
`
`
`
`
`
`
`<og "614
`
`SO E AIRICT
`
`ag "fii-+
`
`
`Ex.1018 / Page 8 of 19Ex.1018 / Page 8 of 19
`
`TESLA, INC.TESLA, INC.
`
`
`
`U.S. Patent
`
`May25, 2004
`
`Sheet 8 of 12
`
`US 6,742,020 B1
`
`¥LSLh80SL
`==|
`
`@YSANaS/\¥YSAuas
`
`OSSogs
`
`dHOLIMSi
`
`0aA
`
`ag"614
`
`JAYMSSéi80S[|Ed
`
`3SAM
`
`ag“614
`
`
`Ex.1018 / Page 9 of 19Ex.1018 / Page 9 of 19
`Ex.1018 / Page 9 of 19
`
`TESLA, INC.TESLA, INC.
`TESLA,INC.
`
`
`
`
`
`
`
`U.S. Patent
`U.S. Patent
`
`May25, 2004
`
`Sheet 9 of 12
`
`US 6,742,020 B1
`US 6,742,020 B1
`
`O°AAO
`
`9SAIN
`
`/
`
`5g“bi
`
`Ig“Bi4
`
`
`
`
`
`
`
`
`
`
`Ex.1018 / Page 10 of 19Ex.1018 / Page 10 of 19
`Ex.1018 / Page 10 of 19
`
`TESLA, INC.TESLA, INC.
`TESLA,INC.
`
`
`
`U.S. Patent
`
`May 25, 2004
`
`Sheet 10 of 12
`
`US 6,742,020 B1
`
`
`
`
`
`s
`
`
`Ex.1018 / Page 11 of 19Ex.1018 / Page 11 of 19
`
`TESLA, INC.TESLA, INC.
`
`
`
`U.S. Patent
`
`May 25, 2004
`
`Sheet 11 of 12
`
`US 6,742,020 B1
`
`1
`
`ZLA
`
`Z ”
`
`
`
`
`
`
`
`
`
`
`
`
`
`(SHOIABCI(s)HOIABCI
`
`ELLÍTO?!
`
`
`Ex.1018 / Page 12 of 19Ex.1018 / Page 12 of 19
`
`TESLA, INC.TESLA, INC.
`
`
`
`U.S. Patent
`
`May 25, 2004
`
`Sheet 12 of 12
`
`US 6,742,020 B1
`
`
`
`
`
`
`
`FOREACH
`PROBLEM
`DATA:
`
`CALL
`ALLOCATOR
`
`
`
`
`
`BETTER
`RESOURCE
`FOUND2
`
`
`
`
`
`
`
`
`
`MIRROR DATA 804
`TO NEW
`RESOURCE
`
`DEAOCATE
`OLD
`RESOURCE
`
`806
`
`Fig. 8
`
`
`Ex.1018 / Page 13 of 19Ex.1018 / Page 13 of 19
`
`TESLA, INC.TESLA, INC.
`
`
`
`US 6,742,020 B1
`
`1
`SYSTEMAND METHOD FOR MANAGING
`DATA FLOW AND MEASURING SERVICE IN
`A STORAGE NETWORK
`
`TECHNICAL FIELD OF THE INVENTION
`This invention relates in general to distributed computing
`Systems including Storage networks and, more specifically,
`to Systems and methods for measuring and managing the
`flow of data in distributed computing Systems and Storage
`networks.
`
`BACKGROUND OF THE INVENTION
`Distributed computing Systems generally include two or
`more computers that share data over a decentralized net
`work. Examples of Such Systems include Automatic Teller
`Machine (ATM) transaction-processing networks, airline
`reservation networks, and credit card transaction-processing
`networks.
`Storage networks are a Subset of distributed computing
`Systems in which the network resources used for commu
`nication between Storage resources and computation
`resources are Separate from the network resources used for
`general network communications. The Storage area network
`is an attempt at providing increased Storage performance
`through the Separation of Storage related traffic from general
`network communications. This allows general communica
`tions to go on without being blocked by data/storage traffic,
`and conversely allows data/storage traffic to occur without
`interruptions from general network communications.
`In distributed Storage Systems, there is need to Store both
`data and metadata. Metadata is information about data being
`Stored, like size, location, and ownership. Even simple
`filesystems Store metadata in their directory Structures.
`In a distributed computing System or Storage network, it
`is desirable that data be accessible, even if a single machine
`of the system fails. It is also desirable that data be stored and
`delivered to machines of the System in ways that overcome
`the differences between various machines and operating
`Systems. It is desirable that System performance be high, and
`further, that adding machines to the System increase both
`performance and Storage capacity-that the System be Scal
`able. Data should be maintained in a Secure manner, with
`Strict control over device access, and the System should be
`easily managed.
`In distributed computing Systems and Storage networks,
`there may be multiple copies of at least Some data and
`metadata existing in multiple machines Simultaneously.
`Coherency requires that these copies be identical, or inter
`locked Such that only the most recently altered version is
`subject to further modification. It desirable that data, and
`especially metadata, be Stored and access interlocked to
`enforce coherency in the multiple machines of the System to
`prevent Such faux-pas as assigning a single airline Seat to
`two different passengers.
`In particular, it is desirable that users, and user-level
`applications, not need to track and Select Storage devices and
`partitions thereon. Users or application programs should be
`able to specify Storage and performance requirements for
`data to be Stored, allowing the Storage Subsystem to Select
`the physical device. These performance requirements for
`Specific data are quality of Service (QOS) metrics. Further,
`the System should ensure that QOS requirements are met
`insofar as possible.
`Various methods have been devised for managing data
`flow in distributed computing Systems. For example, U.S.
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`Pat. No. 5,724.575 to Hoover et al. describes a system for
`managing data flow in a distributed, heterogeneous health
`care computing System. In the Hoover System, individual
`heterogeneous computers make their data homogeneous by
`mapping their heterogeneous data fields to a homogeneous
`object model through an interface server. The flow of
`homogeneous data to and from the individual computerS is
`then managed by a central object broker computer capable
`of managing data presented in accordance with the object
`model. This approach may suffer if the central object broker
`or interface Server becomes overloaded or fails, data is then
`inaccessible even if physically located on a functioning
`machine.
`Other methods for enabling data flow in distributed com
`puting systems are disclosed in U.S. Pat. Nos. 5,758,333,
`5,675,802, 5,655,154, and 5,412,772. In U.S. Pat. No. 5,675,
`802 a System for managing distributed Software develop
`ment projects has multiple, “weakly consistent, copies of a
`Source code database at multiple locations. A "mastership
`enforcer' at each location enforces access-for-change limi
`tation rules to files of the source-code database. Periodically
`the “weakly consistent” copies are Synchronized, Such that
`updated copies replace outdated copies in other locations.
`In U.S. Pat. No. 5,412,772, an object format for operation
`under Several different operating System environments is
`described. This object format includes view format infor
`mation for each object that incorporates information that
`may be accessible under only one or another of the operating
`system environments. The object format of U.S. Pat. No.
`5,412,772 is described without reference to a locking or
`other data coherency enforcement mechanism.
`U.S. Pat. No. 5,758,333 describes an application
`independent database system having a central access control
`System, a central Storage System, and a central
`These methods appear to be limited to application in a
`homogenous distributed computing System, are limited to
`point-to-point data transactions, or fail to provide the high
`level of data coherency required for Such applications as air
`travel reservation transaction processing.
`It is known that there are Systems on the market that
`provide at least partial Solutions to the problems of manag
`ing and measuring data flow in distributed computing Sys
`tems and Storage network Systems. These Systems include
`Sun JIRO which is documented on the Internet at www.ji
`ro.com. Sun's device and data management Services. The
`data Services portion of Jiro does not manage the network
`interconnect to control data flow. This could limit perfor
`mance and ability to operate in a truly heterogeneous
`environment with non-StoreX devices.
`Accordingly, there is a need in the art for an improved
`System and method for managing data flow in a distributed
`computing System.
`SOLUTION TO THE PROBLEM
`A distributed computing System implements a shared
`memory model of Storage on a network. The network may
`be a storage area network. The shared memory model
`contains a distributed metadata database, or registry, that
`provides a coherent and consistent image of the State of data
`activity, including data Storage, movement and execution,
`acroSS the Storage network. Upon the same network, but not
`necessarily in the Same machines of the network, is a
`distributed data database controlled and indexed by the
`distributed metadata registry.
`The metadata registry is implemented to provide data
`availability, reliability, scalability of the system,
`
`
`Ex.1018 / Page 14 of 19Ex.1018 / Page 14 of 19
`
`TESLA, INC.TESLA, INC.
`
`
`
`US 6,742,020 B1
`
`3
`compatibility, and Simplicity of management. The metadata
`registry also contains information about available resources
`of the System, including quality-of-service (QOS) metrics of
`each available resource, and information about transactions
`in progreSS over the network. The information about trans
`actions in progreSS is Stored in command objects of the
`metadata registry.
`The shared-memory model is implemented on top of a
`network infrastructure and under the file System and oper
`ating System-it acts as an abstraction layer that masks the
`out differences between the hardware and software plat
`forms. This allows incorporation of new technologies with
`out redesign of an entire System.
`In order to ensure coherency between multiple copies of
`Sections of the distributed metadata registry, an agent may be
`injected onto a Switch of a storage network, onto a router of
`a general network, or maintained on a System on which a
`Section of the metadata registry is Stored. This agent moni
`tors communications between machines that write metadata
`and machines on which a Section of the metadata registry is
`stored for creation of write command objects. When a write
`command object is created, the agent adds additional des
`tination machines to the write command object Such that
`those machines will be updated when the write command
`eXecuteS.
`Command objects ready for execution are evaluated for
`potential dependencies and order-specific execution require
`ments. Those without potential conflicts or order-specific
`flags, as well as those whose potential conflicts have cleared
`or are the earliest pending in order-specific Sequences, are
`executed. Command objects having incomplete potential
`dependencies may, but need not, be executed Speculatively.
`A machine of a network in accordance with this invention
`includes an application programming interface (API) that
`receives generic input/output (I/O) commands and database
`commands from a Software thread of an application. Code
`associated with the API converts the generic I/O commands
`received from the Software thread to I/O commands specific
`to the metadata registry and data operations. The converted
`I/O commands are forwarded to machines Storing the meta
`data and data for translation to I/O commands for individual
`Storage devices. An object registry provides database Ser
`vices in response to the database commands received from
`the Software thread.
`In another embodiment of this invention, a data transac
`tion in a distributed computing System is managed by
`generating a data Structure, or command object, for the data
`transaction within the metadata registry maintained in the
`shared-memory model. The data structure includes a plural
`ity of objects that describe the parameters of the data
`transaction. IDS unique within the distributed computing
`System are generated for at least Some of the objects, and the
`data transaction is added to a list of pending transactions
`within the registry. The transaction-related objects are then
`asSociated with one another through links, and a command
`is generated to initiate the transaction.
`In an embodiment of the invention, an allocation proceSS
`handles requests for allocation of Storage resources to a
`Software thread of an application or to data to be Stored in
`the System. The requester may specify a Specific resource, or
`may specify desired QOS attributes of the resource. If the
`requester Specifies a specific resource, the allocation proceSS
`allocates that resource if it is available. If the requester
`Specifies desired QOS attributes, the allocation proceSS
`Searches the metadata registry for Suitable resources meeting
`the QOS requirements. The QOS requirements may include
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`Specifications of Search engines or other processing power of
`a machine to which the resource is attached, as well as
`interface Specifications of the resource to that machine. The
`QOS requirements are Stored in the metadata registry along
`with information regarding the data, including the actual
`physical location of the data.
`Each machine having data monitors the actual quality of
`Service it provides to user and application programs running
`in the System. There is a dynamic load-balancing process
`that periodically examines data to identify data for which the
`QOS requirements have not been met, for which the QOS
`requirements have been greatly exceeded, or that may Soon
`overflow a resource. This process then moves that data to
`more optimum locations in the network. Movement of data
`to more optimum locations may involve movement to
`resources having a different QOS, or closer proximity to
`machines on which the user and application programs are
`.
`System management utilities may also examine and alter
`the QOS requirements for Specific data of the System So as
`to permit optimization of problem user and application
`programs run in the System.
`The combination of the automatic resource allocation
`process and the load-balancing process together provide a
`utility-style operation, where Storage requests need not
`Specify physical locations in the network, and where prob
`lem data migrates to the best available resource in the
`System.
`In another embodiment of the invention, when data or
`metadata of the System is Searched for desired information,
`an inquiry is made of the machines on which the information
`may be located to determine if those machines have
`application-specific executable code, Such as Search code for
`locating desired information. If that machine lacks the
`application-specific executable code, the requesting appli
`cation transmits the application-specific executable code to
`the machines on which the information may be located. The
`application-specific executable code is then executed on the
`machines on which the information may be located; and
`those machines return appropriate responses. Operation in
`this manner minimizes data transfer over the network during
`Search operations.
`BRIEF DESCRIPTION OF THE FIGURES
`FIG. 1 is a block diagram showing a distributed comput
`ing System in accordance with this invention;
`FIG. 2 a block diagram illustrating processes and data on
`a compute machine from the distributed computing System
`of FIG. 1;
`FIG. 2A, a block diagram illustrating processes and data
`on a Storage machine from the distributed computing System
`of FIG. 1;
`FIG. 3 an illustration of the structure of the registry,
`showing its command object, data location, and resource
`availability Sections, and how these may be duplicated in
`part on multiple machines of the System;
`FIG. 4, an illustration of a command object of the registry;
`FIGS.5, 5A, 5B, 5C, 5D, 5E, 5E, and 5G are illustrations
`of operations involving a write command object operative
`between machines of the System;
`FIG. 6, an illustration of how an agent may intercept
`construction of a write command object and add additional
`destinations thereto;
`FIG. 7, a block diagram of resource allocation; and
`FIG. 8, a flowchart of load balancing.
`
`
`Ex.1018 / Page 15 of 19Ex.1018 / Page 15 of 19
`
`TESLA, INC.TESLA, INC.
`
`
`
`US 6,742,020 B1
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`S
`DETAILED DESCRIPTION
`AS shown in FIG. 1, a distributed computing System in
`accordance with this invention includes multiple computing
`machines interconnected via a network 104. Some machines
`of the System are Storage machines 106 that primarily
`function to Serve Storage resources to other machines of the
`network. Other machines of the System are compute
`machines 108 that primarily Serve as computation resources
`that use Storage Served by Storage machines 106. There may
`also be workstation machines 110 that interact with the
`compute machines 108. It will be understood by those
`having skill in the technical field of this invention that the
`invention is not limited to any particular distributed com
`puting System, computing device, or network. The Storage
`15
`machines 106 may, but need not, be RAID storage systems.
`The network may, and usually does, incorporate one or more
`routers or Switches 112 as well as communications links of
`the kinds known in the art. Machines of the distributed
`computing System may be single or multiple processor
`machines, of general purpose or of dedicated purpose as
`known in the art.
`In the art of networking a device for communicating
`packets between machines, including compute machines
`108 and storage machines 106, over communication
`resources that may vary with the identity of the machines
`and the availability of communication resources is known as
`a router. In the art of Storage area networks, Similar devices
`for communicating packets between machines is known as
`a Switch. For consistency, both devices are called Switches
`herein.
`The network may have the form of a storage area network,
`where separate physical network hardware is utilized for
`most Storage-related communications, including metadata
`communications. This technique avoids conflict between
`Storage-related communications and WorkStation-compute
`machine communications.
`Attached to storage machines 106 of the system are
`Storage resources, Such as disk drives 114, optical Storage
`devices 116, tape backup drives 118, RAID sets 120, or other
`Storage devices as known in the art of computing. Intercon
`nections between Storage resources and Storage machines
`may have the form of a SCSI buss 122, a fiberchannel
`connection 124, an IDE cable 126, or Such other interfaces
`as are or may become known in the art of computing.
`A compute machine 108 of the distributed computing
`system of FIG. 1 has one or more threads 200 of an
`application program, which may be a program for accessing
`a database, or any other program that requires access to data
`Stored on the distributed computing System. This thread
`communicates to an application programming interface
`(API) module 202 also executing on the compute machine.
`When the API starts, it locates a root, or copy thereof, of a
`metadata registry (see below) that exists on the System,
`copies it, and maintains a local copy 204 of at least Some
`parts of the metadata registry. The metadata registry has a
`tree Structure, So once the root is located, links of the tree can
`be followed to locate all other elements of the registry.
`The local copy 204 of portions of the metadata registry
`contains information 206 about data stored on the system
`that the compute machine 108 has need to access, informa
`tion 208 on pending and executing commands involving
`data relevant to compute machine 108, and any information
`210 on resources 212 served to other machines of the system
`by compute machine 108. The API 202 communicates
`through a network interface module 214 onto the network of
`the System.
`
`65
`
`6
`The compute machines 108 communicate through a net
`work interface module 240 (FIG. 2A) executing on a storage
`machine 106 of the system to a device interface module 242.
`The compute machines also communicates with metadata
`registry modules 244 that exist on those of the Storage
`machines having local copies 245 of portions of the meta
`data registry. Each Storage machine also has a registration
`process that permits it to register its resources as available
`when the System is initialized. Storage machines having
`registry metadata modules 244 may have an intelligent
`registry-operation interceptor 246 that recognizes operations
`that may require updates to local copies 204 maintained on
`machines other than those initiating a transfer.
`Switches of the System may have an intelligent registry
`operation interceptor that recognizes operations that may
`require updates to local copies of the metadata registry
`maintained on machines other than those initiating and
`initially addressed by a transfer.
`The metadata registry of the system has a root 300 (FIG.
`3) that is accessible in at least one location in the System.
`LinkS from the root may be traced to a pending command
`tree 302, a data descriptor tree 304, and a resource avail
`ability tree 306. Within the command tree 302 may be one
`or more command objects.
`A command object of the metadata registry, as illustrated
`in FIG. 4, has a command object root 400. This command
`object root 400 links to direction objects 402 and transaction
`objects (not shown). The direction objects 402 link to
`transport objects 404, which may link through media objects
`406 to data objects 408.
`The metadata registry command object portion Stores
`eleven or more types of objects, each having its own data
`Structure having at least the following fields:
`Command Object
`Command Field
`Link To Direction Object(s)
`Link To Transaction Object(s)
`where “Command Field' is a String comprising one of the
`I/O commands Store, Retrieve, Erase, and Execute. A Com
`mand Object is typically linked to at least two Direction
`Objects-a Source and a destination-and at least one
`Transaction Object (multiple transactions may use the same
`Command Object).
`Direction Object
`Position Field (Source, Intermediary, Destination)
`Link To Transport Object(s)
`Link To Transaction Object(s)
`where “Intermediary” is used if the Direction Object is an
`intermediate object through which the data must travel. A
`Direction Object typically has links to at least one Transport
`Object and at least one Transaction Object.
`Transport Object
`ID Field
`Protocol Field
`Link To Location Object(s)
`Link To QOS Object(s)
`Link To Media Object(s)
`Link To Transaction Object(s)
`where “ID Field” is the ID of the Transport Object (e.g.,
`LAN 1), and “Protocol Field” specifies the protocol to be
`used (e.g., TCP/IP).
`Media Object
`ID Field
`Size Field
`Link To Location Object(s)
`Link To QOS Object(s)
`
`
`Ex.1018 / Page 16 of 19Ex.1018 / Page 16 of 19
`
`TESLA, INC.TESLA, INC.
`
`
`
`US 6,742,020 B1
`
`7
`Link To Management Object(s)
`Link To Data Object(s)
`Link To Lock Object(s)
`Link To Transaction Object(s)
`where “ID field” is the ID of the Media Object (e.g.,
`SEAGATE DRV 01), and “Size Field” is the size of the
`Media Object (e.g., 18.1 GB). Note that the Media Object
`includes information about the machine on which the
`resource is located.
`Data Object
`ID Field
`Sequence Number Field
`Size Field
`Position Field
`Link To Location Object(s)
`Link To QOS Object(s)
`Link To Management Object(s)
`Link To Lock Object(s)
`Link To Transaction Object(s)
`Link To Timing Object(s)
`where “ID Field” is the ID of the Data Object, “Sequence
`Number Field” is a sequence number for the Data Object if
`it is one Sequential portion of a larger data file, for example,
`“Size Field” is the size of the Data Object in bytes, and
`“Position Field” is the position of the Data Object in
`memory or Storage.
`Management Object
`Ownership Field
`Access Field
`Security Field
`where “Ownership Field' identifies who owns a Media
`Object or DataObject, “Access Field' identifies who should
`have access, and “Security Field” identifies whether data to
`or from the Media Object or Data Object should be
`encrypted, compressed, etc.
`Quality Of Service (QOS) Object
`Bandwidth Field
`Latency Field
`Sequencing Properties Field (In-Order/Linear, Out-Of
`Order/Random-Access)
`where “Bandwidth Field' identifies the bandwidth required
`for a transaction, “Latency Field' identifies the latency
`permissible, and “Sequencing Properties Field” identifies
`whether data must be accessed Sequentially or can be
`random.
`Location Object
`ID Field
`Level Of Operations (Local, Wide Area, Regional,
`Global, etc.)
`Lock Object
`ID Field
`Access Mask
`Link To Timing Object(s)
`where “Access Mask” is a field identifying whether an
`object is locked.
`Timing Object
`ID Field
`Duration Field
`Transaction Object
`ID Field
`A data transfer command object initially created by the
`API 202 running in a compute machine 108 when the API
`202 is requested to initiate an operation by an application
`program. This object is created with a command object 400,
`direction objects 402, data objects 408, and other permis
`Sible object fields as needed to specify the operation as
`
`5
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`8
`known to the application. Data movement properties are
`determined by the data Structure. Destinations and Sources
`for data can be explicitly Specified or matched to Specified
`parameters. The data object-media object connections can
`also serve to define the data element to be transferred. This
`allows for the use of multiple levels of operation to Suit an
`application's or device's needs. Leaving out the data object
`connection to a media object (on the Source side), for
`instance, would allow for the copying of an entire device.
`Similarly, multiple media objects may connect to a single
`data object-this would allow for data Sets that span drives.
`A QOS template 600 (FIG. 6) may be placed by an
`application or the API into creating the command object of
`FIG. 4. Before the command object is executed, the QOS
`template is replaced with detailed destination information
`601 by a resource allocator 700, FIG. 7. In generating the
`destination information 601 (FIG. 6), which includes allo
`cated Storage device information 702, network port and
`Switch preferred routing information 704, and, if requested,
`allocated processing device information 706, the resource
`allocator considers resource availability lists 708 and
`observed QOS data 710 as well as desired QOS information
`in the QOS template 712. The needed storage 713, including
`capacity and latency, is compared to that of the available
`devices 716 to determine those devices, or combinations of
`devices, that meet or exceed the requirements. Needed
`bandwidth 714 as specified in the QOS template 712 is
`compared to that of the available devices 716 and the
`available network ports 717 and network device and link
`capacity 718 of the network interconnect that connects the
`available devices to the machine originating the command
`object. Observed network device 730, port 732, storage
`device 734, and processing device 736 QOS information is
`also considered to avoid placement of high bandwidth data
`on resources that may have plenty of Space, but which
`already have much of their bandwidth consumed by other
`tasks. The resource allocator 700 also considers the avail
`ability of processing devices 720 as compared to the pro
`cessing needs 722 requested in the QOS template. The
`resource allocator may also set up alternate destination paths
`for overflow data should a device become full.
`Users and application programs can fully Specify desti
`nation and intermediary devices, Such as Switches, or they
`may let the System determine the optimal network path for
`the transfer.
`FIGS. 5, 5A, 5B, 5C, 5D, 5E, 5E, and 5G illustrate
`operations involving a command object. Referring to these
`figures, as well as FIG. 2, the command object 500 is
`initially created by the API 202 (FIG. 2) upon request 502
`of the application program 200 in the local copy 204 of
`portions of the metadata registry of the machine 110 origi
`nating the transfer.
`This command object is forwarded 510 (FIG. 5A) from
`the compute machine 108 upon which it originat



