throbber
US 6,779,082 B2
`(10) Patent No.:
`a2 United States Patent
`
` Burgeretal. (45) Date of Patent: Aug. 17, 2004
`
`
`US006779082B2
`
`(54) NETWORK-BASED DISK REDUNDANCY
`STORAGE SYSTEM AND METHOD
`Inventors: Eric William Burger, McClean, VA
`US); Walt
`h Ore
`
`(75)
`
`Nechun NH(US)Andy Spitzer, North
`Andover, MA (US); Barry David
`Wessler, Potomac, MD (US)
`
`wo
`
`FOREIGN PATENT DOCUMENTS
`PCT/US02/03315
`5/2002
`OTHER PUBLICATIONS
`
`(73) Assignee: Ulysses ESD,Inc., San Jose, CA (US)
`
`
`
`Udo Kelter, “Discretionary Access Controls in a High—Per-
`formance Object Management System”,
`IEEE 1991, p.
`288-299.
`* cited by examiner
`.
`.
`Primary Examiner—Donald Sparks
`
`(*) Notice: Subject to any disclaimer, the term of this—Asséstant Examiner—BrianR. Peugh ; ;
`
`patent is extended or adjusted under 35
`(74) Attorney, Agent, or Firm—Morgan, Lewis & Bockius
`U.S.C. 154(b) by 476 days.
`LLP
`(57)
`
`(21) Appl. No.: 09/777,776
`
`ABSTRACT
`
`An embodimentof the invention described in the specifica-
`Feb. 5, 2001
`tion and drawingsis a distributed and highly available data
`Prior Publication Data
`storage system. In one embodiment,
`the distributed data
`storage system includesa plurality of data storage units that
`US 2002/0144058 Al Oct. 3, 2002
`are controlled by an object management system. The object
`(51) Unt, C1. eeececeeeseeeeeeeeeteeteeneeeees GO6F 12/00 en se preferentially poles the distributed
`(52) US. Ch. ceececceeeceee 711/114; 711/162; 711/4
`ala storage unis for perlorming the Te Access reques’s
`according to the external inputs/outputs with which the file
`
`(58) Field of Search 0.0... 711/114, 162, acccss requests are associated. In responsetoafile ercation
`TA/112, 4
`request
`that is associated with an external
`input of one
`.
`distributed data storage unit, the object management system
`References Cited
`preferentially creates a data file tn that distributed data
`U.S. PATENT DOCUMENTS
`storage unit. In response to a file retrieval request that is
`associated with a data file and an external output of a
`A
`*
`5,511,177
`distributed data storage unit, the object management system
`......... TAL/A14
`4/1996 Kagimasaet al.
`5,673,381 A *
`9/1997 Tluai et al.
`......
`
`preferentially returns a hostname and pathname of a copy of
` - seus
`5,751,883 A
`5/1998 Ottesen et al.
`..
`*
`the data file that is stored within that distributed data storage
`10/1998 Vishlitzky ct al. sss... F114
`5,819,310
`
`
`unit. The object management system also makes redundant
`scccesccsess--- 709/202
`10/2000 Beck et al.
`6,138,139
`*
`
`
`6,167,494 A*12/2000 Cheston etal. .
`. TU/162
`copies of the data files in different units to provide high
`
`.. 707/201
`availability of data.
`6,298,356 B1
`10/2001 Jawaharetal. .
`
`
`6,467,034 B1*10/2002 Yanaka.......... we FAL/162
`
`6,493,825 Bl*12/2002 Blumenauctal. .......... 713/168 15 Claims, 8 Drawing Sheets
`
`**
`
`AA
`
`(22) Filed:
`(65)
`
`(56)
`
`400
`
`|— 419
`
`|-— 420
`
`Application sends new fila
`request (associated with
`an extemal I/O
`connection) to OMS.
`
`OMSidentifies and
`preferentially selects the
`distributed data storage
`unit thatis assoclated
`with the extemal VO
`
`connection.
`
`OMScalls the name
`service onthe selacted|439
`data storage unit, and
`asks for a uniquefile
`nameto be allocated.
`
` ¥
`The selected data storage
`unit assigns a file name|— 440
`that is unique within the
`
`data storageunit.
`
`Application racords
`informationin the
`selected data storage
`unit, using the assigned
`file narne.
`
`_— 450
`
`HPE, Exh. 1001, p. 1
`
`
`
`
`HPE, Exh. 1001, p. 1
`
`

`

`U.S. Patent
`
`Aug. 17, 2004
`
`Sheet 1 of 8
`
`US 6,779,082 B2
`
`peerseenenneneeneenseenetceeeeceteeceeeeenecetneeecetnecteeveetteeneeneeeiveecnnbinnAD
`
`10
`
`[
`
`—vO
`
`[
`
`VO
`
`f
`
`0
`
`OY
`
`
`
`
`
`F) Unit||Unit Unit Unit
`
`
`
`130a
`130b
`130¢
`oman
`130n
`
`!
`
`|
`
`
`
`Network
`
`Network
`
`\— Connection
`
`peceeeees sd
`
`OMS Manager
`Unit
`(Secondary)
`1106
`
`Network
`Connection
`
`
`
`
`
`
`
`
`
`Network
`Network
`Network
`Network
`
`Connection
`Connection | Connection Connection
`
`Network Switch 105
`
`
`
`
`Connection
`
`
`
`
`
`
`OMS Manager
`Unit
`(Primary)
`110a
`
`
`
`
`Application
`Server
`
`
`
`Figure 1
`
`
`
`150
`
`HPE, Exh. 1001, p. 2
`
`HPE, Exh. 1001, p. 2
`
`

`

`U.S. Patent
`
`Aug. 17, 2004
`
`Sheet 2 of 8
`
`US 6,779,082 B2
`
`Distributed Data
`Storage Unit 130a
`
`
`
`“212
`
`Mass
`
`Storage
`Subsystem
`208
`
`
`
`
`
`CPU
`202
`
`|
`
`
`ile Copying Service
`
`Object Management System|— 249
`
`Memory 206
`
`[
`Operating System
`
`
`Networking module |— 234
`|— 236
`External 1/0 module
`
`
`232
`
`ce Namin Service
`
`|-— 242
`
`34d
`
`
`
`External I/O |-— 210
`Subsystem
`
`
`
`(
`
`}
`
`External /O
`
`Network
`Interface
`
`204
`
`
`
`
`To Network
`Swtich 105
`
`Figure 2
`
`HPE, Exh. 1001, p. 3
`
`HPE, Exh. 1001, p. 3
`
`

`

`U.S. Patent
`
`Aug.17, 2004
`
`Sheet 3 of 8
`
`US 6,779,082 B2
`
`OMS
`
`Manager
`Unit 110a oN
`
`M
`306
`a emory
`
`232
`
`234
`
`Operating System
`
`-
`Networking Module
`External /O module
`
`|~ 2°6
`
`
`
`
`
`
`
`
`
`
`i
`
`| Mass Storage
`' Subsystem
`y
`308
`
`|— 240
`
`242
`
`31200| sbject Management
`System
`f
`File naming service
`~ 244
`~
`File copying service
`24
`y— 248
`OMSWork Queue
`
`[24Unit Selector }— 248
`.
`.
`a 250
`OMS File Mapping Tabie
`OMSFile State Table °°"
`OMSUnit State Table | 254
`File Creation Module
`— 260
`File Replication Module
`|— 270
`File Retrieval Module
`
`310
`
`
`
`External 0
`Subsystem |
`TTP
`Piiidi add
`
`External I/O
`lines
`
`Network
`Interface
`
`304
`
`To Network
`
`Swtich 105
`
`.
`
`Figure 3
`
`HPE, Exh. 1001, p. 4
`
`HPE, Exh. 1001, p. 4
`
`

`

`U.S. Patent
`
`Aug. 17, 2004
`
`Sheet 4 of $
`
`US 6,779,082 B2
`
`400
`
`
`
`
`file name.
`data storage unit.
`tf
`
`Application sends newfile
`request (associated with
`an external I/O
`connection) to OMS.
`
`
`|— 410
`
`OMS identifiesand
`preferentially selects the
`
`distributed data storage - 420
`
`unit that is associated
`with the external I/O
`connection.
`
`OMS calls the name
`service on the selected
`data storage unit, and
`asks for a uniquefile
`nameto be allocated.
`
`-— 430
`
`y
`
`
`The selected data storage
`unit assignsa file name
`that is unique within the
`
`L— 440
`
`Application records
`information in the
`selected data storage {
`unit, using the assigned
`
`450
`
`Po
`
`Fig. 4
`
`HPE, Exh. 1001, p. 5
`
`HPE, Exh. 1001, p. 5
`
`

`

`U.S. Patent
`
`Aug. 17, 2004
`
`Sheet 5 of 8
`
`US 6,779,082 B2
`
`300
`
`
`
`Application sends replication — 510
`request to OMS.
`
`y— 920
`OMS queuesthe replication
`request
`
`OMSselects atargetdata
`storage unit.
`
`~~ 930
`
`
`
`OMS stores the sourcefile
`information and notes that the
`file is not redundant.-
`
`OMS contacts the target unit's
`|— 550
`nameservice, requesting a new
`file allocation.
`
`
`"I
`|OMS contacts the target unit's
`file copy service, requesting a ~ 560
`
`copy from sourcefile to target
`file.
`
`
`
`
`
`
`the targetfile information in the a 570
`
`| Upon completion, OMS stores
`
`OMS, and marksthetargetfile
`as a redundantcopy.
`
`Fig. 5
`
`HPE, Exh. 1001, p. 6
`
`HPE, Exh. 1001, p. 6
`
`

`

`U.S. Patent
`
`Aug.17, 2004
`
`Sheet 6 of 8
`
`US 6,779,082 B2
`
`600
`
`
`
` Application contacts the OMS with the
`
`nameof the sourcefile in the
`application name-space ("handle").
`
`610
`
`
`
`
`OMS queuesthe file retrieval request.
`
`|— 620
`
` 4 O
`
`MS looks up the “handle,”
`preferentially selects the file stored in
`the data storage unit with the mostidle
`capacity, and returns the hostname and
`pathnameofthefile to the application.
`
`[|
`
`
`
`Application retrieves the file by passing
`the hostname and pathname ofthe file
`to the appropriate data storage unit.
`
`Fig. 6
`
`HPE, Exh. 1001, p. 7
`
`HPE, Exh. 1001, p. 7
`
`

`

`U.S. Patent
`
`Aug. 17, 2004
`
`Sheet 7 of 8
`
`US 6,779,082 B2
`
`700
`
`
`
`Application sendsfile pn"
`copy requestto the
`OMS.
`
`
`
`
`
`| OMS queuesthefile /— 720
`copy request.
`
`
`
`OMSincreases the
`link count of thefile
`
`and updates the OMS
`file mapping table with
`any new handle.
`
`
`Fig. 7
`
`HPE, Exh. 1001, p. 8
`
`HPE, Exh. 1001, p. 8
`
`

`

`U.S. Patent
`
`Aug.17, 2004
`
`Sheet 8 of 8
`
`US 6,779,082 B2
`
`800
`
`
`
`—- 810
`Application sendsafile delete
`request to the OMS.
`
`-——_
`
`
`
`|
`
`OMS queuesthe file delete
`request.
`
`|
`
`a NO oO
`
`830
`
`840
`
`850NY
`
`OMS removesanyapplication
`name to OMS name-space
`mapping, and decrementsthe link
`count.
`
`
`
`
`OMS deterimesif the link count
`for that file has reached zero.
`
`
`
`If the link count has reached zero,
`the OMS calls the name service
`of all the distributed data storage
`units that contain copies of the
`file, requesting the service to
`remove them.
`
`
`
`Fig. 8
`
`HPE, Exh. 1001, p. 9
`
`HPE, Exh. 1001, p. 9
`
`

`

`US 6,779,082 B2
`
`1
`NETWORK-BASED DISK REDUNDANCY
`STORAGE SYSTEM AND METHOD
`
`BRIEF DESCRIPTION OF THE INVENTION
`
`The present invention relates generally to computer data
`storage. More specifically, the present inventionrelates to a
`high-availability data storage methodology for a computer
`network.
`
`BACKGROUND OF THE INVENTION
`
`rn
`
`10
`
`RAID (Redundant Array of Inexpensive Disks)
`technology, which uses multiple disk drives attached to a
`host computer,
`is a way of making a data store highly
`available. The RAID controller or host software makes a
`redundant copy of the data, either by duplicating the writes
`(RAID 1), establishing a parity disk (RAID 3), or establish-
`ing a parity disk with striped writes (RAID 5). Greater levels
`of redundancycan be achieved byincreasing the number of
`redundant copies.
`Although RAID technology provides a highly available
`disk array, data availability is not guaranteed. For instance,
`if the host computer fails, data becomes unavailable regard-
`less of how many redundant disk arrays are used. In order to +
`provide an even higherlevel of data availability, dual-ported
`arrays, which are accessible by two host computers, are
`used. The two host computers establish a protocol between
`them so that only one writes to a given disk segment at a
`time. If onc host computer fails, the surviving host computcr
`can take over the work of the failed computer. This type of
`configuration is typical in network file servers or data base
`servers.
`
`toCc
`
`40
`
`A disadvantage of dual-ported disk arrays, however, is
`that they use a number of rather expensive components.
`Dual-ported RAID controllers are expensive. Moreover, a
`complex protocol is used by the hosts to determine which is
`allowed to write to cach disk and when thcy are allowed to
`do so. Often, host manufacturers charge a substantial pre-
`mium for clustering software.
`Beside the high costs of system components, another
`disadvantage of dual-ported disk array systems is that the
`number of host computers that can simultaneously access
`the disk array is restricted. In dual-ported disk array systems,
`data must be accessed via one or the other host computer.
`Thus,
`the number of data access requests that can be
`serviced bya disk array system is limited by the processing
`capability of the host computers.
`Yet another disadvantage with multi-ported disk arrays is 5
`that expanding the storage requires upgrading the disk array
`(for storage) or host computer (for processing). There are
`RAID arrays that expand by adding disks on carrier racks.
`However, once a carrier rack is full, the only way to expand
`the array is to get a new, larger one. The samesituation holds
`for the host computer. Some host computers, such as Sun
`6500, from Sun Microsystems of Mountain View, Calif.,
`may be expanded by adding more processors and network
`interfaces. [lowever, once the computeris full of expansion
`cards, one needs to buy a new computer to expand.
`SUMMARYOF THE INVENTION
`
`60
`
`An embodiment of the present invention is a distributed
`and highly available data storage system.
`In one
`embodiment, the distributed data storage system includes a
`network of data storage unils that are controlled byan object
`managementsystem. Significantly, whenever data is written
`
`2
`the object management system
`to one data storage unit,
`makes a redundant copy of that data in another data storage
`unit. The object management system preferentially selects
`the distributed data storage units for performing the file
`access requests according to the external inputs/outputs with
`whichthe file access requests are associated. In response to
`a file creation request that is associated with an external
`input of one distributed data storage unit, the object man-
`agementsystem will prefercntially crcate a data file in that
`distributed data storage unit. In response to a file retrieval
`request that is associated with a data file and an external
`output of another distributed data storage unit, the object
`management system will preferentially return a hostname
`and pathnameof a copy of the datafile that is stored within
`that distributed data storage unit. The object management
`system also makes redundant copies of the data files in
`different units to provide high availability of data.
`is not
`An aspect of the present
`invention is that
`it
`necessary to use expensive RAID servers. Rather,
`inexpensive, commodity disk servers can be used.
`‘Ihe
`distributed and highly available data storage systcm is also
`highly scalable, as additional disk servers can be added
`according to the storage requirement of the network.
`Another aspect of this invention is that dedicated servers
`for the disk service functions are not required. Disk service
`functions can be integrated into each data storage unit. The
`data storage units may be implemented using relatively low
`cost, gencral-purpose computcrs, such as so-called desktop
`computers or personal computers (PCs). This aspect is of
`importance to applications where I/O, CPU, and storage
`resources follow a proportional relationship.
`Yet another aspect of the present inventionisthat users of
`the system are nol tied to any specific one ofthe data storage
`units. ‘hus, individual users may exceed the storage capac-
`ity of any given data storage unit. Yct anothcr aspect of the
`present invention is that an expensive TDM (Time Domain
`Multiplexed) switching infrastructure is not required. An
`inexpensive high-speed Ethernet network is sufficient
`to
`provide for the necessary interconnection. Yet another aspect
`of the present invention is that the data storage system is
`scalable to the numberofits external I/Os.
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`For a better understanding of the invention, reference
`should be made to the following detailed description taken
`in conjunction with the accompanying drawings, in which:
`FIG. 1 is a block diagram illustrating a data storage
`system according to an embodimentof the present inven-
`tion.
`
`FIG. 2 is a block diagram illustrating the components of
`a distributed data storage unit in accordance with an embodi-
`ment of the present invention.
`FIG. 3 is a block diagram illustrating the components of
`an OMS managerunit in accordance with an embodimentof
`the present invention.
`FIG. 4 is a flow diagramillustrating the operations of the
`data storage system of TIG. 1 when creating a new file.
`FIG. 5 is a flow diagramillustrating the operations of the
`data storage system of FIG.
`1 when making a redundant
`copy ofa file.
`FIG. 6 is a flow diagram illustrating the operations of the
`data storage system of FIG. 1 when an application is
`retrieving a file.
`FIG. 7 is a flow diagram illustrating the operations of the
`data storage system of FIG. 1 when an application copies a
`file.
`
`HPE, Exh. 1001, p. 10
`
`HPE, Exh. 1001, p. 10
`
`

`

`US 6,779,082 B2
`
`3
`FIG.8 is a flow diagram illustrating the operations of the
`data storage system of FIG. 1 when an application deletes a
`file.
`
`Like reference numerals refer to corresponding parts
`throughout the drawings.
`DETAILED DESCRIPTION OF THE
`PRETERRED EMBODIMENTS
`
`aya
`
`4
`includes a central processing unit (CPU) 202, a network
`interface 204 for coupling to network switch 105, a memory
`206 (which may include randomaccess memoryas well as
`disk storage and other storage media), a mass-storage sub-
`system 208 (which mayinclude a disk subsystem for storing
`voice mail messages), an external I/O subsystem 210 (which
`may include one or more voice cards for communicating
`with a public service telephone network), and one or more
`buscs 212 for interconnecting the aforementioned clements
`of system 130.
`The network interface 204 provides the appropriate hard-
`ware and software layers to implement networking of the
`distributed data storage units. In the preferred embodiment,
`the network interface 204 is a LOOBaseTX Ethernet network
`interface, running the TCP/IP network stack.
`The external I/O subsystem 210 provides the appropriate
`hardware and software layers to implementthe interface to
`the outside world for the server. It may be another Ethernet
`interface to serve web pages, for example. It may be a
`toCc
`) Natural Microsystems AG4000cto interface with the Public
`Switched Telephony Network.In the preferred embodiment,
`it is a Natural Microsystems CG6000c to interface with the
`packet telephony network. It can be a combination of these
`or other external interfaces. Alternately,
`the external I/O
`subsystem 210 maybe a virtual interface: one can serve
`TCP/IP-based services over the network interface 210. It
`should be note that the external I/O subsystem is optional.
`For example,
`the distributed data storage unit 130a can
`simply be a file server for the network, using the network
`interface 204 for service access.
`‘The mass storage subsystem 208 providesfile service to
`the CPU 202. In the prescnt embodiment, the mass storage
`subsystem 208 runs the VxFS operating system from Veri-
`tas. Alternate embodiments include the Unix File System
`(UFS) or the WindowsNT File System (NTFS).
`Operations of the distributed data storage unit 130a are
`controlled primarily by control programs that are executed
`by the unit’s central processing unit 202.
`In a typical
`implementation, the programs and data structures stored in
`the system memory 206 will include:
`an operating system 232 (such as Solaris, Linux, or
`WindowsNT) that
`includes procedures for handling
`various basic systemservices and for performing hard-
`ware dependent tasks;
`networking software 234, which is a component of
`Solaris, Linux, and Windows2000;
`applications 236 related to the external I/O subsystem
`(e.g., an inbound voice message storage module for
`storing voice messagesin user voice mailboxes, a voice
`message playback module, etc.); and
`necessary components of the object management system
`240.
`The components of the object management system 240
`that reside in memory 206 of the distributed data storage unit
`1304 preferably include the following:
`a file naming service 242; and
`a file copying service 244.
`FIG. 3 illustrates the componcnts of an OMS manager
`unit 110@ in accordance with an embodimentof the present
`invention. Components of the secondary OMS managerunit
`110b are similar to those of the illustrated unit 110a. As
`shown, OMS managerunit 110q includes a central pracess-
`ing unit (CPU) 302, a network interface 304 for coupling to
`network switch 105, a memory 306 (which mayinclude
`random access memory as well as disk storage and other
`storage media), a mass-storage subsystem 308 (which may
`
`Reference will now be made in detail to the preferred
`embodiments of the invention, examples of which are illus-
`trated in the accompanying drawings.
`In the following
`detailed description, numerous specific details are set forth
`in order to provide a thorough understanding of the present
`invention. However, it will be apparent to one of ordinary
`skill in the art that the present invention maybe practiced
`without these specific details. In other instances, well-known
`methods, procedures, components, and circuits have not
`been described in detail so as not to unnecessarily obscure
`aspects of the present invention.
`System Components of the Data Storage System of the
`Present Jovention
`FIG. 1 is a block diagram illustrating a data storage
`system 100 according to an embodiment of the present
`invention. As illustrated,
`the data storage system 100
`includes a network switch 105 coupled to distributed data :
`storage units 130a—-130n and OMS (Object Management
`System) managers 110¢—-110b One embodiment of the
`present embodiment is implemented using a 100BaseTX
`Ethernet network, and thus,
`the network switch 105 is a
`high-speed Ethernet switch, such as the Nortel Networks
`Accelar 1200. In other embodiments of the invention, other
`types of networks, such as an ATM network may be uscd to
`interconnect the distributed data storage units 130¢—130n
`and the OMS managers 110a—110b. Also illustrated is an
`application server 150 that may be coupled to access the data
`storage system 100 via the network switch 105. Application
`programs, such as voice message application programs, may
`reside on the application server 150.
`The distributed data storage units 130¢-130n are the units
`of storage and disk redundancy. In the present embodiment,
`cach of the distributed data storage units 130a—130n has a
`plurality of external input/output (I/O) lines for coupling to
`an external system, such as a public exchange (PBX) sys-
`tem. Each of the distributed data storage units 130a—130n
`also has its own processing resources. In one embodiment,
`each distributed data storage unit is implemented using a
`low cost general purpose computer.
`The object management system (OMS)ofthe data storage
`system 100 resides on the distributed data storage units
`130a-130n and two OMS managers 110a—-110b. The OMS.
`provides nametranslation, object location, and redundancy
`management for the system 100. The OMSusesa closely-
`coupled redundancy scheme to provide a highly-available
`object management system service.
`In the present embodiment, the OMS managerresides on
`two computer systems to provide high-availability and fault
`tolerance capability. That is, if one of the primary OMS
`manager 110a@ crashes or otherwise becomes unavailable,
`the sccondary OMS manager 1105 may be uscd. In other
`embodiments, the object management system may run on
`specialized data processing hardware, or on a single fault-
`tolerant computer.
`TIG. 2 is a block diagram illustrating the components of
`the distributed data storage unit 130@ in accordance with an
`embodiment of the present invention. Components of the
`distributed data storage units 130b-130n are similarto those
`of the illustrated unit. As shown, data storage unit 130d
`
`£a
`
`51
`
`QDa
`
`HPE, Exh. 1001, p. 11
`
`HPE, Exh. 1001, p. 11
`
`

`

`US 6,779,082 B2
`
`15
`
`30
`
`5
`include a disk subsystem for storing voice mail messages),
`and one or more buses 312 for interconnecting the afore-
`mentioned elements of system 110a. The OMS managerunit
`1102 mayalso include an optional external I/O subsystem
`310.
`The OMS manager unit 110@ may include components
`similar to those of the distributed data storage unit 130a.
`Operations of the OMS manager unit 110¢ are controlled
`primarily by control programs that are executed by the
`system’s central processing unit 302. The software running
`on the OMS manager unit 110a, however, may be different.
`Particularly, as shown in FIG. 3,
`the programs and data
`structures stored in the system memory 306 may include:
`an operating system 232 (such as Solaris, Linux, or
`WindowsNT) that
`includes procedures for handling
`variousbasic system services and for performing hard-
`ware dependent tasks;
`networking software 234, which is a component of
`Solaris, Linux, and Windows2000;
`applications 236 related to the external I/O subsystem
`(e.g., an inbound voice message storage module for
`storing voice messagesin user voice mailboxes, a voice
`message playback module, etc.); and
`neccssary componcnts of the object management system
`240.
`The components of the object management system 240
`that reside on the OMS manager unit 110¢@ include the
`following:
`a file naming service 242;
`a file copying service 244;
`an OMS work queue 246;
`a unit selector module 248;
`an OMSfile mapping table 250;
`an OMSfile state table 252; and
`an OMSunitstate table 254.
`the file naming
`According to the present embodiment,
`service 242 is for obtaining a unique file name in the OMS
`manager unit 110a. The file copying service 244 is for
`copying files to and from the OMS manager unit 110a. The
`OMSwork queue 246is for storing file access requests from
`the applications. The unit sclector module 248is for sclect-
`ing one of the distributed data storage units 130a—130for
`carrying out
`the file access or duplication requests. ‘lhe
`OMSfile mapping table 250 stores the correlation between
`a file’s name in the application name-space (or “handle”)
`and the actual location of the file. The OMSfile state table
`252 stores the status of the files stored by the data storage
`system 100. ‘The OMSfile state table 252 also keeps track of
`wn5
`a “link count” for each ofthe files stored bythe data storage °
`system 100. The OMS unit state table 254 stores the status
`of the distributed data storage units 130a¢—103n.
`The secondary OMS managerunit can take over when the
`primary OMS managerunit is down.
`Tables 1-4 belowillustrate exemplary OMS work queue
`246, OMSfile mapping table 250, OMSfile state table 252,
`and OMSunitstate table 254, and their respective contents.
`
`35
`
`40
`
`55
`
`TABLE 1
`
`OMS Work Qucuc
`
`handle
`
`hostname
`
`pathname
`
`command
`
`MyFileName
`MyOtherName
`DeleteThis
`
`Unit3
`Unit2
`
`/infiles/V00,1/infile.tif
`/infiles/V00,1/voice.vox
`
`new
`copy
`delete
`
`60
`
`65
`
`6
`
`TABLE 2
`
`OMSFile Mapping Table
`
`handle
`MyOtherName
`MyOtherName
`Delete Vhis
`DelctcThis
`
`hostname
`Unit2
`Unit5
`Unit?
`Unit1
`
`pathname
`jinfiles/VUU,1/voice.vox
`fu2/V99,7/£19283.vox
`/ul/V23,44/2308tasd.tif
`/infiles/V21,8/3q49-7n.tit
`
`TABLE 3
`
`OMSFile State Table
`
`handle
`
`MyFilcName
`MyOthcrName
`AnotherFile
`
`slale
`
`New
`OK
`OK
`
`link count
`
`1
`2
`1
`
`TABLE 4
`
`OMS Unit State Table
`
`hostname
`
`Unitl
`Unit2
`Unit3
`Unit4
`Units
`Unit6
`Unit7
`Unit8
`
`state
`
`UP
`MAINT
`uP
`DOWN
`uP
`uP
`UP
`MAINT
`
`Operations of the OMS 240 will be discussed in greater
`detail below.
`Operations of the Object Management System
`FIG. 4 is a flow diagram 400 illustrating the operations of
`the data storage system 100 when creating a newfile. As
`shown,
`in step 410, when an application (e.g., a voice
`message application program running on application scrver
`150) needs to create a new data file, the application sends a
`requestto the object management system (OMS) 240 of the
`data storage system 100. Preferably, the request for a new
`file has an association with an external 110 connection. The
`requestis preferably sent to the primary OMS managerunit
`1104. Then,in step 420, the file creation module 260 of the
`OMS240 identifies and preferentially selects the distributed
`data storage unit that 1s associated with the external I/O
`conncction. But if the data storage unit that is associated
`with the external I/O connection is unavailable, the OMS
`selects an available data storage unit. The physical I/O
`stream from the external J/O connection is then converted
`into data packets, which are transmitted across the network
`and stored at the selected data storage unit.
`With reference still to FIG. 4, in step 430,the file creation
`module 260 then calls the name service of the selected
`distributed data storage unit, asking for a unique file name
`to be allocated. In step 440, the name service of the selected
`data storage unit then assigns a file name that is unique
`within the particular distributed data storage system. In step
`450, after the distributed data storage unit creates thefile, the
`application then records information into the file.
`According to one particular embodiment of the present
`invention, the data storage system 100 may be implemented
`as part of a voice message system. In this embodiment, a
`new file needs to be created for recording a new message
`
`HPE, Exh. 1001, p. 12
`
`HPE, Exh. 1001, p. 12
`
`

`

`US 6,779,082 B2
`
`8
`OMS240 with the name of the source file in the application
`name-space (or “handle”). In step 620, the OMS 240 queues
`the request in the OMS work qucuc 246. In step 630, when
`the OMS 240 works through the OMS work queue 256 and
`finds the file retrieval request, the file retrieval module 280
`of the OMS 240 then looks up the “handle” from the OMS
`file mapping table 250. Assuming that multiple copies of the
`file are stored in the data storage system 100, the OMS 240
`will preferentially select a copy that is stored within the data
`storage unit with the most idle capacity. TheOMS 240 then
`returns the hostname and pathnameofthe file to the appli-
`cation. In the present embodiment, the file retrieval module
`280 may use the unit selector module 248 to choose the
`preferred distributed data storage unit. To provide a high-
`available service,
`the file retrieval module 280 will not
`return a file stored on an unreachable node. Since multiple
`copies of every file (except the most recently created files
`that have not yet been replicated) are stored in the system
`100,
`the OMS 240 should be able to find a copy of any
`specified file on a running unit, even when one of the data
`storage unit has failed. In an alternate embodiment, the file
`retrieval module 280 returns information onall copies of the
`file to allow the application to choose the best file copy to
`use.
`
`aya
`
`toCc
`
`30
`
`35
`
`£a
`
`7
`whena call comesin on an external I/O connection. A voice
`message application, detecting that a call is coming in, will
`preferentially create a new file for recording the voice
`stream ofthe call. In the present example, the request for the
`newfile is sent to the distributed data storage unit associated
`with the incoming call. Thus,
`the same data storage unit
`receiving the physical I/O stream will be used [or recording
`the I/O stream.
`FIG. 5 is a flow diagram 500 illustrating the operations of
`the data storage system 100 when committing a file to
`redundant storage. As shown, in step 510, when the appli-
`cation is ready to committhe file to redundant storage, the
`application makesa replication request to the OMS 240.‘The
`replication request includes the source hostname, the name
`ofthe file to be replicated, and the nameofthe replicatedfile.
`In step 520, the OMS queuesthe replication request in the
`OMS work queue 246. If the application needs to know
`immediately when replication is complete,
`the OMS 240
`may perform the replication immediately and may synchro-
`nously inform the application through synchronous remote
`procedure call mechanisms.
`With referencestill to FIG. 5, in step 530, when the OMS
`240 works through the OMS work queue 246 and finds a
`replication request, the file replication module 270 of the
`OMS240 selects a target data storage unit for copying the :
`file. In one embodiment, the replication module 270 uses the
`selector module 248 that has knowledge of the current state
`of each distributed data storage unit 130a—130n. The selec-
`tor module 248 selects a target unit based on current disk,
`CPU,and I/O utilization. The selector module 248 may also
`allow a newly installed distributed data storage unit to get
`the bulk of copics without overwhelming it. Altcrnatcly, the
`selector module 248 may use less sophisticated algorithms.
`Forinstance, the selector module 248 may always pick the
`distributed data storage unit to the “left” of the source data
`storage unit. The selector module 248 mayalso randomly
`pick one ofthe distributed data storage units 130a—-130n for
`storing the replicated file.
`In step 540,
`the file replication module 270 stores the
`source file information, noting the file is not redundant. Prior
`to replication,
`the source file is initially denoted as not
`redundantto protect against a system failure while the file is
`being replicated. In step 550,the file replication module 270
`contacts the target data storage unit’s name service, request-
`ing a new file nameallocation. In step 560, upon success-
`fully obtaining a new file name from the target data storage
`unit, the file replication module 270 contacts the target data
`storage unil’s file copy service, requesting a copy from the
`source file to the target file. In step 570, when the copy is
`complete, the file replication module 270 stores the desti-
`nation file information. After successfully replicating the
`file, the file replication module 270 marks the file as being
`redundant. At this point, the OMS 240 has a relationship
`between the file’s name in the application name-space and
`the OMS namespace.
`According to one embodimentof the invention, the OMS
`240 also stores a link countfor eachfile in the OMSfile state
`table 252. The link count is the number of unique application
`references to the given file. When the application creates a
`file in the OMS 240, the OMS 240sets the link countto one.
`When the application copies the file in the OMS 240, the
`OMS 240 increments the link count. Likewise, when the
`application deletesthe file, the OMS 240 decrementsthe link
`count.
`
`With referencestill to FIG. 6, in step 640, after obtaining
`the hostname and pathnameofthe file from the OMS 240,
`the applicationretrieves the file by passing the hostname and
`pathnameto the appropriate distributed data storage unit. In
`the present embodiment, a host-to-host binary copy
`protocol, such as CacheFS from Sun Microsystems, may be
`used to send the file to the requesting applica

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket