throbber
(12)
`
`United States Patent
`Cochran
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 7,058,850 B2
`Jun. 6, 2006
`
`US00705885OB2
`
`(54) METHOD AND SYSTEM FOR PREVENTING
`DATA LOSS WITHN DISK-ARRAY PARS
`SUPPORTING MIRRORED LOGICAL UNITS
`(75) Inventor: Robert A. Cochran, Rocklin, CA (US)
`(73) Assignee: Hewlett-Packard Development
`Company, L.P., Houston, TX (US)
`Subject to any disclaimer, the term of this
`past l
`sts, A listed under 35
`M
`YW-
`(b) by
`ayS.
`(21) Appl. No.: 10/210,368
`(22) Filed:
`Jul. 31, 2002
`
`(*) Notice:
`
`(65)
`
`Prior Publication Data
`US 2004/OO78.638 A1
`Apr. 22, 2004
`s
`(51) Int. Cl.
`(2006.01)
`G06F II/00
`(52) U.S. Cl. ................................. 7146,7144, 71443
`(58) Field of Classification Search
`s
`s 714f6
`- - - - - - - - - - - - 714/43 4 42
`See application file for complete search histo s is
`pp
`p
`ry.
`References Cited
`
`(56)
`
`U.S. PATENT DOCUMENTS
`6,543,001 B1 * 4/2003 LeCrone et al. ...............
`
`6,587,970 B1* 7/2003 Wang et al. .................. T14? 47
`6,691.245 B1* 2/2004 DeKoning ..................... T14?6
`6,728,898 B1 * 4/2004 Tremblay et al. .............. T14?6
`6,785,678 B1* 8/2004 Price ............................. 707/8
`6,816,951 B1 * 1 1/2004 Kimura et al. .............. 711/162
`2002/00999 16 A1* 7/2002 Ohran et al. ................ T11 162
`* cited by examiner
`
`Primary Examiner Robert Beausoliel
`Assistant Examiner Christopher McCarthy
`
`(57)
`
`ABSTRACT
`
`An additional communications link between two mass
`storage devices containing LUNs of a mirrored-LUN pair, as
`well as incorporation of a fail-safe mass-storage-device
`implemented retry protocol to facilitate non-drastic recovery
`from communications link failures within the controllers of
`the two mass-storage devices, prevents build-up of WRITE
`requests in cache and Subsequent data loss due to multiple
`communications-link and host computer failures. The com
`bination of the additional link and the retry protocol together
`ameliorates a deficiency in current LUN-mirroring imple
`mentations that often leads to data loss and inconsistent and
`unrecoverable databases.
`
`12 Claims, 22 Drawing Sheets
`
`
`
`714
`
`LAN OR WAN
`
`
`
`704
`
`HOST
`COMPUTER
`
`HOST
`COMPUTER
`
`710
`
`
`
`
`
`MASS-STORAGE
`DEVICE
`
`ESCON, ATM, T3 OR OTHER
`
`MASS-STORAGE
`DEVICE
`
`
`
`708
`
`HPE, Exh. 1009, p. 1
`
`

`

`U.S. Patent
`U.S. Patent
`
`Jun. 6, 2006
`Jun.6, 2006
`
`Sheet 1 of 22
`Sheet 1 of 22
`
`US 7,058,850 B2
`US 7,058,850 B2
`
`
`
`
`
`HPE, Exh. 1009, p. 2
`
`HPE, Exh. 1009, p. 2
`
`

`

`U.S. Patent
`
`Jun. 6, 2006
`
`Sheet 2 of 22
`
`US 7,058,850 B2
`
`302
`
`REQUEST/REPLY BUFFER
`
`CONTROLLER HARDWARE LOGIC
`
`
`
`
`
`
`
`CONTROLLER
`FIRMWARE
`
`DSK MEDIA
`READ/WRITE
`MANAGEMENT
`FRMWARE
`
`
`
`307
`
`
`
`308
`
`303
`
`305
`
`306
`
`
`
`DISK
`
`Fig. 3
`
`HPE, Exh. 1009, p. 3
`
`

`

`U.S. Patent
`
`Jun. 6, 2006
`
`Sheet 3 of 22
`
`US 7,058,850 B2
`
`498
`
`ABSTRACT
`UNS
`
`.
`499
`f
`
`-
`
`.
`
`.
`410
`- - -
`-
`
`.
`
`.
`411
`- - -
`
`.
`
`.
`412
`- - -
`
`.
`
`.
`415
`- - -
`
`.
`
`.
`.
`415
`414
`- - -
`----
`
`.
`
`- - -
`
`
`
`l
`
`-
`
`- - -
`
`- - -
`
`- - -
`
`- - -
`
`- - -
`
`--
`
`- - -
`
`
`
`407
`
`Gi
`
`DISK ARRAY CONTROLLER
`
`402
`
`HPE, Exh. 1009, p. 4
`
`

`

`U.S. Patent
`U.S. Patent
`
`Jun. 6, 2006
`Jun.6, 2006
`
`Sheet 4 of 22
`Sheet 4 of 22
`
`US 7,058,850 B2
`US 7,058,850 B2
`
`505
`
`r
`
`|
`
`501
`
`0;
`
`0%
`
`4 05
`
`506
`
`507
`
`7 NZS
`
`Stf
`
`7 |
`
`Sy 9% og Vv
`
`7
`
`('S9
`
`|
`|
`ee | 510
`LUNA \
`
`502
`
`|
`
`503
`
`|
`[
`od
`LUN BO
`
`504
`
`fig. 9
`
`HPE, Exh. 1009, p. 5
`
`HPE, Exh. 1009, p. 5
`
`

`

`U.S. Patent
`
`Jun. 6, 2006
`
`Sheet S of 22
`
`US 7,058,850 B2
`
`
`
`019
`
`809909
`
`D] Raes || No.vº109~~[]
`
`Z19
`
`HPE, Exh. 1009, p. 6
`
`

`

`U.S. Patent
`
`Jun. 6, 2006
`
`Sheet 6 of 22
`
`US 7,058,850 B2
`
`Ole“|ISOH
`JOVUOLS~SSYN
`YALNdNOD
`
`ell
`
`80L
`
`JDIAIC
`
`Z‘aly
`
`Ble
`
`NVMdONVI
`
`POL
`
`LSOH
`
`YaLAdNOD
`
`9
`OL
`
`OIL
`
`COL
`
`
`
`YFHLOYOFl‘ALY‘NOIS3
`
`JOVYOLS-SSVN
`
`JOA
`
`HPE, Exh. 1009, p. 7
`
`HPE, Exh. 1009, p. 7
`
`
`
`

`

`U.S. Patent
`
`Jun. 6, 2006
`
`Sheet 7 of 22
`
`US 7,058,850 B2
`
`#38
`
`
`
`þ08
`
`909
`
`9380Z8
`
`
`
`
`
`
`
`
`
`
`
`HPE, Exh. 1009, p. 8
`
`

`

`U.S. Patent
`U.S. Patent
`
`Jun. 6, 2006
`Jun. 6, 2006
`
`Sheet 8 of 22
`Sheet 8 of 22
`
`US 7,058,850 B2
`US 7,058,850 B2
`
`
`
`a9‘OLYYNITJALVOINAAHOD
`
`Y3HLOdOFL
`
`
`
`O18.
`
`
`
`
`HaLAdNOdHLAGNOo
`{SOH-{SOH
`
`
`
`‘04“NLY‘NOOS3
`
`
`
`JOIAIG_JOVYOLSSSYAJOIAIOJOVAOLSSSYN
`
`
`
`
`
`HPE, Exh. 1009, p. 9
`
`HPE, Exh. 1009, p. 9
`
`
`

`

`U.S. Patent
`U.S. Patent
`
`Jun. 6, 2006
`Jun. 6, 2006
`
`Sheet 9 of 22
`Sheet 9 of 22
`
`US 7,058,850 B2
`US 7,058,850 B2
`
`
`
`LSOH
`
`YALNdNOD
`
`"LSOH
`
`
`
`YIiNdN0d *
`
`
`
`‘94‘WLY‘NOOS3
`
`YIHLOYO£1
`
`
`
`ANT)JALLYOINNANOD
`
`
`
`JNAIG_FOVYOLSSSVA
`
`
`
`HPE, Exh. 1009, p. 10
`
`HPE, Exh. 1009, p. 10
`
`
`
`
`

`

`U.S. Patent
`U.S. Patent
`
`Jun. 6, 2006
`Jun.6, 2006
`
`Sheet 10 of 22
`Sheet 10 of 22
`
`US 7,058,850 B2
`US 7,058,850 B2
`
`S
`
`S
`So
`
`
`
`H||
`
`4
`
`Hil
`
`
`
`
`
`
`
`e=)
`c n
`-R
`
`@_90_19)
`
`
`
`
`
`
`
`
`e
`
`O
`se
`
`rt
`
`or
`
`6—
`
`a?
`
`S
`s
`
`so
`
`
`
`02
`
`S
`
`Fig.9A
`
`S.
`
`
`
`904
`
`HPE, Exh. 1009, p. 11
`
`HPE, Exh. 1009, p. 11
`
`

`

`U.S. Patent
`
`Jun. 6, 2006
`
`Sheet 11 of 22
`
`US 7,058,850 B2
`
`1005
`
`1008
`
`X
`
`1001
`
`f002
`
`Of 6
`
`1918
`
`1005 Fig, 10A
`
`1006
`
`f004
`
`f012
`
`f014
`
`1010
`
`1022
`
`
`
`f02O
`
`1024 Fig. 10B
`
`HPE, Exh. 1009, p. 12
`
`

`

`U.S. Patent
`U.S. Patent
`
`Jun. 6, 2006
`
`Sheet 12 of 22
`
`US 7,058,850 B2
`US 7,058,850 B2
`
`ANOWAK YITIOVINODa.
`
`YIHLOYOSL
`
`
`
`ANTJAULYOINAWNOD
`
`
`
`ISOHISOH
`
` dadA09uaLNdNOd
`
`LSOHISOH
`
`zgral
`
`
`
`‘D4‘WLY‘NOOS3
`
`HPE, Exh. 1009, p. 13
`
`HPE, Exh. 1009, p. 13
`
`

`

`U.S. Patent
`U.S. Patent
`
`Jun. 6, 2006
`Jun. 6, 2006
`
`Sheet 13 of 22
`Sheet 13 of 22
`
`US 7,058,850 B2
`US 7,058,850 B2
`
`Vel‘sty
`
`VŽI '5||
`
`LSOH
`
`YALNdN09
`
`YITIONLNOD
`
`AMON
`
`1SOH
`
`
`
`
`
`
`
`‘O4“WIV‘NOOSI
`
`YaHLOYOSL
`
`
`
`ANT)SALLVOINNWNOD
`
`
`
`
`
`
`
`
`
`YALAdNOd
`
`HPE, Exh. 1009, p. 14
`
`HPE, Exh. 1009, p. 14
`
`

`

`U.S. Patent
`U.S. Patent
`
`Jun.6, 2006
`Jun. 6, 2006
`
`Sheet 14 of 22
`Sheet 14 of 22
`
`US 7,058,850 B2
`US 7,058,850 B2
`
`
`
`HOSTCOMPUTER
`
`CONTROLLER
`
`
`
`
`
`
`
`
`
`
`
`HOSTCOMPUTER
`
`se
`cs sPas
`~Gi
`woe
`hat
`=x
`==ro;
`<l
`<=
`cLzoom
`o
`oO
`.~2
`ie)—
`S =
`uw SoO
`
`aaO
`
`—Q
`
`HPE, Exh. 1009, p. 15
`
`HPE, Exh. 1009, p. 15
`
`

`

`U.S. Patent
`U.S. Patent
`
`Jun. 6, 2006
`Jun. 6, 2006
`
`Sheet 15 of 22
`Sheet 15 of 22
`
`US 7,058,850 B2
`US 7,058,850 B2
`
`O18.
`
`
`
`O61‘OlyNIT3ALYOINANNOD
`LSOH
`
`ASOH
`
`YALNdNO9
`
`¥aNdNOd
`
`
`
`
`
`
`
`
`
`‘Q4‘WLY‘NOOS3
`
`Y3HLOYO¢l
`
`HPE, Exh. 1009, p. 16
`
`HPE, Exh. 1009, p. 16
`
`

`

`U.S. Patent
`U.S. Patent
`
`Jun. 6, 2006
`Jun. 6, 2006
`
`Sheet 16 of 22
`Sheet 16 of 22
`
`US 7,058,850 B2
`US 7,058,850 B2
`
`
` YsLAdN0D
`‘O4‘WY‘NOQSI[siPsi)
`suo1|2S)Qse)Ps)
`
`
`
`
`
`4SOH
`
`ISOH
`
`YALAdNOD
`
`AYOWIN vOL!
`
`cS-
`g)
`
`
`
`ANITFALLVOINANAOD
`
`
`
`
`
`HPE, Exh. 1009, p. 17
`
`HPE, Exh. 1009, p. 17
`
`
`
`
`
`

`

`U.S. Patent
`U.S. Patent
`
`Jun. 6, 2006
`Jun. 6, 2006
`
`Sheet 17 of 22
`Sheet 17 of 22
`
`US 7,058,850 B2
`US 7,058,850 B2
`
`
`
`LSOH
`
`YaLAdNOo
`
`
`
`
`
`
` ‘04‘WLY“NOOSAY
`YIHLOYOfh
`
`
`
`ANTSAILVOINNWNOD
`
`HPE, Exh. 1009, p. 18
`
`HPE, Exh. 1009, p. 18
`
`
`
`

`

`U.S. Patent
`
`Jun. 6, 2006
`
`Sheet 18 of 22
`
`US 7,058,850 B2
`
`80/
`
`
`
`BEHIO HO ?I 'WIW 'NOOSE
`
`
`
`
`
`
`
`
`
`
`
`HPE, Exh. 1009, p. 19
`
`

`

`U.S. Patent
`U.S. Patent
`
`Jun. 6, 2006
`
`Sheet 19 of 22
`
`US 7,058,850 B2
`US 7,058,850 B2
`
`AMOWIN
`
`
`
`‘D4“ALY‘NOISI
`
`YdHLOYOot
`
`
`
`ANTTJAILYOINNANOD
`
`LSOH
`
`
`
`YALNdNOD=
`
`YALNdNOD
`ISOH
`LSOH
`
`HPE, Exh. 1009, p. 20
`
`HPE, Exh. 1009, p. 20
`
`
`
`
`

`

`U.S. Patent
`U.S. Patent
`
`Jun. 6, 2006
`Jun. 6, 2006
`
`Sheet 20 of 22
`Sheet 20 of 22
`
`US 7,058,850 B2
`US 7,058,850 B2
`
`LSOH
`
`ANNJAILVOINAAKOD
`
`909;809!
`
`LSOH
`
`YALNdNOd
`
`¥alNdNO
`
`AMON
`
`
`
`‘Od“LY‘NOOS3
`
`Y3HLOYOSl
`
`HPE, Exh. 1009, p. 21
`
`HPE, Exh. 1009, p. 21
`
`

`

`U.S. Patent
`U.S. Patent
`
`Jun. 6, 2006
`
`Sheet 21 of 22
`
`US 7,058,850 B2
`US 7,058,850 B2
`
`
`
`LSOH
`
`YALNdNOd
`
`LSOH
`
`ANONSA
`
`9€8-
`
`
`
`‘D4‘WLY‘NODS3
`
`YAHLOYOSL
`
`
`
`ANTJALLVOINANNOD
`
`HPE, Exh. 1009, p. 22
`
`HPE, Exh. 1009, p. 22
`
`
`
`
`

`

`‘D4‘ALY‘NOOS3
`
`YALNdN0daLSOH
`
`
`
`
`
`
`
`
`
`919
`
`U.S. Patent
`U.S. Patent
`
`Jun. 6, 2006
`Jun. 6, 2006
`
`Sheet 22 of 22
`Sheet 22 of 22
`
`US 7,058,850 B2
`US 7,058,850 B2
`
`Y
`
`S
`S
`
`4AHLOYO£1
`
`
`
`ANT)JALLYOINNNNOD
`
`HPE, Exh. 1009, p. 23
`
`HPE, Exh. 1009, p. 23
`
`

`

`US 7,058,850 B2
`
`1.
`METHOD AND SYSTEM FOR PREVENTING
`DATA LOSS WITHN DISK-ARRAY PARS
`SUPPORTING MIRRORED LOGICAL UNITS
`
`TECHNICAL FIELD
`
`The present invention relates to the mirroring of logical
`units provided by disk arrays and other multi-logical-unit
`mass-storage devices and, in particular, to a method and
`system for preventing data loss resulting from host-com
`puter and communications-link failures that interrupt data
`flow between a primary, or dominant, logical unit on a first
`mass-storage device and a secondary, remote-mirror logical
`unit on a second mass-storage device.
`
`BACKGROUND OF THE INVENTION
`
`10
`
`15
`
`2
`communications medium. For many types of Storage
`devices, including the disk drive 301 illustrated in FIG. 3,
`the vast majority of I/O requests are either READ or WRITE
`requests. A READ request requests that the storage device
`return to the requesting remote computer some requested
`amount of electronic data stored within the storage device.
`A WRITE request requests that the storage device store
`electronic data furnished by the remote computer within the
`storage device. Thus, as a result of a READ operation
`carried out by the storage device, data is returned via
`communications medium 302 to a remote computer, and as
`a result of a WRITE operation, data is received from a
`remote computer by the storage device via communications
`medium 302 and stored within the storage device.
`The disk drive storage device illustrated in FIG. 3
`includes controller hardware and logic 303 including elec
`tronic memory, one or more processors or processing cir
`cuits, and controller firmware, and also includes a number of
`disk platters 304 coated with a magnetic medium for storing
`electronic data. The disk drive contains many other compo
`nents not shown in FIG. 3, including READ/WRITE heads,
`a high-speed electronic motor, a drive shaft, and other
`electronic, mechanical, and electromechanical components.
`The memory within the disk drive includes a request/reply
`buffer 305, which stores I/O requests received from remote
`computers, and an I/O queue 306 that stores internal I/O
`commands corresponding to the I/O requests stored within
`the request/reply buffer 305. Communication between
`remote computers and the disk drive, translation of I/O
`requests into internal I/O commands, and management of
`the I/O queue, among other things, are carried out by the
`disk drive I/O controller as specified by disk drive I/O
`controller firmware 307. Translation of internal I/O com
`mands into electromechanical disk operations in which data
`is stored onto, or retrieved from, the disk platters 304 is
`carried out by the disk drive I/O controller as specified by
`disk media read/write management firmware 308. Thus, the
`disk drive I/O control firmware 307 and the disk media
`read/write management firmware 308, along with the pro
`cessors and memory that enable execution of the firmware,
`compose the disk drive controller.
`Individual disk drives, such as the disk drive illustrated in
`FIG. 3, are normally connected to, and used by, a single
`remote computer, although it has been common to provide
`dual-ported disk drives for concurrent use by two computers
`and multi-host-accessible disk drives that can be accessed by
`numerous remote computers via a communications medium
`such as a fibre channel. However, the amount of electronic
`data that can be stored in a single disk drive is limited. In
`order to provide much larger-capacity electronic data-Stor
`age devices that can be efficiently accessed by numerous
`remote computers, disk manufacturers commonly combine
`many different individual disk drives, such as the disk drive
`illustrated in FIG.3, into a disk array device, increasing both
`the storage capacity as well as increasing the capacity for
`parallel I/O request servicing by concurrent operation of the
`multiple disk drives contained within the disk array.
`FIG. 4 is a simple block diagram of a disk array. The disk
`array 402 includes a number of disk drive devices 403, 404,
`and 405. In FIG. 4, for simplicity of illustration, only three
`individual disk drives are shown within the disk array, but
`disk arrays may contain many tens or hundreds of individual
`disk drives. A disk array contains a disk array controller 406
`and cache memory 407. Generally, data retrieved from disk
`drives in response to READ requests may be stored within
`the cache memory 407 so that subsequent requests for the
`same data can be more quickly satisfied by reading the data
`
`The present invention is related to mirroring of data
`contained in a dominant logical unit of a first mass-storage
`device to a remote-mirror logical unit provided by a second
`mass-storage device. An embodiment of the present inven
`tion, discussed below, involves disk-array mass-storage
`devices. To facilitate that discussion, a general description of
`disk drives and disk arrays is first provided.
`The most commonly used non-volatile mass-storage
`device in the computer industry is the magnetic disk drive.
`In the magnetic disk drive, data is stored in tiny magnetized
`regions within an iron-oxide coating on the Surface of the
`disk platter. A modern disk drive comprises a number of
`platters horizontally stacked within an enclosure. The data
`within a disk drive is hierarchically organized within various
`logical units of data. The surface of a disk platter is logically
`divided into tiny, annular tracks nested one within another.
`FIG. 1A illustrated tracks on the surface of a disk platter.
`Note that, although only a few tracks are shown in FIG. 1A,
`Such as track 101, an actual disk platter may contain many
`thousands of tracks. Each track is divided into radial sectors.
`FIG. 1B illustrates sectors within a single track on the
`Surface of the disk platter. Again, a given disk track on an
`actual magnetic disk platter may contain many tens or
`hundreds of sectors. Each sector generally contains a fixed
`number of bytes. The number of bytes within a sector is
`generally operating-system dependent, and normally ranges
`from 512 bytes per sector to 4096 bytes per sector. The data
`normally retrieved from, and stored to, a hard disk drive is
`in units of sectors.
`The modern disk drive generally contains a number of
`magnetic disk platters aligned in parallel along a spindle
`passed through the center of each platter. FIG. 2 illustrates
`a number of Stacked disk platters aligned within a modern
`magnetic disk drive. In general, both Surfaces of each platter
`are employed for data storage. The magnetic disk drive
`generally contains a comb-like array with mechanical
`READ/WRITE heads 201 that can be moved along a radial
`line from the outer edge of the disk platters toward the
`spindle of the disk platters. Each discrete position along the
`radial line defines a set of tracks on both surfaces of each
`disk platter. The set of tracks within which ganged READ/
`WRITE heads are positioned at some point along the radial
`line is referred to as a cylinder. In FIG. 2, the tracks 202-210
`beneath the READ/WRITE heads together comprise a cyl
`inder, which is graphically represented in FIG. 2 by the
`dashed-out lines of a cylinder 212.
`FIG. 3 is a block diagram of a standard disk drive. The
`disk drive 301 receives input/output (“I/O) requests from
`remote computers via a communications medium 302 Such
`as a computer bus, fibre channel, or other Such electronic
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`HPE, Exh. 1009, p. 24
`
`

`

`US 7,058,850 B2
`
`5
`
`10
`
`15
`
`3
`from the quickly accessible cache memory rather than from
`the much slower electromechanical disk drives. Various
`elaborate mechanisms are employed to maintain, within the
`cache memory 407, data that has the greatest chance of
`being Subsequently re-requested within a reasonable amount
`of time. The disk WRITE requests, in cache memory 407, in
`the event that the data may be subsequently requested via
`READ requests or in order to defer slower writing of the
`data to physical storage medium.
`Electronic data is stored within a disk array at specific
`addressable locations. Because a disk array may contain
`many different individual disk drives, the address space
`represented by a disk array is immense, generally many
`thousands of gigabytes. The overall address space is nor
`mally partitioned among a number of abstract data storage
`resources called logical units (“LUNs). A LUN includes a
`defined amount of electronic data storage space, mapped to
`the data storage space of one or more disk drives within the
`disk array, and may be associated with various logical
`parameters including access privileges, backup frequencies,
`and mirror coordination with one or more LUNs. LUNs may
`also be based on random access memory (“RAM), mass
`storage devices other than hard disks, or combinations of
`memory, hard disks, and/or other types of mass-storage
`devices. Remote computers generally access data within a
`disk array through one of the many abstract LUNs 408-415
`provided by the disk array via internal disk drives 403-405
`and the disk array controller 406. Thus, a remote computer
`may specify a particular unit quantity of data, such as a byte,
`word, or block, using a bus communications media address
`corresponding to a disk array, a LUN specifier, normally a
`64-bit integer, and a 32-bit, 64-bit, or 128-bit data address
`that specifies a LUN, and a data address within the logical
`data address partition allocated to the LUN. The disk array
`controller translates Such a data specification into an indi
`cation of a particular disk drive within the disk array and a
`logical data address within the disk drive. A disk drive
`controller within the disk drive finally translates the logical
`address to a physical medium address. Normally, electronic
`data is read and written as one or more blocks of contiguous
`32-bit or 64-bit computer words, the exact details of the
`granularity of access depending on the hardware and firm
`ware capabilities within the disk array and individual disk
`drives as well as the operating system of the remote com
`puters generating I/O requests and characteristics of the
`45
`communication medium interconnecting the disk array with
`the remote computers.
`In many computer applications and systems that need to
`reliably store and retrieve data from a mass-storage device,
`Such as a disk array, a primary data object, such as a file or
`database, is normally backed up to backup copies of the
`primary data object on physically discrete mass-storage
`devices or media so that if, during operation of the appli
`cation or system, the primary data object becomes corrupted,
`inaccessible, or is overwritten or deleted, the primary data
`object can be restored by copying a backup copy of the
`primary data object from the mass-storage device. Many
`different techniques and methodologies for maintaining
`backup copies have been developed. In one well-known
`technique, a primary data object is mirrored. FIG. 5 illus
`trates object-level mirroring. In FIG. 5, a primary data object
`“O'” 501 is stored on LUNA 502. The mirror object, or
`backup copy, “O, 503 is stored on LUN B 504. The arrows
`in FIG. 5, such as arrow 505, indicate I/O write operations
`directed to various objects stored on a LUN. I/O write
`operations directed to object “O'” are represented by arrow
`506. When object-level mirroring is enabled, the disk array
`
`55
`
`4
`controller providing LUNs A and Bautomatically generates
`a second I/O write operation from each I/O write operation
`506 directed to LUNA, and directs the second generated I/O
`write operation via path 507, switch “S” 508, and path 509
`to the mirror object “O'” 503 stored on LUN B 504. In FIG.
`5, enablement of mirroring is logically represented by
`switch “S” 508 being on. Thus, when object-level mirroring
`is enabled, any I/O write operation, or any other type of I/O
`operation that changes the representation of object"O' 501
`on LUNA, is automatically mirrored by the disk array
`controller to identically change the mirror object “O'” 503.
`Mirroring can be disabled, represented in FIG. 5 by switch
`“S 508 being in an off position. In that case, changes to the
`primary data object "O' 501 are no longer automatically
`reflected in the mirror object “O'” 503. Thus, at the point
`that mirroring is disabled, the stored representation, or state,
`of the primary data object “O'” 501 may diverge from the
`stored representation, or state, of the mirror object “O'” 503.
`Once the primary and mirror copies of an object have
`diverged, the two copies can be brought back to identical
`representations, or states, by a resync operation represented
`in FIG. 5 by switch “S. 510 being in an on position. In the
`normal mirroring operation, Switch “S” 510 is in the off
`position. During the resync operation, any I/O operations
`that occurred after mirroring was disabled are logically
`issued by the disk array controller to the mirror copy of the
`object via path 511, switch “S. and pass 509. During
`resync, switch “S” is in the off position. Once the resync
`operation is complete, logical Switch “S” is disabled and
`logical switch “S”508 can be turned on in order to reenable
`mirroring so that Subsequent I/O write operations or other
`I/O operations that change the storage state of primary data
`object “O” are automatically reflected to the mirror object
`“O, 503.
`FIG. 6 illustrates a dominant LUN coupled to a remote
`mirror LUN. In FIG. 6, a number of computers and com
`puter servers 601–608 are interconnected by various com
`munications media 610–612 that are themselves
`interconnected by additional communications media
`613-614. In order to provide fault tolerance and high
`availability for a large data set stored within a dominant
`LUN on a disk array 616 coupled to server computer 604,
`the dominant LUN 616 is mirrored to a remote-mirror LUN
`provided by a remote disk array 618. The two disk arrays are
`separately interconnected by a dedicated communications
`medium 620. Note that the disk arrays may be linked to
`server computers, as with disk arrays 616 and 618, or may
`be directly linked to communications medium 610. The
`dominant LUN 616 is the target for READ, WRITE, and
`other disk requests. All WRITE requests directed to the
`dominant LUN 616 are transmitted by the dominant LUN
`616 to the remote-mirror LUN 618, so that the remote
`mirror LUN faithfully mirrors the data stored within the
`dominant LUN. If the dominant LUN fails, the requests that
`would have been directed to the dominant LUN can be
`redirected to the mirror LUN without a perceptible inter
`ruption in request servicing. When operation of the domi
`nant LUN 616 is restored, the dominant LUN 616 may
`become the remote-mirror LUN for the previous remote
`mirror LUN 618, which becomes the new dominant LUN,
`and may be resynchronized to become a faithful copy of the
`new dominant LUN 618. Alternatively, the restored domi
`nant LUN 616 may be brought up to the same data state as
`the remote-mirror LUN 618 via data copies from the remote
`mirror LUN and then resume operating as the dominant
`LUN. Various types of dominant-LUN/remote-mirror-LUN
`pairs have been devised. Some operate entirely synchro
`
`25
`
`30
`
`35
`
`40
`
`50
`
`60
`
`65
`
`HPE, Exh. 1009, p. 25
`
`

`

`5
`nously, while others allow for asynchronous operation and
`reasonably slight discrepancies between the data states of
`the dominant LUN and mirror LUN.
`Unfortunately, interruptions in the direct communications
`between disk arrays containing a dominant LUN and a
`remote-mirror LUN of a mirrored LUN pair occur relatively
`frequently. Currently, when communications are interrupted
`or Suffer certain types of failures, data may end up languish
`ing in cache-memory buffers, and, in the worst cases, purged
`from cache-memory buffers or lost due to systems failures.
`Designers and manufacturers of mass-storage devices. Such
`as disk arrays, and users of mass-storage devices and high
`availability and fault-tolerant systems that employ mass
`storage devices, have recognized the need for a more reliable
`LUN-mirroring technique and system that can weather com
`15
`munications failures and host-computer failures.
`
`10
`
`SUMMARY OF THE INVENTION
`
`One embodiment of the present invention provides an
`additional communications link between two mass-storage
`devices containing LUNs of a mirror-LUN pair, as well as
`incorporating a fail-safe, mass-storage-device-implemented
`retry protocol to facilitate non-drastic recovery from com
`munications-link failures. The additional communications
`25
`link between the two mass-storage devices greatly reduces
`the likelihood of the loss of buffered data within the mass
`storage device containing the dominant LUN of a mirrored
`LUN pair, and the retry protocol prevents unnecessary
`build-up of data within cache-memory buffers of the mass
`storage device containing the remote-mirror LUN. The
`combination of the additional communications link and retry
`protocol together ameliorates a deficiency in current LUN
`mirroring implementations that leads to data loss and incon
`sistent and unrecoverable databases. The additional commu
`35
`nications link provided by the present invention is physically
`distinct and differently implemented from the direct com
`munications link between the two mass-storage devices, to
`provide greater robustness in the event of major hardware
`failure.
`
`30
`
`40
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`45
`
`50
`
`FIG. 1A illustrated tracks on the surface of a disk platter.
`FIG. 1B illustrates sectors within a single track on the
`surface of the disk platter.
`FIG. 2 illustrates a number of disk platters aligned within
`a modern magnetic disk drive.
`FIG. 3 is a block diagram of a standard disk drive.
`FIG. 4 is a simple block diagram of a disk array.
`FIG. 5 illustrates object-level mirroring.
`FIG. 6 illustrates a dominant logical unit coupled to a
`remote-mirror logical unit.
`FIG. 7 shows an abstract representation of the commu
`nications-link topography currently employed for intercon
`55
`necting mass-storage devices containing the dominant and
`remote-mirror logical units of a mirrored-logical-unit pair.
`FIGS. 8A-C illustrates a communications-link failure that
`results in purging of the cache memory within the mass
`storage device containing a remote-mirror logical unit.
`FIGS. 9A and 9B illustrate a normal WRITE-request
`buffer, such as the input queue 826 of the second mass
`storage device in FIG. 8C, and a bit-map buffer, such as the
`bit map 846 in FIG. 8C.
`FIGS. 10A-E illustrates an example of a detrimental,
`out-of-order WRITE request applied to a mass-storage
`device.
`
`60
`
`65
`
`US 7,058,850 B2
`
`6
`FIG. 11 shows the final stage in recovery from the missing
`WRITE request problem illustrated in FIG. 8A-C.
`FIGS. 12A–C illustrates an error-recovery technique
`employed to handle communications-link failures.
`FIGS. 13 and 14 illustrate the occurrence of multiple
`failures, leading to data loss within the mass-storage devices
`of FIGS. 8A-C, 11, and 12A-C.
`FIG. 15 illustrates an enhanced communications topology
`that represents a portion of one embodiment of the present
`invention.
`FIGS. 16A-D illustrates operation of the exemplary
`mass-storage devices using the techniques provided by one
`embodiment of the present invention.
`
`DETAILED DESCRIPTION OF THE
`INVENTION
`
`One embodiment of the present invention provides a more
`communications-fault-tolerant mirroring technique that pre
`vents loss of data stored in electronic cache-memory for
`relatively long periods of time due to host-computer failures
`and communications failures. In the discussion below, the
`data-loss problems are described, in detail, followed by a
`description of an enhanced mass-storage-device pair and an
`enhanced high-level communications protocol implemented
`in the controllers of the mass-storage-devices.
`FIG. 7 shows an abstract representation of the commu
`nications-link topography currently employed for intercon
`necting mass-storage devices containing the dominant and
`remote-mirror LUNs of a mirrored-LUN pair. A first mass
`storage device 702 is interconnected with a first host com
`puter 704 via a small-computer-systems interface (“SCSI),
`fiber-channel (“FC), or other type of communications link
`706. A second mass-storage device 708 is interconnected
`with a second host computer 710 via a second SCSI or FC
`communications link 712. The two host computers are
`interconnected via a local-area network (“LAN”) or wide
`area network (“WAN) 714. The two mass-storage devices
`702 and 708 are directly interconnected, for purposes of
`mirroring, by one or more dedicated enterprise systems
`connection (“ESCON”), asynchronous transfer mode
`(“ATM), FC, T3, or other types of links 716. The first
`mass-storage device 702 contains a dominant LUN of a
`mirrored-LUN pair, while the second mass-storage device
`708 contains the remote-mirror LUN of the mirrored-LUN
`pa1r.
`FIGS. 8A-C illustrates a communications-link failure that
`leads to a purge of cache memory within the mass-storage
`device containing a remote-mirror LUN. In FIG. 8A, data to
`be written to physical-data-storage devices within a first
`mass-storage device 802 is transmitted by a host computer
`804 through a SCSI, FC, or other type of link 806 to the
`mass-storage device 802. In FIGS. 8A-C, and in FIGS.
`11A-D, 12, 13, and 15A-D, which employ similar illustra
`tion conventions as employed in FIGS. 8A-C, incoming
`WRITE commands are illustrated as small square objects,
`such as incoming WRITE command 808, within a commu
`nications path such as the SCSI, FC, or other type of link
`806. Each WRITE request contains a volume or LUN
`number followed a "slash, followed in turn, by a sequence
`number. WRITE requests are generally sequenced by high
`level protocols so that WRITE requests can be applied, in
`order, to the database contained within volumes or LUNs
`stored on one or more physical data-storage devices. For
`example, both in FIGS. 8A-C and in the subsequent figures,
`identified above, LUN “0” is mirrored to a remote mirror
`stored within physical data-storage devices of a second
`
`HPE, Exh. 1009, p. 26
`
`

`

`7
`mass-storage device 810, interconnected with the first mass
`storage device 802 by one or more ESCON, ATM, FC, T3,
`or other type of communications links 812.
`The controller 814 of the first mass-storage device 802
`detects WRITE requests directed to dominant LUN “0” and
`directs copies of the WRITE requests to the second mass
`storage device 810 via an output buffer 816 stored within
`cache memory 818 of the mass-storage device 802. The
`WRITE requests directed to the dominant LUN, and to other
`LUNS or volumes provided by the first mass-storage device,
`are also directed to an input buffer 820 from which the
`WRITE requests are subsequently extracted and executed to
`store data on physical data-storage devices 822 within the
`first mass-storage device. Similarly, the duplicate WRITE
`requests transmitted by the first mass-storage device through
`15
`the ESCON, ATM, FC, T3, or other type of link or links 812
`are directed by the controller 824 of the second mass-storage
`device 810 to an input buffer 826 within a cache-memory
`822 of the second mass-storage device for eventual execu
`tion and storage of data on the physical data-storage devices
`830 within the second mass-storage device 810.
`In general, the output buffer 816 within the first mass
`storage device is used both as a transmission queue as well
`as a storage buffer for holding already transmitted WRITE
`requests until an acknowledgment for the already transmit
`25
`ted WRITE requests is received from the second mass
`storage device. Thus, for example, in FIG. 8A, the next
`WRITE-request to be transmitted 832 appears in the middle
`of the output buffer, above already transmitted WRITE
`requests 834–839. When an acknowledgement for a trans
`30
`mitted WRITE request is received from the second mass
`storage device, the output buffer 818 entry corresponding to
`the acknowledged, transmitted WRITE request can be over
`written by a new incoming WRITE request. In general,
`output buffers are implemented as circular queues with
`dynamic head and tail pointers. Also note that, in FIGS.
`8A-C, and in the subsequent, related figures identified
`above, the cache memory buffers are shown to be rather
`Small, containing only a handful of messages. In actual
`mass-storage devices, by contrast, electronic cache memo
`40
`ries may provide as much as 64 gigabytes of data storage.
`Therefore, output a

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket