`
`
`
`
`UNIFIED PATENTS
`
`EXHIBIT 1011
`
`
`UNIFIED PATENTS
`
`EXHIBIT 1011
`
`UNIFIED PATENTS EXHIBIT 1011
`PAGE 1
`
`UNIFIED PATENTS EXHIBIT 1011
`PAGE 1
`
`
`
`United States Patent
`Stallmo et al.
`
`[191
`
`[11] Patent Number:
`
`5,617,530
`
`*Apr. 1, 1997
`[45] Date of Patent:
`
`|||||||||I||||llllllllIIIIlI||||Il|||||I|||II||I||II|||||llllllllllllllllll
`U8005617530A
`
`[54] STORAGE DEVICE ARRAY ARCHITECTURE
`WITH COPYBACK CACHE
`
`[75]
`
`Inventors: David C. Stallmo; William A. Brant,
`both of Boulder, C010.
`
`[73] Assignee: EMC Corporation, Hopkinton, Mass.
`
`[ *] Notice:
`
`The term of this patent shall not extend
`beyond the expiration date of Pat. No.
`5,526,482.
`
`[21] Appl. No.2 579,545
`
`[22]
`
`Filed:
`
`Dec. 27, 1995
`
`Related US. Application Data
`
`[63] Continuation of Ser. No. 112,791, Aug. 26, 1993, Pat. No.
`5,526,482, which is a continuation—in-part of Ser. No. 638,
`167, Jan. 4, 1991, Pat. No. 5,274,794.
`
`Int. Cl.6 ...................................................... GO6F 11/10
`[51]
`
`..............
`. 395/182.04; 395/441
`[52] US. Cl.
`[58] Field of Search ......................... 395/182.04, 182.03,
`395/182.01, 181, 182.13; 371/401
`
`[56]
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`3,893,178
`4,092,732
`4,430,701
`4,467,421
`4,562,576
`4,593,354
`4,604,750
`4,667,326
`
`7/1975 Sordello ............................... 360/7302
`5/1978 Ouchi ............
`. 39S/182.03
`
`2/1984 Christian et a1.
`.
`...... 395/446
`8/1984 White .................. 395/445
`
`12/1985 Ratcliife
`395/185 02
`6/1986 Ushiro .......
`395/185.01
`1/1986 Manton .......... 371/402
`5/1987 Young et a1.
`.......................... 371/391
`
`(List continued on next page.)
`FOREIGN PATENT DOCUMENTS
`
`8/1990 European Pat. Off.
`0416968A2
`W089909468 10/1989 WIPO .
`
`.
`
`OTHER PUBLICATIONS
`
`Patterson,D.A., Gibson, 6., and Katz, H.; A Case For
`Redundant Arrays of Inexpensive Disks (RAID) (Jun. 1,
`1988).
`Lee, E.K.; Software and Performance Issues in the Imple-
`mentation of a RAID Prototype (May 1990).
`Chen. P. Gibson, G., Katz, R. H., Patterson, D.A., and
`Schulze, M.; Introduction to Redundant Arrays of Inexpen-
`sive Disks (RAID) (Dec. 1988).
`
`(List continued on next page.)
`
`Primary Examiner—Robert W. Beausoliel, Jr.
`Assistant Examiner—Ly V. Hua
`Attorney, Agent, or Finn—William J. Kubida; Richard A.
`Bachand; Holland & Hart LLP
`
`[57]
`
`ABSTRACT
`
`A fault-tolerant storage device array using a copyback cache
`storage unit for temporary storage. When a Write occurs to
`the RAID system, the data is immediately written to the first
`available location in the copyback cache storage unit. Upon
`completion of the Write to the copyback cache storage unit,
`the host CPU is immediately informed that the Write was
`successful. Thereafter, further storage unit accesses by the
`CPU can continue without waiting for an error-correction
`block update for the data just written. In a first embodiment
`of the invention, Read-Modify-Write operations are per-
`formed during idle time. In a second embodiment of the
`invention, normal Read—Modify-Write operation by the
`RAID system controller continue use Write data in the
`controller’s buffer memory. In a third embodiment, at least
`two controllers, each associated with one copyback cache
`storage unit, copy Write data from controller buffers to the
`associated copyback cache storage unit. If a copyback cache
`storage unit fails, more than one controller share a single
`copyback storage unit. In a fourth embodiment, Write data
`is copied from a controller bufl°er to a reserved area of each
`storage unit comprising the array.
`
`32 Claims, 10 Drawing Sheets
`
`403
`
`403
`
`407
`
`
`
`CONTROLLER #2
`
`
`
`405
`
`I
`
`
`
`UNIFIED PATENTS EXHIBIT 1011
`PAGE 2
`
`UNIFIED PATENTS EXHIBIT 1011
`PAGE 2
`
`
`
`5,617,530
`Page 2
`
`U.S. PATENT DOCUMENTS
`
`4,722,085
`4,733,396
`4,754,397
`4,761,785
`4,768,193
`4,775,978
`4,791,642
`4,817,035
`4,849,929
`4,870,643
`4,899,342
`4,914,656
`4,920,539
`4,958,351
`4,993,030
`4,995,041
`5,058,116
`5,185,876
`5,274,799
`5,402,428
`5,463,765
`5,499,337
`
`10988 Flora etal.
`............................ 371/401
`3/1988 Baldwin
`. 371/402
`
`6/1988 Varaiya eta1..
`...... 361/685
`
`8/1988 Clark et a1.
`..... 371/511
`
`. 37100.3
`8/1988 Takemae
`
`10/1988 Hartness.
`.371/40.1
`12/1988 Taylor
`..... 37l/40.2
`
`. 395/185.07
`3/1989 Timsit
`3/1989 Timsit ................ 395082.03
`
`9/1989 Bultrnan etal.
`..... 395082.05
`2/1990 Potter et a1. ............ 395/182.04
`4/1990 Dunphy, Jr. et a1.
`. 395082.05
`
`.371/402
`4/1990 Albonesi
`9/1990 Flora .............
`.371/40.1
`
`2/1991 Krakaueretal.
`..
`. 371/401
`.371/40.l
`2/1991 Hetherington
`
`10/1991 Chao .................
`.371/40.2
`...... 395/841
`2/1993 Nguyen etal.
`
`..... 395082.04
`12/1993 Brantetal.
`
`.
`3/1995 Kakuta et a1.
`..... 395082.04
`
`.
`10/1995 Kakuta et a1.
`..... 395082.04
`
`3/1996 Gordon .............................. 395082.04
`
`OTHER PUBLICATIONS
`
`Chen, P., Gibson, G., Katz, R.H., Patterson, D.A., and
`Schulze, M., et a1.; Evolution of the RAID 2 Architecture
`(Jun. 12, 1990).
`
`Chen, P., Gibson, G., Katz, R.h., Patterson, D.A., and
`Schulze, M.; How Reliable is RAID?(Feb. 1988).
`
`Maximum Strategy, Inc., San Jose, CA; Strategy 2 Disk
`Array Controller Operation Manual (Nov. 2, 1988).
`Maximum Strategy, Inc., San Jose, CA; Strategy 1 Disk
`Array Controller Operation Manual (Date Unknown).
`Gibson, G.A ., Performance and Reliability in Redundant
`Arrays of Inexpensive Disks (Date Unknown).
`
`Chen, P., An Evaluation of Redundant Arrays of Disks
`Unsing an Amdahl 5890; (May 1989).
`Katz, R.H., Gibson, GA. and Patterson, D.A.; Disk system
`Architectures for High Performance Computing (Mar.
`1989).
`
`Gray, J ., Horst, B., and Walker, M.; Parity Striping and Disc
`Arrays: Low—Cost Reliable Storage wqith Acceptable
`Throughput (Jan. 1990).
`Schultz, M.E.; Considerations in the Design of Raid Proto—
`type (Aug. 1988).
`Clark and Corrigan; IBM Systems Journal, vol. 28, No. 3,
`1989.
`
`Schulze, M., et a1.; “How Reliable is RAID?”; Compcom,
`Spring 1989, pp. 1—12.
`Ousterhout, J., and Douglis, F., Beating the I/O Bottleneck:
`A Case for Log—Structured File Systems (Aug. 4, 1988).
`
`UNIFIED PATENTS EXHIBIT 1011
`PAGE 3
`
`UNIFIED PATENTS EXHIBIT 1011
`PAGE 3
`
`
`
`US. Patent
`
`Apr. 1, 1997
`
`Sheet 1 of 10
`
`5,617,530
`
`1
`
`3
`
`2
`
`Fig. 1
`
`UNIFIED PATENTS EXHIBIT 1011
`PAGE 4
`
`UNIFIED PATENTS EXHIBIT 1011
`PAGE 4
`
`
`
`US. Patent
`
`Apr. 1, 1997
`
`Sheet 2 of 10
`
`5,617,530
`
`START
`
`20
`
`
` YES
`
`HAS
`CPU ISSUED
`WRITE
`
`21
`
`
`
`
`
`
`
`22
`
`NO
`
`25
`
`COMMAND
`?
`
`WRITE BLOCK
`TO CACHE
`
`23
`
`UPDATE DIRECTORY
`TO POINT TO CACHE
`& FLAG ARRAY
`LOCATION
`
`
`
`
`
`
` HAS
`CPU ISSUED
`
`
`
`READ
`COMMAND
`?
`
`YES
`
`
`
`33
`
`NO
`
`
`
`
`ANY
`PENDING
`
`BLOCKS
`?
`
`
`
`34
`
`R
`
`£25;
`
`36
`
`37
`
`
`
`NO
`
`
`
`
`TO CPU
`
`
`24
`
`YES
`
`35
`
`
`
`ACKNOWLEDGE
`WRITE OPERATION
`
`READ A
`
`PENDING BLOCK
`
`
`READ OLD DATA &
`
`425;?ng
`
`PARITY BLOCKS
`
`ON FLAGGED STRIPE
`
`
`
`
`
`
`
`XOR OLD DATA &
`PARITY BLOCKS
`W/PENDING BLOCK
`FOR NEW PARITY
`
`BLOCK
`
`Fig. 2A
`
`(A)
`
`(5)
`
`UNIFIED PATENTS EXHIBIT 1011
`PAGE 5
`
`UNIFIED PATENTS EXHIBIT 1011
`PAGE 5
`
`
`
`US. Patent
`
`Apr. 1, 1997
`
`Sheet 3 of 10
`
`5,617,530
`
`(A)
`
`(3)
`
`
`
`STORE NEW PARITY
`& PENDING BLOCK AT
`FLAGGED ARRAY
`LOCATION
`
`
`
`39
`
`38
`
`
`
`
`
`UPDATE DIRECTORY
`ENTRY TO POINT
`TO ARRAY
`
`RETU RN
`TO START
`
`27
`
`READ DIRECTORY
`FOR POINTER TO
`
`DATA BLOCK
`
`
` 28
`IS
`DATA BLOCK
`
`
`IN ARRAY
`
`?
`
`31
`YES
`
`NO
`
`29
`
`READ DATA BLOCK
`FROM ARRAY AND
`TRANSFER TO CPU
`
`READ DATA BLOCK
`FROM CACHE AND
`
`TRANSFER TO CPU
`Fig. 23
`
`
`32
`
`.
`
`RETURN
`TO START
`
`30
`
`RETURN
`TO. START
`
`UNIFIED PATENTS EXHIBIT 1011
`PAGE 6
`
`UNIFIED PATENTS EXHIBIT 1011
`PAGE 6
`
`
`
`US. Patent
`
`Apr. 1, 1997
`
`Sheet 4 of 10
`
`5,617,530
`
`START
`
`5°
`
`51
`
`9
`
`HAS
`
`
`CPU ISSUED
`
`WRITE
`
`
`COMMAND
`
`
`
`YES
`
`
`
`52
`
`NO
`
`56
`
`YES
`
`
`
` HAS
`CPU ISSUED
`
`READ
`
`
`
`COMMAND
`?
`
`WRITE BLOCK
`TO CACHE
`
`UPDATE DIRECTORY
`TO POINT TO BUFFER
`& FLAG ARRAY
`LOCATION
`
`
`
`63
`
`NO
`
`
` NO ANY
`PENDING
`
`
`BLOCKS
`
`?
`
`. YES
`
`55
`
`‘54
`
`RETU RN
`TO START
`
`ACKNOWLEDGE
`WRITE OPERATION
`TO CPU
`
`ACCESS A
`PENDING BLOCK
`
`55
`
`RETU RN
`TO START
`
`
`
`
`
`READ OLD DATA &
`PARITY BLOCKS
`ON FLAGGED
`STRIPE
`
`
`
`66
`
`67
`
`
`
`XOR OLD DATA &
`PARITY BLOCKS
`W/PENDING BLOCK
`FOR NEW PARITY
`BLOCK
`
`Fig. 3A
`
`(A)
`
`(B)
`
`UNIFIED PATENTS EXHIBIT 1011
`PAGE 7
`
`UNIFIED PATENTS EXHIBIT 1011
`PAGE 7
`
`
`
`US. Patent
`
`Apr. 1,1997
`
`Sheet 5 of 10
`
`'
`
`5,617,530
`
`(A)
`
`(B)
`
`STORE NEW PARITY
`BLOCK & PENDING
`BLOCK AT FLAGGED
`ARRAY LOCATION
`
`59
`
`UPDATE DIRECTORY
`ENTRY TO POINT
`TO ARRAY
`
`68
`
`
`
`
`
`
`DATA BLOCK
`
`RETURN
`TO START
`
`57
`
`READ DIRECTORY
`FOR POINTER TO
`
`NO
`
`59
`
`YES
`
`61
`
`READ DATA BLOCK
`FROM ARRAY AND
`
`READ DATA BLOCK
`FROM BUFFER AND
`TRANSFER TO CPU
`
`
` 58
`IS
`DATA BLOCK
`IN ARRAY
`
`?
`TRANSFER TO CPU
`
`
`
`Fig. 3B
`
`RETURN
`TO START
`
`60
`
`_
`
`52
`
`RETURN
`TO START
`
`UNIFIED PATENTS EXHIBIT 1011
`PAGE 8
`
`UNIFIED PATENTS EXHIBIT 1011
`PAGE 8
`
`
`
`US. Patent
`
`Apr. 1, 1997
`
`Sheet 6 of 10
`
`5,617,530
`
`'<
`
`407
`
`A2
`
`408
`
`8
`V'
`
`1—
`=33:
`
`E
`j
`0
`D:
`I—
`5
`0
`
`$9$9
`
`
`
`E
`j
`O
`I
`I-
`6
`O
`
`N
`it
`
`
`
`E?
`V
`
`w
`I
`
`it
`
`UNIFIED PATENTS EXHIBIT 1011
`PAGE 9
`
`DD
`
`.O
`
`UNIFIED PATENTS EXHIBIT 1011
`PAGE 9
`
`
`
`US. Patent
`
`Apr. 1, 1997
`
`Sheet 7 0f 10
`
`5,617,530
`
`STEP 501
`STEP 510
`
`
`
`
`READ
`
`
`NO
`WRITE
`
`
`REQUEST
`
`
`
`
`
`RECEIVED
`9
`
`
`
`
`
`REQUEST
`RECEIVED
`?
`
`
`
`STEP 512
`
`WRITE PENDING DATA
`
`DETERMINE LOCATION
`
`
`
`STEP 533
`
`
`
`
`BLOCK TO CONTROLLER
`OF DATA BLOCK TO
`
`BUFFER
`BE READ
`
`
`
`
`STEP 506
`APPEND HEADER AND
`STEP 514
`
`TRAILER TO PENDING
`
`DATA BLOCK
`
`
`STEP 507 STEP 516
`
`
`
`READ DATA FROM
`BLOCK, HEADER, AND
`ARRAY AND TRANSFER
`
`
`TRAILER TO STORAGE UNIT
`TO CPU
`
`
`COPY PENDING DATA
`
`ACKNOWLEDGE WRITE
`OPERATION
`
`TO CPU
`
`RETURN
`TO START
`
`(A)
`
`Fig. 5A
`
`UNIFIED PATENTS EXHIBIT 1011
`PAGE 10
`
`UNIFIED PATENTS EXHIBIT 1011
`PAGE 10
`
`
`
`US. Patent
`
`Apr. 1, 1997
`
`Sheet 8 of 10
`
`5,617,530
`
`(A)
`
`
`
`ANY
`STEP 520
`
`PENDING
`NO
`
`
`DATA BLOCKS IN
`CONTROLLER
`
`BUFFER
`?
`
`YES
`
`STEP 522
`
`RETURN
`TO START
`
`ACCESS PENDING DATA
`
`BLOCK FROM BUFFER
`BLOCK FROM ARRAY
`
`STEP 524
`
`READ OLD DATA
`
`STEP 526
`
`WRITE PENDING DATA
`
`BLOCK TO ARRAY
`
`STEP 528
`
`INVALIDATE BIT MAP
`IN CONTROLLER
`
`BUFFER
`
`STEP 530
`
`NOTE SEQUENCE
`
`NUMBER LAST USED
`
`STEP 532
`
`READ OLD PARITY BLOCK
`
`STEP 534
`
`XOR OLD DATA AND
`PARATY BLOCK
`W/PENDING DATA
`
`BLOCK
`
`STEP 535
`
`STEP 538
`
`WRITE NEW PARITY
`BLOCK IN ARRAY
`
`NUMBER LIST USED
`
`READ SEQUENCE
`
`
`STEP 540
`
`YES
`SAVE
`
`AS LAST TIME
`READ
`
`?
`
`
`
`
`STEP 542
`
`
`INVALIDATE BIT
`MAP IN COPYBACK
`
`
`CACHE STORAGE
`UNIT
`
`
`
`RETURN
`TO START
`
`Fig; 53
`
`UNIFIED PATENTS EXHIBIT 1011
`PAGE 11
`
`UNIFIED PATENTS EXHIBIT 1011
`PAGE 11
`
`
`
`US. Patent
`
`Apr. 1, 1997
`
`Sheet 9 of 10
`
`5,617,530
`
`COPYBACK CACHE
`STORAGE
`UNAVAILABLE
`
`
`
`STEP 603 STEP 604
`
`STEP 501
`
`
`ANY
` WRITE PENDING
`
`
`HOLD UP
`
`PENDING
`
`
`DATA BLOCKS
`
`SUBSEQUENT
`
`
`
`DATA BLOCKS
`
`
`TO ARRAY
`‘
`WRITE
`
`
`?
`
`OPERATIONS
`
`
`
`STEP 605
`
`
`REBUILD FAILED
`STORAGE UNIT
`
`
`
`STEP 509
`
`STEP 607
`
`
`ANY
`UNUSED
`
`
`STORAGE
`AS COPYBACK a
`
`
`UNITS
`
`
`?
`
`
`
`
`
`ACKNOLAGE
`
`AT
`
`WRITE COMPLETE
`
`LEAST ONE
`
`
`ONLY AFI'ER RMW
`
`SEQUENCE IS
`COMPLETE
`
`
`USE UNUSED
`STORAGED UNIT
`
`CACHE STORAGE
`UNIT
`
`
`
`
`STEP 6“
`
`STEP 615
`
`STORAGE UNIT
`
`SHARE THAT
`
`END
`
`Fig. 6
`
`UNIFIED PATENTS EXHIBIT 1011
`PAGE 12
`
`UNIFIED PATENTS EXHIBIT 1011
`PAGE 12
`
`
`
`US. Patent
`
`Apr. 1, 1997
`
`Sheet 10 of 10
`
`5,617,530
`
`408
`
`°°
`
`CPU
`
`0))
`
`9))
`
`501
`
`501
`
`Fig.7
`
`UNIFIED PATENTS EXHIBIT 1011
`PAGE 13
`
`UNIFIED PATENTS EXHIBIT 1011
`PAGE 13
`
`
`
`5,617,530
`
`1
`STORAGE DEVICE ARRAY ARCHITECTURE
`WITH COPYBACK CACHE
`
`RELATED APPLICATION
`
`This is a continuation of application Ser. No. 08/112,791
`to David C. Stallano and William A. Brant, filed Aug. 26,
`1993 for Storage Device Array Architecture with Copyback
`Cache, now US. Pat. No. 5,526,482, which issued Jun. 11,
`1996, and which is a continuation-in-part of application Ser.
`No. 07/638,167 of William A. Brant, filed Jan. 4, 1991,
`entitled Storage Device Array Architecture With Copyback
`Cache, now US. Pat. No. 5,274,799, which issued Dec. 28,
`1993.
`
`BACKGROUND OF THE INVENTION
`
`1. Field of the Invention
`
`This invention relates to computer system data storage,
`and more particularly to a fault~tolerant storage device array
`using a copyback cache storage unit for temporary storage.
`2. Description of Related Art
`A typical data processing system generally involves one
`or more storage units which are connected to a Central
`Processor Unit (CPU) either directly or through a control
`unit and a channel. The function of the storage units is to
`store data and programs which the CPU uses in performing
`particular data processing tasks.
`Various type of storage units are used in current data
`processing systems. A typical system may include one or
`more large capacity tape units and/or disk drives (magnetic,
`optical, or semiconductor) connected to the system through
`respective control units for storing data.
`However, a problem exists if one of the large capacity
`storage units fails such that information contained in that
`unit is no longer available to the system. Generally, such a
`failure will shut down the entire computer system.
`The prior art has suggested several ways of solving the
`problem of providing reliable data storage. In systems where
`records are relatively small,
`it
`is possible to use error
`correcting codes which generate ECC syndrome bits that are
`appended to each data record within a storage unit. With
`such codes, it is possible to correct a small amount of data
`that may be read erroneously. However, such codes are
`generally not suitable for correcting or recreating lOng
`records which are in error, and provide no remedy at all if
`a complete storage unit fails. Therefore, a need exists for
`providing data reliability external to individual storage units.
`Other approaches to such “external” reliability have been
`described in the art. A research group at the University of
`Califomia, Berkeley, in a paper entitled “A Case for Redun~
`dant Arrays of Inexpensive Disks (RAID)”, Patterson, et al.,
`Proc. ACM SIGMOD, June 1988, has catalogued a number
`of different approaches for providing such reliability when
`using disk drives as storage units. Arrays of disk drives are
`characterized in one of five architectures, under the acronym
`“RAID” (for Redundant Arrays of Inexpensive Disks).
`A RAID 1 architecture involves providing a duplicate set
`of “mirror” storage units and keeping a duplicate copy of all
`data on each pair of storage units. While such a solution
`solves the reliability problem, it doubles the cost of storage.
`A number of implementations of RAID 1 architectures have
`been made, in particular by Tandem Corporation.
`A RAID 2 architecture stores each bit of each word of
`
`data, plus Error Detection and Correction (EDC) bits for
`each word, on separate disk drives (this is also known as “bit
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`
`striping”). For example, US. Pat. No. 4,722,085 to Flora et
`al. discloses a disk drive memory using a plurality of
`relatively small, independently operating disk subsystems to
`function as a large, high capacity disk drive having an
`unusually high fault tolerance and a very high data transfer
`bandwidth. A data organizer adds 7 EDC bits (determined
`using the well-known Hamming code) to each 32-bit data
`word to provide error detection and error correction capa-
`bility. The resultant 39-bit word is written, one bit per disk
`drive, on to 39 disk drives. If one of the 39 disk drives fails,
`the remaining 38 bits of each stored 39—bit word can be used
`to reconstruct each 32-bit data word on a word-by-word
`basis as each data word is read from the disk drives, thereby
`obtaining fault tolerance.
`An obvious drawback of such a system is the large
`number of disk drives required for a minimum system (since
`most large computers use a 32—bit word), and the relatively
`high ratio of drives required to store the EDC bits (7 drives
`out of 39). A further limitation of a RAID 2 disk drive
`memory system is that the individual disk actuators are
`operated in unison to write each data block, the bits of which
`are distributed over all of the disk drives. This arrangement
`has a high data transfer bandwidth, since each individual
`disk transfers part of a block of data, the net effect being that
`the entire block is available to the computer system much
`faster than if a single drive were accessing the block. This is
`advantageous for large data blocks. However, this arrange-
`ment also effectively provides only a single read/write head
`actuator for the entire storage unit. This adversely affects the
`random access performance of the drive array when data
`files are small, since only one data file at a time can be
`accessed by the “single” actuator. Thus, RAID 2 systems are
`generally not considered to be suitable for computer systems
`designed for On-Line Transaction Processing (OLTP), such
`as in banking, financial, and reservation systems, where a
`large number of random accesses to many small data files
`comprises the bulk of data storage and transfer operations.
`A RAID 3 architecture is based on the concept that each
`disk drive storage unit has internal means for detecting a
`fault or data error. Therefore, it is not necessary to store extra
`information to detect the location of an error; a simpler form
`of parity—based error correction can thus be used. In this
`approach, the contents of all storage units subject to failure
`are “Exclusive OR’d” (XOR’d) to generate parity informa-
`tion. The resulting parity information is stored in a single
`redundant storage unit. If a storage unit fails, the data on that
`unit can be reconstructed on to a replacement storage unit by
`XOR’ing the data from the remaining storage units with the
`parity information. Such an arrangement has the advantage
`over the mirrored disk RAID 1 architecture in that only one
`additional storage unit is required for “N” storage units. A
`further aspect of the RAID 3 architecture is that the disk
`drives are operated in a coupled manner, similar to a RAID
`2 system, and a single disk drive is designated as the parity
`unit.
`
`One implementation of a RAID 3 architecture is the
`Micropolis Corporation Parallel Drive Array, Model 1804
`SCSI, that uses four parallel, synchronized disk drives and
`one redundant parity drive. The failure of one of the four
`data disk drives can be remedied by the use of the parity bits
`stored on the parity disk drive. Another example of a RAID
`3 system is described in US. Pat. No. 4,092,732 to Ouchi.
`A RAID 3 disk drive memory system has a much lower
`ratio of redundancy units to data units than a RAID 2 system.
`However, a RAID 3 system has the same performance
`limitation as a RAID 2 system, in that the individual disk
`actuators are coupled, operating in unison. This adversely
`
`UNIFIED PATENTS EXHIBIT 1011
`PAGE 14
`
`UNIFIED PATENTS EXHIBIT 1011
`PAGE 14
`
`
`
`3
`
`4
`
`5,617,530
`
`affects the random access performance of the drive array
`when data files are small, since only one data file at a time
`can be accessed by the “single” actuator. Thus, RAID 3
`systems are generally not considered to be suitable for
`computer systems designed for OLTP purposes.
`A RAID 4 architecture uses the same parity error correc—
`tion concept of the RAID 3 architecture, but improves on the
`performance of a RAID 3 system with respect to random
`reading of small files by “uncoupling” the operation of the
`individual disk drive actuators, and reading and writing a
`larger minimum amount of data (typically, a disk sector) to
`each disk (this is also known as block striping). A further
`aspect of the RAID 4 architecture is that a single storage unit
`is designated as the parity unit.
`A limitation of a RAID 4 system is that Writing a data
`block on any of the independently operating data storage
`units also requires writing a new parity block on the parity
`unit. The parity information stored on the parity unit must be
`read and XOR’d with the old data (to “remove” the infor—
`mation content of the old data), and the resulting sum must
`then be XOR’d with the new data (to provide new parity
`information). Both the data and the parity records then must
`be rewritten to the disk drives. This process is commonly
`referred to as a “Read-Modify-Write” sequence.
`Thus, a Read and a Write on the single parity unit occurs
`each time a record is changed on any of the data storage units
`covered by the parity record on the parity unit. The parity
`unit becomes a bottle-neck to data writing operations since
`the number of changes to records which can be made per
`unit of time is a function of the access rate of the parity unit,
`as opposed to the faster access rate provided by parallel
`operation of the multiple data storage units. Because of this
`limitation, a RAID 4 system is generally not considered to
`be suitable for computer systems designed for OLTP pur-
`poses. Indeed, it appears that a RAID 4 system has not been
`implemented for any commercial purpose.
`A RAID 5 architecture uses the same parity error correc-
`tion concept of the RAID 4 architecture and independent
`actuators, but improves on the writing performance of a
`RAID 4 system by distributing the data and parity informa—
`tion across all of the available disk drives. Typically, “N+l”
`storage units in a set (also known as a “redundancy group”)
`are divided into a plurality of equally sized address areas
`referred to as blocks. Each storage unit generally contains
`the same number of blocks. Blocks from each storage unit
`in a redundancy group having the same unit address ranges
`are referred to as “stripes”. Each stripe has N blocks of data,
`plus one parity block on one storage unit containing parity
`for the remainder of the stripe. Further stripes each have a
`parity block, the parity blocks being distributed on different
`storage units. Parity updating activity associated with every
`modification of data in a redundancy group is therefore
`distributed over the different storage units. No single unit is
`burdened with all of the parity update activity.
`For example, in a RAID 5 system comprising 5 disk
`drives, the parity information for the first stripe of blocks
`may be written to the fifth drive; the parity information for
`the second stripe of blocks may be written to the fourth
`drive; the parity information for the third stripe of blocks
`may be written to the third drive; etc. The parity block for
`succeeding stripes typically “precesses” around the disk
`drives in a helical pattern (although other patterns may be
`used).
`
`Thus, no single disk drive is used for storing the parity
`information, and the bottle-neck of the RAID 4 architecture
`is eliminated. An example of a RAID 5 system is described
`in US. Pat. No. 4,761,785 to Clark et al.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`4o
`
`45
`
`50
`
`55
`
`60
`
`65
`
`As in a RAID 4 system, a limitation of a RAID 5 system
`is that a change in a data block requires a Read-Modify-
`Write sequence comprising two Read and two Write opera-
`tions: the old parity block and old data block must be read
`and XOR’d, and the resulting sum must then be XOR’d with
`the new data. Both the data and the parity blocks then must
`be rewritten to the disk drives. While the two Read opera-
`tions may be done in parallel, as can the two Write opera-
`tions, modification of a block of data in a RAID 4 or a RAID
`5 system still
`takes substantially longer then the same
`operation on a conventional disk. A conventional disk does
`not require the preliminary Read operation, and thus does
`have to wait for the disk drives to rotate back to the previous
`position in order to perform the Write operation. The rota
`tional latency time alone can amount to about 50% of the
`time required for a typical data modification operation.
`Further, two disk storage units are involved for the duration
`of each data modification operation, limiting the throughput
`of the system as a whole.
`Despite the Write performance penalty, RAID 5 type
`systems have become increasingly popular, since they pro-
`vide high data reliability with a low overhead cost for
`redundancy, good Read performance, and fair Write perfor-
`mance. However, it would be desirable to have the benefits
`of a RAID 5 system without the Write performance penalty
`resulting from the rotational latency time imposed by the
`parity update operation.
`The present invention provides such a system.
`
`SUMMARY OF THE INVENTION
`
`The present invention solves the error-correction block
`bottleneck inherent in a RAID 5 architecture by recognition
`that storage unit accesses are intennittent. That is, at various
`times one or more of the storage units in a RAID 5 system
`are idle in terms of access requests by the CPU. This
`characteristic can be exploited by providing a “copyback
`cache” storage unit as an adjunct to a standard RAID system.
`The present invention provides four alternative methods of
`operating such a system.
`In each embodiment, when a Write occurs to the RAID
`system, the data is immediately written to the first available
`location in the copyback cache storage unit. Upon comple—
`tion of the Write to the copyback cache storage unit, the host
`CPU is immediately informed that the Write was successful.
`Thereafter, further storage unit accesses by the CPU can
`continue without waiting for an error—correction block
`update for the data just written.
`In the first embodiment of the invention, during idle time,
`for relevant storage units of the storage system, an error-
`correction block (e.g., XOR parity) is computed for each
`“pending” data block on the copyback cache storage unit,
`and the data block and corresponding error-correction block
`are copied to their proper location in the RAID system.
`Optionally, if a number of pending data blocks are to be
`written to the same stripe, an error-correction block can be
`calculated from all data blocks in the stripe at one time, thus
`achieving some economy of time. In this embodiment, the
`copyback cache storage unit in effect stores “peak load”
`Write data and then completes the actual Read-Modify-
`Write operations to the RAID system during relatively
`quiescent periods of I/O accesses by the CPU.
`In the second embodiment of the invention, after Write
`data is logged to the copyback cache storage unit, normal
`Read-Modify-Write operation by the RAID system control-
`ler continues in overlapped fashion with other CPU I/O
`
`UNIFIED PATENTS EXHIBIT 1011
`PAGE 15
`
`UNIFIED PATENTS EXHIBIT 1011
`PAGE 15
`
`
`
`5,617,530
`
`5
`accesses, using Write data in the array controller’s bufier
`memory. Thus, there are at least two redundant copies of
`each pending data block. Performance is enhanced because
`the CPU can continue processing as soon as the simple Write
`operation to the copyback cache storage unit completes, thus
`eliminating the delay caused by a normal Read-Modify-
`Write RAID system. In this embodiment,
`the copyback
`cache storage unit acts more as a running “log” of Write
`data. Data integrity and redundancy is preserved since the
`Write data is saved to the copyback cache storage unit and
`thus accessible even if the Read-Modify-Write operation
`from the controller buffer to the RAID system never com-
`pletes.
`In the third embodiment of the invention, one or more
`logical arrays of storage units are defined. Each logical array
`of storage units is associated with at least one controller.
`When Write data is presented to a controller, the data is
`immediately stored in a controller bufier. Each controller
`buffer is associated with a unique copyback 'cache storage
`unit. A copy of the data stored in the controller buffer is
`written to its associated copyback cache storage unit. Thus,
`there are at least two redundant copies of each pending data
`block. The CPU is informed that the Write operation was
`successfully completed immediately after the data is written
`to the copyback cache storage unit. Data blocks are only
`read from a copyback cache storage unit upon a failure of the
`associated controller buffer. In one alternative embodiment,
`each copyback cache unit is logically divided into at least
`two logical areas, so that if a copyback cache storage unit
`fails, another copyback cache storage unit associated with
`another logical array can be shared by two logical arrays.
`In the fourth embodiment of the invention, no physical
`storage unit is assigned as a copyback cache unit. Rather, at
`least one logical stripe within each of the storage units that
`comprise the array is reserved for storing a copy of the
`pending blocks of data in the controller buffer. As is the case
`in the third embodiment, pending data blocks are read only
`from the controller buffer during Read—Modify-Write opera—
`tions unless the controller buffer has failed.
`
`The copyback cache storage unit is preferably non-vola-
`tile, so that data will not be lost on a power failure. If the
`copyback cache storage unit is a disk drive, it may be paired
`with a “mirror” storage unit for additional fault tolerance.
`Optionally,
`the copyback cache storage unit may be a
`solid-state storage unit, which can achieve substantially
`faster Write and error-correction block update times than a
`disk drive.
`
`The details of the preferred embodiments of the present
`invention are set forth in the accompanying drawings and
`the description below. Once the details of the invention are
`known, numerous additional innovations and changes will
`become obvious to one skilled in the art.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is block diagram of a copyback cache RAID
`system in accordance with the present invention.
`FIGS. 2A and 2B is a flow-chart of Read and Write
`
`operation in accordance with a first embodiment of the
`present invention.
`FIGS. 3A and 3B is a flow-chart of Read and Write
`
`operation in accordance with a second embodiment of the
`present invention.
`FIG. 4 is a block diagram of a copyback cache RAID
`system in accordance with an alternative embodiment of the
`present invention having multiple controllers.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`4o
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`
`FIGS. 5A and 5B is a high-level flow chart representing
`the steps of Read and Write operations for the embodiment
`of the invention in which a RAID system has multiple
`controllers.
`
`FIG. 6 is a high level flow chart of the steps taken in
`accordance with the present invention when a copyback
`cache storage unit becomes unavailable.
`FIG. 7 is a block diagram of a RAID system in accordance
`with another embodiment of the present invention, in which
`pending data blocks are spread over a number of storage
`units that comprise the array.
`Like reference numbers and designations in the drawings
`refer to like elements.
`
`DETAILED DESCRIPTION OF THE
`INVENTION
`
`Throughout this description, the preferred embodiments
`and examples shown should be considered as exemplars,
`rather than as limitations on the present invention.
`FIG. 1 is block diagram of a copyback cache RAID
`system in accordance with the present invention. Shown are
`a CPU 1 coupled by a bus 2 to an array controller 3, which
`in the preferred embodiment is a fault—tolerant controller.
`The array controller 3 is coupled to each of the plurality of
`storage units Sl—SS (five being shown by way of example
`only) by an I/O bus (e.g., a SCSI bus). The array controller
`3 preferably includes a separately programmable processor
`(for example, the MIPS R3000 RISC processor, made by
`MIPS of Sunnyvale, Calif.) which can act independently of
`the CPU 1 to control the storage units.
`The storage units 81—55 are failure independent, meaning
`that the failure of one unit does not affect the physical
`operation of other units. Optionally, the storage units 81—85
`are disk drive units including solid state storage unit buffers
`7. Such solid state storage unit buffers 7 allow data to be
`written by a controller to a disk drive without encountering
`the delay associated with disk rotational latency of a disk
`drive. Data that is written to the solid state storage unit buffer
`7 is preferably written from the solid state storage unit buffer
`7 to the rotating medium of the disk drive unit as soon as
`possible.
`Also attached to the controller 3 is a copyback cache
`storage unit CC, which in the preferred embodiment is
`coupled to the common I/O bus (e.g., a SCSI bus) so that
`data can be transferred between the copyback cache storage
`unit CC and the storage units 81—55. The copyback cache
`storage unit CC is preferably non-volatile, so that data will
`not be lost on a power failure. If the copyback cache storage
`unit CC is a disk drive, it may be paired with a “mirror”
`storage unit CC' for additional fault tolerance. The mirror
`storage unit CC' is coupled to the controller 3 such that all
`data written to the copyback cache storage unit CC is also
`written essentially simultaneously to the mirror storage unit
`CC‘,
`in known fashion. Optionally,
`the copyback cache
`storage unit CC may be a solid—state storage unit, which can
`achieve substantially faster Write and error-correction block
`update times than a disk drive. In such a case, the solid-state
`storage unit preferably includes error—detection and correc—
`tion circuitry, and is either non-volatile or has a battery
`backup on the power supply



