`
`United States Patent
`Fair et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 7,334,095 B1
`*Feb. 19, 2008
`
`USOO7334095B1
`
`(54) WRITABLE CLONE OF READ-ONLY
`VOLUME
`
`(75) Inventors: Robert L. Fair, Cary, NC (US); John
`K. Edwards, Sunnyvale, CA (US)
`(73) Assignee: Network Appliance, Inc., Sunnyvale,
`CA (US)
`-
`0
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 395 days.
`This patent is Subject to a terminal dis
`claimer.
`
`(*) Notice:
`
`(21) Appl. No.: 10/836,112
`
`Apr. 30, 2004
`
`(22) Filed:
`(51) Int. Cl.
`(2006.01)
`G06F 2/16
`(52) U.S. Cl. .........r irrir. 711A161
`(58) Field of Classification Search ................ 711/112,
`711/161: 707/202, 203, 10, 204; 709/217
`See application file for complete search history.
`References Cited
`
`(56)
`
`U.S. PATENT DOCUMENTS
`4,156,907 A
`5/1979 Rawlings et al.
`4,399,503 A
`8/1983 Hawley
`4,570,217 A
`2f1986 Allen et al.
`4,598.357 A
`7, 1986 Swenson et al.
`2.
`r
`Nakamura et al.
`s’ - sy
`S.
`4,761,785 A
`8, 1988 Clark et al.
`4,805,090 A
`2/1989 Coogan
`4,837,675 A
`6, 1989 Bean et al.
`4,864,497 A
`9/1989 Lowry et al.
`
`1/1990 Jacobs et al.
`4,896,259 A
`2f1990 Potter et al.
`4,899,342 A
`1/1991 Dunphy, Jr. et al.
`4,989,206 A
`REG f 88: Milian et al.
`SS
`5,155,835 A 10/1992 Belsan
`5,163,131 A 11, 1992 R.
`tal.
`5,202,979 A
`4, 1993 RNA.
`5,278,979 A
`1/1994 Foster et al.
`5.426,747 A
`6/1995 Weinreb et al.
`5,581,724. A 12/1996 Belsan et al.
`5,819,292 A * 10/1998 Hitz et al. .................. 707/2O3
`5,963,962 A * 10/1999 Hitz et al. .................. 707/2O2
`6,636,879 B1
`10/2003 Doucette et al.
`
`(Continued)
`FOREIGN PATENT DOCUMENTS
`
`WO
`
`11, 1989
`WO 89,105.94
`OTHER PUBLICATIONS
`Administration Guide found at http://www.openafs.org/pages/doc/
`AdminGuide?auagd010.htm, visited on Mar. 2, 2005.
`(Continued)
`Primary Examiner Pierre Bataille
`Assistant Examiner Paul Schlie
`(74) Attorney, Agent, or Firm—Cesari and McKenna LLP
`(57)
`ABSTRACT
`
`A system and method creates a writable clone of a read-only
`Volume. A base Snapshot is generated on a source Volume on
`a source storage system and is duplicated as a read-only base
`Snapshot replica on a target Volume on a destination storage
`system. A copy ("clone) is then Substantially instantaneously
`created from the read-only base Snap-shot replica, thereby
`creating a writable clone of a read-only Volume.
`
`29 Claims, 14 Drawing Sheets
`
`AGGREGATE 1200
`PARENT
`WWOL
`1205
`
`BASE
`SNAPSHOT
`1235
`
`
`
`
`
`PARENT
`CONTAINER
`MAPY-.
`1245
`LEVEL 1
`NDIRECT
`BOCK
`
`SNAPSHOT
`CONANER
`MAP. .
`1245
`
`.
`EWE
`NDRECT
`BLOCK
`
`EWEL
`NDIREC
`BLOCK
`
`LEWE
`NREC
`BOCK
`
`
`
`
`
`
`
`
`
`EWELO
`DATA
`BLOCK
`
`LEVELO
`DATA
`BLOCK
`
`SNAPSHOT
`WOLNFO
`BLOCK
`
`
`
`LEVELO
`DATA
`BOCK
`
`Docker EX1016
`Page 1 of 31
`
`
`
`US 7,334,095 B1
`Page 2
`
`U.S. PATENT DOCUMENTS
`
`6,721,764 B2 * 4/2004 Hitz et al. .................. 707/2O2
`6,868,417 B2 * 3/2005 Kazar et al. ...
`... TO7/10
`7,035,881 B2 * 4/2006 Tummala et al. .
`... 707,204
`7,085,785 B2* 8/2006 Sawdon et al. ...
`... 707,204
`2002/01 12022 A1* 8, 2002 KaZar et al. ...
`... 709,217
`
`
`
`OTHER PUBLICATIONS
`Basilico, et al., Error Correction System Using "Shadow Memory,”
`IBM Technical Disclosure Bulletin, May 1984, pp. 5792-5793.
`Bitton, Dina, Disk Shadowing, Proceedings of the 14" VLDB
`Conference, LA, CA (1988).
`Blasgen,
`M.W. et
`al.,
`System
`R.An
`architectural
`Overview.Reprinted from IBM Systems Journal vol. 20, No. 1, 1981
`(C) 1981, 1999.
`Borenstein, Nathaniel S. CMU's Andrew project a retrospective,
`Communications of ACM, (39) 12, Dec. 1996.
`Brown, Mark R. et al., The Alpine file system, ACM Transactions on
`Computing Systems, 3(4):261-293, Nov. 1985.
`Chen, Peter M., et al. An Evaluation of Redundant Arrays of Disks
`Using an Amdahl 5890 Performance Evaluation, pp. 74-85, 1990.
`Chutani, Sailesh, et al., The Episode file system. In Proceedings of
`the USENIX Winter 1992.
`Clark, B.E., et al., Application System /400 Performance Charac
`teristics, IBM Systems Journal, 28(3):407-423, 1989.
`Data Sheet for the Check Point Software Technologies product
`Flood-Gate-1 (1997).
`Dibble, Peter C. et al., Beyond Striping: The Bridge Multiprocessor
`File System, Computer Science Department, University of Roch
`ester, Aug. 11, 1989.
`Douglis, Fred, et al. A comparison of two distributed systems.
`Amoeba and Sprite Computing Systems, 4(4), Fall 1991, pp.
`353-385 ?{copy of article I have has no date or cite}.
`Gait, Jason, Phoenix. A Safe In-Memory File System. Communica
`tions of the ACM, 33(1):81-86, Jan. 1990.
`Hartman, John H. et al., Performance Measurements of a Multi
`processor Sprite Kernel, Proceedings of the USENIX Conference,
`1990.
`Hitz, Dave et al., File System Design for an NFS File Server
`Appliance, Technical Report 3002, Rev. C395, presented Jan. 19,
`1994.
`Howard, John H. et al. Scale and Performance in a Distributed File
`System, Carnegie Mellon University, CMU-ITC-87-068, Aug. 5,
`1987.
`Howard, John, H. et al., Scale and performance in a distributed file
`system, ACM Trans. Computer. System. 6(1), Feb. 1988 pp. 51-81.
`Howard, John H. An Overview of the Andrew File System, Carnegie
`Mellon University, CMU-ITC-88-062.
`The IBM System/38, Chapter 8, pp. 137-157.
`Isomaki, Markus, Differentiated Service for the Internet, Depart
`ment of Technical Physics and Mathematics, May 9, 1998.
`Kazar, Michael L., et al., Decorum File System Architectural
`Overview, USENIX Summer Conference, Anaheim, California,
`1990.
`Lomet, David., et al., The performance of a multiversion access
`method, ACM SIGMOD International Conference on Management
`of Data, 19:353–363.
`Lorie, Raymond, A. Physical integrity in a large segmented data
`base, ACM Trans. Database Systems, (2) 1: 91-104. Mar. 1977.
`Lorie, Ra, Shadow Page Mechanism, IBM Technical Disclosure
`Bulletin, Jun. 1986, pp. 340-342.
`McKusick, Marshall Kirk, et al., A Fast File System for UNIX.
`Computer Science Division, Department of Electrical Engineering
`and Computer Sciences, Univ. of CA, Berkley, Feb. 18, 1994.
`Miller, Ethan L., et al., RAMAA File System for Massively Parallel
`Computers, 12" IEEE Symposium on Mass Storage Systems,
`Monterey CA, Apr. 1993, pp. 163-168.
`Moons, Herman et al., Location-Independent Object Invocation in
`Open Distributed Systems, Autumn 1991 EurOpen Technical Con
`ference and Exhibition, pp. 287-300 (Sep. 16-20, 1991).
`
`Morris, James H. etal, Andrew. A Distributed Personal Computing
`Environment, Comm. of the ACM, vol. 29, Mar. 1986, pp. 184-201.
`Mullender, Sape J., et al. A distributed file service based on
`optimistic concurrency control, ACM Symposium on Operating
`System Principles (Orcas Island, Washington). Published as Oper
`ating Systems Review, 19(5):51-62, Dec. 1985.
`Muller, Keith, et al., A High Performance Multi-Structured File
`System Design. In Proceedings of the 13th ACM Symposium on
`Operating Systems Principles, Oct. 1991, pp. 56-67.
`Ousterhout, John K. et al., The Sprite Network Operating System,
`Computer Science Division, Department of Electrical Engineering
`and Computer Sciences, Univ. of CA, Berkley, Nov. 19, 1987.
`Ousterhout, John et al., Beating the I/O Bottleneck. A Case for
`Log-Structured File Systems, Technical Report, Computer Science
`Division, Electrical Engineering and Computer Sciences, Univer
`sity of California at Berkeley, Oct. 30, 1988.
`Ousterhout, John, Why Aren't Operating Systems Getting Faster as
`Fast as Hardware?, Digital WRL Technical Note TN-11, Oct. 1989.
`Ousterhout, John, A Brief Retrospective On The Sprite Network
`Operating System, found at http://www.cs.berkeley.edu/projects/
`sprite/retrospective.html, visited on Mar. 11, 2005.
`Patterson, D., et al., A Case for Redundant Arrays of Inexpensive
`Disks (RAID), Technical Report CSD-87-391, Computer Science
`Division, Electrical Engineering and Computer Sciences, Univer
`sity of California at Berkeley (1987).
`Patterson, D., et al., A Case for Redundant Arrays of Inexpensive
`Disks (RAID), SIGMOD International Conference on Manage
`ment of Data, Chicago, IL, USA, Jun. 1-3, 1988, Sigmod Record
`(17)3:109-16 (Sep. 1988).
`Peterson, Zachary Nathaniel Joseph, Data Placement for Copy-On
`Write using Virtual Contiguity, University of CA, Santa Cruz,
`Master of Science in Computer Science Thesis, Sep. 2002.
`Quinlan, Sean, A Cached WORM File System, Software-Practice
`and Experience, 21(12): 1289-1299 (1991).
`Redundant Array of Independent Disks, from Wikipedia, the free
`encyclopeda, found at http://en.wikipedia.org/wiki/RAID, visited
`on Mar. 9, 2005.
`Rosenberg, J., et al., Stability in a Persistent Store Based on a Large
`Virtual Memory. In Security and Persistence, Rosenber, J. & Keedy,
`J.L. (ed), Springer-Verlag (1990) pp. 229-245.
`Rosenblum, Mendel, et al., The LFS Storage Manager, Computer
`Science Division, Electrical Engin. And Computer Sciences, Univ.
`of CA, presented at Summer 90 USENIX Technical Conference,
`Anaheim, CA Jun. 1990.
`Rosenblum, Mendel, et al. The Design and Implementation of a
`Log-Structured File System Jul. 24, 1991 pp. 1-15.
`Rosenblum, Mendel, et al., The Design and Implementation of a
`Log-Structured File System, . In Proceedings of ACM Transactions
`on Computer Systems, (10) 1:26-52, Feb. 1992.
`Sandberg, Russel et al., Design and implementation of the Sun
`Network Filesystem. In Proc. Summer 1985 USENIX Conf. pp.
`119-130, Portland OR (USA), Jun. 1985.
`Santry, Douglas S., et al., Deciding When to Forget in the Elephant
`File System, Operating Systems Review, 34(5), (Dec. 1999) pp.
`110-123.
`Satyanarayanan, M., et al. The ITC Distributed File System.
`Principles and Design. In Proceedings of the 10th ACM Sympo
`sium on Operating Systems Principles, (19)5:56-67, Dec. 1985.
`Satyanarayanan, M. A survey of distributed file-systems. Annual
`Review of Computing Science, 4(73-104), 1989.
`Satyanarayanan, M., et al. Coda. A highly available file system for
`a distributed workstation environment Carnegie Mellon University,
`CMU-ITC.
`Satyanarayanan, M., et al. Coda. A highly available file system for
`a distributed workstation environment. IEEE Transactions on Com
`puters, 39(4):447-459, 1990.
`Satyanarayanan, Mahadev, Scalable, Secure, and Highly Available
`Distributed File Access, Computer May 1990:9-21.
`Sidebotham, Bob, Volumes. The Andrew File System Data Struc
`turing Primitive, EEUG Conference Proceedings, Manchester, UK,
`Autumn 1986.
`User Guide found at http://www.openafs.org/pages/doc/UserGuide/
`auusg004.htm, visited on Mar. 2, 2005.
`
`Docker EX1016
`Page 2 of 31
`
`
`
`US 7,334,095 B1
`Page 3
`
`Welch, Brent B., et al., Pseudo Devices. User-Level Extensions to
`the Sprite File System, Computer Science Division, Department of
`Electrical Engineering and Computer Sciences, Univ. of CA,
`Berkley, Apr. 1988.
`Welch, Brent B., et al., Pseudo-File-Systems, Computer Science
`Division, Department of Electrical Engineering and Computer
`Sciences, Univ. of CA, Berkley, Oct. 1989.
`Wittle, Mark, et al., LADDIS. The next generation in NFS file server
`benchmarking, USENIX Association Conference Proceedings, Apr.
`1993.
`Akyurek, Sedat, Placing Replicated Data to Reduce Seek Delays,
`Department of Computer Science, University of Maryland,
`UMIACS-TR-91-121, CS-TR-2746, Aug. 1991.
`Bitton, Dina, Disk Shadowing, Proceedings of the 14" VLDB
`Conference, LA, CA 1988.
`Chaudhuri, Surajit, et al., Self-Tuning Technology in Microsoft SQL
`Server, Data Engineering Journal 22, Feb. 1999 pp. 20-27.
`Coyne, Robert A. et al., Storage Systems for National Information
`Assets, Proc. Supercomputing 92, Minneapolis, Nov. 1992, pp.
`626-633.
`Finlayson, Ross S., et al., Log Files. An Extended File Service
`Exploiting Write-Once Storage Department of Computer Science,
`Stanford University, Report No. STAN-CS-87-1177, Sep. 1987.
`Gray, Jim, et al., The Recovery Manager of the System R Database
`Manger, ACM Computing Surveys, (13)2:223-242 1981.
`Hecht, Matthew S., et al. Shadowed Management of Free Disk
`Pages with a Linked List, ACM Transactions on Database Systems,
`vol. 8, No. 4, Dec. 1983, pp. 503-5 14.
`Howard, John, H. et al., Scale and Performance in a Distributed
`File System, Carnegie Mellon University, CMU-ITC-87-068, Aug.
`1987.
`Howard, John H. An Overview of the Andrew File System, Carnegie
`Mellon University, CMU-ITC-88-062 1988.
`Kazar, Michael Leon, Synchronization and Caching Issues in the
`Andrew File System, Carnegie Mellon University, CMU-ITC-88
`O63.
`Kazar, Michael L., et al., DEcorum File System Architectural
`Overview, USENIX Summer Conference, Anaheim, California,
`1990.
`Kemper, Alfons, et al., Performance Tuning for SAP R/3, Data
`Engineering Journal 22, Feb. 1999 pp. 33-40.
`Kent, Jack et al., Optimizing Shadow Recovery Algorithms, IEEE
`Transactions on Sofiware Engineering, 14(2): 155-168, Feb. 1988.
`Kistler, et al., Disconnected Operation in the Coda File System,
`ACM Transactions on Computer Systems, vol. 10, No. 1, Feb. 1992,
`pp. 3-25.
`Lorie, Raymond, A. Physical Integrity in a Large Segmented
`Database, ACM Trans. Database Syst., vol. 2, Mar. 1977, pp.
`91-104.
`
`Patterson, D., et al., A Case for Redundant Arrays of Inexpensive
`Disks (RAID), Technical Report, CSD-87-391. Computer Science
`Division, Electrical Engineering and Computer Sciences, Univer
`sity of California at Berkeley 1987.
`Patterson, D., et al., A Case for Redundant Arrays of Inexpensive
`Disks (RAID), SIGMOD International Conference on Manage
`ment of Data, Chicago, IL, USA, Jun. 1-3, 1988, Sigmod Record
`(17)3: 109-16 Sep. 1988.
`Peterson, Zachary Nathaniel Joseph, Data Placement for Copy-On
`Write. Using Virtual Contiguity, University of CA, Santa Cruz,
`Master's Thesis for the Department of Science in Computer Sci
`ence, Sep. 2002.
`Quinlan, Sean, A Cached WORM File System, Software-Practice
`and Experience, 21(12): 1289-1299 1991.
`Rosenblum, Mendel, et al., The LFS Storage Manager, Computer
`Science Division, Electrical Engineering And Computer Sciences,
`Univ. of CA, presented at Summer 90 USENIX Technical Confer
`ence, Anaheim, CA Jun. 1990.
`Rosenblum, Mendel, et al. The Design and Implementation of a
`Log-Structured File System Jul. 24, 1991 pp. 1-15.
`Rosenblum, Mendel. The Design and Implementation of a Log
`Structured File System, 1992 pp. 1-93.
`Rosenblum, Mendel, et al., The Design and Implementation of a
`Log-Structured File System. In Proceedings of ACM Transactions
`on Computer Systems, (10) 1:26-52, Feb. 1992.
`Schiefer, Berni, et al., DB2 Universal Database Performance Tun
`ing, Data Engineering Journal 22, Feb. 1999 pp. 12-19.
`Seltzer, Margo I., et al., Journaling Versus Sofi Updates. Asynchro
`nous Meta-Data Protection in File Systems, Proceedings of 200
`USENIX Annual Technical Conference, Jun. 18-23, 2000.
`Shasha, Dennis, Tuning Time Series Queries in Finance. Case
`Studies and Recommendations, Data Engineering Journal 22, Feb.
`1999 pp. 41-47.
`Subramanian, Muralidhar, et al., Performance Challeneges in
`Object-Relational DBMSs, Data Engineering Journal 22, Feb. 1999
`pp. 28-32.
`Weikum, Gerhard, et al., Towards Self-Tuning Memory Manage
`ment for Data Servers, Data Engineering Journal 22, Feb. 1999 pp.
`3-11.
`West, Michael, et al. The ITC Distributed File System. Prototype
`and Experience, Carnegie-Mellon University, Technical Report
`CMU-ITC-040, Mar. 1985.
`Zayas, Edward R. AFS-3 Programmer's Reference: Architectural
`Overview, Transarc Corporation, Pittsburgh, PA, 1.0 edition 1991.
`* cited by examiner
`
`Docker EX1016
`Page 3 of 31
`
`
`
`U.S. Patent
`
`Feb. 19, 2008
`
`Sheet 1 of 14
`
`US 7,334,095 B1
`
`100
`
`
`
`
`
`
`
`
`
`NODE FOR
`NODE FILE
`105
`
`NODE FILE
`INDIRECT
`BLOCK
`110
`
`INDIRECT
`BLOCK
`119
`
`FIG. 1
`(PRIOR ART)
`
`- - - - - - - - - - - -
`
`INDIRECT
`BLOCK
`
`
`
`Docker EX1016
`Page 4 of 31
`
`
`
`U.S. Patent
`
`Feb. 19, 2008
`
`Sheet 2 of 14
`
`US 7,334,095 B1
`
`
`
`
`
`
`NODE FOR
`NODE FILE
`105
`
`
`
`
`
`
`
`SNAPSHOT
`NODE
`205
`
`NODE FILE
`INDIRECT
`BLOCK
`110
`
`INDIRECT
`BLOCK
`119
`
`FIG. 2
`(PRIOR ART)
`
`Docker EX1016
`Page 5 of 31
`
`
`
`U.S. Patent
`
`Feb. 19, 2008
`
`Sheet 3 of 14
`
`US 7,334,095 B1
`
`SNAPSHOT
`NODE
`205
`
`|NODE FILE
`INDIRECT
`BLOCK
`110
`
`NODE
`117
`
`
`
`INDIRECT
`BLOCK
`119
`
`300
`-
`
`NODE FOR
`NODE FILE
`305
`
`NODE FILE
`INDIRECT
`BLOCK
`310
`
`NODE
`317
`
`INDIRECT
`BLOCK
`319
`
`FIG. 3
`(PRIOR ART)
`
`Docker EX1016
`Page 6 of 31
`
`
`
`U.S. Patent
`
`Feb. 19, 2008
`
`Sheet 4 of 14
`
`US 7,334,095 B1
`
`-400
`
`SOURCE
`SSAGE
`420s
`
`PROCESSOR
`422
`
`MEMORY 424
`STORAGE
`OPERATING
`SYSTEM
`500
`
`BUFFER
`CACHE
`470
`
`NETWORK
`ADAPTER
`426
`
`STORAGE
`ADAPTER
`428
`
`DESTINATION
`STORAGESYSTEM
`420D
`
`C CA C C CD
`DISK DISK DISK
`430
`430
`ARRAY 460s
`
`COA (CCA Cld
`DISKIDISKIDISK
`
`430
`430
`ARRAY 460D
`
`
`
`COMPUTER NETWORK
`440
`
`
`
`CLIENT
`410
`
`APPLICATION
`412
`
`
`
`
`
`CLIENT
`410
`
`APPLICATION
`412
`
`FIG. 4
`
`Docker EX1016
`Page 7 of 31
`
`
`
`U.S. Patent
`
`Feb. 19, 2008
`
`Sheet S of 14
`
`US 7,334,095 B1
`
`0 6 G = T ( C) O. W X S | Q /\
`
`
`
`009
`
`Docker EX1016
`Page 8 of 31
`
`
`
`U.S. Patent
`
`Feb. 19, 2008
`
`Sheet 6 of 14
`
`US 7,334,095 B1
`
`
`
`META DATASECTION
`
`610
`
`-TYPE 612
`-SIZE 614
`-TIME STAMPS 616
`-UD 618
`-GD 620
`
`DATASECTION
`
`650
`
`DIRTY BIT 660
`
`DEST
`Ph
`
`DESTINATION STORAGESYSTEM
`SNAPHOT
`
`DESTINATION
`(CLONE: 1155)
`
`SOURCE
`Path
`
`DESTINATION STORAGESYSTEM
`SNAshot
`
`DESTINATION
`DEST: V2 - CLONE: 1155)
`
`FIG. 14
`
`Docker EX1016
`Page 9 of 31
`
`
`
`U.S. Patent
`
`Feb. 19, 2008
`
`Sheet 7 of 14
`
`US 7,334,095 B1
`
`POINTER
`705
`
`POINTER
`705
`
`INDIRECTBLOCK
`704
`
`
`
`
`
`POINTER
`705
`
`POINTER
`705
`
`INDIRECTBLOCK
`704
`
`POINTER
`705
`
`POINTER
`705
`
`LEVEL
`BLOCKS
`
`
`
`
`
`LEVEL
`O
`BLOCKS
`
`Docker EX1016
`Page 10 of 31
`
`
`
`U.S. Patent
`
`Feb. 19, 2008
`
`Sheet 8 of 14
`
`US 7,334,095 B1
`
`
`
`go
`ed
`o
`
`H
`at
`CD
`U
`1.
`CD
`CD
`a.
`
`g
`
`-
`
`g
`
`Docker EX1016
`Page 11 of 31
`
`
`
`U.S. Patent
`
`Feb. 19, 2008
`
`Sheet 9 of 14
`
`US 7,334,095 B1
`
`LOOY
`
`AXOLOAIG
`
`026
`
`N3QadlH
`
`VIAW
`
`VLVd
`
`LOOY
`
`AYXOLOAUIG
`
`OJNISJ
`
`MOO018
`
`v6
`
`YANMO
`
`dvW
`
`001}
`
`AAILOV
`
`dvW
`
`C16
`
`dv
`
`916
`
`1004
`
`AMOLOSUIC
`
`026
`
`N3daqdlH
`
`VLIW
`
`Vivd
`
`1OO¥Y
`
`ANOLOSYIG
`
`0€6
`
`ACONI
`
`OSNISA
`OANIIOA
`
`ald
`
`906
`
`MO0718
`
`v06
`
`40078
`
`c06
`
`006
`
`OANISS
`
`49078
`
`v06
`
`6‘Sls
`
`JOVdS
`OANINOA 0S6TOAA
`AAILOVO4ANISS
`
`dv49018490718
`
`0860S6IOAA
`296bS6056
`
`
`
`766SNLVLSANINSSO/ANNNO-
`
`
`
`966SLVLSONVALILN3GI-
`
`
`
`266SWVNIOAA-
`
`Ov6AldWALSASI14
`
`
`
`066S14T38V1SOVYOLS
`
`Docker EX1016
`Page 12 of 31
`
`Docker EX1016
`Page 12 of 31
`
`
`
`
`
`
`
`U.S. Patent
`
`Feb. 19, 2008
`
`Sheet 10 of 14
`
`US 7,334,095 B1
`
`:
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`NODE NUMBER
`113
`
`1002
`
`INDIREBlock
`1004
`
`LEVEL 1
`INDIRECTBLOCK 1004
`
`LEVELO
`DATABLOCK
`1006
`
`
`
`LEVELO
`DATABLOCK 1006
`Fbn 2000
`
`CONTAINER
`ME
`
`CONTAINERFILE 1000
`
`FIG. 10
`
`
`
`PVBN -- (WID, WBN)
`(113, 2000)
`
`PVBN-> (WID, WBN)
`
`FIG. 11
`
`OWNER MAP
`1100
`
`1110
`
`Docker EX1016
`Page 13 of 31
`
`
`
`U.S. Patent
`
`Feb. 19, 2008
`
`Sheet 11 of 14
`
`US 7,334,095 B1
`
`
`
`
`
`
`
`Ov2)ATsYANIVLNOOLOHSdVNSOL2b31dYANIVLNODLNAYVd
`
`JOM
`
`INauVd 00¢1
`LOHSdVNSWSINVHOAW
`3svaLOHSd¥NS
`SezbCOZ
`
`LOHSdYNSINAYVd
`LOawIONi|°°*|LOSHIGNI|73051|73031|
` piZipizi|49078yoore||
`
`ALVOIYDOV
`SSC
`
`OANINIOA
`
`49074
`
`8rZL
`
`
`
`YaNIVLNOSYANIVLNOD
`
`dv
`
`Docker EX1016
`Page 14 of 31
`
`Docker EX1016
`Page 14 of 31
`
`
`
`
`
`U.S. Patent
`
`Feb. 19, 2008
`
`Sheet 12 of 14
`
`US 7,334,095 B1
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Docker EX1016
`Page 15 of 31
`
`
`
`U.S. Patent
`
`Feb. 19, 2008
`
`Sheet 13 of 14
`
`US 7,334,095 B1
`
`
`
`
`
`Docker EX1016
`Page 16 of 31
`
`
`
`U.S. Patent
`
`Feb. 19, 2008
`
`Sheet 14 of 14
`
`US 7,334,095 B1
`
`START Y.1500
`
`GENERATE BASE SNAPSHOT OF
`PARENT WOLONSOURCE STORAGE SYSTEM
`
`1502
`
`DUPLICATE BASESNAPSHOTAS READ-ONLY L-1504
`SNAPSHOT REPLICAON DESTINATION
`STORAGESYSTEM
`
`CREATE
`NEW WOL"CLONE" INCLUDING CONTAINERFILE
`
`1506
`
`CREATE NEW STORAGE LABEL FLE AND NEW
`SUBDIRECTORY NAGGREGATE FOR CLONE
`
`1508
`
`CREATE
`MODIFIED WOLINFOBLOCK
`
`WRITE MODIFIED WOLINFOBLOCK
`TO CONTAINERFILE
`
`PROPAGATE CLONE SOFTLOCKFROM
`DESTINATION STORAGESYSTEM TO SOURCE
`STORAGE SYSTEM
`
`
`
`1510
`
`1512
`
`1514
`
`INSTANTIATECLONE
`
`1516
`
`
`
`
`
`
`
`
`
`
`
`SERVICE STORAGE OPERATIONS DIRECTED
`TO WRITABLE CLONE OF READ-ONLY VOLUME
`
`1518
`
`END
`
`1520
`
`F.G. 15
`
`Docker EX1016
`Page 17 of 31
`
`
`
`US 7,334,095 B1
`
`1.
`WRITABLE CLONE OF READ-ONLY
`VOLUME
`
`CROSS-REFERENCE TO RELATED
`APPLICATIONS
`
`The present invention is related to the following com
`monly assigned U.S. patent application Ser. No. 10/837,254
`titled, Cloning Technique for Efficiently Creating a Copy of
`a Volume in a Storage System, filed herewith.
`
`10
`
`FIELD OF THE INVENTION
`
`The present invention relates to storage systems and,
`more specifically, to a technique that enables efficient copy
`ing of a read-only Volume of a storage system.
`
`15
`
`BACKGROUND OF THE INVENTION
`
`2
`assigns sequences offbns on a per-file basis, whereas vbns
`are assigned over a larger Volume address space. The file
`system organizes the data blocks within the vbn Space as a
`“logical volume'; each logical volume may be, although is
`not necessarily, associated with its own file system. The file
`system typically consists of a contiguous range of vbns from
`Zero to n, for a file system of size n-1 blocks.
`A known type of file system is a write-anywhere file
`system that does not overwrite data on disks. If a data block
`is retrieved (read) from disk into a memory of the storage
`system and “dirtied (i.e., updated or modified) with new
`data, the data block is thereafter stored (written) to a new
`location on disk to optimize write performance. A write
`anywhere file system may initially assume an optimal layout
`Such that the data is Substantially contiguously arranged on
`disks. The optimal disk layout results in efficient access
`operations, particularly for sequential read operations,
`directed to the disks. An example of a write-anywhere file
`system that is configured to operate on a storage system is
`the Write Anywhere File Layout (WAFLTM) file system
`available from Network Appliance, Inc., Sunnyvale, Calif.
`The storage operating system may further implement a
`storage module. Such as a RAID system, that manages the
`storage and retrieval of the information to and from the disks
`in accordance with input/output (I/O) operations. The RAID
`system is also responsible for parity operations in the storage
`system. Note that the file system only “sees the data disks
`within its vbn space; the parity disks are “hidden' from the
`file system and, thus, are only visible to the RAID system.
`The RAID system typically organizes the RAID groups into
`one large “physical disk (i.e., a physical volume), such that
`the disk blocks are concatenated across all disks of all RAID
`groups. The logical volume maintained by the file system is
`then “disposed over” (spread over) the physical volume
`maintained by the RAID system.
`The storage system may be configured to operate accord
`ing to a client/server model of information delivery to
`thereby allow many clients to access the directories, files and
`blocks stored on the system. In this model, the client may
`comprise an application, such as a database application,
`executing on a computer that “connects to the storage
`system over a computer network, such as a point-to-point
`link, shared local area network, wide area network or virtual
`private network implemented over a public network, Such as
`the Internet. Each client may request the services of the file
`system by issuing file system protocol messages (in the form
`of packets) to the storage system over the network. By
`Supporting a plurality of file system protocols, such as the
`conventional Common Internet File System (CIFS) and the
`Network File System (NFS) protocols, the utility of the
`storage system is enhanced.
`When accessing a block of a file in response to servicing
`a client request, the file system specifies a vbn that is
`translated at the file system/RAID system boundary into a
`disk block number (dbn) location on a particular disk (disk,
`dbn) within a RAID group of the physical volume. Each
`block in the vbn Space and in the dbn Space is typically fixed,
`e.g., 4 kbytes (kB), in size; accordingly, there is typically a
`one-to-one mapping between the information stored on the
`disks in the dbn Space and the information organized by the
`file system in the vbn space. The (disk, dbn) location
`specified by the RAID system is further translated by a disk
`driver system of the storage operating system into a plurality
`of sectors (e.g., a 4 kB block with a RAID header translates
`to 8 or 9 disk sectors of 512 or 520 bytes) on the specified
`disk.
`
`A storage system typically comprises one or more storage
`devices into which information may be entered, and from
`which information may be obtained, as desired. The storage
`system includes a storage operating system that functionally
`organizes the system by, inter alia, invoking storage opera
`tions in Support of a storage service implemented by the
`system. The storage system may be implemented in accor
`dance with a variety of storage architectures including, but
`not limited to, a network-attached storage environment, a
`storage area network and a disk assembly directly attached
`to a client or host computer. The storage devices are typi
`cally disk drives organized as a disk array, wherein the term
`“disk’ commonly describes a self-contained rotating mag
`netic media storage device. The term disk in this context is
`synonymous with hard disk drive (HDD) or direct access
`storage device (DASD).
`Storage of information on the disk array is preferably
`implemented as one or more storage “volumes of physical
`disks, defining an overall logical arrangement of disk space.
`The disks within a volume are typically organized as one or
`more groups, wherein each group may be operated as a
`Redundant Array of Independent (or Inexpensive) Disks
`(RAID). Most RAID implementations enhance the reliabil
`ity/integrity of data storage through the redundant writing of
`data 'stripes' across a given number of physical disks in the
`RAID group, and the appropriate storing of redundant
`information (parity) with respect to the striped data. The
`physical disks of each RAID group may include disks
`configured to store striped data (i.e., data disks) and disks
`configured to store parity for the data (i.e., parity disks). The
`parity may thereafter be retrieved to enable recovery of data
`lost when a disk fails. The term "RAID and its various
`implementations are well-known and disclosed in A Case for
`Redundant Arrays of Inexpensive Disks (RAID), by D. A.
`Patterson, G. A. Gibson and R. H. Katz, Proceedings of the
`International Conference on Management of Data (SIG
`MOD), June 1988.
`The storage operating system of the storage system may
`implement a high-level module. Such as a file system, to
`logically organize the information stored on the disks as a
`hierarchical structure of directories, files and blocks. For
`example, each “on-disk” file may be implemented as set of
`data structures, i.e., disk blocks, configured to store infor
`mation, such as the actual data for the file. These data blocks
`are organized within a Volume block number (vbn) space
`that is maintained by the file system. The file system may
`also assign each data block in the file a corresponding “file
`offset or file block number (fbn). The file system typically
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`Docker EX1016
`Page 18 of 31
`
`
`
`US 7,334,095 B1
`
`10
`
`15
`
`25
`
`30
`
`35
`
`3
`The requested block is then retrieved from disk and stored
`in a buffer cache of the memory as part of a buffer tree of the
`file. The buffer tree is an internal representation of blocks for
`a file stored in the buffer cache and maintained by the file
`system. Broadly stated, the buffer tree has an inode at the
`root (top-level) of the file. An inode is a data structure used
`to store information, Such as metadata, about a file, whereas
`the data blocks are structures used to store the actual data for
`the file. The information contained in an inode may include,
`e.g., ownership of the file, access permission for the file, size
`of the file, file type and references to locations on disk of the
`data blocks for the file. The references to the locations of the
`file data are provided by pointers, which may further refer
`ence indirect blocks that, in turn, reference the data blocks,
`depending upon the quantity of data in the file. Each pointer
`may be embodied as a vbn to facilitate efficiency among the
`file system and the RAID system when accessing the data on
`disks.
`The RAID system maintains information about the geom
`etry of the underlying physical disks (e.g., the number of
`blocks in each disk) in raid labels stored on the disks. The
`RAID system provides the disk geometry information to the
`file system for use when creating and maintaining the
`vbn-to-disk.dbn mappings used to perform write allocation
`operations and to translate vbns to disk locations for read
`operations. Block allocation data structures, such as an
`active map, a Snapmap, a space map and a Summary map, are
`data structures that describe block usage within the file
`system, Such as the write-anywhere file system. These
`mapping data structures are independent of the geometry
`and are used by a write allocator of the file system as existing
`infrastructure for the logical volume.
`Specifically, the Snapmap denotes a file including a bit
`map associated with the vacancy of blocks of a Snapshot.
`The write-anywhere file system (such as the WAFL file
`system) has the capability to generate a Snapshot of its active
`file system. An “active file system” is a file system to which
`data can be both written and read, or, more generally, an
`active store that responds to both read and write I/O opera
`tions. It should be noted that 'snapshot' is a trademark of
`40
`Network Appliance, Inc. and is used for purposes of this
`patent to designate a persistent consistency point (CP)
`image. A persistent consistency point image (PCPI) is a
`space conservative, point-in-time read-only image of data
`accessible by name that provides a consistent image of that
`data (such as a storage system) at Some previous time. More
`particularly, a PCPI is a point-in-time representation of a
`storage element, such as an active file system, file or
`database, stored on a storage device (e.g., on disk) or other
`persistent memory and having a name or other identifier that
`distinguishes it from other PCPIs taken at other points in
`time. In the case of the WAFL file system, a PCPI is always
`an active file system image that contains complete informa
`tion about the file system, including all metadata. A PCPI
`can also include other information (metadata) about the
`active file system at the particular point in time for which the
`image is taken. The terms “PCPI and “snapshot may be
`used interchangeably throughout this patent without dero
`gation of Network Appliance's trademark rights.
`The active map denotes a file including a bitmap associ
`ated with a free status of the active file system. As noted, a
`logical volume may be associated with a file system; the
`term “active file system' thus also refers to a consistent state
`of a current file system. The Summary map denotes a file
`including an inclusive logical OR bitmap of all Snap-maps.
`By examining the active and Summary maps, the file system
`can determine whether a block is in use by either the active
`
`50
`
`45
`
`55
`
`60
`
`65
`
`4
`file system or any Snapshot. The space map denotes a file
`including an array of numbers that describe the number of
`storage blocks used in a block allocation area. In other
`words, the space map is essentially a logical OR bitmap
`between the active and Summary maps to provide a con
`densed version of available “free block” areas within the vbn
`space. Examples of Snapshot and block allocation data
`structures, such as the active map, space map and Summary
`map, are described in U.S. Patent Application Publication
`No. US2002/0083037 A1, titled Instant Snapshot, by Blake
`Lewis et al. and published on Jun. 27, 2002, which appli
`cation is hereby incorporated by reference.
`The write-anywhere file system typically performs write
`allocation of blocks in a logical Volume in response to an
`event in the file system (e.g., dirtying of the blocks in a file).
`When write allocating, the file system uses the block allo
`cation data structures to select free blocks within its vbn
`space to which to write the dirty blocks. The selected blocks
`are generally in the same positions



