throbber
| OA
`
`TAI EI TO U KONTUCI
`
`US009881040B2
`
`( 12 ) United States Patent
`Rawat et al .
`
`( 10 ) Patent No . :
`( 45 ) Date of Patent :
`
`US 9 , 881 , 040 B2
`Jan . 30 , 2018
`
`( 54 ) TRACKING DATA OF VIRTUAL DISK
`SNAPSHOTS USING TREE DATA
`STRUCTURES
`( 71 ) Applicant : VMware , Inc . , Palo Alto , CA ( US )
`( 72 ) Inventors : Mayank Rawat , Sunnyvale , CA ( US ) ;
`Ritesh Shukla , Saratoga , CA ( US ) ; Li
`Ding , Cupertino , CA ( US ) ; Serge
`Pashenkov , Los Altos , CA ( US ) ;
`Raveesh Ahuja , San Jose , CA ( US )
`( 73 ) Assignee : VMware , Inc . , Palo Alto , CA ( US )
`Subject to any disclaimer , the term of this
`( * ) Notice :
`patent is extended or adjusted under 35
`U . S . C . 154 ( b ) by 336 days .
`( 21 ) Appl . No . : 14 / 831 , 808
`( 22 ) Filed :
`Aug . 20 , 2015
`Prior Publication Data
`( 65 )
`US 2017 / 0052717 A1 Feb . 23 , 2017
`( 51 ) Int . Ci .
`G06F 3 / 06
`( 2006 . 01 )
`G06F 1730
`( 2006 . 01 )
`( 52 ) U . S . CI .
`CPC . . . . . . . . G06F 17 / 30327 ( 2013 . 01 ) ; G06F 3 / 067
`( 2013 . 01 ) ; G06F 3 / 0608 ( 2013 . 01 ) ; GOOF
`370641 ( 2013 . 01 ) ; G06F 17 / 30088 ( 2013 . 01 )
`Field of Classification Search
`CPC . . . . . . . . . . . . GO6F 17 / 30327 ; G06F 3 / 0608 ; G06F
`370641 ; G06F 3 / 067 ; G06F 17 / 30088
`See application file for complete search history .
`
`( 58 )
`
`( 56 )
`
`References Cited
`U . S . PATENT DOCUMENTS
`7 / 2014 Acharya et al .
`8 , 775 , 773 B2
`9 , 720 , 947 B2 *
`8 / 2017 Aron . . . . . . . . . . . . . . . . . G06F 17 / 30327
`G06F 12 / 1018
`9 , 740 , 632 B1 *
`8 / 2017 Love
`. . . .
`2015 / 0058863 Al
`2 / 2015 Karamanolis et al .
`2016 / 0210302 Al *
`7 / 2016 Xia
`. . . G06F 3 / 0619
`* cited by examiner
`
`Primary Examiner - Eric S Cardwell
`( 74 ) Attorney , Agent , or Firm — Patterson & Sheridan ,
`LLP
`
`( 57 )
`ABSTRACT
`User data of different snapshots for the same virtual disk are
`stored in the same storage object . Similarly , metadata of
`different snapshots for the same virtual disk are stored in the
`same storage object , and log data of different snapshots for
`the same virtual disk are stored in the same storage object .
`As a result , the number of different storage objects that are
`managed for snapshots do not increase proportionally with
`the number of snapshots taken . In addition , any one of the
`multitude of persistent storage back - ends can be selected as
`the storage back - end for the storage objects according to
`user preference , system requirement , snapshot policy , or any
`other criteria . Another advantage is that the storage location
`of the read data can be obtained with a single read of the
`metadata storage object , instead of traversing metadata files
`of multiple snapshots .
`
`20 Claims , 4 Drawing Sheets
`
`wwwwwwwwwwwane
`
`virluai disk
`210
`
`file descriptor 211
`geometry =
`size =
`data _ region - PTR - -
`
`Snapshot Management Data Structure
`
`snapshot _ data = OID !
`snapshot _ metadata = OID2
`snapshot _ log = 01D3
`
`OID1 - PTR1 w
`OID2 = PTR2
`QID3 = PTR3
`$ $ 1 = lagt ; OID2 , offset xD
`SS2 = tag2 : OD2 , offset x2
`SS3 = tag 3
`RP = OID2 , offset xC
`
`Storage Device 152
`
`base
`Wwwwwwwwwwwwwwwwww
`
`VMFS 230
`
`wwwwwwwwwwwww
`Storage Object 1
`-
`- -
`-
`-
`-
`Storage Obiect 2
`how w ww
`| Storage Object 3
`
`Storage Object 1
`
`Slorage Object 2
`
`Storage Object 3
`
`Storage Device 161
`yan wwwwwwwww
`Storage Object 1
`-
`-
`Storage Object 2
`Storage Object 3
`
`WIZ, Inc. EXHIBIT - 1019
`WIZ, Inc. v. Orca Security LTD.
`
`

`

`atent
`
`Jan . 30 , 2018
`
`Sheet 1 of 4
`
`US 9 , 881 , 040 B2
`
`WWWWWWWWWWWWWWWWWWWW
`
`Host Computer System 100
`
`WWWWWWWWWWWWW
`
`WMULA MAJMU
`
`UMMUM
`
`MAKAMU
`
`VM 112
`Applications 118
`
`OS 116
`
`WWMWWMWWWWWWWW
`
`+
`
`+ +
`
`+
`
`+
`
`+
`
`+ 144141 +
`
`+
`
`+
`
`+
`
`+ 44440 +
`
`+
`
`VM 112N
`
`nnnnnnnnnnnn
`
`WWW xxxxxxxxxxxxxxxxxxxxx
`
`KALULUKLUX * * *
`
`. LILIK
`
`*
`
`* * * * * MKMKMHRHMHMK???
`
`* * W
`
`W WXXX * * *
`
`*
`
`* * *
`
`VMM 1221
`
`VMM 122N
`
`SCSI Virtualization Layer 131
`Filesystem
`Snapshot
`Device Switch
`Module
`132
`
`???????
`wwwwwwwwwwwwwwwwwwwwwwwww
`wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww
`
`HESAOLIVSAN Driver 134
`
`+
`
`2 + 3
`
`+
`
`4
`
`+ 1 +
`
`1
`
`+
`
`1
`
`+ + +
`
`BERE
`
`Data Access Layer 136
`
`Hypervisor
`108
`
`CPUS )
`103
`
`Memory
`104
`
`NICIS )
`105
`
`HBA ( S )
`106
`
`HW Platform
`102
`
`179???? ?1?1?17 ??? # ?
`
`# H
`
`#
`
`?? #
`
`#
`
`# ????????
`
`151
`
`152
`
`77702077777777777777777777777777777777777777777777777777777777777
`
`
`
`???????????????????? ???????????? ?? ??????
`
`Storage
`
`Device JuruterCuttuu
`
`161
`
`UUUUUUUUUU
`
`Storage
`Device
`162
`
`??
`
`FIGURE 1
`
`

`

`U . S . Patent
`
`Jan . 30 , 2018
`
`Sheet 2 of 4
`
`US 9 , 881 , 040 B2
`
`virtual disk
`210
`
`file descriptor 211
`geometry =
`size =
`data region = PTR = - -
`
`Snapshot Management Data Structure
`220
`
`Snapshot _ data = OIDI
`snapshot _ metadata = 0102
`snapshot _ log = O1D3
`OID1 = PTR1
`OID2 = PTR2 -
`OID3 = PTR3
`$ $ 1 = tag1 ; OID2 , offset xo
`SS2 = tag2 ; OID2 , offset x2
`SS3 = tag 3
`RP = 01D2 , offset xa
`
`ww
`
`w
`
`w
`
`w
`
`w
`
`w
`
`w
`
`www
`
`wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww
`
`Storage Device 162
`
`wwwwwwwwwwwwwwwwwwwwwwwwwww
`
`base
`
`wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww
`
`VMFS 230
`
`Hoe
`ww
`w
`*
`KWA WA
`1 Storage Object 1 JAAR
`KAKA
`Storage Object 2 *
`| Storage Object 3 want
`
`*
`
`We
`
`w
`w
`Pm w
`wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww
`
`Storage Object 1
`
`Storage Object 2
`
`Storage Object 3
`
`tuwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww
`
`wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww
`
`Storage Device 161
`????
`?????
`????
`????
`???? ????
`1 Storage Object 1
`mm
`- - - - -
`Storage Object 2 1
`Storage Object 3
`w
`
`3
`
`m
`
`wong
`
`???
`
`FIGURE 2
`
`

`

`U . S . Patent
`
`Jan . 30 , 2018
`
`Sheet 3 of 4
`
`US 9 , 881 , 040 B2
`
`MUND
`27
`Se "
`00000
`
`* * * * *
`
`W
`
`* * * * *
`
`* * *
`
`0
`
`A WR2
`
`LBA
`PIR
`OID2 , X10
`OID2 , x2
`3500
`OID2 , X3
`3800
`Atelier AT & C ) C300E
`A1c347633
`A Att51474618914
`BUCHULE
`A317635 Arteriet
`Alt
`€140X16378296
`DEC31634901054
`40°C )
`Hotectie 3 C3ECEDO
`CECSECSESSE
`1926
`ACHILE )
`36963
`48010
`ACCESS * *
`alebo write data
`
`A * *
`
`* * * * * * *
`
`* *
`
`*
`
`*
`
`* * * * * * * * *
`
`* *
`
`* * *
`
`HUWAKAMKAMAKAKAKACHUMAKMAMAKAA
`
`WR3
`
`www membang k
`
`another unit
`pohon
`allocated
`AUTO
`Actress
`Asesorerost
`Af0919x314
`frescore
`Area
`ASTO 15046910161
`oreret
`1982 recor
`WEDS
`et ses
`3 93183xD ) 10x « te
`Her89500 )
`1994€ ) 1031 )
`10 * 2013
`3 * 010
`write data
`to Show write data
`
`SS2
`S $ 1 = 0102 , offset xo
`RP = OID2 , offset x8
`
`WR4
`
`HARAKARAR
`
`pour
`
`another unit
`for
`allocated
`non minumang mga tao ang
`reka hang
`mga mata me
`le
`SCHECHISI 372€ CIS ) CCCC
`HOHEN
`Cre
`1
`56940 )
`Cheeheese
`$ 1831999
`
`?
`* * * *
`) $ 70 CM Chanel
`974 )
`CHE
`ISY
`write data
`to write data
`
`$ S3
`SS2 = OID2 , offset x2
`RP = 01D2 , offset xc
`
`0
`LBA W
`
`7000 8000
`2000 2000 3000 4000
`7000 8000
`enefit
`B
`
`WR1
`
`2
`
`WR3
`
`OID2
`
`OID1
`from
`
`=
`
`SS1
`RP = OID2 , offset 0
`
`* * ) SeNCH
`TEST )
`
`t he unit of allocation
`WRI Why
`TIETO ) 414SPEED torty
`* * * * 5001 PORTO
`FIETSE103107147
`TEREOC1767467562
`St10109
`SO167015
`Ceca * * * TC )
`01010
`OLEDICIESIEC2
`CC1324
`furorch
`4118
`Methane Write data
`
`20000
`
`B + Tree
`
`base , 0
`
`base , 3800
`base , o
`C1D1 , y
`
`base , 3800
`base , o
`0D1 , y1
`OID1 , 0
`base , 3200
`
`000000
`base ,
`base ,
`base , 0
`1
`0101 , 0 OD1 , y1 3800
`7900
`base , 3200
`OD1 , y3
`
`base ,
`base ,
`base , o
`7900
`OID1 , 0 OID1 , y1 3800
`OD1 , y3
`base , 3200
`
`0
`
`4
`
`N
`
`PRUNAKAN
`
`5 Y
`
`2
`
`4
`
`Y
`
`LBA
`PIR
`0 : 02 ,
`3000
`OD ? x5
`3200
`002 , xi
`0 : 02 , x2 3500
`0 : 02 , 43
`3800
`TOID2 . ó
`7700
`0102x7 7900
`1213141516171810
`PTR
`LBA
`{ } } , x9
`0102 , 46
`3000
`v ) : x 316
`01D2 .
`3500
`0102 , x3
`3800
`OID2 , 07700
`C2 , x
`790
`Luwwwwwwwwwwwwwwwwwwwwwwwww
`PERY
`
`W
`
`WW
`
`S
`
`G
`
`nu (
`
`3
`
`Y
`
`N www
`
`FIGURE 3
`
`base ,
`base ,
`7900
`OID1 , 71 3800
`OD1 , 0
`OID1 , 13
`base , 3200
`
`7 09
`B
`?
`?
`mwenental
`OD1 , y4 base , 300
`
`

`

`U . S . Patent
`
`Jan . 30 , 2018
`
`Sheet 4 of 4
`
`US 9 , 881 , 040 B2
`
`402
`
`404
`
`VM
`Power On
`
`Read SMDS
`
`YYYYYYYYYYYYYYYYYYYYYY MYYYYYYYYYYYYYYYYYYYYYYYY
`
`Open storage
`objects
`
`Establish running
`point ( RP )
`
`* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
`
`FIGURE 4A
`RE
`
`Read 10
`
`422
`
`Access snapshot
`metadata at RP
`+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 424
`Traverse tree
`beginning at RP to
`locate data
`
`11111111111111111111111111111111111111111111111
`
`1111111111111111111111111111111111111111111111111111
`
`426
`
`VM
`Snapshot
`
`Set running point as root
`node of previous snapshot
`
`Create node for
`new running point
`
`412
`
`414
`
`nnnnnnnnnnnn
`
`416
`
`Copy contents of root node
`of previous snapshot into the
`new running point node and
`mark all pointers as pointing
`to shared nodes
`
`FIGURE 4B
`
`Write 10
`
`Access snapshot
`metadata at RP
`
`Traverse tree
`beginning at RP to
`find write location
`
`and update tree nnnnnnnnnnnnnnnnnnnnnnnnnn
`
`Issue write
`command to write
`data at the
`location
`
`FIGURE 4D
`
`432
`
`434
`
`436
`
`Issue read
`command to read
`data from the
`location
`
`*
`
`*
`
`*
`
`*
`
`*
`
`* 241424344477777XXIIIIIIIIII
`
`I I111111111111111111111
`
`?????????????????????????
`
`*
`
`FIGURE 4C
`
`

`

`US 9 , 881 , 040 B2
`
`TRACKING DATA OF VIRTUAL DISK
`SNAPSHOTS USING TREE DATA
`STRUCTURES
`
`object , which may take the form of a file in a host file
`system , a file in a network file system , and object storage
`provisioned as a virtual storage area network ( SAN ) object ,
`a virtual volume object , or a cloud storage object . Similarly ,
`5 metadata of different snapshots for the same virtual disk are
`BACKGROUND
`stored in the same storage object , and log data of different
`snapshots for the same virtual disk are stored in the same
`In a virtualized computing environment , virtual disks of
`storage object . As a result , the number of different storage
`virtual machines ( VMs ) running in a host computer system
`objects that are managed for snapshots do not increase
`( " host " ) are typically represented as files in the host ’ s file
`system . To back up the VM data and to support linked VM 10 proportionally with the number of snapshots taken . In addi
`clones , snapshots of the virtual disks are taken to preserve
`tion , any one of the multitude of persistent storage back - ends
`the VM data at a specific point in time . Frequent backup of
`can be selected as the storage back - end for the storage
`VM data increases the reliability of the VMs . The cost of
`objects containing data for the snapshots . As a result , the
`frequent backup , i . e . , taking frequent snapshots , is high
`form of the storage objects containing data for the snapshots
`because of the increase in associated storage costs and 15 may be selected according to user preference , system
`adverse impact on performance , in particular read perfor -
`requirement , snapshot policy , or any other criteria . Another
`mance because each read will have to potentially traverse
`advantage is that the storage location of the read data can be
`each snapshot level to find the location of the read data .
`obtained with a single read of the metadata storage object ,
`Solutions have been developed to reduce the amount of
`instead of traversing metadata files of multiple snapshots .
`storage consumed by snapshots . For example , snapshots can 20
`FIG . 1 is a computer system , shown as host computer
`be backed up incrementally by comparing blocks from one
`system 100 , having a hypervisor 108 installed on top of
`version to another and only the blocks that have changed
`hardware platform 102 to support the execution of virtual
`from the previous version are saved . Deduplication has also
`machines ( VMs ) 1121 - 112n through corresponding virtual
`been used to identify content duplicates among snapshots to
`machine monitors ( VMMs ) 122 , - 122x . Host computer sys
`25 tem 100 may be constructed on a conventional , typically
`remove redundant storage content .
`Although these solutions have reduced the storage
`server - class , hardware platform 102 , and includes one or
`requirements of snapshots , further enhancements are needed
`more central processing units ( CPUs ) 103 , system memory
`104 , one or more network interface controllers ( NICs ) 105 ,
`for effective deployment in cloud computing environments
`where the number of VMs and snapshots that are managed
`and one or more host bus adapters ( HBAs ) 106 . Persistent
`is quite large , often several orders of magnitude times 30 storage for host computer system 100 may be provided
`greater than deployment in conventional data centers . In
`locally , by a storage device 161 ( e . g . , network - attached
`addition , storage technology has advanced to provide a
`storage or cloud storage ) connected to NIC 105 over a
`multitude of persistent storage back - ends , but snapshot
`network 151 or by a storage device 162 connected to HBA
`technology has yet to fully exploit the benefits that are
`106 over a network 152 .
`provided by the different persistent storage back - ends .
`Each VM 112 implements a virtual hardware platform in
`the corresponding VMM 122 that supports the installation of
`a guest operating system ( OS ) which is capable of executing
`BRIEF DESCRIPTION OF THE DRAWINGS
`applications . In the example illustrated in FIG . 1 , the virtual
`FIG . 1 is a block diagram of a virtualized host computer
`hardware platform for VM 112 , supports the installation of
`system that implements a snapshot module according to 40 a guest OS 116 which is capable of executing applications
`118 within VM 112 , . Guest OS 116 may be any of the
`embodiments .
`FIG . 2 is a schematic diagram that illustrates data struc -
`well - known commodity operating systems , such as Micro
`tures for managing virtual disk snapshots according to an
`soft Windows® , Linux® , and the like , and includes a native
`file system layer , for example , either an NTFS or an ext3FS
`embodiment .
`FIG . 3 is a schematic diagram that illustrates additional 45 type file system layer . Input - output operations ( IOs ) issued
`data structures , including B + trees , for managing virtual disk
`by guest OS 116 through the native file system layer appear
`to guest OS 116 as being routed to one or more virtual disks
`snapshots according to an embodiment .
`FIG . 4A depicts a flow diagram of method steps that are
`provisioned for VM 112 , for final execution , but such IOs
`carried out in connection with opening storage objects that
`are , in reality , reprocessed by IO stack 130 of hypervisor 108
`are needed to manage snapshots according to an embodi - 50 and the reprocessed IOs are issued through NIC 105 to
`storage device 161 or through HBA 106 to storage device
`ment .
`FIG . 4B depicts a flow diagram of method steps that are
`162 .
`At the top of IO stack 130 is a SCSI virtualization layer
`carried out in connection with taking snapshots according to
`131 , which receives IOs directed at the issuing VM ' s virtual
`an embodiment .
`FIG . 4C depicts a flow diagram of method steps that are 55 disk and translates them into IOs directed at one or more
`carried out to process a read IO on a virtual disk having one
`storage objects managed by hypervisor 108 , e . g . , virtual disk
`or more snapshots that have been taken according to an
`storage objects representing the issuing VM ' s virtual disk . A
`file system device switch ( FDS ) driver 132 examines the
`embodiment .
`FIG . 4D depicts a flow diagram of method steps that are
`translated IOs from SCSI virtualization layer 131 and in
`carried out to process a write IO on a virtual disk having one 60 situations where one or more snapshots have been taken of
`or more snapshots that have been taken according to an
`the virtual disk storage objects , the IOs are processed by a
`snapshot module 133 , as described below in conjunction
`embodiment .
`with FIGS . 4C and 4D .
`The remaining layers of IO stack 130 are additional layers
`65 managed by hypervisor 108 . HFS / VVOL / NSAN driver 134
`represents one of the following depending on the particular
`implementation : ( 1 ) a host file system ( HFS ) driver in cases
`
`DETAILED DESCRIPTION
`According to embodiments , user data of different snap -
`shots for the same virtual disk are stored in the same storage
`
`35
`
`

`

`US 9 , 881 , 040 B2
`
`2 , 3 , storage objects 1 , 2 , 3 are identified by their object
`where the virtual disk and / or data structures relied on by
`identifiers ( OIDs ) in the embodiments . SMDS provides a
`snapshot module 133 are represented as a file in a file
`mapping of each OID to a location in storage . In SMDS 220 ,
`system , ( 2 ) a virtual volume ( VVOL ) driver in cases where
`OID1 is mapped to PTR1 , OID2 mapped to PTR2 , and OID3
`the virtual disk and / or data structures relied on by snapshot
`module 133 are represented as a virtual volume as described 5 mapped to PTR3 . Each of PTR1 , PTR2 , and PTR3 may be
`in U . S . Pat . No . 8 , 775 , 773 , which is incorporated by refer -
`a path to a file in HFS 230 or a uniform resource identifier
`ence herein in its entirety , and ( 3 ) a virtual storage area
`( URI ) of a storage object .
`network ( VSAN ) driver in cases where the virtual disk
`SMDS is created per virtual disk and snapshot module
`and / or data structures relied on by snapshot module 133 are
`133 maintains the entire snapshot hierarchy for a single
`represented as a VSAN object as described in U . S . patent 10 virtual disk in the SMDS . Whenever a new snapshot of a
`application Ser . No . 14 / 010 , 275 , which is incorporated by
`virtual disk is taken , snapshot module 133 adds an entry in
`reference herein in its entirety . In each case , driver 134
`the SMDS of that virtual disk . SMDS 220 shows an entry for
`receives the IOs passed through filter driver 132 and trans -
`each of snapshots SS1 , SS2 , SS3 . Snapshot SS1 is the first
`lates them to IOs issued to one or more storage objects , and
`snapshot taken for virtual disk 210 and its entry includes a
`provides them to data access layer 136 which transmits the 15 tag ( tagl ) that contains searchable information about snap
`IOs to either storage device 161 through NIC 105 or storage
`shot SS1 and a pointer to a root node of a B + tree that records
`device 162 through HBA 106 .
`locations of the snapshot data for snapshot SS1 . Snapshot
`It should be recognized that the various terms , layers and
`SS2 is the second snapshot taken for virtual disk 210 and its
`categorizations used to describe the virtualization compo -
`entry includes a tag ( tag2 ) that contains searchable infor
`nents in FIG . 1 may be referred to differently without 20 mation about snapshot SS2 and a pointer to a root node of
`departing from their functionality or the spirit or scope of the
`a B + tree that records locations of the snapshot data for
`invention . For example , VMMs 122 may be considered
`snapshot SS2 . Snapshot SS3 is the third snapshot taken for
`separate virtualization components between VMs 112 and
`virtual disk 210 and its entry includes a tag ( tag3 ) that
`hypervisor 108 ( which , in such a conception , may itself be
`contains searchable information about snapshot SS3 . The
`considered a virtualization “ kernel ” component ) since there 25 pointer to a root node of a B + tree that records locations of
`exists a separate VMM for each instantiated VM . Alterna -
`the snapshot data for snapshot SS3 is added to the entry for
`tively , each VMM may be considered to be a component of
`snapshot SS3 when the next snapshot is taken and the
`its corresponding virtual machine since such VMM includes
`contents of snapshot SS3 are frozen . The contents of the
`the hardware emulation components for the virtual machine .
`nodes of all B + trees are stored in the snapshot metadata
`It should also be recognized that the techniques described 30 storage object . Accordingly , the pointer in the entry for
`herein are also applicable to hosted virtualized computer
`snapshot SS1 indicates OID2 as the storage object contain
`systems . Furthermore , although benefits that are achieved
`ing the B + tree for snapshot SS1 and offset x0 as the location
`may be different , the techniques described herein may be
`of the root node . Similarly , the pointer in the entry for
`applied to certain non - virtualized computer systems .
`snapshot SS2 indicates OID2 as the storage object contain
`FIG . 2 is a schematic diagram that illustrates data struc - 35 ing the B + tree for snapshot SS2 and offset x2 as the location
`tures for managing virtual disk snapshots according to an
`of the root node .
`embodiment . In the embodiment illustrated herein , the vir -
`SMDS also specifies a running point RP , which is a
`tual disk for a VM ( shown in FIG . 2 as virtual disk 210 ) is
`pointer to a root node of a B + tree that is traversed for reads
`assumed to be a file that is described by a file descriptor in
`and writes that occur after the most recent snapshot was
`the host file system ( shown in FIG . 2 as file descriptor 211 ) . 40 taken . Each time snapshot module 133 takes a snapshot ,
`Each file descriptor of a virtual disk contains a pointer to a
`snapshot module 133 adds the running point to the entry of
`data region of the virtual disk in storage . In the example of
`the immediately prior snapshot as the pointer to the root
`FIG . 2 , file descriptor 211 contains the pointer PTR which
`node of the B + tree thereof , and creates a new running point
`points to a base data region in storage device 162 . In the
`in the manner further described below .
`description that follows , this base data region in storage 45
`FIG . 3 is a schematic diagram that illustrates additional
`device 162 is referred to as " base " and locations within this
`data structures , including B + trees , for managing virtual disk
`base data region are specified with an offset . In other
`snapshots according to an embodiment . FIG . 3 depicts the
`embodiments , the virtual disk may be represented as a
`logical block address ( LBA ) space of virtual disk 210 , the
`VVOL object , a VSAN object , or other types of object stores
`snapshot data storage object ( OID1 ) , and the snapshot
`known in the art , and described using associated descriptor 50 metadata storage object ( OID2 ) , in linear arrays beginning at
`objects .
`offset 0 . FIG . 3 also schematically illustrates B + trees
`In addition to file descriptor 211 , the data structures for
`associated with each of SS1 and SS2 , the first having root
`node 0 and the second having root node 8 .
`managing snapshots include a snapshot management data
`timeline is depicted along the left side of FIG . 3 and
`structure ( SMDS ) 220 , storage object 1 which contains
`actual data written to virtual disk 210 after a snapshot has 55 various events useful for illustrating the embodiments , such
`been taken for virtual disk 210 ( hereinafter referred to as
`as snapshots ( e . g . , SS1 , SS2 , SS3 ) and writes ( WR1 , WR2 ,
`“ the snapshot data storage object ” ) , storage object 2 which
`WR3 , WR4 ) are depicted along this timeline . Alongside
`contains metadata about the snapshots taken for virtual disk
`each of these events , FIG . 3 also illustrates the changes to
`210 ( hereinafter referred to as “ the snapshot metadata stor -
`the contents of the snapshot data storage object ( OID1 ) and
`age object " ) , and storage object 3 which is used to record 60 the snapshot metadata storage object ( OID2 ) , and the B +
`snapshot metadata operations for crash consistency ( herein -
`trees .
`after referred to as “ the snapshot log storage object ” ) .
`The first event is a snapshot of virtual disk 210 , SS1 . In
`Storage objects 1 , 2 , 3 are depicted herein as object stores
`the example described herein , this snapshot is the very first
`within storage device 162 , but may be files of HFS 230 or
`snapshot of virtual disk 210 , and so snapshot module 133
`a network file system in storage device 161 . Storage objects 65 creates SMDS 220 , which specifies the storage locations for
`1 , 2 , 3 may be also be object stores in a cloud storage device .
`the snapshot data storage object ( OID1 ) , the snapshot meta
`Regardless of the type of storage backing storage objects 1 ,
`data storage object ( OID2 ) , and the snapshot log storage
`
`

`

`US 9 , 881 , 040 B2
`
`identifies storage location = OID2 and offset = x3 as the
`object ( OID3 ) . Snapshot module 133 also sets the running
`pointer to node 3 , a beginning LBA of 3800 , and a P flag
`point RP to be at node 0 ( whose contents are stored at
`indicating that it points to a private node . Private nodes are
`storage location = OID2 , offset x0 ) , and updates node 0 to
`those nodes whose contents may be overwritten without
`include a single pointer to the base data region of virtual disk
`210 . Thus , initially , subsequent to the event SS1 , snapshot 5 preserving the original contents . On the other hand , when a
`module 133 directs all read IOs ( regardless of the LBA range
`write IO targets an LBA and a shared node is traversed to
`targeted by the read IO ) to the base data region of virtual
`find the data location corresponding to the targeted LBA , the
`disk 210 .
`contents of the shared node need to be preserved and a new
`The second event is a write IO to virtual disk 210 , WR1 .
`node created . The handling of shared nodes is described
`In the example of FIG . 3 , WR1 is a write IO into the virtual 10 below in conjunction with the write IO , WR4 .
`disk at LBA = 3500 and has a size that spans 300 LBAs .
`The B + tree on the right side of FIG . 3 schematically
`According to embodiments , instead of overwriting data in
`illustrates the relationship of the nodes that are maintained
`the base data region of virtual disk 210 , the write data of
`in the snapshot metadata storage object after each event
`WR1 is written into the snapshot data storage object through
`depicted in FIG . 3 . The B + tree to the right of WR1 shows
`the following steps .
`that node 0 now points to nodes 1 , 2 , 3 , and nodes 1 , 2 , 3
`First , snapshot module 133 allocates an unused region in
`the snapshot data storage object . The size of this allocation
`point to data regions that together span the entire LBA range
`is based on a unit of allocation that has been configured for
`spanned by the base data region of virtual disk 210 . Node 1
`the snapshot storage object . The unit of allocation is 4 MB includes a pointer to the base data region of virtual disk 210
`in this example , but may be changed by the snapshot 20 at an offset equal to 0 . Node 2 includes a pointer to the
`administrator . For example , the snapshot administrator may
`snapshot data storage object at an offset equal to yl ( = 500 )
`set the unit of allocation to be larger ( > 4 MB ) if the snapshot
`Node 3 includes a pointer to the base data region of virtual
`data storage object is backed by a rotating disk array or to
`disk 210 at an offset equal to 3800 .
`be smaller ( < 4 MB ) if the snapshot data storage object is
`The third event is a write IO to virtual disk 210 , WR2 . In
`backed by solid state memory such as flash memory . In 25 the example of FIG . 3 , WR2 is a write IO into virtual disk
`addition , in order to preserve the spatial locality of the data ,
`at LBA = 3000 and has a size that spans 200 LBAS . As with
`snapshot module 133 allocates each region in the snapshot
`WR1 , instead of overwriting data in the base data region of
`data storage object to span a contiguous range of LBAS
`virtual disk 210 , the write data of WR1 is written into the
`( hereinafter referred to as the “ LBA chunk ” ) of the virtual
`snapshot data storage object through the following steps .
`disk beginning at one of the alignment boundaries of the 30
`First , snapshot module 133 detects that LBA at offset
`virtual disk , for example , and integer multiples of ( unit of
`3000 has been allocated already . Therefore , snapshot mod
`allocation ) / ( size of one LBA ) . In the example of FIG . 3 , the
`ule 133 issues a write command to the snapshot data storage
`size of one LBA is assumed to be 4 KB . Accordingly , the
`object to store the write data of WR2 in the allocated region
`very first allocated region in the snapshot data storage object
`000 35 at an offset equal to 0 . The offset is 0 because the LBA 3000
`spans 1000 LBAs and the alignment boundary is at 3000 , 35 al
`falls on an alignment boundary . Then , snapshot module 133
`because WR1 is a write IO into the LBA range beginning at
`creates two additional nodes , nodes 4 , 5 , and adds two
`offset 3500 .
`Second , snapshot module 133 issues a write command to
`pointers to these two nodes in node 0 . More specifically , a
`the snapshot data storage object to store the write data of
`first new entry in node O identifies storage location = OID2
`WR1 in the allocated region at an offset equal to an offset 40 and offset = x4 as the pointer to node 4 , a beginning LBA of
`from an alignment boundary of the LBA chunk spanned by
`0 , and a P flag indicating that it points to a private node , and
`the allocated region . In the example of FIG . 3 , the allocated
`a second new entry
`in
`node 0
`identifies storage
`region spans LBA range 3000 - 3999 , and so snapshot module
`location = OID2 and offset = x5 as the pointer to node 5 , a
`133 issues a write command to the snapshot data storage
`beginning LBA of 3000 , and a P flag indicating that it points
`object to store the write data ( having a size equal to 1 . 2 45 to a private node . Snapshot module 133 also modifies the
`MB = 300x4 KB ) in the allocated region at an offset equal to
`beginning LBA for the pointer to node 1 from 0 to 3200 .
`500 from the beginning of the allocated region . The offset
`The B + tree to the right of WR2 shows that node 0 now
`from the beginning of the snapshot data storage object is also
`points to nodes 4 , 5 , 1 , 2 , 3 , and nodes 4 , 5 , 1 , 2 , 3 point to
`500 ( shown in Figure as y1 ) because the allocated region is
`data regions that together span the entire LBA range spanned
`the very first allocated region of the snapshot data storage 50 by the base data region of virtual disk 210 . Node 4 includes
`object .
`a pointer to the base data region of virtual disk 210 at an
`Third , snapshot module 133 updates the snapshot meta -
`offset equal to 0 . Node 5 includes a pointer to the snapshot
`data storage object at an offset equal to 0 . Node 1 includes
`data of virtual disk 210 ( in particular , the snapshot metadata
`storage object , OID2 ) by creating three additional nodes ,
`a pointer to the base data region of virtual disk 210 at an
`nodes 1 , 2 , 3 , and overwrites the contents of node 0 to 55 offset equal to 3200 . Node 2 includes a pointer to the
`convert node 0 from a leaf node ( which points to data ) to an
`snapshot data storage object at an offset equal to y1 ( = 500 ) .
`index node ( which points to one or more other nodes ) , so
`Node 3 includes a pointer to the base data region of virtual
`that node ( includes the following information : ( i ) pointers
`disk 210 at an offset equal to 3800 .
`to nodes 1 , 2 , 3 , ( ii ) a beginning LBA for each pointer , and
`The fourth event is a write IO to virtual disk 210 , WR3 .
`( iii ) a private / shared flag for each pointer . More specifically , 60 In the example of FIG . 3 , WR3 is a write IO into virtual disk
`node O has three entries , one entry for each pointer . The first
`at LBA = 7700 and has a size that spans 200 LBAS . As with
`entry identifies storage location = OID2 and offset = xl as the
`WR1 and WR2 , instead of overwriting data in the base data
`pointer to node 1 , a beginning LBA of 0 , and a P flag
`region of virtual disk 210 , the write data of WR3 is written
`indicating that it points to a private node . The second entry
`into the snapshot data storage object through the following
`identifies storage location = OID2 and offset = x2 as the 65 steps .
`pointer to node 2 , a beginning LBA of 3500 , and a P flag
`First , snapshot module 133 allocates a new unused region
`indicating that it points to a private node . The third entry
`in the snapshot data storage object because the previously
`
`

`

`US 9 , 881 , 040 B2
`allocated region does not span the LBA targeted by WR3 . In
`spanned by the newly allocated region . In the example of
`the example of FIG . 3 , the size of the newly allocated region
`FIG . 3 , the newly allocated region spans LBA range 0000
`0999 , and so snapshot module 133 issues a write command
`is again 4 MB .
`Second , s

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket