I met a relatively new NetApp customer a few weeks back for discussing the best practices around vSphere and NetApp. While going through the some of the stuff, he brought up a point around VSC Backup & Recovery (SMVI) that when he tried restoring a complete Datastore from a NetApp Snapshot backup, he found it to be much faster than restoring a VM from the same Snapshot. I explained to him that while the Datastore restore utilizes SnapRestore which simply reverts the pointers of the volume to a previous point in time resulting in near-instantaneous restores, the VM restore utilizes something called Single File SnapRestore (SFSR) which copies back the files from the snapshot copy to the Active File System.
So the time taken to restore a single VM is dependent upon the size of the VM. I also shared with him a great workaround to achieve instant restores of VM by mounting the backup through VSC, adding the VM in the mounted backup to the inventory, powering it on and using Storage vMotion to move it wherever one wants. My colleague Keith Aasen (also a fellow Canadian) has documented the process here https://communities.netapp.com/docs/DOC-10862
While the above process is great for instant restores, wouldn’t it be nice if the SFSR process itself was faster?
Clustered ONTAP did not have SFSR capability until Data ONTAP 8.1. So we had a chance to rewrite the SFSR code and guess what we did? We utilized our file/sparse file level cloning (which we also use in RCU and VAAI offloads) functionality as a backend for SFSR. What this means is that with SFSR in Clustered ONTAP, we can create clones of files within the snapshot for quick file level restores that are significantly faster than SFSR in 7-mode. These SFSR operations have a great use case not only in VMware environments where VMDKs are big in size but also a lot of other cases (think VHDs, DB files etc.). The change is transparent to the user or the application utilizing SFSR and therefore there is no need to modify anything in the application or in the way one does a SFSR.
I took a few screenshots from a Clustered ONTAP environment to demonstrate this. I used an environment build out of Clustered ONTAP simulator with vSphere 5.1 and VSC 4.1. In the environment I had a VM with two partitions, one 4GB and other approx. 30GB in size. The 30GB sized partition was made of eager zeroed thick VMDK with approx. 28GB of data in it.
For demonstration purpose, I browsed the datastore and deleted both the VMDKs to induce a corruption.
As you can see in the screenshot above, it took around 70 seconds to restore a 34GB VM with approx. 31.4GB data in it. And this is on a SIMULATOR!
As part of the my testing, when I tried restoring a smaller VM, it took the same time. So it’s quite possible that it would take around the same for most of the VMs with different sizes. The time may increase by a few seconds if there was a VMware Snapshot involved in the process since it means an extra step of reverting the VMware Snapshot as part of the recovery.
This change in SFSR capability allows customers to use either the restore button built into VSC or the instant restore method described above with almost the same results.
For me this is a welcome change and it also indicates continuous innovation from NetApp. A lot has changed in Data ONTAP since its inception in 1992 and it’ll continue to change bringing in more innovation. What will not change is the Data ONTAP DNA and NetApp’s strategy to put Data Center Efficiency (including Operational Efficiency) first. I would like to caution people who think that NetApp has stopped innovating just because it still uses Data ONTAP and it still uses FAS controllers. I would any day choose innovation over changing the names of products.
Who would you choose? A company that innovates or a company that just changes the marketing collateral?
Satinder Sharma (@storarch)