VVols is ‘THE’ solution to VM awareness for many.
Yes, we have been waiting for it for a long time now and we are still unsure about its whereabouts.
For those of you who want to understand why there is a need for VM awareness, there are a lot of blogs on this topic by some of the best in the industry. Stephen Foskett covered it in three parts – Part1, Part2, Part 3. Tintri has a great infographic explaining it on a page here and in a blog post here .
VVols is bringing in a new type of model which basically helps one define policies and data services at a VM level, getting granular than the current model used by traditional storage devices, which is at a Volume/LUN level and at the same time preventing the IO Blender situation to an extent.
In my opinion, adding VVols to vSphere is a great step by VMware but it is definitely only a small part of the solution. In fact, I think it is just an enabler and there is a lot that is needed at the underlying storage level to make it an ideal VM aware storage. Let’s dig more into this.
A VM aware storage should ideally bring in an object based storage where a VM is the object. It should not involve any Pool, RAID Group, Volume/LUN creation and management. All that the IT team should care about is creating and managing VMs, automating provisioning as per policies, defining service levels and enabling the business.
On the contrary, traditional storage concentrates a lot on LUN/Volume/RAID constructs and doesn’t treat VM as an object. With VVols things won’t change from a storage architecture perspective. Doing that would need a lot of work on traditional architectures. The fact that the storage vendors already have deployments where the data is not stored in VMs, forces them to use the same architecture which doesn’t distinguish between I/O going to file/LUN Vs a VM. So the end result is general-purpose storage being used for Virtualized workloads and VVols is not going to change that.
The problem of management scalability…..
If we go by the reports on how VVols will be implemented for SAN, every VM would require at least 3 LUNs to be created (could be 5-10 based on deployment excluding Snapshot etc). Which means, 1000 VMs would need 3000 LUNs and 10,000 VMs would need 30,000 LUNs. So we are going in the reverse direction as far as VM consolidation ratios per LUN is concerned. There is no choice except if one is running vSphere over NFS. Even with that, the credible NFS implementations still have no awareness beyond volumes – to the extent of requiring at least ‘x’ number of volumes to be able to perform to the potential of a storage controller. Even if they enable data services (snapshot/clones etc.) at a VM level, the I/O would have no VM awareness, VMs will be treated just as files.
VM visibility at the storage level…..
Not having VM visibility at the storage level is one of the biggest reasons why every virtualization performance problem ends up pointing towards the Storage Guy. The challenge here is that the Storage team has no idea of what is going on at the VM level. The result is an investment in expensive monitoring softwares that need a lot of effort to pinpoint the problem. VVols are not going to make this simpler. Although with VVols, every VM will point to a group of LUNs or files (in case of NFS), the storage products have no visibility whatsoever on the overall IO path (Host, Network, Storage) and what is causing the latency (Bottleneck Visualization). I think a VM aware storage should ideally be able to help with first level of troubleshooting including (but not limited to) which layer is slowing down the IO to a particular VM – Host, Network or Storage or which VM is contending with which VM etc.
The Headroom Challenge….
How many times have you seen a performance problem resulting from the fact that you didn’t know how much headroom the system has? VVols in no way are going to enable customers to understand Headroom. A VM Aware Storage product ideally should show how much each VM is consuming out of the overall resources hence enabling customers to understand the Headroom. It is up to the storage vendor to implement something like that and then also expose it higher up in the layer.
Who sent that IO?
From a Performance/IO standpoint, a VM Aware storage should ideally understand which IO is originating from which VM and how it is doing on the path to and from the storage with every VM/vdisk getting its own queue to the various resources (Network, NVRAM, Memory/Cache, CPU, Flash/Disk). It is the second part (independent queuing), which is difficult to implement/change in a traditional storage. Giving every VM its own queue to the resources brings in fair share and QoS. It means that a VM that does big/bursty IOs is not holding up a VM that does small IOs but is sensitive about latency. It allows storage products to implement a better limiting/guaranteed performance based QoS and at the same time, making it easy to troubleshoot performance problems.
VVols implementation would require the storage to implement protocol end points, which will make the storage aware of which IO goes to which LUN/File but the queuing and VM/Network awareness would require a big change at the storage level.
Working Set Size…
The other important functionality that VM awareness really drives is calculation of the working set at VM level and allocating Flash Storage based on that. There may be VMs that need 5GB of Flash to achieve <1ms latency and their could be another VM that may need 40GB of Flash to achieve the same latency. Calculating that with accuracy needs an understanding of IO at a VM level. There are external softwares like Cloud Physics that enable that but having that built into the storage gives a big advantage of maximizing the use of Flash. A lot of storage companies are using Flash to stretch the life of the traditional storage architectures but either they use reactive tiering or traditional caching algorithms (based on recency or FIFO) with no awareness as far as VMs are concerned. The result is that when using Flash with traditional caching algorithms, only around 30-50% of the IOs hit the Flash media on an average. With Reactive tiering, the percentage is even lower. Some vendors just throw in an all-flash solution at the problem. It works but doesn’t have same economy (without risking running out of capacity) as a Flash based solution that has HDDs built into it with true VM awareness that drives 99% of the IO from Flash.
The reality is that Virtualized Datacenters still have a lot of data that doesn’t need Flash. At this stage when the Flash is still 10x the cost of HDD, it is important that the storage solution uses the Flash capacity more intelligently. Even going all-flash for certain workloads doesn’t discount the need for a VM Aware Storage since raw performance is only a part of the VM awareness challenge.
I think the real benefit that majority of the vendors may be able to deliver with VVols is the ability to enable Data Services at a VM level. Since every VM will mean multiple LUNs/Files, the storage vendors will not have to implement anything different except the ability to Snap, Clone, Replicate multiple LUNs/Files representing the same VM at the same time. This will all depend on which vendor you are talking to.
I think VVols is a step in the right direction by VMware and it validates the approach taken by Tintri from the day one. It will finally enable some VM granular operations that are missed by the traditional storage customers, provided the traditional storage vendors bring the object level functionality at the storage level.
As we have experienced with VAAI, every vendor will have its own implementation of VVols. Some will implement it just like a tick box and others as something that brings in real benefit. Even in the latter case, it’ll be hard for a traditional storage (All Flash or Hybrid) to deliver a true VM aware storage experience. A traditional or a multipurpose storage can’t bring in the focus that a VM aware storage needs.
How about my other Hypervisors?….
While VVols may bring in some sort of awareness at the VM level for vSphere workloads it won’t enable the same in a multi hypervisor setting. Ideally you would want the storage to be VM aware regardless of the Hypervisor being used (Hypervisor neutrality). In fact, use the same repository for multiple hypervisors with storage being totally aware of which VM in which hypervisor is generating which IO.
Why limit it to that? I would ideally want the storage to easily transition VM types from one hypervisor to the other while sitting in the same repository or even replicate from one datacenter to the other, changing the hypervisor type in transit to an external cloud in a Hybrid cloud environment.
In the end I would say – Just as Flash needs a storage designed from grounds up to take advantage of the power of Flash, a VM Aware Storage needs a grounds up design as well. A grounds up design, for it to deliver the storage intelligence required to not only bring the functionality and data services at a VM Granular level but also the Infrastructure insight required for delivering IO awareness, per VM resource allocation, sub millisecond latencies (even with spinning disks), true QoS with fair share, auto-alignment of workloads across Hypervisors, without requiring the IT team to learn storage in a world where they manage only the VMs, Applications/Workloads, Service levels and not the Storage.