Improving Storage QoS Accuracy and Performance based Chargeback/Showback

A few weeks back, I wrote a series of blog posts (Part 1, Part 2, Part 3) on how Tintri simplifies chargeback/showback for service providers (SPs). With the release of manual quality of service (QoS) per virtual machine (VM) and the introduction of normalized IOPS, Tintri has made that value proposition for SPs even better.

Tintri Storage QoS

As we all know, Tintri is the only storage platform that has an always-on dynamic QoS service that ensures QoS at a vDisk level. As part of this new functionality, Tintri customers can manually configure QoS at a VM level.

QoS on Tintri systems is implemented on normalized IOPS (more on this below) and customers can configure min and/or max settings for individual VMs. The minimum setting guarantees performance when the system is under contention (when you are more than 100% on the Performance Reserves bar) and the maximum setting allows an upper limit on performance for the VM. The new latency visualization gets an enhancement as well with the support for contention and throttle latency visualizations that ensure that QoS doesn’t become a liability.

Screen Shot 2015-04-23 at 9.15.01 PM

If you want to read more about QoS, head over to the blog post here. There is also a great video posted here.

Normalized IOPS

Normalized IOPS are measured at a granularity of 8K by a reporting mechanism that translates standard IOPS into 8K IOPS. This helps create a single scale to measure the performance of various VMs/applications. So, in addition to reporting the standard IOPS per VM/vDisk, the VMstore also reports normalized IOPS for the VMs. So how does the Tintri VMstore report Normalized IOPS? Here are a few examples –

If an application/VM is doing 1000 IOPS @ 8K block size, the VMstore would report it as 1000 Normalized IOPS. Similarly the Normalized IOPs for an application doing 1000 IOPs @ 16K block size would be reported as 2000 Normalized IOPs. Taking a few more examples –

1000 IOPS @ 12K would be equal to 2000 Normalized IOPS as well (rounding off to nearest 8K)

and 1000 IOPS @ 32K would be reported as 4000 Normalized IOPS

Why use normalized IOPS?

  • As we all know, different applications have different block sizes. Normalized IOPS allows us to understand the real workload generated by various applications and help create an apples-to-apples comparison between applications.
  • It also makes QoS predictable. When we set up QoS using normalized IOPS, we know exactly what the result will be, instead of getting a skewed result because of the block size of the application.
  • It gives one single parameter for SPs to implement performance-based chargeback/showback. So, instead of considering IOPS, block size, and throughput, and then trying to do some sort of manual reporting and inconsistent chargeback/showback, the SPs get the measurements out of the box.

Let’s use an example to see how SPs can take advantage of the new functionality.

Screen Shot 2015-04-23 at 9.13.41 PM

In the above screenshot, we have three VMs and we can see the IOPS and Normalized IOPS for each of these VMs. If we look at just the IOPS, we would be inclined to think that the VM SatSha_tingle is putting the highest load on the system, and that it is 2.7x the VM SatSha_tingle-02. But if we look at the Normalized IOPS, we know the real story. The VM SatSha_tingle-02 is almost 1.5x of SatSha_tingle. This is also reflected in the reserves allocated by the system to the VMs under Reserve%.

In a SP environment, without the normalized IOPS, the SP would either end up charging less for SatSha_tingle-02, or would have to look at block size and do some manual calculation to understand the real cost of running the VM. But with Normalized IOPS, the SP can standardize on one parameter for charging based on performance and get more accurate and more predictable with its chargeback/showback.

Since Normalized IOPS are also used for setting up QoS, SPs can now guarantee predictable performance to its customers through implementation of min and max IOPS-based QoS. With Normalized IOPS, the SPs now have four different ways to chargeback/showback on Tintri systems: Provisioned space, Used space, Reserves and min/max Normalized IOPS. Each of these ways bring more accuracy and predictability to any SPs chargeback/showback model that directly affects their Profitability.

Cheers,

@storarch

Why all the modern Storage QoS Implementations are not good enough?

Storage QoS is starting to become a key functionality today. It is mainly driven by a drive towards increased resource utilization and therefore resource sharing. In the past, Storage would have dedicated RAID group for LUNs and the applications would have LUNs dedicated to it. This worked really well in terms of guaranteed IOPs and Isolation. As disk sizes increased and storage technology matured, we started sharing the drives amongst the LUNs and these LUNs were no longer Isolated from each other. Virtualization made the situation even worse because not only the disks got shared amongst LUNs, the LUNs themselves got shared by multiple workloads (VMs). This resulted in Noisy Neighbor problems that impacted these LUNs and Volumes based storage systems.

A few Storage Vendors have some form of manual QoS functionality built-in the storage OS. Tintri for example, from the beginning, built an architecture that enabled an Always ON, fully automatic, dynamic QoS at an individual vDisk level (think VMDKs, VHDs etc.) that ensured automatic storage resource reservation at the vDisk level (based on our built-in IO analytics engine) so that every vDisk gets the performance it needs at sub-ms response time. The architecture is designed such that a new vDisk gets its performance from free reserves available in the system. So, at no point another vDisk that needs more performance impacts an existing vDisk. The approach is different from traditional approach where one manually sets up QoS at a LUN/Volume level but is highly effective for IT organizations that don’t want to hand hold the storage system. Tintri is the only storage product out there that has an Always ON, dynamic QoS enabled within all its storage appliances.

Screen Shot 2015-04-06 at 4.42.02 PM

Having said that, setting up QoS manually does have its play in Service Provider (SP) Space as well as some Private Cloud implementations where the SP (Public or Private) doesn’t want to give everyone unlimited performance. These SPs want to be able to sell let’s say Platinum Service to their customers and do it dynamically on the fly without even moving the workload. So, coming back to my original point about QoS implemented by Storage Vendors today. Here are the reasons why I say they are not good enough –

Granularity Challenge

In today’s datacenters, workloads are Virtual and Clouds are not implemented without Virtualization. In these Virtualization enabled datacenters, dealing with LUNs/Volumes is a pain. LUNs were brought into the industry 30-40 years back, when the workloads were physical and we started using these LUNs/Volumes even with Virtualized workloads because thats what the storage systems knew. In a virtual environment, a LUN has multiple workloads running in it. Implementing QoS on LUNs has no advantage for virtual workloads whether it is being implemented for isolation or chargeback.  VVols would change it (only for vSphere) but there is still a long way to go there as VVols don’t support all the vSphere features and not all vendors have a practically deployable implementation.

The result is that VMs in a LUN/Volume end up sharing the IOPs limits set up at the LUN/Volume level and therefore end up interfering with each other.

The IOPs Dilemma

Storage QoS is implemented by IOPs. One can combine it with MB/s but only a few vendors allow you to do that. Usually, it is just one or the other.

Now here is the problem, IOPs can have different meaning based on the block size. If I am limiting a LUN/Volume to a 1000 IOPs here is what it could mean –

4K Block size means 4MB/s

8K Block size means 8MB/s

64K Block size means 64MB/s

The same 1000 IOPs can mean 16x more load on a system when looking at 64K block size Vs 4K. That is a lot of difference for a Service Provider to take into account when deciding the pricing for a service. In some cases even large number of small block size IOs may impact storage more than large block IOs. Now some vendors can combine the IOPs limit with throughput to get around this to some extent but ideally service providers want one unit to bill against and want a single scale to measure everyone. Microsoft’s implementation of Normalized IOPs is a great example of such a metric.

The Throttling Effect

Some Storage systems using QoS on LUNs have this problem to deal with specifically when it comes to hosts that have more than one LUN coming from a storage system mapped through a HBA. When one implements QoS limits on a specific LUN and that LUN tries to go above that limit, it gets throttled by the storage system. The IOs get queued up at the HBA level and at that point the host starts to throttle the IO to the LUN and it does that for not just the LUN in question but to all the LUNs coming from that storage, thinking that storage system is not able to take the load that it is trying to send. This makes it practically impossible to implement QoS at an individual LUN level without impacting other LUNs.

QoS_Throttle_fairshare

The Visibility and Analytics Challenge

Most of the storage vendors have QoS more as a check box with a very few real world deployments. The reason is that QoS is really complex to implement and there are more chances of getting it wrong than right. QoS has to be implemented like a strategy and across all workloads. The challenge is that once someone gets it wrong it is not easy to fix and requires involving Vendor Support teams to determine the cause. Some vendors sell Professional Services around this, which makes it a really expensive feature to implement.

The other point being that QoS itself can become the cause of latency either because of the Max Limits setup on a workload or because of the contention resulting from the cumulative Minimum guaranteed IOPs set up on various workloads exceeding the overall performance capability of the storage system. Ideally the storage systems should give more insight into QoS and its impact on various workloads so that if someone complains of latency or drop in performance, the IT team is quickly able to pin point the reason. None of the storage vendors provide advanced user friendly analytics for QoS today and that is one of the biggest inhibitors in terms of real world adoption of QoS.

To summarize, QoS offered by storage vendors today is not granular enough, it doesn’t have a single scale to measure or apply QoS guarantees/ limits, it doesn’t ensure performance fair-share, it doesn’t guarantee isolation and storage providers don’t have the necessary analytics associated with it to make it easy to implement and then troubleshoot QoS related issues. I think its time to address these challenges so that QoS can be widely accepted and implemented in the datacenter.

Cheers..

@storarch

The industry is validating Tintri – Another one comes through

Last few weeks have been great in terms of industry recognition of how Tintri has been approaching the storage problem for virtualized workloads.

First, VVOLs go GA and validates the approach Tintri took 7 years back with VMstore in terms of removing the boundaries around LUNs and Volumes in virtualized environments and come out with a product that delivered VM centric Storage Platform around 4 years back. The result is 4 years of product maturity (and 4 years of lead) based on real world deployments.

Now we have Pure Storage announce an integration with VMTurbo that allows customers to use VMTurbo in combination with Pure Storage to automate the movement of VMs from one LUN to the other based on various conditions including performance and latency.

What does this tell us?

Continue reading

FY16 , VVols and Tintri’s Financial Differentiation

vSphere 6.0 is GA and Tintri announced support for vSphere 6.0, VVols and VMware Integrated Openstack along with a plugin for vRealize Operations (vROPs) at PEX this week. We also finished our FY in January and will be off to our Sales Kick Off next week. FY15 was a great year with tremendous growth and record QTRs for Tintri. Tintri continues to lead the way with a product designed from the grounds up for both Flash and Virtualization. With vSphere 6.0 we would bring all the goodness that customer’s love about VMstore to VVols including some of the key differentiators that would separate us from the pack –

  • 99-100% IO from flash
  • VM granular operations
  • VM granular visibility and latency visualization
  • Per VM/VVol analytics
  • Automatics Per VM partitioning of Storage resources (Performance Reserves) based on the analytics
  • QoS and Performance Fair share at a VM level (there are a lot more exciting things coming in this space. Stay Tuned for an update on this….)
  • Latency Visualization across the infrastructure (Host, Network, Storage). We are going to add more to this in the coming weeks … stay tuned
  • 1M VVols per VMstore
    • With VVols implementation, a VM may need as low as 5 VVols and as high as 100s of VVols (with snapshot, clones etc.) and a 1000 VM install would require 10s of Thousands to 100s of thousands VVols.
  • VM Granular Automation
  • Ability to Manage, Monitor and analyze up to 112,000 VMs from a single pane of glass

The VVol race that the storage Vendors are starting now was won by Tintri four years back. If you would be evaluating VVols in the coming weeks/months, you should definitely read this blog here to understand what you should ask from your storage vendor when it comes to VVols.

Tintri from the beginning focused on using software to drive innovation and one of the key differentiators about the technology is its ability to deliver 99-100% IO from flash, which is driven by our software unlike all-flash vendors that use Brute Force to deliver performance. The advantage is that Tintri can address a broad spectrum of workloads at a very low cost.

Workload Breakup

What it means is that unlike an All-Flash solution where $/IOP is low and $/GB is high or a hybrid solution where $/IOP is high and $/GB is low, Tintri brings both $/GB and $/IOP at low levels without over depending on Space Savings (Dedup & Compression), therefore delivering a better $/workload at a very high density.

$/GB

Our focus on Virtualization continues to help us differentiate and bring new Virtualization & Cloud Centric functionality faster to the market. The result is a platform that is 5-10x cheaper on CAPEX, 60x cheaper on OPEX and highly scalable.

Cheers..

@storarch

Is VVol the solution to VM awareness ??

VVols is ‘THE’ solution to VM awareness for many.

Yes, we have been waiting for it for a long time now and we are still unsure about its whereabouts.

For those of you who want to understand why there is a need for VM awareness, there are a lot of blogs on this topic by some of the best in the industry. Stephen Foskett covered it in three parts – Part1, Part2, Part 3. Tintri has a great infographic explaining it on a page here and in a blog post here .

VVols is bringing in a new type of model which basically helps one define policies and data services at a VM level, getting granular than the current model used by traditional storage devices, which is at a Volume/LUN level and at the same time preventing the IO Blender situation to an extent.

In my opinion, adding VVols to vSphere is a great step by VMware but it is definitely only a small part of the solution. In fact, I think it is just an enabler and there is a lot that is needed at the underlying storage level to make it an ideal VM aware storage. Let’s dig more into this.

Continue reading

My first step into the Blogger’s World

Here I am, finally taking a step into the Blogging World. I have resisted the itch but I thought it’s time now to begin this journey. I have been kinda active on Twitter with micro-blogs, sharing my thoughts and sharing links to other blogs that I like. The decision to blog was mainly driven by the fact that sometimes I found the 140-ish character limit on Twitter too short to share my thoughts and sometimes I felt lack of good blogs covering various topics. So I thought rather than complaining, why not start one of my own.

You can expect topics covering Technology, Storage, Virtualization, Backup etc.

As far as the name of the Blog goes, I wanted some thing that could relate to what I would cover here. Being late the party, it was tough to grab a name that was my first or second choice but Virtual Data Blocks is not bad either. I have started liking it since the first time I put it in.

So there it is. I wish that you like it and follow it. I’ll try my best to be regular.

See you soon..

Cheers

-Satinder (@storarch)