The Dilemma of Evaluating Storage for VMware

As a Consulting Systems Engineer focused on Virtualization, I get to meet a lot of Customers and Prospects that are evaluating storage solution for Virtualization, Cloud, Business Critical Applications etc. A lot of times, with so many options available in the Storage Industry with confusingly similar messages, this is how a Technology Evaluator looks at things.

Is this how you see it too? Believe it or not, this is the situation that some vendors work towards!

As we know, the devil is in the details. When a Technical Evaluator finds himself in this situation, there are specific questions that he/she can ask each vendor to get a clearer picture. I have tried to put a list below without a specific order.

Unified Storage

The most abused term today, even more than Cloud. Everyone has a product that they call as ’Unified’ even when there is nothing Unified about it. It’s like naming my dog as Lion and saying I have Lion as a pet. Questions to ask each Vendor –

  1. Does your storage run a single OS?
  2. Are all the features common across SAN & NAS?
  3. Do I get 5 9s availability across all protocols?
  4. Can I create processes in the datacenter that are common across all protocols?
  5. If I go from Low to Medium or Medium to High, do I have to change my processes and people?
  6. Does the storage offer Scale Up as well as Scale Out without putting any operational burden?
  7. Since the storage is the backbone of the datacenter, can I prevent both planned and unplanned downtimes?
  8. Does your storage allow creating secure logical containers similar to VMs on the compute side for cases where my customer or a department asks for a separate storage?

Deduplication

Deduplication is a great feature to have as part of the whole Efficiency Strategy especially in Virtualized environments where the storage is filled up with redundant data. Questions to ask –

  1. Can I enable deduplication in production?
  2. Is the deduplication done at the block level or file level?
  3. What impact does deduplication have on the performance of the solution?
  4. Can I enable it across both SAN & NAS?

Cloning

Cloning is another feature that can not only save space but also save on time and replication bandwidth (when implemented on primary). Depending on how it is implemented, it could be very useful or completely useless. Questions to ask –

  1. Can I do cloning at a Volume, file and block level?
  2. Can I do sub-LUN clones?
  3. How much time does a clone operation take?
  4. What impact do the clones have on the overall performance of the solution?
  5. Can I create clones on a Disaster Recovery (DR) destination for Test & Dev, Reporting and DR testing without breaking the Replication relationship (and bringing the Business at risk)?
  6. Does the cloning integrate with Applications like SQL, Oracle, SAP etc.?

Tiering

With the advent of Flash, Tiering has been the most effective way of getting its benefits while keeping the costs in control. The idea is to have the required data/blocks in the right place/tier at the right time and every vendor has implemented a tiering solution based on what works best for them. While tiering can really help in improving performance and latency, over-reliance on it or being too aggressive when sizing can be disastrous. Here are some key questions to ask your vendor –

  1. How easy it is to manage your tiering solution? Does it require complex Policies?
  2. Is it Real Time?
  3. What is the percentage of Flash & Spinning Media in the proposed solution?
  4. What is my worst case performance in the event the application working set goes beyond the size of Flash or if the Flash Media stops working?
  5. What is granularity of the tiering solution (this can have a big impact on 3 above) ?
  6. Has your company done any Public Benchmarks with Tiering turned on?
  7. Is the Tiering Functionality available across all protocols?
  8. Does the Tiering solution work with other features like Snapshot copies, Replication etc.?
  9. Is your Tiering solution Deduplication and Clone aware for both Space and Performance Amplification?
  10. What is your Tiering Strategy? Does it cover Server, Storage Controller and Disks?

Snapshot Copies

Snapshot copies have come a long way since they were introduced by NetApp way back in 1992 and have their own place in the datacenter. Consider this – Around 80% of storage failures are logical (DB corruption, Virus attack, user error etc.) in nature and 20% are physical failures (Controller, Disk, Power Supply etc.). In most of the cases logical failures can result in more downtime than physical failures and for these logical failures, Snapshot copies can be used to quickly recover a volume, datastore, VM or a file instantaneously. But only if the Snapshot copies are implemented by a vendor in a particular way. Questions to ask –

  1. How granular are the Snapshot copies?
  2. How many can be created and is there a performance penalty associated with them?
  3. How much space does it utilize? Can the snapshot copies be vaulted to cheaper disks for saving space on the more expensive primary storage?
  4. Is there any difference in implementation of Snapshot feature between SAN & NAS? Why?
  5. Do the Snapshot copies integrate with the replication solution to offer multiple recovery points on both Primary and DR?
  6. How much time does it take for one to restore a Snapshot copy?
  7. If my Snapshot space reserved is 20%, can I recover data if my changes go beyond 20%?
  8. Does it integrate well with various applications for application consistent backups?
  9. Does it integrate with VMware & vCloud Director?

Replication

Virtualization, Cloud or any sort of consolidation demands another copy of data. As the storage becomes the central repository with everything served out of it, any downtime at the storage level can bring down the whole datacenter operations. DR sites are considered as liability since they are just an investment waiting for a failure to happen, doing nothing consuming Bandwidth, Power, Space & Cooling. What a Technical Evaluator should look at is how he can make it more efficient to deploy a DR. Questions to ask –

  1. How Granular is the replication solution in terms of data sent for a single block change?
  2. How is it licensed? Is there a different license for different type of replication solutions?
  3. Does it integrate with applications for application consistent replication?
  4. Does it integrate with Deduplication and Clones on the Primary to send only unique blocks to the DR site?
  5. Does the solution offer native compression before sending the data to the DR site?
  6. Can I establish the baseline using Tape instead of doing it over the network?
  7. Rather than having an idle DR site, can I use the DR site actively for things like Test & Dev, Reporting etc.

VMware API Integration

Starting with vSphere 4.1, VMware started introducing APIs to let the storage do the job that they are good at. This allowed them to concentrate on what they were good at. Since then various storage vendors have implemented these APIs. The vendor implementations were very similar to start with but with the launch of vSphere 5, this started to change and some vendors started to offer more advanced VMware API integration which have big impact on overall TCO and performance.

  1. Which primitives within VAAI does your storage support for SAN & NAS?
  2. How do you differentiate in terms of VMware API integration as compared to other vendors ?
  3. Does vSphere API for Array Integration (VAAI) invoke features like Sub-Lun clones/File Level Clones for saving space and time?
  4. Does the storage integrate with features like View Composer Array Integration (VCAI) for View?
  5. Does the storage integrate with vStorage API for Storage Awareness (VASA)?

SRM Integration

Site Recovery Manager (SRM) is a VMware tool that helps in automating the failover at the time of a Disaster. It also offers some cool functionality to test the DR site for functionality and consistency with a single click. Here are some questions to ask –

  1. Does the storage integrate with SRM for both SAN & NAS?
  2. Does it integrate with the SRM test functionality to performance DR consistency tests?
  3. Does SRM Test functionality require any additional capacity for doing DR tests without breaking the replication relationship? How much?
  4. Does the integration support both failover and failback?

vCenter Plugin

vCenter Plugin is a way of getting storage functionalities natively available into vCenter. It  helps in empowering both VMware & Storage admins therefore making overall management easier. Again, almost all the vendors have a vCenter plugin but there is a lot of difference between what one vendor can do Vs the other. I usually suggest customers to get a demo of the plugin and see the plugins in action for comparison but the questions below should also be able to get you an idea.

  1. What functionalities does the vCenter plugin provide?
  2. Does it cover a VMs life in terms of Setup, Deploy, Maintain, Protect and Decommission?
  3. Which workflows does the vCenter plugin support?
  4. Does it enforce best practices?
  5. Does it allow to fix any best practice violations

The questions for any other hypervisor will be very similar. I would just replace VAAI integration with, for example, ODX integration in case of Hyper-V. And replace vCenter Plugin with SCOM/SC Orchestrator plugin in case of Hyper-V, XenCenter plugin for Citrix and oVirt plugin in case of KVM.

This is definitely not an exhaustive list of questions but can be a good starting point. Once you have answers to these, the table above will definitely start to look different.

Let me know what do you think?

Cheers..

Satinder Sharma (@storarch)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s