The best part of Clustered ONTAP – No Striped Volumes/LUNs

I had a great last week and the highlight of my week was a customer saying “This is too good to be true” after a Clustered ONTAP presentation and then the same thought crossing my mind during an ONTAP roadmap session. There are great things coming in not so distant future which I can’t share just as yet. You can surely expect NetApp to further raise the level of the newly created Scaleout Unified Storage Market just like it did with Unified Storage.

Coming back to what I wanted to discuss. One of the things that some customers/partners bring up about Clustered ONTAP is that if the NetApp LUNs/volumes are limited to a single node or if they are striped across the nodes.

Why would that matter? Here are some points I could come up with –

  • Striped Volume/LUN could potentially help in maximizing the performance
  • The nodes manage the performance amongst themselves so one may not have to worry about a node becoming a bottleneck

These are valid points. What impact do these have? Can these objectives be achieved through other ways? Let’s take a look.

Starting with the first one – A LUN/volume striped across multiple nodes can result in higher aggregate performance. This is an important consideration but if you think about it, an enterprise application running on shared infrastructure has multiple LUNs/volumes and the performance required by one of the LUNs will rarely outgrow that delivered by a single node. With older nodes/engines and CPU architectures this was an important consideration but today the nodes/engines can deliver a very high performance and this is not really one of those ‘show stoppers’ unless a vendor uses nodes that are fairly low powered.

The second point above is in my opinion more important than the first one. It brings in ease of management and removes any guess work. But is striping only way to achieve this? If this was the case then VMs in a Virtualized environment will not be confined to a single Physical Server. They would be striped across the Physical Hosts. So how did the Hypervisor vendors like VMware achieved the same objective? They started with enabling VM mobility, building the intelligence into the Hypervisor and then automating the movement of VMs based on CPU, Memory, IO and Capacity Utilization. NetApp has taken a similar approach with Clustered ONTAP. With DataMotion, customers can move LUNs/volumes non-disruptively from one node to the other and from one aggregate to the other. This can be done for load balancing based on controller resources, capacity or just for promoting a full application to faster disks for a sustained acceleration in performance that can’t be met by standard Data Tiering. The intelligence on when to do this is provided by OnCommand Unified Manager and/or OnCommand Balance. Lisa Crewe from NetApp has covered the ability of OnCommand Balance here. The good thing about DataMotion is that it takes all the policies (like Snapshot, SnapMirror, Dedup etc.) with it so one doesn’t have to reconfigure anything.

Now you may ask why take this approach and not take the traditional striping based approach. There are a lot of reasons for that. I have listed a few of them.

  • Not striping the volumes allows use of different types of controllers in a single cluster. So one is not limited to same type/size of the node. This allows one to start small and then add nodes based on requirement. These nodes could be current generation or the next generation
  • It allows isolation of applications to different nodes in the cluster. One can even dedicate a whole node to an application. Some nodes may be marked as High Performance nodes with SSD drives and used for running performance hungry apps or just for moving apps based on request by a department/ customer in a private/public cloud setup.
  • It doesn’t force one to upgrade software on all the nodes at the same time or restrict reverting back to the previous generation software.
  • It doesn’t force one to upgrade the complete hardware in one shot. Just as the nodes can be added in phased manner, they can be upgraded or removed in phased manner too. It helps in solving one of the biggest pain points of customers running traditional frame based arrays. It allows one to not only eliminate unplanned outages but planned outages too.
  • Failure of multiple nodes for any reason doesn’t impact all the volumes/LUNs.
  • It allows one to evacuate nodes for any maintenance. Think Maintenance Mode in VMware vSphere. The automation can be achieved by using NetApp Workflow Automator that is available to customers at no cost.
  • It also allows one to implement secure multi-tenancy based on hardware if a Vserver based isolation is not enough.

I can go on but I think I covered some of the major ones. At this point you may say how about the capacity use case? What if someone needs one big container to store files which is self-managed? This is a use case that is driven by explosion in unstructured data that requires a large file system which can handle billions of files/objects. This use case is covered by Clustered ONTAP through use of Infinite Volume. Infinite Volume can scale to PBs, is a self-managed Volume that distributes the content on ingest and has all the NetApp Data Management qualities (Snapshot, deduplication, SnapMirror Compression etc). You can get more details on Infinite Volume here.

So as you can see, the decision to not include striped LUNs/volumes for enterprise applications was a very calculated one. I love the fact that Clustered ONTAP solves the same problems in a different way, at the same time bringing in advantages that are unique but not new for people using Virtualized Servers.

Satinder (@storarch)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s