Flash has changed the way the Storage is Architected in the Datacenter. The fear of using higher capacity drives for high performance applications is the thing of the past. I remember having conversations with customers up for tech refresh and their concern around refreshing the hard drives supporting their applications. The use of Flash in different forms has taken care of those concerns. What we are striving for now is use of just two tiers in the datacenter – Flash (or equivalent) and SATA
Let’s look back at the history of Flash in the enterprise space in brief. The use of Flash started as dedicated SSDs for applications. Since this was a very expensive form of using Flash, vendors came out with ways to use it as a tier through Automated Tiering based on policies. Automated Tiering based on policies for data movement (Think HSM or Hierarchial Storage Management) brought some benefits but had its own challenges like the right data not reaching the right place at the right time. NetApp took a completely different approach, and introduced Performance Acceleration Module (PAM-1 using DRAM) which offered real time performance acceleration based on a technology that had a history of success – Caching. With PAM-2, NetApp introduced Flash instead of DRAM on the cards and later named it as FlashCache. Around the same time Fusion IO released its server side PCI based Flash Cards. Both these approaches were innovative and brought affordability to use of Flash. With Flash used as a cache, the data got accelerated in real time. By making this tier Deduplication and Clone aware, NetApp made sure that only 1-5% of Flash was enough to accelerate a large dataset. This was in 2008.
Fast Forward and today NetApp has one of the most complete stories around Flash. With Products like FlashCache (PCI cards) for accelerating at the storage controller level, Flash Pool for disk level acceleration, Flash Accel as server side cache and an all-Flash array for dedicated workloads. The storage side caching/tiering helps in reducing the latency and improving the performance coming out of ‘x’ number of spinning drives but it doesn’t improve the overall the performance that a controller can deliver. The advantage is that it helps anything that is hosted on the storage. On the other side, Server side cache not only accelerates but also adds to the overall performance as the server starts to contribute to the overall IOPS delivered. The server side cache helps only the applications running on the server.
Let’s talk about what I think about the future of these technologies. I personally think that an all-Flash Array will not find a wide use case in the datacenter for a long time. It will be limited to some dedicated applications. The majority of the applications in the datacenter will use Hybrid Arrays and Server side cache to get the right performance and latencies. Server Side Cache will have the most important role to play in the datacenter with the hybrid arrays playing a role of a capacity tier full of SATA accelerated by Flash.
But we are still away from that state. There is still a lot of innovation required on the server side cache front. Today, most of the server side caching softwares are designed as Read Only. This definitely helps some workloads since this accelerates the reads and offloads the backend array so that it can concentrate more on writes. The real benefit will be seen once this cache becomes read-write with server side cache not only enabling true write back cache but also ensuring only unique data goes to storage through use of global (server level to start with) inline deduplication and compression. This will allow storage to be sized for much lesser number of IOPs and therefore make it easier to use the backend shared storage only as a capacity tier.
I really think that this innovation has to be led and driven by the storage vendors for the primary reason of ensuring data coherency. Third party solutions could be quick to innovate but lack of backend storage awareness could mean a risk of serving stale data if for example, someone restores a snapshot on the backend storage. The server side softwares could be configured to flush all the data when it detects something like this. But this will result in a performance impact thereby exposing the capacity tier and breaking the architecture. At that time the caching/tiering on the storage side will save the day. So even with Server side cache, caching/tiering on the backend storage will play a key role. The storage vendors are in a better position to make the softwares not only aware of the backend but work with the storage side tiering to make the overall solution more efficient.
It has been a long post. So to wrap up I think innovation in server side cache is what will bring us to a stage where we will have only two tiers (Flash and SATA) in the datacenter. And the innovation has to come from the storage vendors unless the third party server cache providers can work with the storage vendors to make their cache deeply integrate with the storage solutions.