Last year I wrote many times about Object Storage, Flash memory, Caching, and various other interesting technologies in the storage industry. And I also coined the term “Flash&Trash” to describe a trend that sees the adoption of a two tier storage strategy built on latency-sensitive Flash-based arrays on one side and capacity-driven scale-out systems on the other.
At times, I used a slide you can find here on the right (where the flash tier is collapsed into the compute layer) talking about possible scenarios with a huge distributed cache at the compute layer and object storage at the backend. At SC15 I got further confirmation that some vendors are looking into this kind of architecture. But this will take time…. Meanwhile, I think there are some interesting vendors that, coupled together, could help to implement this model very easily.
In this blog I’m just wondering about possible alternative solutions. I don’t really know if it would work in the real world but, at the same time, I’d be rather curious to see the results of such an experiment.
Why?
In large scale deployments, file systems are becoming a real pain in the neck for several reasons. And even scale-out NAS systems have their limits. Furthermore, if you add that now that data is accessed from everywhere and on any device, NAS is not a technology that can be relied on.
At the same time, file systems are just a layer that adds complexity and don’t bring any benefit if your primary goal is to access data as fast as possible.
Putting a cache in front of object storage could solve many problems and give tremendous benefits.
Ok, but Why in the enterprise? Well, even if enterprises don’t have these kinds of very large infrastructures yet (above I was talking about Big Data and HPC), you can see the first hints of a similar trend. For example, VVOLs are targeting the limits of VMFS while organizations of any size are experiencing an exponential data growth which is hard to solve with traditional solutions.
An example
I want to mention just an example (there are more, but this is the first that comes to mind). I’m talking about Cohesity coupled with a caching software.
If you are a VMware customer this solution could be really compelling. On one side you have Cohesity: scale-out storage, data footprint reduction of all sorts, integrated backup functionality, great analytics features and ease of use (you can get an idea of what Cohesity does from SF8 videos if you are interested in knowing more). You can also us it as a VM store but this is a secondary storage solution. I’m not saying the performance is bad per se, but the system could run many different workloads (and internal jobs/apps) at the same time and IOPS and Latency could be very far from your expectations. It doesn’t have a QoS functionality either and, again it could mess up your primary workloads.
If it wasn’t that Cohesity is an all-but-primary storage, it would have the potential to be the “ultimate storage solution” (I’m exagerating a bit here!). Well, it works for 80% of them? Anyway, you could fill the gap with a caching solution like PernixData or SanDisk FlashSoft (or Datagres if you are more Linux/KVM shop). Most of these caching solutions are very good and they actually do much more than caching now!
Reducing complexity and costs
For a mid size company this could be a great solution from the simplification perspective. A total separation between latency-sensitive and capacity-driven workloads/applications. The first would be managed by the caching layer while the latter would be “all the rest”!
I’d also like to do a cost comparison (both TCA & TO) between a Cohesity+Pernix bundle and a storage infrastructure built out of the single components…
Other interesting alternatives
If you don’t like having too many components from different vendors you could look at Hedvig as an alternative. It doesn’t have the same integrated backup features as in Cohesity but it is an end-to-end solution from a single vendor. In fact, if you look at its architecture, the Hedvig Storage Proxy can run on the hypervisor/OS (also enabling a distributed caching mechanism), while the storage layer is managed on standard commodity x86 servers through the Hedvig Storage Service. This is an interesting solution with a great configurability for both high IOPS and capacity-driven workloads… But, to be honest, I haven’t checked if it has a QoS mechanism to manage them at the same time, but I’m sure it is worth a look.
And, of course, any object store with a decent NFS interface could be on the list of possible solutions, as well as other caching solutions…
Closing the circle
As said previously… this is just an idea. But I’d like to see someone testing it for real. Coupling modern secondary storage prices (and features) with incredible performance of server-based flash memory, could be a very interesting exercise. In same cases, like for Cohesity for example, it could also help to collapse many other parts of the infrastructure in fewer components, aiming towards a more simplified infrastructure.
If you want to know more about this topic, I’ll be presenting at next TECHunplugged conference in Austin on 2/2/16. A one day event focused on cloud computing and IT infrastructure with an innovative formula combines a group of independent, insightful and well-recognized bloggers with disruptive technology vendors and end users who manage rich technology environments. Join us!
Enrico, the idea of having only “trash” storage and an effective cache is an interesting one. It’s similar to the message of storage virtualisation – use a virtual layer and some of your underlying storage can be cheap. One problem that has to be considered is the issue of the change in workload profile. If caching is installed, 99% of the I/O (if it is working effectively) will be writes. So now the external storage is having to deal with a very different workload. Imagine having to deal with almost 100% write I/O on your array, potentially in large bursts as the cache is flushed.
Then there’s the issue of how big the cache is; once the cache is exhausted, the I/O requests will still fall back to the external storage – at that point you’re back to your original scenario and need plenty of IOPS (potentially). So there’s a risk/cost associated with having too little cache – you may find the cost benefit overwhelmed if there’s a lot of active data.
On the positive side, this could of course work as a great solution for many companies, perhaps the smaller ones or branch offices.
Chris
Hi Chris,
great points. and this is why I wrote that it should be tested. In any case, I agree with you cache sizing and cache flushing could be issues.
See you soon at TECHunplugged and TFD10! 😉
Hi Chris,
I believe you are missing a very important point about the cache layer. It should not only be used to accelerate reads or writes but also to prepare IO before it is going to the ‘trash’ storage. The ‘trash’ storage, typically large SATA drives are pretty good at writing large data sets consecutively but are horrible at small writes (typical block size of 4k f.e.). Open vStorage f.e. uses a log structured approach on the SSDs where 4k (random) writes are transformed to 64MB files (all 4k consecutive writes to the volume) which get stored on the ‘trash’ backend. By using that cache layer you will essentially get better performance from the ‘trash’ layer. Especially with the new Shingled Magnetic Recording (SMR) drive having a layer which streamlines the IO is essential for getting reasonable performance.
The cache layer does than speed up IO. If you anyone would be interested, have a look at http://blog.openvstorage.com/2014/09/location-time-based-or-magical-storage/
Wim
Wim, that is a good point and one I didn’t address, as the discussion covered (in this instance) Cohesity. Now I see what you’re implying, however the risk there is that there’s an inability for an external array to manage large block I/O (or at least to act differently than expected). Having “cache on cache” could definitely be a problem of both performance and data integrity. However, I guess in this instance if you’re referring to writing to SMR drives directly, then yes, I agree that would be another additional benefit.
Did you already have a look at Open vStorage? It also supports caching at the compute host (with dedupe). It is open source.
Kid regards,
Wim Provoost
Product Manager Open vStorage
Wim,
I read about Open vStorage, but I admit I never had the opportunity to looked at it seriously.