A couple of weeks ago I published an article about high performance object storage. Reactions have been quite diverse. Some think that object stores can only be huge and slow and then others who think quite the opposite. In fact, they can also be fast and small.
In the last year I’ve had a lot of interesting conversations with end users and vendors concerning this topic. Having just covered the part about “fast object stores”, again I’d like to point out that by fast I mean faster and with better latency than traditional object stores, but not as fast as block storage. This time round I’d like to talk about smaller object stores.
Talking to some of my colleagues (both analysts and bloggers), they say that object storage makes no sense under one Petabyte or so… But my point of view is that they are dead wrong! It all depends on the applications and on the strategy your organization is adopting. Let me work with examples here.
It all depends on the applications and on the strategy your organization is adopting.
It depends on the application
HDS was one of the first in the market to think about object storage as an enabler for cloud and data-driven applications and not just as a more affordable form of storage for cold data. They invested on building an ecosystem which is now very robust and seems quite successful with their customers.
Two pieces of this ecosystem are the remote NAS gateway and Sync&Share (HDI, HCP Anywhere in HDS nomenclature). HDS claims that more than 1500 customers are running HCP now and, IIRC, 400+ PB of on-premises storage under management. Just by doing the simple math (400/1500), this falls in the range of 260TB per user on average… without considering that some of these customers are really huge and use HCP for the traditional archiving/content management use cases…
I’m wondering how big HDS customers would be on average if I were to remove the first 10 installations in terms of capacity from the equation… and also how many of those 10 customers are actually using HCP for enterprise applications like Sync&Share. I would bet that those 10 are more in the xSP field, video content distribution, archiving, big data and so on. But this is merely speculation on my part… and I invite HDS to leave a comment if they want to add more to this.
Other vendors, like Cloudian for example, have a license that starts as low as 10TB! And I personally met some of their (happy) customers in the range of 100/300TB. These end users have embraced object storage for NAS gateways, file distribution, and lately backup. For each new application they add more capacity and more cluster nodes.
Caringo is another good example. They’ve always worked with ISVs and many of their customers are quite small. And now, thanks to FileFly they have a compelling solution for file server consolidation/remotization. This kind of solution is good for small and large customers and they are doing rather well with it. I was having a talk with them a few months ago and they were thinking about bundling the whole solution (Swarm+FileFly) in a package for the smaller customers (starting at around 40/50TB) because they’ve recorded a lot of interest in that range of capacity.
And I’m not saying that these vendors can’t scale or that they don’t have large installations. Large installations are the case histories you can find on their websites, the kind of installation that is much easier to publicize because it demonstrates the potential of your product. Need another example? Small and specialized vendors like Object Matrix have customers that start under the 200TB (many of them actually)… but on their homepage you’ll find one of the biggest!
Nick Pearce, one of the founders of Object Matrix, told me that most of his customers start very small (in the past the average deal was 60TB and lately -because of large disks I suppose- they start at 300TB ), and they grow up from there… his explanation is simple: less risk while taking advantage of scale-out architecture.
A customer of mine started working with Ceph a while ago and they are now implementing it in production on a 3 node cluster… 100TB usable. I’ve spoken to others in the last 6/9 months who are doing the same with clusters in the order of 100/500TB built out of decommissioned servers. Many of them use it just as a third tier storage for log archiving, secondary backup and so on. But it’s ridiculously cheap and reliable for them…
But you don’t have to trust me… So I asked a comment to someone who works with all the object storage (and NAS) vendors: Jeff Denworth, SVP Marketing at CTERA. CTERA, you probably already know that, is a cool vendor that provides some really interesting solutions such as Sync & Share and NAS gateways, among other fancy cloud-based backup solutions. They have 100s of customers (some of them are ISPs with several thousands of end users, but they are doing pretty well in the enterprise market too). When I asked Jeff to express his opinion regarding, he told me:: “I would say, of our customer pool, the large majority of them have under 200TB. But we’re also not the only use case they consider object for… so we become the first use case (gateways, sync and share, etc.) and then the customer immediately starts thinking about new use cases (backup, then DevOps, are most commonly the next to consider.” and even though CTERA has global deduplication and compression functionalities they are still in the range I’m talking about.
And again, I asked the same question to SwiftStack last week and they told me that the first installation for the majority of their customers is in the order of 300TB now. A capacity that grows quickly in time but still, they usually start small…
Did I mention the majority of object storage vendors here? Well, if not it’s because the article can’t be as long as a novel… but I think I gave you enough to think about small object stores, didn’t I?
But there is more
Some startups are working on smaller object storage systems intentionally. They want to build small object storage systems by design! (or better still, small footprint object storage systems)
Minio is working hard on an S3-comaptible object store that can run in a single virtual machine or a container. The product is open source and is thought up for developers. I think about it as the MySQL of object stores. And they are not alone, also Open.IO has a similar approach to building an object storage system that can serve single applications. The right back-end for developers of the cloud-era.
The idea behind this object storage system is that developers are asking for S3-compatible storage to build their applications. The small footprint is necessary to embed it within a container and distribute the application in the easiest possible way. But this also means that the S3 engine is very small and fast (yes, again, fast!), security is simplified and multitenancy is no longer a problem since you have an S3 repository dedicated to your application. For better or worse, the developer takes control of the overall “micro-infrastructure”.
For better or worse, the developer takes control of the overall “micro-infrastructure”.
You might think I’m out of my mind here… but in a few weeks time we’ll also be seeing Scality, an object storage vendor usually mentioned in very-large scale installations, announcing an interesting component that can also fit this use case. But I can’t say more at the moment, I think this piece of information is still under some sort of embargo.
Once again, we are talking about object storage systems which are intended for small data sets and single applications, with the ability to grow if needed.
Closing the circle
Thinking about object storage only for huge multi-petabyte installations is passé. Examples which support this are everywhere and most enterprises are choosing object storage not for its characteristics of durability or scalability but because they want to implement cloud storage systems with applications that take advantage of protocols like S3.
Even though I do agree that for smaller end users public cloud is a good option, for many of them there are good reasons for adopting an on-premises solution as well.
Storage is no longer about saving data safely and efficiently, which is now taken for granted, but it’s all about distributing and sharing it quickly and securely.
Storage is no longer about saving data safely and efficiently, which is now taken for granted, but it’s all about distributing and sharing it quickly and securely. This is a major issue if the organization is widely distributed and is leveraging mobile devices for its business activity. I realise I’m repeating myself here but, from this point of view, object storage can be considered a NAS 2.0.
And last but not least, with more and more developers adopting S3 and Swift protocols, we’ll be seeing a great deal more small (and embedded?) object stores around…
[Disclaimer: Yes, I did some work for many of the vendors mentioned in this post…]
If you want to know more about this topic, I’ll be presenting at next TECHunplugged conference in London on 12/5/16. A one day event focused on cloud computing and IT infrastructure with an innovative formula combines a group of independent, insightful and well-recognized bloggers with disruptive technology vendors and end users who manage rich technology environments. Join us!