Object storage: why, when, where… and but.

In one of my latest posts I wrote about private object storage not being for everyone… especially if you don’t have the size that makes it viable… But, on the other hand we are all piling up boatloads of data and users need to access it from many different locations, applications and devices at anytime.

Object storage characteristics are ideal for building a horizontal platform for this kind of job, and sometimes it makes a lot of sense to implement an on-premises infrastructure even if dealing with smaller capacities (in this case, small means in the order of a hundred Terabytes).

If Object isn’t for you today… there are chances it will be tomorrow! In this post I would like to recap some of the benefits of object storage and share some ideas about where and how to start thinking about it… and more.

Why Object Storage

You can find many object storage solutions in the market now. Some of them can start out quite small while others make sense only if dealing with multi Petabyte capacities. And different architectures can contribute positively to manage smaller or bigger objects or specific workloads, but, at the same time, now they are all supporting a similar set of APIs for accessing data (with S3 APIs clearly winning over Swift at the moment).

In fact, support for S3 API is the first feature end users usually look for because it extremely simplifies the search for solutions at the front-end (appliances and applications). An object storage infrastructure has to be considered as a common horizontal platform to provide different services. In some cases the Object store can provide some of these services (scale-out NAS for example) but most of these services are implemented through external appliances or applications which leverage APIs.

Object storage systems commonly have some basic characteristics like multi-tenancy, security, geo-distribution, automatic data replication, policy based data protection, very high resiliency and reliability as well as high availability. Performance is not usually listed on top but, depending on the use case or specific implementation, it could be listed as well.
Infrastructures are built on top of commodity hardware and architecture design is usually distributed.

All these combined characteristics considerably drive down both TCO and TCA to approximately cents/GB. And this is another reason why you could be interested in it!

When and Where use Object Storage

The use cases where object storage can be a good fit are many, especially if your organization is developing new applications capable of leveraging it. But, in this post, I wanted to focus just on infrastructure and mostly on off-the-shelf solutions. In fact, many object storage end users start adopting Object Storage with traditional protocols or applications and then they add more over time.

If a similar strategy is not adopted (starting out small and growing in time along with the number of applications and capacity), object storage will just be a small isolated storage island and it won’t be worth the initial effort in the long term. In this case it could become more of a problem than a solution.

Back to possible adoption scenarios, NAS comes before anything else, it could sound weird but this is actually what many end users ask for. Every form of NAS! Traditional NAS, distributed NAS, scale-out NAS. And I’m not talking just about capacity, by decoupling capacity (object store) from the front-end (external appliance with cache and efficiency) it is possible to serve any kind of high performance workload without thinking about traditional issues of the classic NAS… like backup, DR, capacity management and so on. (a vivid example here comes from Avere Systems, which can serve even HPC workloads with just a bunch of its appliances at the front-end of any, even remotely deployed and high latency, object storage systems!)

Sync & share. S&S is one of the most common applications with an object store at the backend (Dropbox and all the others are based on object storage for example). This solution has many benefits, especially when there is a high number of remote offices and mobile workers in the organization. In this particular case, it’s not about the quantity of data but it is much more about keeping control over your data while giving end users the best in terms of data mobility (aka good user experience in a walled garden!)

All the rest of your data (including Trash). In this category you can find many different applications ranging from Active archiving to backup in various forms. In fact, the number of primary storage vendors supporting S3 API to make clones or copies of data volumes is increasing, as well as backup vendors that are now supporting Object storage as a target. In the same category, despite the application being totally different, some analytics applications are starting to take advantage of these kinds of repositories for storing data or, in the most advanced implementations, leveraging them for in-place data analytics too.

BUT

BUT… isn’t it too late for object storage?
If you take a look at the market, it’s easy to see that something is happening. A number of new startups are building next generation storage systems which can be considered an advancement of traditional object storage, especially for enterprise organizations… Look at Hedvig for example-

They have most of the characteristics discussed earlier in this post but they provide block, file and object interfaces as well as integrated, and advanced, data services in the same product. Sometimes, like in the case of Cohesity, these data services are particularily advanced and they are intended to replace traditional backup products with similar embedded functionalities.
We are talking about products which have some of the services discussed above and are ready for use without doing the integrations you usually need with traditional object storage products.

In some cases, when the Object Storage is a part of a larger ecosystem (like for HDS or DDN for example) we already have this situation but it is not really common at the moment.

Closing the circle

A horizontal platform serving many (secondary) storage needs is what I have been talking about for years now, and I’m glad it is finally happening. API-based Object Storage was the first building block, but now it seems it is going to be integrated in more sophisticated products capable to do what was previously possible with external appliances or software. And this means less complexity and ease of use which, again, will drive more adoption in smaller environments (not only hyper-scale!).

This new scenario is not ideal for many of the “traditional” object storage startups around, some of them are very specialized and their file/block protocols are not as good as the core part, while data services are non existent!
In the past year we have already seen some of them acquired but for others, finding an exit strategy will become harder and harder. All these new startups coming up with more sophisticated products (and vendors like Nutanix with scale-out storage), will be giving Object Storage vendors a hard time. The only way for them to keep up is to build better file (and block?) protocol interfaces and innovative data services as well, and the question is: will they be able to do that before it’s too late?

If you want to know more about this topic, I’ll be presenting at next TECH.unplugged event in Amsterdam next 24/9. A one day event focused on cloud computing and IT infrastructure with an innovative formula combines a group of independent, insightful and well-recognized bloggers with disruptive technology vendors and end users who manage rich technology environments. Join us!

Tim Wessels on 21/07/2015 at 4:03 pm

Well, considering that Object-Based Storage (OBS) is just gaining some traction in the SMB and enterprise market, it seems a bit premature to pronounce the incumbent OBS vendors lacking in vision. Newly funded entries like Hedvig and Cohesity, despite the DNA of their founders, have yet to prove themselves in the market. So, let’s not declare them the “winners” and all of the other OBS vendors the “losers” in the marketplace for object-based, capacity storage.

Things that matter in choosing an OBS vendor include the completeness of their S3-compatible API. Any OBS vendor can claim S3-compatibility if they implement only basic S3 commands. Some go beyond the basics and implement the intermediate S3 commands and just one, Cloudian, actually implements the advanced S3 commands, which means you can use any hardware-based or software-based AWS S3 solution and/or develop your own applications using the AWS S3 SDK.

Another distinction worth mentioning is whether or not the OBS vendor can tier data from their cluster to AWS S3 or Glacier. Note that the APIs for S3 and Glacier are not the same, and only one OBS vendor, Cloudian, can actually do this.

One last important criteria is support for both object replication and erasure codes for data protection. Most every OBS vendor does replication and erasure codes, but several of the more specialized OBS vendors, like Amplidata (HGST) and Cleversafe, only support erasure codes.

OBS is capacity storage that should be easy to manage and use. The value added to OBS are the solutions that make it useful, like backup, archiving, file sync-and-share, unstructured data storage from humans and machines, and big data. The largest “ecosystem” of third party value-added solutions is based on AWS S3-compatibility, not Swift or CDMI. Not all OBS vendors are created equal.

3 Comments

Tim Wessels on 21/07/2015 at 4:03 pm

Well, considering that Object-Based Storage (OBS) is just gaining some traction in the SMB and enterprise market, it seems a bit premature to pronounce the incumbent OBS vendors lacking in vision. Newly funded entries like Hedvig and Cohesity, despite the DNA of their founders, have yet to prove themselves in the market. So, let’s not declare them the “winners” and all of the other OBS vendors the “losers” in the marketplace for object-based, capacity storage.

Things that matter in choosing an OBS vendor include the completeness of their S3-compatible API. Any OBS vendor can claim S3-compatibility if they implement only basic S3 commands. Some go beyond the basics and implement the intermediate S3 commands and just one, Cloudian, actually implements the advanced S3 commands, which means you can use any hardware-based or software-based AWS S3 solution and/or develop your own applications using the AWS S3 SDK.

Another distinction worth mentioning is whether or not the OBS vendor can tier data from their cluster to AWS S3 or Glacier. Note that the APIs for S3 and Glacier are not the same, and only one OBS vendor, Cloudian, can actually do this.

One last important criteria is support for both object replication and erasure codes for data protection. Most every OBS vendor does replication and erasure codes, but several of the more specialized OBS vendors, like Amplidata (HGST) and Cleversafe, only support erasure codes.

OBS is capacity storage that should be easy to manage and use. The value added to OBS are the solutions that make it useful, like backup, archiving, file sync-and-share, unstructured data storage from humans and machines, and big data. The largest “ecosystem” of third party value-added solutions is based on AWS S3-compatibility, not Swift or CDMI. Not all OBS vendors are created equal.
- Enrico Signoretti on 21/07/2015 at 5:31 pm
  
  Tim,
  thank you for commenting.
  I agree with most of you say about OBS S3, and that architecture design/implementation counts. But I never talked about Winner or Losers, I tried to say that for many of the competitors there a new challenges now that these new products are coming around.
  - Tim Wessels on 21/07/2015 at 6:14 pm
    
    Point taken Enrico…I think one of the challenges will be for OBS vendors to natively support legacy storage protocols along with fully supporting the popular cloud storage APIs like S3, Swift and perhaps CDMI, and doing it all without external gateways sitting outside the OBS cluster. Today and for the foreseeable future, the cloud storage world revolves around AWS S3, and that’s where an OBS vendor needs to be good right now.