This is the second article of a short series about Object Storage (here is the fist one) with the intent is to give a brief introduction about each Object Storage player. As I mentioned earlier this is not a comparison between products but rather an attempt to give the reader the idea of the number of solutions available and how they are positioned.
In the first article I talked about Amplidata, Caringo, Cleversafe, Claudian. Today I’m going to write about Data Direct Networks WOS, EMC Atmos, Hitachi Data Systems HCP, Inktank Ceph, NetApp StoreGRID, OpenStack Swift and SwiftStack, Scality RING.
Data Direct Networks WOS
I was briefed on WOS (Web Object Scaler) a few months ago and it strongly reflects the DNA of its company. In fact DDN was born in 1998 and is very focused on high performance storage solutions for applications like HPC, Oil&Gas, media&entertainment, Genome sequencing and so on.
WOS architecture is very flexible and allows for configuring the cluster in different ways with regards of data protection and replication. At the same time, contrary to many other object storage systems, data are directly written on disks without limits and constraints introduced by the filesystems. WOS is usually sold as a hardware solution based on very high density 4U appliances connected with 10Gb Ethernet or infiniband. WOS provides access via APIs as well as file gateways alongside an integration with others DDN products. On the flip side the partners ecosystem is not one of the most crowded.
This solution is geared to the very high end and to object storage infrastructures which require scalability and very high performance.
EMC Atmos
EMC, the worldwide leader in the storage industry, which practically started the Object Storage market many years ago with Centera: despite all its limitations, this product was adopted by many ISVs. Atmos, the current product, is much more open and with a modern design.
EMC has built an interesting ecosystem of partners and solutions around Atmos that makes it very suitable for most enterprise needs.
Atmos can be viewed as very general purpose Object Storage system that doesn’t bring the best in performance or in scalability but, thanks to a certain level of flexibility, can be easily adopted and managed from medium to large sized companies. In fact, the architecture allow deploying different kinds of configurations and has features like erasure code; on the other hand it has a long way to go from the scalability and performance of high end solutions.
It can be ordered has a hardware product or as a VSA, the latter is the best option for starting with limited quantities of data and concurrent connections.
Hitachi Data Systems HCP
HCP (Hitachi Content Platform) comes from the acquisition of Archivas in 2007. Fortunately, HDS has left the development team almost intact and HCP is now an important part of HDS’ strategy for content and file solutions.
The product shows a conservative approach under many aspects but it also has some interesting functionalities which are well suited for the enterprise usage (Single file instances is one of those I like best). HDS has been working hard to extend the solutions portfolio and the partners ecosystem. At the end of the day this is the only product on the market that can offer a complete set of end-to-end solutions from the object store to the gateways, all directly supported by a primary storage vendor. Management tools, as usually happens with HDS, should be revised and simplified.
HCP is available in three different configurations: VSA, physical appliances with local storage and physical appliances on top of HDS block storage. HDS also offers a sort of “HCP as a service” in some countries (directly or through certified partners), giving the end user freedom of choice on how to implement its object storage infrastructure (private or hybrid).
Inktank Ceph
Ceph is an open source distributed Object Store and File System available on linux platforms and Inktank is the company that provides commercial services and support.
Ceph architecture has a brilliant designed and an interesting roadmap (although I might check the latest updates). The implementation is still immature under the “enterprise” point of view but it is rapidly improving.
Theoretically speaking this is something very far from ordinary enterprise environments but in the last year I spotted Ceph clusters in a growing number of data centers for basically three reasons: it’s cheap, it’s resilient and it scales (no small thing, indeed!). Almost 100% of the installations are small clusters (less than 10 nodes, hundreds of TBs) and they are used for storing the type of secondary data that don’t have enough value to be stored in primary storage systems but no one wants/can delete them (logs, sensors readings, secondary backups, and so on). Sometimes the cluster was moved from development/test to production just because it works! At the same time these clusters are still available to internal developers and some of them are figuring out how to use them more intelligently.
Ceph is compatible with Openstack (Swift+Cinder) and Amazon S3 but then you need third parties solutions to complete the infrastructure.
NetApp StorageGRID
StorageGRID comes from the acquisition of Bycast in 2010. After the acquisition, probably due to a lack of a precise strategy, the product almost disappeared for a couple of years, when suddenly a new version (9.0) came out with S3/CDMI support and a few new features in 2012.
StorageGRID is a software product but NetApp can offer a complete solution bundled with servers and its storage systems. The partners ecosystem is limited (most of the partners where present before the acquisition) and there aren no end-to-end out-of-the-box solutions like the ones you can find from other major vendors. StorageGRID also offer basic NFS/SMB services to ingest files but this feature is not intended to swap out traditional NAS filers.
NetApp doesn’t seem to be pushing it much, which is the biggest problem with this product and, consequently, the risk for the end users.
Openstack Swift (SwiftStack)
Swift is a open source projects which is part of Openstack cloud management platform. SwiftStack was founded in 2011 by some of the Swift project leaders to develop, deliver and support a commercial version of Swift.
The architecture design is similar to what I wrote previously: front-end nodes for managing accesses and IO operations coupled with a back-end of high-capacity nodes that provide space. The product is also receiving continuos updates and features improvements.
Maturity issues aside, the SwiftStack is gaining a lot of attention (also thanks to Opestack) and partner ecosystem is quickly growing.
Scality RING
Scality is a startup, founded in 2009, that is developing a next generation software-only object storage solution called RING.
The modern architecture of the product allows end users to configure the cluster with different data protection mechanisms (multiple copies or erasure coding) as well as different data replication policies and geo distribution.
Scality has already demonstrated very high speed configurations (based on SSDs) and policy based automatic tiering capabilities. For example, these two together allow the customer to have a single solution capable of serving frequently accessed data and long term archiving. The architecture is very flexible and scalable targeting large multi Petabyte installations.
Scality RING can be accessed via proprietary and standard APIs (CDMI and S3 for example) but also through many connectors and gateways: scale-out FS, NAS protocols, OpenStack Cinder for block storage, HDFS and it can be directly used as a mail server storage.
Scality has a very clear strategy and a growing list of partners who are building a good ecosystem of hardware and software solutions for large ISPs and Enterprises.
Why it matters
Object storage brings several benefits to the enterprise and can easily help with TCO when the numbers get big. The bigger the amount of unstructured data you have to manage, the more Object Storage makes sense.
The foremost issue is dealing with data growth management, but storing data without having a clue of what’s going on, makes the data management problem even bigger. Object Storage has a totally different approach and addresses the problem by adding value and new opportunities:
TCO: A well-designed Object Storage platform and its gateways could be much more cost effective when compared to a traditional storage system.
Private cloud enabler: Centralization, consolidation, self/auto provisioning, scaling, abstraction, access methods (and so on) are all at the base of a private cloud infrastructure.
Dark data and Big Data: As a consequence of the first two points, the enterprise object store becomes a huge repository. It’s only a matter of finding the tools (and the ideas!) that suit you at best to bring out value from that!
As mentioned in the previous article, I’ll be attending Next Generation Object Storage Summit next March in Los Angeles. Please, don’t hesitate to drop me an email or a DM if you are going to be there and want to chat about objects!
And, as always, every comment is warmly welcomed!
The maturity and breadth of Hitachi’s HCP along with the recent Verizon deal have made it the solution we are most seriously considering.
I am interested by how did you come to the conclusion that Hitachi’s HCP is the solution? How did you compare HCP to Scality Ring as for instance.
RING looks like it is going to be a viable platform someday. Perhaps even someday soon, but certainly not today.
Undertaking a transition to object storage will generally be a high profile high risk IT endeavor. Once RING has years of proof in the field being able to deliver not only enterprise class reliability, but also enterprise class global support THEN they will be worthy of consideration against products with such track records today.
It took over 7 years of vault solid real world reliability in production environments for the best architecture in enterprise class block type storage to finally start getting serious consideration for tier 1 jobs. If Scality can provide a similar track record in the future they will deserve the same success in object storage.
Thank you for your explanation, however looking at their very flexible coding erasure technology while HCP still use RAID 6 ( if I am not mistaken), I believe that enterprise class reliability should not be a concern. RAID 6 was not designed for Petabytes scale since it has significant vulnerability to data loss due to lengthily rebuild process. As far as I know, Scality customers never experience data loss and down time even though when their underlying storage servers have been totally replaced ( non disruptive hardware and software updates)
Regarding performance of tier one storage, tri-replication combined with SSD provide high performance and scalability without the drawback of RAID 6 lengthy rebuild process. GigaOM highlighted Scality object storage performance, scalability and cost benefits ( much less maintenance).
Regarding enterprise class global support, their technology requires much less support compared to more traditional HCP approach. However, IDC placed Scality in the Object Storage leader group, and Customer service was one of the selected criteria.
Regarding success in object storage, Scality just grew 500% in 2013. SGI, Penguin Computer are bundling Scality technologies with their storage servers. Hundred of millions users are using their technologies now.
All in all, I believe that Scality also deserve serious consideration when looking for object storage in addition to more traditional object storage players. Serious technical and financial appraisal should be undertaken in order to help making the right decision. There is no one size fits all for object storage.
As I stated RING looks good; really good even. Also as I stated once they prove global field support is enterprise class vault solid that will get them in a LOT more doors. Nobody proves that overnight, but they are on the path certainly.
HCP is not necessarily RAID6 or any other implementation. It consists of dual replication at a minimum and can be deployed with up to 4 separate physical copies all on completely different underlying storage one of which can even be tape. As much or little of that can also be flash from whichever vendor you prefer.
None of this is a knock on Scality; they just need a few more miles behind them to start landing the big tier 1 accounts is all.
Also, I do not work for nor am I affiliated with HDS in any way. If such is not the case for you with Scality you should state such.
HCP may replicate two copies, but it still uses RAID as it’s recommended underlying data protection. Here’s a snippet from the Hitachi HCP whitepaper published May of 2013.
“Typically, the external SAN-attached storage uses RAID-6. Best protection and high availability of an HCP 500 system is achieved by giving each node its own RAID group or Hitachi Dynamic Provisioning (HDP) pool containing 1 RAID group.”
If you use their nodes based solution they use RAID-5 which is also a poor solution for obvious reasons. I mention this because most object based storage solutions are designed to scale to 100’s of petabytes.
Whitepaper link: http://www.hds.com/assets/pdf/hitachi-white-paper-introduction-to-object-storage-and-hcp.pdf