In the last couple of months I did some work for Red Hat on Ceph Storage. It was a good occasion to get updated on Ceph and do a reality check on how it is used by end users and why they are adopting it. Some of these stories are really compelling and say a lot about what is happening to storage infrastructures in general… but this is another story.
Ceph is highly successful in many fields now and the number of use cases is growing quite rapidly lately… and after the introduction of Red Hat Ceph Storage 2.0 the potential is even greater.
One of the fastest applications for Ceph is object storage, and this is where most of my work has been focused. By analyzing three different workloads and related data, I wrote 3 short papers (and one that summarize them all) to talk about the potential Red Hat Ceph Storage has in these areas. I hope you enjoy the read and find them useful.
Most of this work is an evolution of what I have been working on in the last two years, with the goal to look at all the possibilities offered by technologies like object storage.
Object storage and Big Data
Organizations of all sizes are experiencing a substantial diversification in the workloads served by their storage infrastructures. Consequently, requirements are no longer just about performance or capacity, but about how storage resources are being consumed. Mobile devices, Big Data applications, IoT are the most visible examples, they are drastically changing their access patterns to large scale storage systems.
This is why Big Data analytics and, more precisely, in-memory applications, are posing another big challenge to traditional storage infrastructures where performance, throughput and scalability are involved. A tough problem to solve, especially since HDFS, a file system specifically designed to massively scale, is not suitable for all kinds of workloads and file sizes.
New radical infrastructure designs are surfacing and becoming more common, with memory-based ephemeral storage or sophisticated caching mechanisms used as first tier and coupled with massive capacity-driven data stores at the back-end. In this scenario, object storage (which in the past was considered suitable only for cheap, durable and cold data archives), is in the right position to become the perfect foundation for this new class of storage. Now, throughput with scalability should not be an issue – S3/Swift APIs are supported by an increasing number of software vendors and HDFS/NFS/SMB gateways make it very easy to ingest data and remain compatible with legacy environments.
For object storage systems, the process of streaming data to the compute cluster is seamless, while storing original data sets or new results is less costly and more reliable than for any other kind of storage… and even more so now with the flexibility provided by cloud tiering – mechanisms implemented by most modern platforms.
Free download to the rest of this paper is available here. Enjoy the read!
Closing the circle
As mentioned earlier, lately I’ve been talking to end users about their experiences with Ceph. It’s interesting to note that, more than once, Ceph storage clusters started out from decomissioned servers and grew in capacity and importance over time… this is really cool and I’m very curious to know more now (You’ll see me a lot talking about “transparent storage” in the next months 😉 ). So, take this as an invite, and please leave a comment (or send me a message on a social network – LinkedIn, Twitter) with your story: why/how did you start and if you like it. Thanks!
If you are interested in these topics, I’ll be presenting at next TECHunplugged conference in Amsterdam on 6/10/16 and in Chicago on 27/10/16. A one day event focused on cloud computing and IT infrastructure with an innovative formula combines a group of independent, insightful and well-recognized bloggers with disruptive technology vendors and end users who manage rich technology environments. Join us!