Big data lakes? Too many ponds, that’s the problem

In recent chats I had with end users and vendors I found a common pattern that made me think about Big Data analytics and how data is collected, organized and analyzed in many organizations. This is also, I think, an explanation for the slow growth of some Big Data companies and slower than expected ROI in some Big Data investments.  I eventually discovered some interesting things that I’d like to share with you in this post. The rise of data ponds (and broom closet IT) Instead of building a single huge data lake, many organizations, or departments/organizational units, prefer to...

Read More

Juku.beats: Cassandra, NoSQL and Datastax Enterprise 3.0 (w/ Jonathan Ellis, CTO of Datastax)

In this Episode I’m with Jonathan Ellis (co-founder and CTO of Datastax) and we talk about: – Datastax – NoSQL and Cassandra – What’s new in Datastax Enterprise v.3.0 Here the the full transcript of the episode: Enrico: Hi everyone and welcome to a new episode of Juku.beats Today, I am here with Jonathon Ellis founder and CEO of Datastax. Hi Jonathan, how are you? Jonathan: I’m doing well. Thanks for having me on the program. Enrico: Thank you for being here with me. Jonathan, the first question is about you and DataStax. I would like you to introduce...

Read More

Super Mega Hyperconvergence

I love the concept of Hyperconvergence, who doesn’t? An IT infrastructure built out of relatively balanced (and small) nodes contributing all together to a large pool of computing and storage resources, which can linearly scale just by adding more nodes. This kind of infrastructure, thanks to the latest advancements in software, has become very easy to manage and could be the answer to many different types of workloads… but, as I’ve already written about it in the past, not all of them. Sometimes, due to particular compute or storage needs it just doesn’t work out! Also market leaders, like...

Read More

Flash, Trash and data-driven infrastructures!

I’ve been talking about two-tier storage infrastructures for a while now. End users are targeting this kind of approach to cope with capacity growth and performance needs. The basic idea is to leverage Flash memory characteristics (All-flash, Hybrid, hyperconvergence) on one side and implement huge storage repositories, where they can safely store all the rest (including pure Trash) at the lowest possible cost, on the other. The latter is lately also referred to as a data lake. We are finally getting there but there is something more to consider. And it’s about the characteristics of these storage systems. In...

Read More
  • 1
  • 2


The views expressed on these pages are mine alone and not (necessarily) those of any current, future or former client or employer. As I reserve the right to review my position based on future evidence, they may not even reflect my own views by the time you read them. Protip: If in doubt, ask.