A year ago (maybe more) I was in a meeting with a storage startup. We were discussing their product architecture when I expressed my concerns about the inapplicability of their product to medium sized environments because it was designed to manage Multi Petabyte infrastructures. The answer I got, “who doesn’t have a PetaByte today?!?”, left me somewhat puzzled although I made light of it. 😉
I know, if you live in the Silicon Valley Petabytes are peanuts, but for the rest of the world a Petabyte sounds like a lot!

And I was wrong

Just a few days ago I visited a client of mine and we tried to count all the storage he has under management. We found more than 500TBs of managed storage and this number takes into account only the primary (local) sites.
We calculated almost everything: primary and secondary storage, backup repositories, DR site and so on, but we left out the remote offices in foreign countries (not a lot, but each one of them has a little infrastructure) and tapes. This end user doesn’t have a MDM platform at the moment, so mobile devices (and notebooks) are out too (not all that big amount of data today, but you know…)

1TB per employee and counting

This organization, in the manufacturing sector, has more than 500 employees. This actually means 1TB per employee under management and much more to come! (I was there to discuss a new storage project)
This enterprise has experienced most of its data growth in the last three years and it is not seeing a slow down at all. On the contrary, thanks to a new generation of connected machines they are producing right now, data collection issues and future Big Data projects are just around the corner.

There’s more! A shrinking budget (the Italian economic recession doesn’t help) and a new market landscape require performance, agility (or, at least, faster provisioning) and cheaper solutions.

About the medium enterprise

A medium enterprise is “a company with less than 250 employees”. Taking for granted 1TB/employee today, and the exponential data growth we are experiencing, it’s not difficult to predict a rapid increase of the TB/employee ratio in the next few years. It could happen in four/five years from now maybe but the 4TB/employee is just around the corner. Even if we calculate a conservative increase of 30% yr/yr you will get a 4TB/employee by 2019.

And there you go! A 4TB/employee is 1PB for a company of 250 people.

About the Petabyte

As I mentioned above, that Petabyte is not all for production purposes, it’s the grand total of “somehow managed storage”. It’s composed by a small amount of production data (probably around 10%, which includes primary and secondary storage) and repositories of any kind like backups, DR replicas, long term archives, and so on. Most of those data are inactive, some with a value, some not.

Why it is important

This article is not the result of a study or a research made on a scientific basis, but I’m pretty sure that the organization mentioned here is not an isolated incident.
From my point of view one PB is still a lot, but you know, 1TB of data was a lot just a few years ago…

Once again, I think it’s time to rethink how to manage storage, even for smaller enterprises. Traditional approaches are no longer viable, at least not from the point of view of cost. On the flip side I’m not sure that end users (and vendors) are ready for this change…

Last but not least, the value of that PB.
It’s fairly simple to calculate the cost of that PB but I can’t say the same about its value.
Storing data like a packrat without extracting any value from it does not make any sense to me (Gartner calls it Dark Data). That’s it for now, but I have an idea about that, and Im going to share it with you soon (in the mean time take a look at this). Stay tuned!