What are you looking for in a new storage system? Performance, capacity, efficiency, data services? Well, from my point of view you should be looking for analytics first!
The right Analytics tool can easily save you money, time and make your job easier.
Jack of all trades, master of the infrastructure
It’s not only about storage, no matter what your job when it comes to IT infrastructures, nowadays you’ll likely need some sort of analytics to help you manage more and better. There are many reasons why:
– Most SysAdmins are Jacks of all trades now, and they need to have all the components of their infrastructure under control without spending too much time on it.
– Business requires IT to be pro-active and give more answers, faster.
– It is becoming really complicated doing any form of capacity planning: measuring trends, calculating capacity growth and giving the right sizing for new applications. The old excel file we used for years had to take into account too many variables to be realistic.
Infrastructures are different but, depending primarily on size, having a correct view of what is going on inside is quite a common problem… and it is fundamental when it comes to understanding if the entire stack works as expected.
It’s not only about troubleshooting
Statements such as: “I don’t actually know what I can move to a secondary storage system”, ” You’ll never guess what developers have built around a network shared volume!” or “I don’t exactly know what happens at 11 p.m., but that particular VM goes nuts!” are just some examples that explain how difficult it is getting the right information when you have to make a decision. And they are all decisions that usually involve spending more money or time!
Not all Analytics were born equal
IT infrastructure Analytics is not new. Some form of it has always been available. But it has evolved considerably and in many ways.
First of all, we now have many more resources to perform a lot more detailed analysis. This means that the number of collected data pints (logs, sensors, etc) are much greater than in the past, improving accuracy, quantity of metrics to monitor and allowing to manage longer timeframes. But there is much more.
If you look in detail at all the various implementations you can easily see that there are two big areas of interest: storage infrastructure analytics and stored-data analytics. In practice, it depends on what you are more interested in: the content or the container!
I’m not saying that there aren’t points of contact between the two but usually, if one vendor does something really good on one end you can’t expect to have a state-of-the-art analytics implementation for the other… and there are reasons for that.
Different types of analytics means different approaches
The easiest way to explain why it is difficult to analyze content and container at the same time is by using two startups Nimble Storage and DataGravity as examples. The first has one of the best implementations when it comes to infrastructure analytics, while the other is doing an amazing job with stored data.
InfoSight (this is the name of the Nimble product) collects 70-100 million sensors per day from each single array. Another important key point of InfoSight is that it also has hooks in VMware vCenter and can get information on everything from the actual disk up to the metrics of each single VM.
And, this is the important part, they are all sent to the cloud. In fact, all the magic happens there. All your data (meta data actually) are stored together with data coming from all Nimble customers building a huge Big Data repository.
This means that InfoSight has the ability to show detailed graphics of your system being compared with the rest of the installed base. The value of comparison is fundamental to analyze trends or to get tips about best practices for example.
At the same time, a cloud-based approach makes pro-active support much more reliable and efficient thanks to all the conditions that can be discovered before they become dangerous.
Nimble’s approach has been followed by many others (practically any startup that is building a primary storage system). There are some important reasons for building an analytics engine as such:
– It improves the quality of support services. Failure prediction is good for the end users but it is even better for the startup with a small support team!
– It does not impact storage system performance. We are talking about primary storage here, the system only sends data to the cloud and everything is computed without touching your original data volumes.
– The startup has an amazing, real-time feedback, particularly useful for quicker product improvement. The amount of collected data makes it easier to develop features by looking at what is measured in the field or on the users behavior (even when users can’t tell you what they exactly do with their systems!).
DataGravity has an opposite approach. It crawls your data (locally) to find information that is not easy to access in any other way. No matter if data is stored in a VM, files or Blocks, it has the ability to find patterns, single strings, discover user behavior, who did what and when, and much more. There is a terrific video with a demo of the latest features recorded during the Tech Field Day at VMworld that explains a lot of the potential of this solution.
In this particular case the system reserves parts of its resources (with second controller and particular disks organization) to do all its magic. In fact, doing all this activity on active data could affect front-end performance and at the same time, sending all your data to a cloud based Analytics engine does not make sense at many levels!
Also in this case, DataGravity is leading but it’s not alone in this space, and this approach has been followed primarily by some startups that usually propose large NAS/Object scale-out systems. The reasons behind this type of design can be found in:
– Having more control over data and information for security, auditing, regulation or policy reasons.
– Providing modern tools to users of your organization that resemble things they usually use, like a Google-like search engine. Searching your storage like you do with google is intriguing but some vendors are also developing APIs and specific query languages that will open the possibility to build interesting integrations with other applications.
– Building new data protection schemes based on the fact that some (or all) data is already copied in a safe part of the array for the analytics part… and you also have information on how they are effectively used and how important they are!
– And, also in this case, building new features by analyzing how data is stored and moved around.
Closing the circle
Taking for granted the quality of basic features, data protection and implementation of basic data services, the most important thing you have to look for in a new storage system is the analytics part. They are very different and even though you can find some overlaps, today it is quite difficult (if not impossible) to find a product that is able to give you best-in-class infrastructure and stored-data analytics at the same time. First of all you have to consider your needs very carefully and only then will you be able to find the right solution.
Cloud-based solutions like Nimble help to improve the overall TCO of your infrastructure (not only storage) while content analytics brings efficiency and cost-saving more on the business side of your organization.
[Wishful thinking here!] I have to say that, for both of the startups mentioned, I’d like to see the Analytics part separated from the storage system and to be able to collect/analyze data from other storage vendors. I know it’s the major feature for both of them but If I were a user, I’d like to have such powerful analytics tools for my entire infrastructure!
[Disclaimer: I used Nimble Storage and DataGravity as examples because I think they are doing a great job with their analytics tools, but you also need to know that I did some work for Nimble this year (a paper about InfoSight) and my expenses to attend Tech Field Day Extra (sponsored by DataGravity among others) were covered by the event organizer]