This is the first article of a short series about Data-aware storage (followed by a paper that I’ll be publishing later in March, sponsored by DataGravity).
The term data-aware storage is fairly new to our industry and its definition, as often happens, is not very clear and vendors have their own view of the world.
In my personal opinion
Data-aware storage means being able to analyze infrastructure and workloads as well as the data involved, giving a complete picture of what is really happening to your data while empowering several business, organizational and security processes.
And I’m convinced that the concept of Data-aware can also be applied to other infrastructure components.
A sum up of the challenges
Capacity growth. Not only are we dealing with larger data sets, but data retention policies extend much longer than in the past and many organizations are now adopting never-delete policies for most of their data.
Data and workload diversity. The number of applications and access methods have radically changed in the last few years. Now we have many more data types stored in a single storage system, accessed by a larger number of people and devices.
These problems are relatively easy to solve when they are treated individually, but the sum of the two introduces a new level of complexity and it becomes much harder to fully understand and control what is actually stored as well as to exploit the value of data and hidden insights. Furthermore, primary storage has to continue to deliver consistent performance while crawling through stored data for these insights.
The growing number of applications, different data sources, lifetime-long retention periods, and users creating and accessing data from anywhere and any device are heavily impacting the effectiveness of traditional data management and auditing mechanisms while increasing infrastructure TCO and all sorts of security risks!
More than just an infrastructure problem
However, infrastructure is only the tip of the iceberg. Some of the most critical problems are usually seen at the business level. In fact, managers are not aware of the real state of security, user behaviors, compliance and so on. And when they do have a clue of what is really happening, it’s because complex and expensive external solutions are in place. These software solutions usually offer a siloed view of data and limited access to it (through external agents deployed on servers and virtual machines), adding even more complexity, negatively impacting the performance of the production environment, and solving only part of the problem.
Having full access rights to data, without the ability to take advantage of it is another concern in many organizations. Providing the ability to search across the entire data domain, would improve business efficiency and competitiveness while enhancing many other organizational aspects. But, if it means building a specialized infrastructure, the cost could outweigh the benefit.
More than just saving data
Next generation data-aware storage systems can do more than just save data safely. In fact, they can be the answer to analyzing infrastructure and workloads as well as the data involved, giving a complete picture of what is really happening to your data while empowering several business, organizational and security processes. In fact, data security is a big concern for any organization now, and it has already been proven that traditional security mechanisms are no longer effective on modern attacks and data-leak prevention. Data-aware storage systems can easily help to mitigate this issue by enhancing security policies through automated search and discovery capabilities and by detecting and reporting unusual user behaviors.
In order to be effective, data-aware storage should have some basic characteristics:
The analytics engine should be seamlessly integrated with the infrastructure, easy to use and shouldn’t impact overall performance of the production environment.
Data insights and visualizations should be accessible to anyone in the organization who needs to analyze and leverage information coming from stored data.
It should be based on a no-compromise modern design with all the software features and integrations we have come to expect in traditional storage systems (like snapshots, remote replication, VMware integration, etc.)
Closing the circle
When well implemented, data-aware storage improves efficiency of the entire infrastructure by consolidating and offloading data analytics from the application level while enabling advanced and insightful information discovery that can dramatically improve both TCO and business intelligence.
In the next post I’ll be writing about different “levels of awareness”, talking about their scope and showing practical examples.