Most enterprises are finally realizing that secondary storage is not that secondary after all. We’ve been talking for years about the exponential growth of unstructured and secondary data, and now that everybody is experiencing it in first person, it looks like many are unprepared and struggling to find solutions to contain costs, while offering a good service, to survive what looks like a data flood.
The commoditization of primary storage
Primary storage, usually associated with structured data, has been at the center of the stage (and budgets) for many years but things are changing quickly. Now that data is recognized as the most important asset for almost every type of organization, retentions are getting longer and capacities are growing exponentially when it comes to unstructured data. Infrastructure budgets are following suit and there is a shift in spending for storage types and services. In fact, even though primary storage remains more expensive, secondary storage is substantially bigger and is expanding even quicker. A few years ago, most of the analyses were pretty unanimous reporting a ratio between structured and unstructured data in the order of 20/80 but now it is not difficult to find infrastructures where unstructured data counts for 90-95% of the total!
At the same time, primary storage has been commoditized by new technologies like flash memory or software solutions which hide the complexity in the back-end (i.e. Vmware and, more in general, hyper-converged infrastructures). Flash in particular made performance consistency and low latency available to almost everybody and all vendors took advantage of powerful commodity hardware to implement features which a few years ago were available only on high end and expensive arrays.
The unsustainability of secondary storage
For years, secondary storage had the role of storing data of less critical value. I usually referred to primary/secondary relation as “Flash & Trash” for example. Large capacity repositories but relatively poor in the number and quality of features, like in the case of traditional object stores.
Lately, also thanks to laws and regulations, like GDPR for example, these repositories needed to be much more flexible, smart and efficient. Moreover, data is accessed differently from devices and applications compared to the past, complicating things even more. New reporting, analytics and data management features, unconceivable a few years back, are now in high demand from end users.
After the digital transformation experienced by organizations of all types and sizes, most processes changed and now we rely on unstructured data much more than in the past. For example, without having files properly organized, indexed and searchable, their value quickly diminishes over time. And other risks are associated to the lack of data control, including security, privacy violation, leaks and so on.
At the end of the day, it’s not only about storing safely the sheer amount of data we create, but what can be done with it while keeping costs at bay.
Making secondary storage smarter
$/GB remains one of the key aspects/issues of secondary storage, but now there are several solutions which are designed to make it smarter and more efficient, hence sustainable over time.
Cloud integration, multi-protocol access (with file/object parity), metadata index/search are all examples of the basic capabilities that modern storage systems should include.
A good example of a solution aimed at solving these kinds of challenges is the one from Komprise. This is a hybrid-SaaS solution (with all the management and analytics engine up in the cloud, and a relative small set of VMs installed locally to manage data movements across storage systems).
The goal of the solution is dual. On one hand, Komprise looks at all the file storage systems and analyses access patterns, data composition, and many other aspects. On the other hand, it uses this information alongside user-defined policies, to:
- move data to the right place (cheaper object stores for example),
- give the user a better view of his data and perform strategic planning accordingly,
- manage data migrations,
- improve remote data replication for disaster recovery,
- Make data copies to the cloud for analytics workloads.
Once again, this kind of solution addresses the challenges coming from data control and management in large scale environments while reducing infrastructure TCO.
Komprise installations range from a few hundred TBs up to several Petabytes and, one of the aspects I like the most is that it doesn’t require local agents installed on the single Filer or PC. It’s simple, neat and efficient.
Here are a couple of videos from the latest Storage Field Day event where you can find more info about Komprise’s internal design and overall architecture. They are worth a watch.
Closing the Circle
Komprise is just an example of a smart solution aimed at transforming a huge problem into an opportunity. One of the many indeed, but in this case, there is no need to buy a new storage system or change the way users and applications store and access data. All the magic happens in the backend, seamlessly, making it easier to test and adopt it in production.
Secondary storage is growing quickly, becoming a very important part of all IT budgets, and there is no reason to think that it will slow down any time soon. Most organizations are moving from hundreds of Terabytes to Petabytes under management and they need solutions to face new challenges that can be introduced without causing disruption.
Disclaimer: I was invited to Storage Field Day 17 by GestaltIT and they paid for travel and accommodation, I have not been compensated for my time and am not obliged to blog. Furthermore, the content is not reviewed, approved or edited by any other person than the Juku team.
Well, unstructured data management is now unavoidable and knowing where to keep it for how long and for how much is the new storage imperative. I began a beta test of Komprise almost three years ago with the assistance of Mohit Dhawan and Krishna Subramanian from Komprise. They could not have been more helpful in working with me to connect Komprise with my Cloudian HyperStore demo cluster. My Komprise beta test moved more than a million files from a Windows Server 2012 R2 server to a Cloudian cluster. This was before Cloudian made a commitment to engage with Komprise as a solution partner and certify Komprise for use with Cloudian. Komprise is easy to setup and deploy. Running an analysis and creating a plan for managing your data using Komprise produces almost immediate results. Your data management plans can be tested and modified and you can model how much you will save in storage costs. The Komprise control plane in AWS never stores your data. Your data is always under your control. Komprise has what every organization can use to do a better job of data management. No need to “retrain” users where their data files are located. Komprise can scale to do its job in any size organization that makes use of SMB and NFS shares.