I’m not sure whether or not “data proliferation” is the right term to use in this case, but I think it gives the right idea about something that has been happening since the inception of data storage: copying and reusing data. This practice is not bad per se, but lately it has become an issue because of the quantity of data and the number of copies we make!

The severity of the problem is that

Multiple copies of primary data sets are made to fulfill many different user and business requests ranging from application development to analytics and sometimes, even backup or disaster recovery. Copying and then maintaining, data has high costs both in terms of infrastructure and management.
It’s not unusual to see production data copied many times (sometimes 8 or more), and the number of copies grows every time we discover a new way to extract other information from those data!
This actually means that for each TB of primary data (ERP, DB and so on) we have another 7/8 TBs more to manage. In many cases data are cloned and not snapshotted for various reasons: performance issues, data protection, storage system limitations and so on.
At the same time, in the effort of saving money, the user copies data on secondary, cheaper, storage systems contributing to an overall TCO growth due to the hidden costs of related management tasks: It’s like a dog chasing its own tail…

Big and bigger

The bigger the infrastructure, the bigger the issue. In fact, a large enterprise has many different applications, with different software/hardware stacks and consequently, many different underlaying storage systems. Managing copies between distinct storage systems could be stressful and time consuming. And I’m only talking about the basic problems that you face at the infrastructure level: sometimes, just thinking about organizational and process issues involved in the copy of mission critical data is a job in itself!

The solution

actifio logoA few weeks ago I got the chance to talk with Actifio. This company, an American startup, that has already secured $107.5 M in funding, produces a storage appliance specifically targeted to solve these types of issues.
The appliance makes a first copy of the data directly from the source using standard available tools (Oracle RMAN, Microsoft VSS, VMware APIs…), then it tracks changes of primary data to maintain the copy aligned and consistent with the original.
Once the appliance has copied the data it starts a continuos optimization activity through deduplication and compression. From this point on the user can take full advantage of Actifio by setting “virtualized views” of his data (r/w snapshots) and all the tiering, replication and retention policies.

Why it matters

In practice Actifio rationalizes and improves data copying activities, enabling a new efficient horizontal enterprise-wide platform capable of virtualizing data, add SLAs and backup/replicate them without limits and constraints imposed by the single storage vendor.
Actifio is not the only solution trying to solve this problem, it reminds me of Delphix in many aspects: another startup with a similar approach but focused only on Oracle and Microsoft DBs.
More in general, these kinds of solutions can have a huge impact in terms of efficiency and related savings, especially in larger and complex environments with legacy storage systems.