Yesterday Data Direct Networks has announced a new product, the GS7K Parallel File System Appliance. This is a GPFS based storage appliance aimed at storing huge amounts of data for the kind of organizations that want an easy to deploy and simple to manage solution with enterprise grade support. In fact, similar infrastructures can also be implemented with open source products and standard x86 hardware but the skills needed to design and maintain these environments at best are not available everywhere (and they aren’t cheap!). With this product DDN addresses a problem which is already present in larger organizations and that will be soon faced by large to medium sized ones.
New problems need new solutions
As I mentioned in a recent article, most of the traditional vendors don’t have the right products to compete at the Petabyte scale, especially when the $/GB and massive throughput are at the top of the list of requirements.
Requirements that are becoming ever more present in all kinds of enterprises. Above a certain level of capacity, problems can’t be solved with traditional architectures, and that’s because of scalability, cost, performance and longevity of traditional solutions.
At the same time, Data growth is massive while CPU needs aren’t growing at the same pace. It means that Hadoop-like (or hyperconverged-like?) clusters won’t be sufficient to store huge amounts of data and have it always ready for calculations. It doesn’t make sense neither from the technical nor economic perspectives.
Moreover, most of the organizations are storing data (sometimes just because everyone wants to keep everything!) but only a few have already planned a strategy to use all of it!
New solutions need new architectures
One of the things that I like the most about GS7K is the tiering functionality to object storage. This is very similar to what I described in my presentation during the last NGSS: primary storage is the front-end and cloud (object storage) is the backend. This kind of multi-tier architectures will become more relevant in the future enabling new interesting scenarios.
New architectures need new forms of computing
Tiered storage architectures are good for solving the capacity problem but having data locally is the key for speeding up computation.
New (1U) servers from primary vendors are ready to support next generation Flash Memory options (smaller form factor disks, Flash DIMMS, and so on) and next server chipsets will support huge amounts of memory (up to 6TB!).
Once again, from my point of view, moving data fast (from a huge storage repository to a hyper-converged infrastructure) and a strong caching layer will be the keys to obtain the best next generation architecture. If you only think that you can have different data-anlytics applications (for example, different departments can choose different solutions to manage their businesses), having a complete separation between data and CPU is much better than building an Hadoop cluster to store and analyze data.
Why it is important
Many organizations store data today and they will probably compute it tomorrow. Storing data in a cheap, reliable, durable and accessible location is a must for now, but throughput and recyclable CPU resources will be indispensable tomorrow.
Other vendors are working on (or already have) similar solutions: object (and cloud) storage is the back-end, some sort of high speed primary storage is its front-end and Flash/Ram-based-storage hyper-converged platforms will provide the CPU power.
I’m sure, we’ll be seeing more solutions like this sooner rather than later…