Some days ago Isilon (now an EMC company) announced a new feature, in addition to existing options, for their storage solutions: HDFS support. This is a great move that I like very much but with a little reservation.
Storage vendors are working hard to take a step forward by bringing data closer to the computational power. In this moment EMC is leading the pack (the other announcement they made last week is project lightning: a flash cache on servers
tightly integrated with their storage), but it’s clear that it is only a matter of time: I’m sure that other primary vendors will release similar functionalities soon.
Who is Isilon
Isilon develops a scale-out NAS solution based on standard x86 hardware and its OneFS operating system. The company’s products has gained a lot of attention in the last couple of years, especially in HPC and other high demanding storage environments.
Scale out systems are something relatively new in the enterprise storage space but many end users are beginning to look at these as a viable solution for their growing (exploding) storage needs. In fact, we can resume the advantages of scale-out storage with two characteristics: truly modular and scalable. So, from the enterprise point of view, you can start small and grow in relation to your needs.
What is HDFS
HDFS means Hadoop Distributed FileSystem. It is the primary choice when you want to develop and deploy a hadoop solution.
The HDFS architecture was developed on top of commodity x86 servers with hardware failures in mind. The result of this approach is very good resiliency (on common x86 nodes) but with a big drawback in space resources usage. HDFS was also designed to best adhere with huge data sets requirements like high sequential throughput, coherency and automated replication mechanisms.
What Isilon did
Isilon has simply integrated HDFS on top of its scale out NAS. The coupling is perfect because Isilon already has the experience (and the foundation technology) to do this. In my point of view, the advantages of deploying HDFS on OneFS, are many:
- Ease of use: (one of the problems of actual Hadoop deployments) by avoiding the design of a specifically designed HDFS cluster.
- General purpose: Isilon is a multi protocol solution which can help to solve not only Big Data problems but also other storage needs that your company is facing.
- Efficiency: The Isilon’s HDFS interface has the same efficiencies offered by OneFS (less wasted space but very good resiliency and performance).
I can’t say anything about the speed of this solution. I don’t know if it is similar to a native HDFS cluster but I’m pretty sure Isilon will have time to prove the quality of its HDFS soon.
It’s clear that EMC’s Isilon could be a very interesting choice for those enterprises that are evaluating big data solutions, it’s less complex than traditional ones and could solve big data issues and more!
Bottom line (and my reservation)
It’s clear that Isilon has hit the mark by responding to this new request from the enterprise market: Enterprises, not only the biggest ones, are approaching new ways to collect and analyze data, but they are scared by the complexity of traditional Hadoop clusters.
I started this post saying that Isilon has done a great job but that I don’t like it 100%. The reason is that they are supporting HDFS only with Greenplum HD appliances!
I can understand why they are granting their cousins a competitive advantage (delivering this interesting feature exclusively to them at the beginning), but I hope that they will open it up soon to other players.
I’m sure that the lock-in mechanism that the Isilon+Greenplum couple puts in place could hurt many end users.