Last week we attended a briefing with Nimbus Data about their next generation flash storage system. The new array is capable of running at the astonishing speed of 1 Million IOPS. This isn’t the first time that I hear “1 Million IOPS” from a vendor (TMS and Violin have also been publicizing it for a while) but the interesting point is that we are moving very fast from what we can call “science experiments” to real production products. There is a new interesting market segment of flash based high end systems and many customers are evaluating them for their T1 workloads. On the opposite side Amazon, last week, announced Glacier: a secure distributed object storage optimized for archiving purposes with 11 9s of reliability, proposed at 0.01$ per gigabyte per month!
This storage service is positioned as an alternative to tape archiving for those customers that need infrequent access to huge amounts of data stored for many years (or forever).

The trend is clear: big space and big performance requirements at the antipodes (it’s not news). But what is happening in the middle?

Big speed

End users are asking for low latency and high speed access to their active data. This is often a small subset of their whole data space but they need to process data at a much higher speed than in the past. You need informations to be competitive and you need them as fast as possible or, at least, faster than your direct competitor. The easiest way is to store and process data on flash storage. MLC (multi level cell) enterprise storage is no more a chimera and the $/GB is going down very fast while the $/IOPS is already more competitive than ordinary under utilized HDDs based storage systems.
If it is true that 1M IOPS systems aren’t for every one, we can also expect that many vendors playing in the range of hundreds of thousands IOPS will do very well in the next future. This kind of systems can satisfy performance (and space) needs of most enterprises at a reasonable price (especially if compared with old generation T1 arrays).

Big space

All data are important and all data need to be maintained for a longer time (actually, longer time is often a synonym of forever)! If you continue to stack up new data on old data, with long retention policies, the problem becomes huge: medical records are only an example of what I’m saying.
Amazon’s move to offer an archiving service is very clever. Tape technology is hated by the majority of end users and they are constantly looking to alternatives. Actually, at 0.01$/GB/month, it doesn’t look as a cheaper alternative but it is a first step. (BTW, I’m not sure that the comparison made here is fair, probably Glacier is more comparable to services offered by companies like Iron Mountain)
If backup software vendors will be smart enough to integrate Glacier as a backup target (indeed, some of them already has S3 integration), customers could be able to easily migrate tape archives into the cloud. Of course, it’s not for every one but I’m sure that many end users are eager to evaluate such an opportunity like this.

Big data and SMB are in the middle

When you want to process big amounts of data in a relatively small time you need throughput (more than IOPS) and space. This remains a sweet spot for HDDs and HDDs based subsystems. In practice the huge number of (SATA) drives can bring a sufficient amount of IOPS and a big throughput while the disk space is utilized at its best (waste of disk space is the biggest issue when you buy spindles for IOPS!). Big data scale out architectures are also very well suited to adopt small amount of flash on each node (e.g.: PCI flash cards) to improve performance in local data subsets processing.

On the same trail we can find SMBs: the adoption of hybrid storage systems is massive here. A bunch of new startups are doing very well in this segment and technologies like SSD caching and automated tiering are much appreciated from end users.

Bottom line

I think there are few points to remark here:
1) SSD popularity is growing faster than what was predicted by analysts in the last years.
2) Affordable enterprise (MLC) flash storage is already more competitive than HDD based systems for many applications.
3) Hybrid storage systems (monolithic or distributed) are attracting SMBs and Big Data because they are a good compromise between performance, space and price.
4) Tape is still alive but cloud is getting a little bit closer… And if it is true that Glacier uses custom low power/RPM SATA drives, will finally disks become cheaper than tapes in the future?