Lately I have been writing a lot about the role of flash memory and disk drives in the future of storage infrastructures (here and here a couple of examples). And latest news is simply confirming that it will be all about Flash for primary workloads and Object Storage (and disks) for the rest.
Reality is that Flash is
quickly becoming already a no-brainer when it comes to IOPS per GB in primary data scenarios, but pure capacity is a totally different thing and it will continue to be deployed through disks for a long time yet. Even if you find a good deal, and flash price will continue to shrink, the difference in terms of $/GB will still be huge in the next few years.
Your mileage may vary
$/GB is/was the parameter used in the past to calculate the cost of storage with traditional arrays. But things have become a lot fuzzier with all the modern data services and compression techniques.
$/GB could vary substantially from user to user and it often depends on the type of data you are storing, its compressibility and other factors strictly related to overall efficiency of the storage system and to how it is used.
Last week SanDisk launched a new box, a sort of “JBOF” array (Just a Bunch Of Flash). It’s amazingly dense and cheap (less than $2 per GB, they say) but if you add data protection (a RAID or a file system, like CEPH or ZFS for example), things could change. Just think about mirroring it, the $2/GB price tag suddenly becomes $4 per GB!
Just to make it easier for every one, I’m going to expose facts about $/GB and $/IOPS using the easiest possible way. I will compare prices that I found in a recent HGST price list. These are list prices for one single disk purchase (which means that the final price could be a lot less if you are an OEM or you buy in quantity). One of the prices comes from Amazon (and this is a consumer-grade product which means, also in this case, that the price can be much lower for the vendor/reseller).
These are the 4 products:
First of all, it is important to understand that we are comparing simple components (raw material), and any operation you perform on the front-end of an array produces several operations in the backend. RAID, block-size optimization, compression/recompression, deduplication, bus or CPU limits, and so on, limit the number of IOPS that are effectively seen by the hosts.
Let’s do the math:
[table id=1 /]
* considering a 50000 IOPS 4k random workload (which, again, aren’t the IOPS you obtain from your array)
It’s plain to see that SSD wins hands down if the comparison is about $/IOPS and IOPS/GB. But on the other hand, there is a great abyss between the 0.037-0.049 $/GB registered by SATA drives and the others.
And the loser is…
It’s also quite clear that the big loser here is the 10K SAS drive. It doesn’t come out the winner in any comparison and the technology won’t be developing that much in the future, results can only get worse over time.
At the same time, the consumer-grade SATA $/GB is almost 100 times better than Flash. Yes, 100!
To be fair, a consumer-grade flash disk (w/ SATA interface) is only 10x the $/GB of SATA, but even with a Y/Y price slash of 40% it will take 5/6 years to have comparable prices, without considering that we already have much more denser hard disks in the pipeline.
In the near future we will be seeing many more (petabyte-scale) storage systems using consumer-grade (low cost) SATA drives coupled with modern data protection techniques (like distributed Erasure Coding for example) which help to manage multiple disk faults and longer rebuild time. For example HDS, with its HCP S10, is already doing it.
Closing the circle (…and stating the obvious)
Flash wins in IOPS, HDD wins in Capacity, and this won’t be changing anytime soon.
All-Flash storage systems are cost effective because flash allows to use in-line deduplication and compression at virtually no cost. Their predictability is very high and it’s good for high-demanding primary workloads.
At the same time, hybrid systems are still better than All-Flash when the demand for IOPS is not too high. In this case, the “IOPS/GB” shown in the above table/chart is an important metric. If you want to build a balanced IOPS/TB system you still need Flash+SATA. This is why most SMB customers are still going hybrid and companies like Tegile and Nimble are doing so well.
Last but not least, high capacity systems will continue to be nearly All-Disk for a long time. And I say nearly because today most object storage solutions use RAM for metadata and cache. Tomorrow, with very high capacity disks, it could be cheaper to store some information on a relatively small SSDs.