Last month I took a couple of trips to US to attend two different events and I recorded some interesting news.
1) HP, During its Discover, launched a new SSD Array and a new StoreOnce VSA. They also launched a hardware tape library capable of 560 slots at its maximum configuration (it means 3.5 PBs with LTO6).
In that occasion, I also had the chance to talk to some WD representatives about some products that they are working on (slower but more durable hard drives!).
2) On the second trip I met Pure Storage, a 100% flash array entrant manufacturer that has a very interesting strategy: they are not targeting the very high end market with their feature-rich product but they are addressing the more general purpose, and traditional, external storage market. In fact, due to the efficiency that Pure instilled to its array they claim a very good $ 5 per usable GB street price.
3) Small tape libraries are struggling while huge libraries are doing pretty well. This is due to many factors but, from my point of view, two are more important than others:
– SMBs find tapes sluggish and complicated. These companies are almost 100% virtualized, they do not need long retention and they want easy stuff to manage.
– Right now at the largest scale, tapes TCO is unmatchable.
And, eventually, an interesting discussion about open-compute with Scality‘s people helped me to connect the dots.
Will disk become the new tape?
The Hard Disk dilemma
How fast Flash will replace disks in general purpose enterprise storage arrays is not easy to predict but it will happen.
Disks, as we know them today, will be less relevant and they need to evolve to meet new needs.
If disks aren’t a performance solution anymore they need to become more effective in managing spaces at a lower cost. (indeed, industry is targeting to build 12TB hard drives for 2016!)
At the moment tapes have the lowest TCO in terms of $/GB: when a tape is stored in its slot it consumes 0 watts and has not moving parts! In fact, tapes are perfect for cold data, like in the case of long term archiving.
But tapes also have limitations, you need to reread them every now and them to be sure that data are still readable, migrations from a tape generation to the following one come at very high costs. Tapes are also not very well appreciated by object storage vendors: dispersed metadata management and space efficiency techniques (like erasure coding) are not easy to manage with tapes.
So, if my assumptions are correct, first issues to address are power consumption and usability. It could sound weird but this is the real problem: it’s hard to achieve a good power consumption and usability at the same time.
Even power consumption alone is not easy to address: for example you can slow the disk down but not too much because of the risk of scratches on the platters surface. The only way is to completely stop the disk and park the heads, the power consumed by electronics is minimal.
On the other hand, current hard disks can’t be accessed if they are stopped and they were not designed to sustain many start-stop cycles.
The (expected) next step
The first approach is primarily software. If you get a look at the Opencompute’s cold storage hardware project, you’ll find that there is a proposal to build a rack with a relatively minimal CPU and tons of disks. In the bigger environments you could think about a single powered up disk in each rack/node! If you have 100 racks then you have 100 active disks at a time (instead of tens of thousands), the software can spread data on those disks and when they are filled up you can stop them and turn on other disks.
The drawback here could be metadata management. In fact, you also need to manage a separate metadata repository (like a small SSD on the nodes) to be sure to get immediate access to metadata when needed. Obviously, when you need an object/file you have to spin up the relative disks. Potentially this is very easy to implement: it’s only a clever software. Perhaps, a further improvement could be building slower (but energy savvy) hard disks capable of managing as many start-stop cycles as possible…
[disclaimer: take the following paragraph with a grain of salt, this is based on second hand rumors] The second approach is to have a smart (hybrid) hard drive. If the traditional hard drive is equipped with a small amount of flash you can imagine to have data and metadata close to each other without needing a separate metadata repository. You can spin down and then turn off the mechanical part while continuing to have access to the flash. You could expect that SSD and spinning media be accessed as two different devices or even change their behavior via APIs if needed (for example, SSD could also be used as a big cache or for automatic tiering).
The latter option could result in a simpler software layer and could open the door to many other applications, but it’s more a wish than a real scenario at the moment.
In any of the described scenarios power consumption would drop. Disks don’t have tape issues like migrations, data rereads or even inapplicability of data efficiency techniques.
This is not a viable scenario for today, technology isn’t ready… but, given the rumors and info that I heard in the last months I think that sooner or later something will happen.