A few days ago I had an interesting chat with Andy Warfield at Coho Data and the topic of Network/Storage relationship came up several times. (Quick disclaimer: I’m currently doing some work for Coho)
In a couple of my latest articles (here and here) I talked about why many large IT organizations prefer PODs to other topologies for their datacenters but I totally forgot to talk about networking (I also have to admit that networking is not my field at all). So, this article could be the right follow-up for those posts.
Network is Storage
The problem is simple. In the past, primary enterprise networked storage was FC only. Topology, connectivity, protocol management and anything else were easier if not a non existing problem at all. Yes, at scale everything becomes more complex but still FC was more manageable than everything else.
Things got more and more complicated with the introduction of Ethernet fabrics. At the beginning it was just secondary storage, NAS repositories and other non-critical applications… but, year after year, Ethernet-based protocols matured, switches became faster and faster, and hardware was commoditized. Now most modern storage systems are going Ethernet first… long story short? Ethernet won hands down. NVMe over Fabrics? It’s on Ethernet!
And Operations have changed too, reflecting the new opportunity given by this new powerful commodity hardware and the continuous search for simplification. In fact, now you have many more less specialized SysdAmins than in the past. Many more “jack of all trades” who can understand and manage all the infrastructure components but they are not true specialists… and again, also in this case, consolidating on a single protocol/technology/wire has been much more convenient for everybody!
Storage is Network
But storage has also evolved a lot and has drastically changed in the last few years! Up to a few years ago the biggest problem in storage was performance. Wasn’t it?
The fastest (15K RPM) hard disk was (and still is!) capable of about 180/200 IOPS (and 4/7 ms of latency) and any sort of expedient was good for carving out something more from it… do you remember short stroking for example??
CPU and Networking were much faster already, and many software/hardware mechanisms were in place to mitigate what the real bottleneck of the entire infrastructure was.
Now you have all sorts of Flash memory. And there is more coming in the form of Memory Class Storage (aka Intel 3D Xpoint for example). Storage is no longer the bottleneck, now it can be outrageously fast and will be even faster tomorrow! In fact, not only the introduction of 3DXpoint alone but, in a very few years from now, 3D TLC (QLC?) memory will have a $/GB that will be comparable to SAS/SATA HDDs but with an IOPS rate that will be as high as MLC nowadays (for reads at least). In practice, this means that capacity and performance could easily come together from the same primary storage system… leading to massive throughputs and IOPS per single system.
Is networking the next bottleneck for storage?
It’s already happened in large scale infrastructures (Big Data for example) and it is already happening in other contexts.
If you consolidate capacity and performance in a single system, throughput can become unsustainable from the networking perspective, throwing the whole infrastructure off.
A traditional array, connected to a limited number of Ethernet ports concentrated in a few switches, could become a major problem. Even a scale-out storage array, if concentrated in a single rack and accessed through TOR Switches would have the same problem.
With traffic consolidation on a single wire, traditional three-tier network topologies are becoming more and more difficult to manage. Each single compute node can now host 10s, 100s VMs, and even 1000s is not that far-fetched, containers per server in the future and if we add storage traffic we can easily get huge amounts of data transferred per second to/from each single server. VM and workload mobility make things even worse and thinking about a single point of access for storage (a single switch in a single rack) is just crazy!
Some scale-out storage solutions (like HDFS for example) have already addressed this problem with data localization by putting chunks of data close to the compute resources that will probably use them. At the same time new flatter network topologies, like leaf-spine, are preferable to minimize the number of hops. In some cases, like for Coho, SDN is another key technology that can be implemented to boost overall infrastructure efficiency by putting storage nodes in different racks and distribute traffic transparently while having the advantages of data localization and a single logical point of access.
Closing the circle
Is it history repeating? In the 90s NUMA (Non Uniform Memory Access) got a lot of attention in computer design. Now, at the infrastructure level, it looks like we are facing similar problems with very similar solutions!
It’s always the same problem: how to design efficient and balanced systems (infrastructures in this case). Now storage is no longer the bottleneck and, thanks to different types of non-volatile memory, CPUs and Network are no longer wasting time while waiting for data… in fact, we are now starting to experience the opposite problem!
Here you can find a really interesting read that goes much deeper into analyzing the problem and painting possible future scenarios… enjoy the read!
If you want to know more about this topic, I’ll be presenting at next TECHunplugged conference in Austin on 2/2/16. A one day event focused on cloud computing and IT infrastructure with an innovative formula combines a group of independent, insightful and well-recognized bloggers with disruptive technology vendors and end users who manage rich technology environments. Join us!