Yesterday I was in a briefing with Kevin Brown (CEO of Coraid) we were talking about the latest news from his company when he said something like this: “in the past people managed small data in big boxes, now they are managing big data in [many] small boxes”. I couldn’t agree more, I dug into this statement and this post is the result of my thoughts.

The trend is looking watertight: vendors that are working on next generation storage architectures are building their products on few, but fundamental pillars. Commodity, strong parallelization/scale-out, Flash, ethernet, software and APIs. After the hype of SDN the “software-defined” buzzword is getting some grip in the storage landscape too and, perhaps, it’s not all that bad…

I’m not here to give you the right definition of software-defined-storage, I’m sure that you can buy it from well known super-analysts. I have already written in the past about enterprise storage commoditization (here too) and the future of storage in virtualized environments and now, that VMware and a bunch of startups, are officially going towards this direction, it’s time to elaborate more on this concept.

Commodity

Nothing new here. All the new startups and almost all tier 1 vendors are working on commodity x86 hardware. Some of them started building traditional architectures on x86 stuff (2 or more server acting as controllers connected to disk trays) since the beginning of last decade and now, that scale-out architectures are gaining a lot of traction, commodity x86 servers are a must! They are cheap, fast and the reliability is obtained from the software.

Scale-out

It looks like the only way to go, especially when we speak about football field sized data centers. All the next generation architectures, especially virtualized and cloud infrastructures, are targeting horizontal scaling approaches. When we talk of storage alone it’s the same difference but, even more so, the concept of “no-SAN” architectures is growing and storage is beginning to be encapsulated in computational nodes. These kinds of architectures aren’t designed to obtain 1 Million IOPS from a single node but it’s not hard to obtain more from many nodes stacked on a single rack.

Flash

SSD technology is becoming a key component in every modern storage architecture. You can find it in different form factors and implemented in different ways but it’s always present. End users are perceiving the performance benefits and ROI and they are the first to ask for it. Scale-out design strongly encourages the use of this technology: when we analyze the single node, or storage brick, we can easily find a two tier configurations based on flash coupled with SATA drives to obtain both performance and space in 1 or 2 rack units.

Ethernet

Well, you know, the efforts in pushing converged ethernet on storage is overwhelming and it’s obvious that, at the moment, iSCSI and NFS are clear winners. The reasons are easy to find: you can look at protocols like FCoE as an attempt to mix water and oil in the same pipe, it works pretty fine but you need a lot of machinery at the ends to separate them. In the end, iSCSI and NFS are easy to use, well known and widely supported and, let me say, they inherited some of the benefits introduced by FCoE to ethernet protocol.
Another reason of their success is linked to the nature of scale-out design: commoditization (again). In fact, iSCSI and NFS work on commodiy switches and the every mentioned “software-defined-networking” is moving the networking intelligence up to the server layer.

Software

How can you talk about software defined storage without mentioning software? in fact, software glues all the pieces together. Scale-out filesystems, synchronization and data replication between nodes, reliability and availability, features and integration with upper layers, are all made via software. This is the key point and the value. When you look at storage today you are already looking at the software that manages your data: deduplication/compression, snapshots, plug-ins, thin provisioning, and so on, are all software features and they are the features that describe the capabilities of your storage system.

APIs

This is the step forward in storage management and probably in all the infrastructure stack. The only way to manage huge software installations in complex environments is with APIs. Even more so, when we talk about cloud infrastructures and you need to automate and manage provisioning, SLAs and security. Actually, a good set of APIs could mean shrinking provisioning time from days to minutes.

Bottom line

“Storage is software” is not news but software defined storage is a further step toward the realization of the software defined datacenter. We will see a lot of stirring here in the next future…