By now everyone and his dog already made a post about VAAI, I would not bother you with an extensive explanation of what is VAAI and why it’s crucial to Virtualization, I will simply refer to a couple of posts that explain its current implementation in details:

My focus will be on how I envision to accelerate VAAI even more, enhancing its storage side.

To explain my point of view I will do an analogy with a common feature found in storage arrays today: Point-in-Time Copies.

Point-in-time copies (sometimes referred to as Snapshots) are a really valuable feature, they provide a consistent point in time of a specified Data set in order to perform various tasks like: backups, environment duplications and so on.

Traditionally PIT copies were made using a technique called Copy-On-Write which is suitable for a small number of PIT for a single LUN but its performance issues take their toll as soon as their first PIT is created, PIT copies concept was pioneered by IBM with its FlashCopy functionality.

NetApp innovated the approach to PIT copies using a different pointer-based snapshot technique, this almost completely eliminated the performance issue and made possible a massive number of multiple snapshots per single LUN enabling the complete potential of the PIT concept, this post explain how the Compellent Storage Center pointer-based snapshots works in detail, however this is not specific to Compellent, almost all the next-generation storage arrays (like IBM XIV, NetApp FAS, 3Par InServ, Dell Equallogic, HP Lefthand and many others) use the same approach.

So basically we have a great concept (PIT copies) but with most of its potential still locked by its implementation (Copy-on-Write) and then we have an innovator that enable its full potential with a clever implementation and I’m pretty sure that VAAI is still in its “Copy-on-Write” stage of life :-).

As you already know VAAI is implemented using an extended SCSI command set, Let’s take as example the most sought-after feature: the Hardware Offloaded Copy.
The hardware offload copy in my opinion can be accelerated to 100000x making all the cloning tasks a matter of few seconds, here’s how:

Keep in mind how a pointer-based snapshot works and bear with me with my explanation:

A 16GB VM sitting in a 128GB Datastore is currently accessed by an ESX host.

Then a VAAI-enabled Clone request is issued by the host, the storage array, instead of doing a real block-to-block copy, simply create a “map” of pointers of the cloned VM on another portion of the datastore, locking its space but without issuing a single block copy, this operation should take the same time as a normal snapshot: few seconds.

Then the host start to write to the new cloned VM and the delta differences are stored in the blocks locked by the “map” previously created.

A similar task can be already done today using snapshots, but it becomes cumbersome immediately because every clone needs to reside on its own LUN and datastore, this approach, instead, can be applied “inside” a datastore streamlining the deployments. Just imagine a VDI infrastructure relying on such cloning technique! :-).

I’m sure that storage vendors will try to integrate and innovate their respective VAAI implementations, I hope this post made you realize how powerful can be the still-evolving VAAI approach.