On saturday I was wandering around the storage blogosphere and I stopped by Nigel’s blog for a really nice article on Sub-Lun Tiering design considerations. In that post he was making the point that SSDs are not that reliable for writes (in a scenario where SSDs are treated almost as a “cache”) because of a issue called “write amplification” that sparked my interest in creating a post explaining in a technical-to-layman fashion, how a SSD drive works and what kind of magic trickery does inside its case in order to go that fast :-).

First things first: there are a number of different devices called SSDs today, but I will focus only on NAND-based SSDs that are drop-in replacement for traditional hard drives commonly found in storage systems and even consumer devices.

What is NAND?

NAND is a nonvolatile solid state memory. Nonvolatile memory has the capability to hold and store data even when the power is turned off, it stores data in a large array of transistors. SLC (Single-level cells) transistors can store one bit of data while MLC (Multi-level cells) can store two bits of data in each cell.

Compared to traditional NOR Flash memory, NAND Flash memory can pack a greater number of storage cells in a given area of silicon. This gives NAND Flash density and cost advantages over other nonvolatile memory. NAND achieves these advantages by sharing some of the common areas of the storage transistor, which creates strings of serially connected transistors (in NOR devices, each transistor stands alone). This serial cell architecture explains the device name: NAND (not AND) is the boolean logic reference to how information is read out of these cells.

SLC vs. MLC

Traditional, single-level cell (SLC) NAND Flash memory stores one bit of information per memory cell. This basic technology enables faster transfer speeds, lower power consumption, and increased endurance. Multiple-level cell (MLC) NAND, by comparison, stores two bits of information per memory cell, effectively doubling the amount of data that can be stored in a similar-size NAND Flash device, but that comes at a cost.

As you can clearly see from the table below, MLC cells are subjected to a shorter lifespan and worse performance when compared to SLC cells:

Features MLC SLC
Density 32/64mbit 16mbit
Bits per Cell 2 1
Voltage 3.3V 3.3V, 1.8V
Page Size 2 / 4 K 2 K
Erase / Program Cycles < 10.000 < 100.000
Read Performance 50 µs 25 µs
Write Performance 600 / 900 µs 200 / 300 µs
Erase Performance 3 ms 1.5 / 2 ms

Until last year, SLC were the only enterprise grade SSD drives available on the market (notably the STEC ZeusIOPS), but thanks to the technology advancements, MLC drives recently made their appearance in the enterprise market with a new standard called E-MLC (enterprise-grade multilevel cell).

How SSDs operate internally

SSDs are subjected to many quirks and nuances from their NAND cells, for instance, NAND must be erased an entire block at a time (an operation that takes nearly 2,000µs) and a write (or program) must be to an erased block.

This leads to a phenomenon called Write Amplification, because Flash memory must be erased before it can be rewritten, the process to perform these operations results in moving (or rewriting) user data and metadata more than once. This multiplying effect increases the number of writes required over the life of the SSD which shortens the time it can reliably operate. The increased writes also consume bandwidth to the Flash memory which mainly reduces random write performance to the SSD. Many factors will affect the write amplification of an SSD, some can be controlled by the user and some are a direct result of the data written to and usage of the SSD.

When data is first written to an SSD, the cells all start in an erased state so data can be written directly using pages at a time (often 4-8 KB in size). The SSD controller on the SSD, which manages the Flash memory and interfaces with the host system, uses a logical to physical mapping system known as LBA (logical block addressing) and that is part of the Flash Abstraction Layer (more on that later). When new data comes in replacing older data already written, the SSD controller will write the new data in a new location and update the logical mapping to point to the new physical location. The old location is no longer holding valid data, but it will eventually need to be erased before it can be written again.

FAL to the rescue!

The Flash Abstraction Layer (FAL) provides a high-level abstraction of the physical organization of NAND Flash memory devices. It emulates the rewriting of sectors in hard disks by remapping new data to another location in the memory array and marking the previous sector invalid.

To better wrap your head around this concept think of it as something that resembles a database log or the NetApp WAFL (Write Anywhere File Layout).

Tipically the SSD is Overprovisioned by the manufacturer, meaning that there is a difference between the physical capacity of the Flash memory and the logical capacity available for the user, this additional space is used by the FAL modules which normally are included in every SSD controller:

Translation Module

The Translation module, which is the primary interface in the FAL, provides the translation from virtual to physical addresses and converts the logical operations into physical operations on the Flash memory device. It also handles the exporting of all operations available on storage media (for example: write sector, read sector and format partition).

Wear Leveling Module

The Wear Leveling module ensures that the memory array is used uniformly by monitoring and evenly distributing the number of erase cycles per block. Each time a block is requested by the Translation module, the Wear Leveling module allocates the least used block. The “Program/Erase cycles” are the number of possible Write/Erase operations on a block that for a single level cell (SLC) NAND device it is equal to 100,000 cycles.

Garbage Collection module

As the FAL emulates rewriting sectors in hard disks by remapping new data to another location of the memory array and marking the previous sector invalid, eventually it may be necessary to free some of the invalid memory space to allow further data to be written. To do this, the FAL implements the Garbage Collection module, where the valid sectors are copied into a new free area and the old area is erased.

Bad Block management

The Bad Block Management module determines how to set a block as bad.
Bad Blocks are blocks that contain one or more invalid bits whose reliability is not guaranteed. Bad Blocks may be present when the device is shipped, or may develop during the lifetime of the device. The Bad Block Management module hides the bad blocks from the FAL, preventing the FAL from accessing them.

Final Words

As you can see, SSD technology is pretty complicated stuff, I just scratched the surface with this analysis but I hope that it will give you the possibility to make up your own mind on the subject.