SSD Demystified

On saturday I was wandering around the storage blogosphere and I stopped by Nigel’s blog for a really nice article on Sub-Lun Tiering design considerations. In that post he was making the point that SSDs are not that reliable for writes (in a scenario where SSDs are treated almost as a “cache”) because of a issue called “write amplification” that sparked my interest in creating a post explaining in a technical-to-layman fashion, how a SSD drive works and what kind of magic trickery does inside its case in order to go that fast :-).

First things first: there are a number of different devices called SSDs today, but I will focus only on NAND-based SSDs that are drop-in replacement for traditional hard drives commonly found in storage systems and even consumer devices.

What is NAND?

NAND is a nonvolatile solid state memory. Nonvolatile memory has the capability to hold and store data even when the power is turned off, it stores data in a large array of transistors. SLC (Single-level cells) transistors can store one bit of data while MLC (Multi-level cells) can store two bits of data in each cell.

Compared to traditional NOR Flash memory, NAND Flash memory can pack a greater number of storage cells in a given area of silicon. This gives NAND Flash density and cost advantages over other nonvolatile memory. NAND achieves these advantages by sharing some of the common areas of the storage transistor, which creates strings of serially connected transistors (in NOR devices, each transistor stands alone). This serial cell architecture explains the device name: NAND (not AND) is the boolean logic reference to how information is read out of these cells.

SLC vs. MLC

Traditional, single-level cell (SLC) NAND Flash memory stores one bit of information per memory cell. This basic technology enables faster transfer speeds, lower power consumption, and increased endurance. Multiple-level cell (MLC) NAND, by comparison, stores two bits of information per memory cell, effectively doubling the amount of data that can be stored in a similar-size NAND Flash device, but that comes at a cost.

As you can clearly see from the table below, MLC cells are subjected to a shorter lifespan and worse performance when compared to SLC cells:

Features	MLC	SLC
Density	32/64mbit	16mbit
Bits per Cell	2	1
Voltage	3.3V	3.3V, 1.8V
Page Size	2 / 4 K	2 K
Erase / Program Cycles	< 10.000	< 100.000
Read Performance	50 µs	25 µs
Write Performance	600 / 900 µs	200 / 300 µs
Erase Performance	3 ms	1.5 / 2 ms

Until last year, SLC were the only enterprise grade SSD drives available on the market (notably the STEC ZeusIOPS), but thanks to the technology advancements, MLC drives recently made their appearance in the enterprise market with a new standard called E-MLC (enterprise-grade multilevel cell).

How SSDs operate internally

SSDs are subjected to many quirks and nuances from their NAND cells, for instance, NAND must be erased an entire block at a time (an operation that takes nearly 2,000µs) and a write (or program) must be to an erased block.

This leads to a phenomenon called Write Amplification, because Flash memory must be erased before it can be rewritten, the process to perform these operations results in moving (or rewriting) user data and metadata more than once. This multiplying effect increases the number of writes required over the life of the SSD which shortens the time it can reliably operate. The increased writes also consume bandwidth to the Flash memory which mainly reduces random write performance to the SSD. Many factors will affect the write amplification of an SSD, some can be controlled by the user and some are a direct result of the data written to and usage of the SSD.

When data is first written to an SSD, the cells all start in an erased state so data can be written directly using pages at a time (often 4-8 KB in size). The SSD controller on the SSD, which manages the Flash memory and interfaces with the host system, uses a logical to physical mapping system known as LBA (logical block addressing) and that is part of the Flash Abstraction Layer (more on that later). When new data comes in replacing older data already written, the SSD controller will write the new data in a new location and update the logical mapping to point to the new physical location. The old location is no longer holding valid data, but it will eventually need to be erased before it can be written again.

FAL to the rescue!

The Flash Abstraction Layer (FAL) provides a high-level abstraction of the physical organization of NAND Flash memory devices. It emulates the rewriting of sectors in hard disks by remapping new data to another location in the memory array and marking the previous sector invalid.

To better wrap your head around this concept think of it as something that resembles a database log or the NetApp WAFL (Write Anywhere File Layout).

Tipically the SSD is Overprovisioned by the manufacturer, meaning that there is a difference between the physical capacity of the Flash memory and the logical capacity available for the user, this additional space is used by the FAL modules which normally are included in every SSD controller:

Translation Module

The Translation module, which is the primary interface in the FAL, provides the translation from virtual to physical addresses and converts the logical operations into physical operations on the Flash memory device. It also handles the exporting of all operations available on storage media (for example: write sector, read sector and format partition).

Wear Leveling Module

The Wear Leveling module ensures that the memory array is used uniformly by monitoring and evenly distributing the number of erase cycles per block. Each time a block is requested by the Translation module, the Wear Leveling module allocates the least used block. The “Program/Erase cycles” are the number of possible Write/Erase operations on a block that for a single level cell (SLC) NAND device it is equal to 100,000 cycles.

Garbage Collection module

As the FAL emulates rewriting sectors in hard disks by remapping new data to another location of the memory array and marking the previous sector invalid, eventually it may be necessary to free some of the invalid memory space to allow further data to be written. To do this, the FAL implements the Garbage Collection module, where the valid sectors are copied into a new free area and the old area is erased.

Bad Block management

The Bad Block Management module determines how to set a block as bad.
Bad Blocks are blocks that contain one or more invalid bits whose reliability is not guaranteed. Bad Blocks may be present when the device is shipped, or may develop during the lifetime of the device. The Bad Block Management module hides the bad blocks from the FAL, preventing the FAL from accessing them.

Final Words

As you can see, SSD technology is pretty complicated stuff, I just scratched the surface with this analysis but I hope that it will give you the possibility to make up your own mind on the subject.

6 Comments

Rsnell on 19/12/2010 at 3:47 pm

Overall good stuff! On a consumer level, laptop, and desktop level, SSD is off and running and MLC is the standard.

On an enterprise level, WhipTail solves write amplification via buffering to align the writes to the native write block size of the flash cell. So MLC can write as fast or faster than SLC and 150X faster than HDD.

The problem with SSDs that “drop in” to traditional arrays is that the performance bottlenecks at the storage processor, controller, and/or NAS head. So while the drives may be fast individually, it doesn’t add up.

MLC is going to the the primary flash storage medium for all use cases. Intel and Toshiba continue to invest in MLC base drive and appliance companies, and IBM just signed a huge OEM contract with STEC for their MLC drives to use in their enterprise storage arrays.

WhipTail compensates for the limited wear of MLC by linearizing the writes across the entire appliance so that no single drive or cell within a drive gets pounded by an enterprise workload. This enables us to guarantee our drives will last 7 years without any performance degradation.

Fabio Rapposelli on 19/12/2010 at 6:23 pm

I agree that using multi level cells is probably the way to go, and I know that there are already some triple level cell chip in the market today.

So basically everything is going to be played in the optimization layer, optimize filesystems to write on flash and optimize arrays to do optimized writes.
Reply

Rsnell on 19/12/2010 at 4:47 pm

Overall good stuff! On a consumer level, laptop, and desktop level, SSD is off and running and MLC is the standard.

Fabio Rapposelli on 19/12/2010 at 7:23 pm

I agree that using multi level cells is probably the way to go, and I know that there are already some triple level cell chip in the market today.

So basically everything is going to be played in the optimization layer, optimize filesystems to write on flash and optimize arrays to do optimized writes.
Reply

Didier Pironet on 27/12/2010 at 6:23 pm

Just a quick comment here about TRIMing. While enterprise grade SSDs trim natively, consumer grade SSDs just don’t and expect the OS to tell them what to trim. Unfortunately not all OS’es support TRIM function. And even if TRIM hardware is available such as in the Intel X25-M G2, it is useless and inappropriate.

Another one about the microcode. That’s the master piece of enterprise grade SSDs and even thought STEC for instance does an excellent microcode, companies like EMC has to enhance it to stick to their storage product lines, adding their ingredients and secret sauce to the recipe. IMO real improvments will come from SSD microcodes and how storage companies will handle that…

Cheers,
Didier

Didier Pironet on 27/12/2010 at 7:23 pm

Cheers,
Didier

Rsnell on 19/12/2010 at 3:47 pm

Overall good stuff! On a consumer level, laptop, and desktop level, SSD is off and running and MLC is the standard.

On an enterprise level, WhipTail solves write amplification via buffering to align the writes to the native write block size of the flash cell. So MLC can write as fast or faster than SLC and 150X faster than HDD.

The problem with SSDs that “drop in” to traditional arrays is that the performance bottlenecks at the storage processor, controller, and/or NAS head. So while the drives may be fast individually, it doesn’t add up.

MLC is going to the the primary flash storage medium for all use cases. Intel and Toshiba continue to invest in MLC base drive and appliance companies, and IBM just signed a huge OEM contract with STEC for their MLC drives to use in their enterprise storage arrays.

WhipTail compensates for the limited wear of MLC by linearizing the writes across the entire appliance so that no single drive or cell within a drive gets pounded by an enterprise workload. This enables us to guarantee our drives will last 7 years without any performance degradation.
- Fabio Rapposelli on 19/12/2010 at 6:23 pm
  
  I agree that using multi level cells is probably the way to go, and I know that there are already some triple level cell chip in the market today.
  
  So basically everything is going to be played in the optimization layer, optimize filesystems to write on flash and optimize arrays to do optimized writes.
Rsnell on 19/12/2010 at 4:47 pm

Overall good stuff! On a consumer level, laptop, and desktop level, SSD is off and running and MLC is the standard.

On an enterprise level, WhipTail solves write amplification via buffering to align the writes to the native write block size of the flash cell. So MLC can write as fast or faster than SLC and 150X faster than HDD.

The problem with SSDs that “drop in” to traditional arrays is that the performance bottlenecks at the storage processor, controller, and/or NAS head. So while the drives may be fast individually, it doesn’t add up.

MLC is going to the the primary flash storage medium for all use cases. Intel and Toshiba continue to invest in MLC base drive and appliance companies, and IBM just signed a huge OEM contract with STEC for their MLC drives to use in their enterprise storage arrays.

WhipTail compensates for the limited wear of MLC by linearizing the writes across the entire appliance so that no single drive or cell within a drive gets pounded by an enterprise workload. This enables us to guarantee our drives will last 7 years without any performance degradation.
- Fabio Rapposelli on 19/12/2010 at 7:23 pm
  
  I agree that using multi level cells is probably the way to go, and I know that there are already some triple level cell chip in the market today.
  
  So basically everything is going to be played in the optimization layer, optimize filesystems to write on flash and optimize arrays to do optimized writes.
Didier Pironet on 27/12/2010 at 6:23 pm

Just a quick comment here about TRIMing. While enterprise grade SSDs trim natively, consumer grade SSDs just don’t and expect the OS to tell them what to trim. Unfortunately not all OS’es support TRIM function. And even if TRIM hardware is available such as in the Intel X25-M G2, it is useless and inappropriate.

Another one about the microcode. That’s the master piece of enterprise grade SSDs and even thought STEC for instance does an excellent microcode, companies like EMC has to enhance it to stick to their storage product lines, adding their ingredients and secret sauce to the recipe. IMO real improvments will come from SSD microcodes and how storage companies will handle that…

Cheers,
Didier
Didier Pironet on 27/12/2010 at 7:23 pm

Just a quick comment here about TRIMing. While enterprise grade SSDs trim natively, consumer grade SSDs just don’t and expect the OS to tell them what to trim. Unfortunately not all OS’es support TRIM function. And even if TRIM hardware is available such as in the Intel X25-M G2, it is useless and inappropriate.

Another one about the microcode. That’s the master piece of enterprise grade SSDs and even thought STEC for instance does an excellent microcode, companies like EMC has to enhance it to stick to their storage product lines, adding their ingredients and secret sauce to the recipe. IMO real improvments will come from SSD microcodes and how storage companies will handle that…

Cheers,
Didier

SSD Demystified

What is NAND?

SLC vs. MLC

How SSDs operate internally

FAL to the rescue!

Translation Module

Wear Leveling Module

Garbage Collection module

Bad Block management

Final Words

Related

About The Author

Fabio Rapposelli

6 Comments

Leave a Reply Cancel reply

Search

Disclaimer

Voices in Data Storage

SSD Demystified

What is NAND?

SLC vs. MLC

How SSDs operate internally

FAL to the rescue!

Translation Module

Wear Leveling Module

Garbage Collection module

Bad Block management

Final Words

Share:

Related

About The Author

Fabio Rapposelli

6 Comments

Leave a Reply Cancel reply

Search

Disclaimer

Voices in Data Storage