How SSD works

Everybody thinks SSDs are lightning fast. Of course there are no spinning disk platters and no head positioning. Everything is done in memory therefore they should be fast. But if you use an SSD for your database, you'll start with amazing performance levels that quickly diminish. Here is why.

Unlike with spinning disks you cannot rewrite your data directly on an SSD. Before writing, the blocks that will receive the data must be erased. The latency for the entire operation is 100 µs for write operations but 2 ms to erase a block. The erase operations are 20 times slower. To make thing more complex, data can be written in small increments of 4KB, but only erased in blocks of 512KB. The blocks can only be erased 3,000/5,000 times. Meaning that the SSD firmware has to carefully manage where and how the dates are written in order to avoid an early wear of the SSD. This operation is called wear leveling.

Wear leveling

The main challenge for SSDs manufacturers is to prevent the early wear of the SSD. For example if you write a 1MB data file at the beginning of the disk then update that file over and over, after ≈5000 writes your disk will be dead. To prolong the life span of the SSD, manufacturers use a technique called Wear leveling. Wear leveling distributes the erase and re-writes evenly across the medium to avoid failures due to a high concentration of write cycles.

Why is my previously fast SSD now so slow?

As long as you are writing new data, everything is fine. The data is written sequentially one block after another. When you want to modify the content of an existing file, the SSD marks the old data as deleted and writes the new data sequentially at the end. This is a fast operation as long at there is enough space on the disk.

However, over time the disk runs out of sequential space. As you can see in the picture, this leads to disk fragmentations and disk fragmentation is bad. Do you remember the time when people were running the program called defrag on their MSDOS box to improve performance?

On SSDs, data fragmentation is very bad because SSDs need to erase an entire erase block to reclaim free space and the block has to be completely free of data. To accomplish this, the SSD has a garbage collection mechanism. The garbage collector needs to be careful not to move too much data to avoid early wear of the medium.

The garbage collection operations are usually done in the background, when the disk is not in use. On a busy database server with many I/O this is not always possible and after a while some write operations can take a long time because the SSD has to free some space before the write can be done. On a very fragmented data set the SSD has to move data and reorganize them to avoid fragmentation and to be able to reclaim free space. These operations dramatically cut the performance of the SSD and reduce the lifespan of the drive.


If your application is doing a majority of reads and sequential writes SSDs are a good choice for high performance throughput. However, if your application is update intensive, SSDs will perform well when they are new, but the performance will degrade over time. Also keep in mind that the life span of a disk depends on the number of times the blocks are erased. Most of the SSD manufacturers' manuals claim an average of 5 years. This average is based on laptop or desktop usage. An update intensive application will wear the SSD down faster. On a busy server running a database that lifespan can go down from 5 to 1.5 to 2 years.

Comments !