Anatomy of Disk Drives

Courtesy: Linux Mag

The first company to release a 3TB drive was Western Digital that on Oct. 19, 2010, announced that they were actually shipping 3TB SATA hard drives. The Western Digital Green 3TB (Model: WD30EZRSDTL) is a very nice 3.5″ SATA hard drive. It uses five 600MB platters to achieve 3TB and has a SATA 2.0 (3 Gbps) interface with a very large 64MB cache. To keep the drive power down to a level that Western Digital wanted to achieve, this new drive has somewhere between a 5,400 and 6,000 rpm rotational speed. So it’s not a performance oriented drive but a capacity oriented drive.

As of the writing of this article, I checked for pricing on the drive. The price on the drive was $209.99 (be sure to check pricing since this price was only when I checked and only at The drive also comes with a HighPoint RocketRAID 620 PCIe SATA card which is a bit unusual for SATA hard drives. Why would a standard SATA hard drive come with a SATA card when systems typically have lots of SATA ports? The answer is that is it used in breaking the 2.199TB limit for some systems.

What 2.199TB limit?

I suppose a number of readers are asking about this magic 2.199TB limit and why we are only seeing this limit now. So let’s start by reviewing why we have this limit.

Virtually every storage device used in the consumer world as well as most in the enterprise world use a scheme called LBA to address the blocks on storage devices. The approach is really simple – it is a linear addressing scheme where blocks are located by an integer index. The first block is LBA 0, the second is LBA 1, and on and on. When you system boots, it looks at LBA 0 of the first boot device for the Master Boot Record (MBR).

The LBA scheme uses 32-bit addressing under the MBR partitions. This means that the maximum possible LBA value is,

Maximum LBA Value = 2^32 = 4,294,967,296

Since most drives use 512-byte sectors this means the largest partition is limited to the following size.

Largest Partition = 4,294,967,296 * 512 bytes = 2,199,023,255,552 Bytes

In other words, we can’t address more than 2.199 TB’s on a single drive with a MBR partition.

There are many ways to solve this problem. One way is to support a larger LBA address space. In particular, 64-bit LBA addressing will allow us to go to 2^64 blocks. Coupling this with 512 byte sectors this gets us to about 9.4ZB (Zettabytes). You will be very pleased to note that Linux has 64-bit LBA addressing, particularly for SATA. From what I understand, 64-bit LBA has been in there for SCSI drives so it was easily adapted for SATA drives.

Another approach to solving the problem is to increase the size of the sectors from 512-bytes to something bigger, such as 4KB. Let’s take a quick look at this.

4KB Sectors

As we previously saw, using really small sectors, such as 512-byte, can limit the amount of addressable space using 32-bit LBA addressing. The 512 byte sector size was originally used to get the maximum usable capacity from storage. Such a small sector size almost guaranteed that sectors would not go wasted since it was extremely unlikely that a file would be smaller than 512 bytes (it’s possible but very unlikely). But now we are facing the problem that we can’t use hard drives larger than 2.199TB which has become an obstacle for us storage capacity junkies. One apparently easy option is to increase the sector size to 4KB.

Why 4KB? The answer is surprisingly simple – it matches an integer multiple of various system aspects. Specifically, the NTFS and HFS+ file systems all default to 4KB clusters. Ext3 also defaults to 4KB. Just as important is the fact that the normal page size of memory on an x86 processor is 4KB. So memory and several common file systems all are based on 4KB (By the way – that’s not coincidence. It’s all based on the x86 memory page size). Using 4KB means that we are wasting as little space as possible since we’ve matched the memory page size and the file system cluster size. However, there will be times when the file is smaller than 4KB so it will effectively waste space. But this is the trade that must be made in exchange for getting much large capacities. So 4KB it is.

It turns out that 4KB sectors also have other benefits that impact the drives themselves. As the aural density (the density of data on the platter) increases, the ability to access the data becomes more difficult. You either need much more sensitive drive heads (adding cost and complexity) or you need to improve your ability to read the data correctly. Drives use ECC (Error Checking Correcting) algorithms inside the drive itself to make sure they are reading the sectors correctly. In current 512 byte sector drives, when the data is written, a 4 byte (40 bits) block of ECC data is also written. When the particular 512 byte sector is read, the ECC data is read as well. The drive then compares the values. If they are not equal (ouch) then the drive re-reads the data and repeats the process until the data matches or until it reaches a particular value (a time-out) and will throw an ECC error. The drive can use CRC codes and Reed-Solomon codes to recover data from sectors that have bad (corrupt) data and to move that data to spare sectors. This is what you commonly see in SATA drives when a sector goes bad and the drive remaps the data from that sector to a new spare sector on the drive.

Using the 512 byte sectors, we need 8 sectors to reach a total of 4KB of data. This means we have 8*40 = 320 bits (or 32 bytes assuming 10 bits = 1 byte) of ECC data to reach 4KB of real data. On the other hand, if we have 4KB sectors, Western Digital is saying we only need 100 bits (10 bytes) of ECC data. This means we can save about 220 bits (22 bytes) in space that can be used by the user (and us storage capacity junkies rejoice). This give about 5.5% more capacity per drive.

In addition to capacity, the switch to 4KB sectors also means that the drive does not have to do as much work per read when performing the ECC check, improving drive performance. Also, it means that the 4KB drive can correct sectors faster than the 512 byte sector drives. Both aspects help performance.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s