Sat, 16 Dec 2023 UTC

# Recovering a btrfs pool after drive failure

## My NAS

My NAS is cobbled together from an old workstation and a bunch of drives cannibalised from other machines over the years. Not only are the drives second-hand, consumer grade, and of varying ages and brands, but they are also all different sizes - ranging from 250 GB up to 2 TB in size. Any competent sysadmin would take one look and run away screaming, but it has held up surprisingly well for years.

Anyway, the filesystem I use to stitch the storage hardware together into a single logical device is btrfs. It gets a bad rap in some circles, but in my experience it is actually very reliable and has a many nice features. In particular, it allows you to throw a bunch of drives of different sizes into a storage pool, and then tell the filesystem to ensure that every block is stored on at least n devices, where n is anywhere from 2 to 4. btrfs calls this RAID1, RAID1C3, and RAID1C4, as it has some similarities with old-school RAID1.

The main thing I wanted from this setup was enough resilience to lose a single drive without losing the whole storage pool. So I went with RAID1.

## Boot failure

Today the NAS was offline, and as soon as I opened the door to the cupboard where it is stored I knew why - one drive was clunking away in an ominous manner.

The btrfs filesystem was failing to mount, and since it is in /etc/fstab the machine was failing to boot. I couldn’t even get into recovery mode via Debian’s default GRUB menu entry because this machine has a locked root account.

## Recovery

Since it was impossible to get the machine to the point that it could tell me which drive had problems, I resorted to the highly technical method of feeling each drive with a fingertip until I located the one emitting the clunks. Then I pulled the SATA data cable and waited for the clunks to stop.

At this point I booted GRML from a USB stick, and took a look at the remaining drives.

Now, because I had been running in RAID1 mode, in theory the storage pool should still be complete even while missing a drive. So I mounted it in degraded mode1:

sudo mkdir /mnt/pool
sudo mount -o degraded /dev/sda /mnt/pool

That worked, so the data was safe!

Then I removed the missing device from the pool:

sudo btrfs device remove missing /mnt/pool

This command:

  1. checks for whichever device(s) are missing from a degraded mount.
  2. removes the device from the storage pool.
  3. copies data around to meet the requirements of the storage profile. In the case of RAID1 it needs to ensure that all the data that was on the missing device is duplicated among the remaining devices.2

This took around 90 minutes (the missing drive was 250 GB in size). You can monitor the removal process by inspecting the device usage - the device being removed will eventually go to zero:

sudo btrfs device usage /mnt/pool/

At the end of this process I booted back into the main OS, and the storage pool was back - just 250GB smaller.

## Adding another device

Later, I found another 250 GB drive sitting in a drawer and threw it into the NAS to replace the one that had failed.

sudo btrfs device add -Kf /dev/sdb /srv/pool

This command was almost instant because there was no data copying to do since the data was already duplicated across multiple devices.

## Conclusions

This experience confirms my existing impression of btrfs:


  1. A couple of notes on the mount command:

    • While mounting in read-only mode might seem prudent for data preservation, it also means that you can’t actually manipulate the storage pool. So mount read-write (the default) if you want to e.g. remove devices.
    • Reference any device which is part of the pool when mounting. It doesn’t matter which one.
     ↩︎
  2. Device removal requires that there is enough free space in the storage pool to duplicate the data that was stored on the missing device. If not, you may need to either perform the btrfs device add first, or look into btrfs-replace↩︎

tags : btrfs sysadmin