Linux MD Devices in a ZFS Pool

Can you use MD devices in a ZPool? (tl;dr)

Yes. Yes you can and it works like you would expect. Create the underlying MDs first, then create the zpool containing them. Seems to work pretty well. The real question is why would you do this?

The situation that got me here:

I was building a media server combining purchased equipment with leftovers in the spare parts bin. The main goal was to create a large filesystem to share out, with at least some resiliency. None of the data was irreplaceable but we didn’t want to trash it all with a single failed drive. One giant raid 0 stripe was out. The challenge was to efficiently use a pile of different size hard drives in that one large filesystem. The drives in question were one 6TB, three 4TB and three 2TB.

The failed approaches:

I am a big Ceph fan, and one of the guys suggested a single host Ceph cluster. I found some references on the topic so figured it was worth a try. That ended up being a bit tricky (crushmap changes are necessary before creating any pools) and it performed poorly. Maybe a topic for a later post if anyone is that much of a glutton for punishment (or needs to lab test something).

I am also a big ZFS fan and assumed it would allow you to assemble several stripes into a raidz1. You certainly can assemble mirrors into a stripe pool (raid 10), I have a few systems running with that arrangement. Alas, you cannot do the same with stripes in a z.

So, MD with ZFS is [the|an] answer.

My next idea was to create three stripes out of the 4TB and 2TB drives, giving us three 6TB MDs, then build a filesystem out of that. (I did make an attempt at creating ZFS stripes but couldn’t add them to a raidz1. You can do that with mirrors but this application valued space efficiency over performance.) In hindsight it seems so obvious.  First, we created the underlying mds. The disk assortment was:

[0:0:0:0] disk ATA Hitachi HDS72302 A5C0 /dev/sda  <-- 2TB
[0:0:1:0] disk ATA WDC WD40EZRZ-00W 0A80 /dev/sdb  <-- 4TB
[0:0:2:0] disk ATA ST32000641AS CC13 /dev/sdc      <-- 2TB
[0:0:3:0] disk ATA WDC WD40EZRZ-00W 0A80 /dev/sdd  <-- 4TB
[0:0:4:0] disk ATA WDC WD40EZRZ-00G 0A80 /dev/sde  <-- 4TB
[0:0:5:0] disk ATA ST2000DM001-1CH1 CC27 /dev/sdf  <-- 2TB
[0:0:6:0] disk ATA HGST HDN726060AL T517 /dev/sdg  <-- 6TB
[1:0:0:0] disk ATA SSD2SC120G1SA754 4B /dev/sdh  <-- Boot Drive

So creating the first MD is as simple as:

# mdadm --create /dev/md0 --level=stripe --raid-devices=2 /dev/sda /dev/sdb

Repeat with the other two and four TB drives for /dev/md1 and /dev/md2. Then zpool them:

# zpool create tank raidz /dev/md0 /dev/md1 /dev/md2 /dev/sdg
# zpool status
  pool: tank
 state: ONLINE
  scan: none requested
config:
 NAME        STATE     READ WRITE CKSUM
 tank        ONLINE       0     0     0
   raidz1-0  ONLINE       0     0     0
     md0     ONLINE       0     0     0    
     md1     ONLINE       0     0     0
     md2     ONLINE       0     0     0
     sdg     ONLINE       0     0     0
 
errors: No known data errors

And there we have it, a raidz1 pool with ~16TB of usable space. And so far it has performed well, or at least as well as we have demanded. The MDs have increased risk of failure, being a plain stripe, but data wouldn’t be in jeopardy by any single disk loss.

Leave a Reply

Your email address will not be published. Required fields are marked *