Last year some time in November 2012, I decided that keeping my media (photos, videos and music) on my main storage server was a little annoying as my family would complain each time I was making changes and the server was unavailable. So, I decided I wanted a small box that would just sit in the corner quietly and reliably serve out disk space with very little intervention from me.
After a little browsing on Ebuyer, I found the HP ProLiant MicroServer N40L. It’s a small little box with four disk bays, one optical drive bay, and could take up to 8GB RAM officially, but people have put 16GB without any issues. At the time, HP had a £100 cash-back deal as well, so without any delay I bought it.
The machine has sat quietly on top of my rack, running OpenFiler to serve files from a 4TB RAID-5 array for months, and it’s been great!
I have been using Linux RAID with LVM on top, and I’m a big fan of this combination, BUT I have some issues with it:
- Running a consistency check on the RAID-5 array takes a very long time. Running a consistency check on a Linux RAID array requires checking each block of the array, even if it has not been allocated. This is a long, slow process, and due to the increased work that the disks would have to do, it increases the chances of it dying!
- The above is also true when replacing a failed disk. When replacing a RAID-5 disk, the resyncing processes is very long, and slow, all the disks are read in order to recalculate parity etc for the replacement disk and the chances of a second disk failure are actually quite high due to the large capacity of the disks, and the amount of time it takes to rebuild. There are quite a few interesting articles about this around.
- I haven’t done any proper tests, but the performance wasn’t as great as I would have hoped for, I was noticing quite bad performance when dealing with lots of small files. (I know it’s a little silly, but this was probably the biggest problem I was having, causing me to research ZFS!)
So, in light of these issues (mainly the performance one!), and from hearing a lot of good things about it from friends (Michael and Jamie), I decided to look into ZFS.
Although I have heard a lot of good things about ZFS in the past, I always avoided it due to having to either use Solaris or a BSD variant as due to some licensing issues ZFS couldn’t be included with Linux. While there was a FUSE module for ZFS, the performance was quite bad so I never really considered using it for a NAS.
Recently, there was an article on The Register about ZFS on Linux being “production ready”, so I decided to take the leap and move from OpenFiler to RHEL6 with ZFS on Linux!
Here is how I done so, and my experiences of it.
I will be creating a single RAIDZ pool with the four drives, with the SSD as a cache L2ARC drive.
The N40L has an internal USB port, so I will be using a SanDisk 16GB Flash drive for the OS.
I don’t plan on putting an optical drive into my N40L, so I decided to use the SATA port which HP have designated the ODD port for my SSD. In order to put the ODD port into AHCI mode, and bring the port up at 3Gb/s instead of 1.5Gb/s, I had to apply a BIOS hack which can be easily found on Google.
As I put in the note above, the two Seagate drives are terrible, and have a pretty high failure rate. I’ve had these for a few years, and they have failed and been replaced by Seagate many times. I’m only using them temporarily temporarily, and planning to replace all the drives with 2TB drives soon, and keep a backup on my main storage server.
The SSD will also be replaced later on with something a little newer, that can offer more IOPS than the current SSD I am using.
As the N40L has an internal USB port, I decided to use a USB flash drive for the OS.
I don’t think I had to do anything special during the installation of RHEL, I used my PXE booting environment and my kickstart scripts to do the base RHEL installation but it’s nothing really fancy, so I won’t go into the installation process.
Once I had a clean RHEL environment, I added the EPEL and ZFS on Linux repositories:
[[email protected] ~]# yum localinstall --nogpgcheck http://mirror.us.leaseweb.net/epel/6/i386/epel-release-6-7.noarch.rpm http://archive.zfsonlinux.org/epel/zfs-release-1-2.el6.noarch.rpm
Next, we install ZFS:
[[email protected] ~]# yum install zfs
The ZFS on Linux documents recommend using the vdev_id.conf file to allow the use of easy to remember aliases for disks. Basically what this does is creates a symlink in /dev/disks/by-vdev/ to your real disk.
I created my vdev_id.conf file as follows:
[[email protected] ~]# cat /etc/zfs/vdev_id.conf
alias HDD-0 pci-0000:00:11.0-scsi-0:0:0:0
alias HDD-1 pci-0000:00:11.0-scsi-1:0:0:0
alias HDD-2 pci-0000:00:11.0-scsi-2:0:0:0
alias HDD-3 pci-0000:00:11.0-scsi-3:0:0:0
alias SSD-0 pci-0000:00:11.0-scsi-5:0:0:0
Once we have made the changes to the vdev_id.conf file, we must make udev trigger and create our symlinks:
[[email protected] ~]# udevadm trigger
[[email protected] ~]# ls -l /dev/disk/by-vdev/
lrwxrwxrwx 1 root root 9 Apr 27 16:56 HDD-0 -> ../../sda
lrwxrwxrwx 1 root root 9 Apr 27 16:56 HDD-1 -> ../../sdb
lrwxrwxrwx 1 root root 9 Apr 27 16:56 HDD-2 -> ../../sdc
lrwxrwxrwx 1 root root 9 Apr 27 16:56 HDD-3 -> ../../sdd
lrwxrwxrwx 1 root root 10 Apr 27 16:56 SDD-0 -> ../../sde
Now we can create our pool!
I decided to go with using RAIDZ1, which is effectively RAID-5. I regret this decision now, and should have gone with RAIDZ-2 (RAID-6), but too late now. :/
Although my drives are using 2^9 (512) byte sectors, I decided to tell ZFS to align the partitions for Advanced Format (AF) disks which use 2^12 (4k) byte sectors. The reasoning for this is that, once the pool has been created, the alignment cannot be changed unless you destroy the pool and recreate it. I’d prefer not to destroy the pool when upgrading the disks, and keeping the partitions aligned for 512 byte drives means that if I decide to upgrade to AF drives in the future, I would see performance degradation due to bad partition/sector alignment. As far as I know, the only disadvantage to aligning for AF drives on 512-byte sector drives is that there will be some disk space overhead and you will lose some usable disk space, but I think it’s better than the alternative of having to destroy the pool to upgrade the drives!
[[email protected] ~]# zpool create -o ashift=12 DiskArray raidz HDD-0 HDD-1 HDD-2 HDD-3 cache SSD-0
[[email protected] ~]# zpool status
scan: none requested
NAME STATE READ WRITE CKSUM
DiskArray ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
HDD-0 ONLINE 0 0 0
HDD-1 ONLINE 0 0 0
HDD-2 ONLINE 0 0 0
HDD-3 ONLINE 0 0 0
SSD-0 ONLINE 0 0 0
errors: No known data errors
Magic! Our pool has been created! 😀
Now we can create a few data sets:
[[email protected] ~]# zfs create DiskArray/home
[[email protected] ~]# zfs create DiskArray/photos
[[email protected] ~]# zfs create DiskArray/scratch
Now you can fill it up! 🙂
I went further and setup NFS exports, and Samba. I opted to share my data stores the normal way using the /etc/exports and smb.conf file, but for this, Samba and NFS have to be started after ZFS has mounted the pool. ZFS does have the sharesmb and sharenfs options which basically add add the export/share to Samba and NFS as soon as it is available, but I prefer the traditional way as I am used to it. 🙂
I haven’t really done too many tests, but using spew, I get the following results:
[[email protected] ~]# spew -b 20m --write 20g /DiskArray/scratch/test.bin
WTR: 63186.63 KiB/s Transfer time: 00:05:31 IOPS: 3.09
[[email protected] ~]# spew -b 20m --read 20g /DiskArray/scratch/test.bin
RTR: 190787.30 KiB/s Transfer time: 00:01:49 IOPS: 9.32
It’s not the greatest performance, and I’m not 100% sure if this is what should be expected, I wish the IOPS would be higher, but comparing these results to a stand-alone Seagate Barracuda 7200.12 500 GB (ST3500418AS) Drive with an ext4 file system (I realise this isn’t really a good or accurate way to compare!), I don’t think it’s too bad:
[[email protected] ~]# spew -b 20m --write 20g /mnt/data/spew.bin
WTR: 125559.47 KiB/s Transfer time: 00:02:47 IOPS: 6.13
[[email protected]sapphire ~]# spew -b 20m --read 20g /mnt/data/spew.bin
RTR: 131002.84 KiB/s Transfer time: 00:02:40 IOPS: 6.40
The write speed of my ZFS RAIDZ pool seems to be half of the stand-alone disk, which is totally expected as it’s calculating parity and writing to multiple disks at the same time, and the read speed actually seems to be faster for my RAIDZ pool!
Also, as I am only on 100mbit ethernet at the moment, I am able to fully saturate the pipe when transferring large files, and I have noticed that things feel a lot more responsive now with ZFS than they were with Linux RAID + LVM + XFS/EXT4, but I haven’t got any numbers to prove that. 🙂
Well, as I’m using 100 mbit switches at home, not much. I’m planning on buying a SAS/SATA controller so I can add a few more drives and maybe a ZIL drive. As mentioned above, I’m also thinking about upgrading the drives to 2TB drives, and replace the SSD with something better as the current one has terrible read/write speeds and doesn’t even offer a very good number of IOPS.
HP currently has a £50 cash-back deal on the N54L, so I’m also really tempted to buy one for backups, but we’ll see about that! 🙂
If you decide to go down the ZFS road (on Linux, BSD or Solaris) on your N40L, I’d be very interested in hearing your experiences, hardware specs, and performance so I can figure out if I’m getting expected performance or terrible performance, so please leave a comment!