ZFS, BTRFS, XFS, EXT4 and LVM with KVM – a storage performance comparison

Written by Gionatan Danti on . Posted in Virtualization

User Rating:  / 77
PoorBest 

Database prepare time

Let's see how fast (or slow) can be to prepare a PostgreSQL database via sysbench prepare command. Please note that sysbench prepare is much heavier than a “simple” raw SQL import, as it issues many synchronized writes (fsync) rather than a single sync command at the end.

The fastest performer was LVM with preallocated space, with preallocated Qcow2 slightly behind. ZFS is again very fast, with ThinLVM+nozeroing at its wheels.

But the real winner is the ThinLVM+EXT4 combo: it had performance very similar to a preallocated LVM volume, with the added flexibility of being a thin volume. And while you can argue that using a direct-attached ThinLVM volume without zeroing is a security hazard (at least in untrusted environments), it is relatively safe to use it as the backing device of a host-side filesystem installation.

What about the large XFS vs EXT4 gap? It can probably related to more efficient journal commit, but maybe it is another problem entirely: I noticed that with default 512KB chunks, XFS log buffer size is (by default) not optimally sized. Technical explanation: XFS log buffer size is MAX(32768, log_sunit). At filesystem creation mkfs.xfs try to optimize the sunit values for the current raid setup, but a 512KB stripe value is too much and so mkfs.xfs reverts to a sunit value of only 32KB. In fact, during filesystem creation you can see something similar:

log stripe unit (524288 bytes) is too large (maximum is 256KiB)
log stripe unit adjusted to 32KiB

This, in turn, cause a low default log buffer size. Please note that while logbsize is a tunable mount parameters (and we can indeed tune even the mkfs options), for this comparison I decided to stick with default options. So I left XFS with default settings. In a future article I'll dissect this (and others) behavior, but this is a work for another time...

Another possible reason is XFS log placement, directly at the middle of the volume: with a mostly-empty filesystem (as in this case: of our 800 GB volume, only a maximum of 32 GB where allocated) it is not the best placement. On the other side, when usage grows, it surely become a good choice.

What about BTRFS? It surprises me: it is fast when using CoW + preallocation or NoCoW + thin, and slow in the other cases. The first result puzzles me: for a CoW filesystem, preallocation don't tell very much, as new blocks will never rewritten in place.

On the other side, the fast NoCoW + thin result can be explained, as thin provision give the filesytem a chance to turn sparse writes into contiguous ones. But hey – the DB inserts performed by sysbench should already be contiguous, right? True, but don't forget the XFS layer inside the guest image: as explained, by default XFS store its journal at the middle of the volume. Thin provision means than the host can remap the log to another location (the beginning of the volume, as guest filesystem creation is one of the first thing the installer accomplishes), and so total seek time can go down (even with small 8 GB images, guest filesystem was under 20% full, so the middle-of-disk placement was not optimal). In other words, we are probably observing an interesting interaction between the host and guest filesystems, and their respective methods to allocate data/metadata blocks.

Comments   

 
#11 capsicum 2016-02-14 03:42
What are the structural details of the thin LVM arrangement? The KVM information I have gives a warning that thin provisioning is not possible with LVM pools. I am new to KVM and VMs, but I do know traditional LVM structure (Pv, Vg, , Lv or thin-Lv , fs)
 
 
#12 Albert Henriksen 2016-02-15 21:40
In my own tests, BTRFS performance is more than 180 times faster if you do the following:

- Disable COW on the folder containing VM image files (to reduce write amplification)
- Disable QCOW2 and use sparse RAW for VM image files (to reduce fragmentation of extents apparently caused by QCOW2 block mapping algorithm)

Both tests were on a Linux 4.2 kernel. The QCOW2 cluster size was 64K in the test using QCOW2. I only tested with COW disabled. The performance difference is likely even greater with NOCOW + RAW versus COW + QCOW2.

To convert VM images, the following commands are useful:
$ chattr +C new_images/
$ truncate -s 100G new_images/vm1.raw
$ qemu-nbd -c /dev/nbd0 old_images/vm1.qcow2
$ dd conv=notrunc,sp arse bs=4M if=old_images/v m1.qcow2 of=new_images/vm1.raw

Shut down virtual machines before conversion, change XML to point to new files and restart virtual machines when done.
 
 
#13 mt 2016-03-03 11:17
Quoting Albert Henriksen:
In my own tests, BTRFS performance is more than 180 times faster if you do the following:

- Disable COW on the folder containing VM image files (to reduce write amplification)
- Disable QCOW2 and use sparse RAW for VM image files (to reduce fragmentation of extents apparently caused by QCOW2 block mapping algorithm)


But that makes btrfs useless. No snapshots, no checksumming. It's fair to test with CoW - do you have any numbers for that?
 
 
#14 Sam 2016-05-23 00:54
Hello,

I'm taking it you forgot to mount BTRFS with compression enabled (which really should be the default)?

Can you please test BTRFS and mount sure you're mounting with the compress=lzo option ?
 
 
#15 Sam 2016-05-23 00:58
Also just saw your note about Kernel 3.10! - we run many hundreds of VMs and not a single production server is running a kernel this old, we run between 4.4 and 4.6 on CentOS 7.

QCOW2 is also a very suboptimal for modern VMs, in reality you'd always use raw devices or logical volumes.

It would be interesting to see you re-run these tests using a modern kernel, say at least 4.4 and either raw block devices or logical volumes along with mounting BTRFS properly with the compress=lzo option
 
 
#16 Luca 2016-05-23 23:28
Great article, but pagination makes it painful to read
 
 
#17 Gionatan Danti 2016-05-24 15:22
@Sam

No, I did not use any compression (which, by the way, was disabled by default).

I stick to distibution-pro vided kernels when possible, and 3.10.x is the current kernel for RHEL7/CentOS7.

Finally, I agree that RAW images are marginally faster than preallocated QCOW2 files, and when possibile I used them. However, for the block layer/filesyste m combo which does not support snapshots, I used QCOW2 to have at least partial feature parity with the more flexible alternatives.
 
 
#18 Yonsy Solis 2016-05-30 16:28
ok, do you try to use distribution provided kernels when possible, but when you integrate a filesystem from and external module (ZFS from ZFS on Linux) and another filesystem from the provided Kernel (BTRFS) with the old characteristics from this, your camparison get invalid.

ZFS get more updates without upgrade the kernel. This is not the case with BTRFS that need updated kernel. The kernel version is important to know in this case (and will need to be updated to a comparison used in Enterprise distributions, Uubntu 16.04 LTS for example implements 4.4 kernel now)
 
 
#19 Brian Candler 2016-12-15 09:37
For "raw images ZFS", do you mean you created a zvol block device, or a raw .img file sitting in a zfs dataset (filesystem)?
 
 
#20 Gionatan Danti 2016-12-15 09:52
Quoting Brian Candler:
For "raw images ZFS", do you mean you created a zvol block device, or a raw .img file sitting in a zfs dataset (filesystem)?


The latter: raw images on a ZFS filesystem
 

You have no rights to post comments