ZFS, BTRFS, XFS, EXT4 and LVM with KVM – a storage performance comparison

Written by Gionatan Danti on . Posted in Virtualization

User Rating:  / 77
PoorBest 

Testbed and methods

Benchmarks were performed on a system equipped with:
- PhenomII 940 CPU (4 cores @ 3.0 GHz, 1.8 GHz Northbridge and 6 MB L3 cache)
- 8 GB DDR2-800 DRAM (in unganged mode)
- Asus M4A78 Pro motherboard (AMD 780G + SB700 chipset)
- 4x 500 GB, 7200 RPM hard disks (1x WD RE, 3x Seagate Barracuda) in AHCI mode, configured in software RAID10 "near" layout (default 512K chunks)
- S.O. CentOS 7.0 x64, kernel version 3.10.0-123.13.2.el7.x86_64

All disk had 3 partitions, used for construct three MD arrays: a first RAID1 “boot” array, a RAID10 “system” one (/ + swap, via LVM) and a final “data” RAID 10 (~ 800 GB usable) array for VMs hosting and testing. BTRFS and ZFS natively support mirroring + striping, so in these cases I go ahead without the “data” MD array and used the integrated facilities to create a mirror+striped dataset.

ZFS was used with the kernel-level driver provided by ZFS on Linux (ZoL) project, version 0.6.3-1.2. A slight change in default setting was needed for 100% reliable operation: I had to enable the “xattr=sa” option. You can read more here: https://github.com/zfsonlinux/zfs/issues/1548

A benchmark run consisted in:
1) concurrently install the 4 VMs (using PXE on the host side + kickstart for unattended installation). The virtual machines had default configuration values, using LVM+XFS for guest systems. The four VM images were 8 GB each in size;
2) prepare a PostgreSQL database on one VM, using sysbench on the host side (using default parameters);
3) concurrently runs a different benchmark on each VM for ten minutes (600 seconds), three time in a row; the results were averaged.

Point n.3 need some more explanations:
- the first machine runs PostgreSQL and was benchmarked by a sysbench instance running on the host. I used the “complex” test with default options. The “complex” test is a mixed read/write, transactional test;
- the second machine runs filebench with fileserver personality, to simulate fileserver-type activity. It is a mixed read/write, mixed sequential/random test;
- the third machine runs filebench with varmail personality, issuing many small synchronized writes as many mailserver do;
- the fourth machine runs filebench with webserver personality, issuing many small reads as typically webservers do.

You can consider any synthetic or semi-synthetic test as flawed, and in some manner you are right. At the same time, the above tests are both easy reproducible (I used default options, both for configurations and benchmarks) and I/O heavy. Moreover, they represent and use a more-or-less credible I/O pattern.

You may wonder why I installed the 4 virtual machines concurrently, and why I run these very stressful tests all at the same time. The fact is that thin provision has a hidden threat: when issuing many I/O writes concurrently, data can be scattered all over the disk, leading to considerable fragmentation. Instead, when using a single VM at a time this behavior is usually not noticeably, as the system has no disk bandwidth contention in this case. The only exception was the database creation/population: it is a very specific operation that need to be issued to a single VM and in real world you will rarely load an entire DB during periods of high I/O usage.

One last thing: I did not specifically optimize guests configuration for maximum throughput. For example, the PostgreSQL service runs with default value for buffer size. For this benchmark round I was interested in creating a high I/O load, nor to fine-tune guests configuration.

Comments   

 
#11 capsicum 2016-02-14 03:42
What are the structural details of the thin LVM arrangement? The KVM information I have gives a warning that thin provisioning is not possible with LVM pools. I am new to KVM and VMs, but I do know traditional LVM structure (Pv, Vg, , Lv or thin-Lv , fs)
 
 
#12 Albert Henriksen 2016-02-15 21:40
In my own tests, BTRFS performance is more than 180 times faster if you do the following:

- Disable COW on the folder containing VM image files (to reduce write amplification)
- Disable QCOW2 and use sparse RAW for VM image files (to reduce fragmentation of extents apparently caused by QCOW2 block mapping algorithm)

Both tests were on a Linux 4.2 kernel. The QCOW2 cluster size was 64K in the test using QCOW2. I only tested with COW disabled. The performance difference is likely even greater with NOCOW + RAW versus COW + QCOW2.

To convert VM images, the following commands are useful:
$ chattr +C new_images/
$ truncate -s 100G new_images/vm1.raw
$ qemu-nbd -c /dev/nbd0 old_images/vm1.qcow2
$ dd conv=notrunc,sp arse bs=4M if=old_images/v m1.qcow2 of=new_images/vm1.raw

Shut down virtual machines before conversion, change XML to point to new files and restart virtual machines when done.
 
 
#13 mt 2016-03-03 11:17
Quoting Albert Henriksen:
In my own tests, BTRFS performance is more than 180 times faster if you do the following:

- Disable COW on the folder containing VM image files (to reduce write amplification)
- Disable QCOW2 and use sparse RAW for VM image files (to reduce fragmentation of extents apparently caused by QCOW2 block mapping algorithm)


But that makes btrfs useless. No snapshots, no checksumming. It's fair to test with CoW - do you have any numbers for that?
 
 
#14 Sam 2016-05-23 00:54
Hello,

I'm taking it you forgot to mount BTRFS with compression enabled (which really should be the default)?

Can you please test BTRFS and mount sure you're mounting with the compress=lzo option ?
 
 
#15 Sam 2016-05-23 00:58
Also just saw your note about Kernel 3.10! - we run many hundreds of VMs and not a single production server is running a kernel this old, we run between 4.4 and 4.6 on CentOS 7.

QCOW2 is also a very suboptimal for modern VMs, in reality you'd always use raw devices or logical volumes.

It would be interesting to see you re-run these tests using a modern kernel, say at least 4.4 and either raw block devices or logical volumes along with mounting BTRFS properly with the compress=lzo option
 
 
#16 Luca 2016-05-23 23:28
Great article, but pagination makes it painful to read
 
 
#17 Gionatan Danti 2016-05-24 15:22
@Sam

No, I did not use any compression (which, by the way, was disabled by default).

I stick to distibution-pro vided kernels when possible, and 3.10.x is the current kernel for RHEL7/CentOS7.

Finally, I agree that RAW images are marginally faster than preallocated QCOW2 files, and when possibile I used them. However, for the block layer/filesyste m combo which does not support snapshots, I used QCOW2 to have at least partial feature parity with the more flexible alternatives.
 
 
#18 Yonsy Solis 2016-05-30 16:28
ok, do you try to use distribution provided kernels when possible, but when you integrate a filesystem from and external module (ZFS from ZFS on Linux) and another filesystem from the provided Kernel (BTRFS) with the old characteristics from this, your camparison get invalid.

ZFS get more updates without upgrade the kernel. This is not the case with BTRFS that need updated kernel. The kernel version is important to know in this case (and will need to be updated to a comparison used in Enterprise distributions, Uubntu 16.04 LTS for example implements 4.4 kernel now)
 
 
#19 Brian Candler 2016-12-15 09:37
For "raw images ZFS", do you mean you created a zvol block device, or a raw .img file sitting in a zfs dataset (filesystem)?
 
 
#20 Gionatan Danti 2016-12-15 09:52
Quoting Brian Candler:
For "raw images ZFS", do you mean you created a zvol block device, or a raw .img file sitting in a zfs dataset (filesystem)?


The latter: raw images on a ZFS filesystem
 

You have no rights to post comments