ZFS, BTRFS, XFS, EXT4 and LVM with KVM – a storage performance comparison
Testbed and methods
Benchmarks were performed on a system equipped with:
- PhenomII 940 CPU (4 cores @ 3.0 GHz, 1.8 GHz Northbridge and 6 MB L3 cache)
- 8 GB DDR2-800 DRAM (in unganged mode)
- Asus M4A78 Pro motherboard (AMD 780G + SB700 chipset)
- 4x 500 GB, 7200 RPM hard disks (1x WD RE, 3x Seagate Barracuda) in AHCI mode, configured in software RAID10 "near" layout (default 512K chunks)
- S.O. CentOS 7.0 x64, kernel version 3.10.0-123.13.2.el7.x86_64
All disk had 3 partitions, used for construct three MD arrays: a first RAID1 “boot” array, a RAID10 “system” one (/ + swap, via LVM) and a final “data” RAID 10 (~ 800 GB usable) array for VMs hosting and testing. BTRFS and ZFS natively support mirroring + striping, so in these cases I go ahead without the “data” MD array and used the integrated facilities to create a mirror+striped dataset.
ZFS was used with the kernel-level driver provided by ZFS on Linux (ZoL) project, version 0.6.3-1.2. A slight change in default setting was needed for 100% reliable operation: I had to enable the “xattr=sa” option. You can read more here: https://github.com/zfsonlinux/zfs/issues/1548
A benchmark run consisted in:
1) concurrently install the 4 VMs (using PXE on the host side + kickstart for unattended installation). The virtual machines had default configuration values, using LVM+XFS for guest systems. The four VM images were 8 GB each in size;
2) prepare a PostgreSQL database on one VM, using sysbench on the host side (using default parameters);
3) concurrently runs a different benchmark on each VM for ten minutes (600 seconds), three time in a row; the results were averaged.
Point n.3 need some more explanations:
- the first machine runs PostgreSQL and was benchmarked by a sysbench instance running on the host. I used the “complex” test with default options. The “complex” test is a mixed read/write, transactional test;
- the second machine runs filebench with fileserver personality, to simulate fileserver-type activity. It is a mixed read/write, mixed sequential/random test;
- the third machine runs filebench with varmail personality, issuing many small synchronized writes as many mailserver do;
- the fourth machine runs filebench with webserver personality, issuing many small reads as typically webservers do.
You can consider any synthetic or semi-synthetic test as flawed, and in some manner you are right. At the same time, the above tests are both easy reproducible (I used default options, both for configurations and benchmarks) and I/O heavy. Moreover, they represent and use a more-or-less credible I/O pattern.
You may wonder why I installed the 4 virtual machines concurrently, and why I run these very stressful tests all at the same time. The fact is that thin provision has a hidden threat: when issuing many I/O writes concurrently, data can be scattered all over the disk, leading to considerable fragmentation. Instead, when using a single VM at a time this behavior is usually not noticeably, as the system has no disk bandwidth contention in this case. The only exception was the database creation/population: it is a very specific operation that need to be issued to a single VM and in real world you will rarely load an entire DB during periods of high I/O usage.
One last thing: I did not specifically optimize guests configuration for maximum throughput. For example, the PostgreSQL service runs with default value for buffer size. For this benchmark round I was interested in creating a high I/O load, nor to fine-tune guests configuration.
Comments
- Disable COW on the folder containing VM image files (to reduce write amplification)
- Disable QCOW2 and use sparse RAW for VM image files (to reduce fragmentation of extents apparently caused by QCOW2 block mapping algorithm)
Both tests were on a Linux 4.2 kernel. The QCOW2 cluster size was 64K in the test using QCOW2. I only tested with COW disabled. The performance difference is likely even greater with NOCOW + RAW versus COW + QCOW2.
To convert VM images, the following commands are useful:
$ chattr +C new_images/
$ truncate -s 100G new_images/vm1.raw
$ qemu-nbd -c /dev/nbd0 old_images/vm1.qcow2
$ dd conv=notrunc,sp arse bs=4M if=old_images/v m1.qcow2 of=new_images/vm1.raw
Shut down virtual machines before conversion, change XML to point to new files and restart virtual machines when done.
But that makes btrfs useless. No snapshots, no checksumming. It's fair to test with CoW - do you have any numbers for that?
I'm taking it you forgot to mount BTRFS with compression enabled (which really should be the default)?
Can you please test BTRFS and mount sure you're mounting with the compress=lzo option ?
QCOW2 is also a very suboptimal for modern VMs, in reality you'd always use raw devices or logical volumes.
It would be interesting to see you re-run these tests using a modern kernel, say at least 4.4 and either raw block devices or logical volumes along with mounting BTRFS properly with the compress=lzo option
No, I did not use any compression (which, by the way, was disabled by default).
I stick to distibution-pro vided kernels when possible, and 3.10.x is the current kernel for RHEL7/CentOS7.
Finally, I agree that RAW images are marginally faster than preallocated QCOW2 files, and when possibile I used them. However, for the block layer/filesyste m combo which does not support snapshots, I used QCOW2 to have at least partial feature parity with the more flexible alternatives.
ZFS get more updates without upgrade the kernel. This is not the case with BTRFS that need updated kernel. The kernel version is important to know in this case (and will need to be updated to a comparison used in Enterprise distributions, Uubntu 16.04 LTS for example implements 4.4 kernel now)
The latter: raw images on a ZFS filesystem
RSS feed for comments to this post