What we are looking for, and when using a cache is not appropriate

In this article we are going to evaluate if, and how much, different caching settings influence KVM performance at a point where consolidation ratio can be impacted. To answer this question, we will collect performance data from both guests and host machine.

In a previous article, I explained why using a write-back cache is quite safe now. Basically, Qemu/KVM honors any flushing operation issued by the guest, so if a guest writes sensible data and issues a flush, it can be certain that data hit the physical disk platters.

However, let me very clearly state that in some circumstances you should not use a write back cache.

The three most common reasons to not use a writeback cache are:

  • one or more guests don't support write barrier (which are used by the host to decide when flushing its cache)
  • you need to live-migrate VMs between multiple hosts (currently libvirt warns you to not use livemigration together with caching, or data corruption may happens)
  • your workload is so cache-unfriendly that the nocache option is the better performing configuration

Point n.1 can be simply verified by looking at your guests: in a modern operating system write barriers are surely supported, if not already enabled by default. For example, Win2000 and later automatically issue a cache flush operation each second, while EXT3-based linux distributions often need to explicitly enable barriers using the “barrier=1” mount option.

Regarding live migration, it is a matter of thinking about your requirements; in most environment, it is not used.

Point n.3 can be verified only after extensive testing; however, often caching is beneficial to a wide range of workloads, so it is safe to assume that it will increase performances, rather that decreasing them.

I want to stress that I am not advocating to always, forever use caching. As stated above, there are reasonable use cases when you should not use the OS cache. Anyway, if caching brings consistent and noticeable performance improvements in the general cases, it may be worth using it.