KVM scalability and consolidation ratio: cache none vs cache writeback

Written by Gionatan Danti on . Posted in Virtualization

User Rating:  / 1

In the latest ten years, full-virtualization technologies gained much traction. While this sometime led to an excessive virtual machines proliferation, the key concept is very appealing: as CPU performance and memory capacity relentless grow over time, why do not use this ever-increasing power to consolidate multiple operating system instances on one single, powerful server?

If done correctly (ie: without an unnecessary grow of total OS instances), this consolidation process bring considerable lower operating costs, both from electricity and maintenance/administration standpoints.

However, in order to extract good performance from virtual machines, it is imperative to correctly size the host virtualizer: CPU, disk, memory and network subsystems should all be capable to sustain the expected average workload, and also something more for the inevitable usage peeks.

KVM VirtIO paravirtualized drivers: why they matter

Written by Gionatan Danti on . Posted in Virtualization

User Rating:  / 6

As you probably already know, there are basically two different schools in the virtualiztion champ:

  • the para-virtualization one, where a modified guest OS uses specific host-side syscall (hypercall) to do its “dirty work” with physical devices
  • the full hardware virtualization one (HVM), where the guest OS run unmodified and the host system “traps” when the guest try to access a physical device

The two approach are vastly different: the former requires extensive kernel modifications on both guest and host OSes but give you maximum performance, as both kernels are virtualization-aware and so they are optimized for the typical workload they experience. The latter approach is totally transparent to the guest OS and often do not require many kernel-level changes to the host side but, as the guest OS is not virtualization aware, it generally has lower performance.

A look at how NCQ, software queue and I/O schedulers impact disk performance

Written by Gionatan Danti on . Posted in Linux & Unix

User Rating:  / 5

While SSD are increasingly used in both enterprise and consumer machines, classical mechanical-based HDD are here to stay at least 5-10 more years: their sheer size (and accompanying low cost per GB) means that they will remain the primary storage backed inside most computers. For example, even where SSD are used, a classic HDD is used to store big and/or compressed data.

This also means that any improvement in HDD performance should be taken seriously: as they are (by far) the slower component that can be found inside modern servers and PCs, any improvement in I/O speed can have a direct positive effect on the performance of the entire setup.

Understanding this fact, enterprise-class drives and controllers have long ago acquired a capability called TCQ: an hardware-managed I/O queue that,  through carefully and smart requests reordering, can noticeably improve HDD performance under high queue depth (QD) scenarios. Even on the software side each piece was in place, as any UNIX/LINUX variant traditionally has a well written, high performing I/O stack with an additional I/O software queue that contribute to a even faster disk subsystem.

EXT3 vs EXT4 vs XFS vs BTRFS filesystem comparison on Fedora 18

Written by Gionatan Danti on . Posted in Linux & Unix

User Rating:  / 7

As always, we want to check the current state and performance of Linux filesystems. This time is the turn of Fedora 18 x86_64, with kernel version 3.9.4-200.

The benchmarked filesystems are:

  • ext3, the classic Linux filesystem
  • ext4, the latest of the ext-based filesystem and the default choice for many Linux distribution
  • xfs, an high performance filesystem designed with scalability in mind
  • btrfs, the new, actively developed, feature-rich filesystem

Note that this article has a focus on performance. For an in-depth, feature-based comparison, you can see the relative Wikipedia page.

Sandy bridge, Ivy bridge, Haswell and the mess of Intel processors feature list

Written by Gionatan Danti on . Posted in Hardware analysis

User Rating:  / 7

Current microprocessors are very complex beasts. To develop an high-performance CPU architecture, you not only need many very smart engineers, but much time (3-5 years) and money (in the order of billions $$$). Moreover, bleeding-edge fabrication plants are incredibly expensive, and they must be continuously upgraded to newer process technologies.

So, it is perfectly understandable that both AMD and Intel (the two main x86 players) try to differentiate they offer, selling processors that spans from 50$ to ~1000$, a range of about 20X. While they want to sell you the most high-price (and high-margin) processors, they also realize that, as the market is very cost-sensitive, the bulk of R&D and production costs must be spread over a very large, low-profit product base. On top of that, all their processors must perform at least decently, or user will loudly complain (hello, Atom users!).