Linux compressors comparison on CentOS 6.5 x86-64: lzo vs lz4 vs gzip vs bzip2 vs lzma

Written by Gionatan Danti on . Posted in Linux & Unix

User Rating:  / 14

File compression is an old trick: one of the first (if not the first) program capable of compressing files was “SQ”, in the early 1980s, but the first widespread, mass-know compressor probably was ZIP (released in 1989).

In other word, compressing a file to save space is nothing new and, while current TB-sized, low costs disks provide plenty of space, sometime compression is desirable because it not only reduces the space needed to store data, but it can even increase I/O performance due to the lower amount of bits to be written or read to/from the storage subsystem. This is especially true when comparing the ever-increasing CPU speed to the more-or-less stagnant mechanical disk performance (SSDs are another matter, of course).

While compression algorithms and programs varies, basically we can distinguish to main categories: generic lossless compressors and specialized, lossy compressors.

If the last categories include compressors with quite spectacular compression factor, they can typically be used only when you want to preserve the general information as a whole, and you are not interested in a true bit-wise precise representation of the original data. In other word, you can use a lossy compressor for storing an high-resolution photo or a song, but not for storing a compressed executable on your disk (executable need to be perfectly stored, bit per bit) or text log files (we don't want to lose information on text files, right?).

KVM scalability and consolidation ratio: cache none vs cache writeback

Written by Gionatan Danti on . Posted in Virtualization

User Rating:  / 8

In the latest ten years, full-virtualization technologies gained much traction. While this sometime led to an excessive virtual machines proliferation, the key concept is very appealing: as CPU performance and memory capacity relentless grow over time, why do not use this ever-increasing power to consolidate multiple operating system instances on one single, powerful server?

If done correctly (ie: without an unnecessary grow of total OS instances), this consolidation process bring considerable lower operating costs, both from electricity and maintenance/administration standpoints.

However, in order to extract good performance from virtual machines, it is imperative to correctly size the host virtualizer: CPU, disk, memory and network subsystems should all be capable to sustain the expected average workload, and also something more for the inevitable usage peeks.

KVM VirtIO paravirtualized drivers: why they matter

Written by Gionatan Danti on . Posted in Virtualization

User Rating:  / 13

As you probably already know, there are basically two different schools in the virtualiztion champ:

  • the para-virtualization one, where a modified guest OS uses specific host-side syscall (hypercall) to do its “dirty work” with physical devices
  • the full hardware virtualization one (HVM), where the guest OS run unmodified and the host system “traps” when the guest try to access a physical device

The two approach are vastly different: the former requires extensive kernel modifications on both guest and host OSes but give you maximum performance, as both kernels are virtualization-aware and so they are optimized for the typical workload they experience. The latter approach is totally transparent to the guest OS and often do not require many kernel-level changes to the host side but, as the guest OS is not virtualization aware, it generally has lower performance.

A look at how NCQ, software queue and I/O schedulers impact disk performance

Written by Gionatan Danti on . Posted in Linux & Unix

User Rating:  / 12

While SSD are increasingly used in both enterprise and consumer machines, classical mechanical-based HDD are here to stay at least 5-10 more years: their sheer size (and accompanying low cost per GB) means that they will remain the primary storage backed inside most computers. For example, even where SSD are used, a classic HDD is used to store big and/or compressed data.

This also means that any improvement in HDD performance should be taken seriously: as they are (by far) the slower component that can be found inside modern servers and PCs, any improvement in I/O speed can have a direct positive effect on the performance of the entire setup.

Understanding this fact, enterprise-class drives and controllers have long ago acquired a capability called TCQ: an hardware-managed I/O queue that,  through carefully and smart requests reordering, can noticeably improve HDD performance under high queue depth (QD) scenarios. Even on the software side each piece was in place, as any UNIX/LINUX variant traditionally has a well written, high performing I/O stack with an additional I/O software queue that contribute to a even faster disk subsystem.

EXT3 vs EXT4 vs XFS vs BTRFS filesystem comparison on Fedora 18

Written by Gionatan Danti on . Posted in Linux & Unix

User Rating:  / 44

As always, we want to check the current state and performance of Linux filesystems. This time is the turn of Fedora 18 x86_64, with kernel version 3.9.4-200.

The benchmarked filesystems are:

  • ext3, the classic Linux filesystem
  • ext4, the latest of the ext-based filesystem and the default choice for many Linux distribution
  • xfs, an high performance filesystem designed with scalability in mind
  • btrfs, the new, actively developed, feature-rich filesystem

Note that this article has a focus on performance. For an in-depth, feature-based comparison, you can see the relative Wikipedia page.