Programs, implementations, libraries and algorithms

Before moving to the raw number, lets first clarify the terminology.

A lossless compression algorithms is a mathematical algorithms that define how to reduce (compress) a specific dataset in a smaller one, without losing information. In other word, it involves encoding information using fewer bit that the original version, with no information loss. To be useful, a compression algorithms must be reversible – it should enable us to re-expand the compressed dataset, obtaining an exact copy of the original source. It's easy to see how the fundamental capabilities (compression and ratio and speed) are rooted in the algorithm itself, and different algorithms can strongly differ in results and applicable scopes.

The next step is the algorithm implementation – in short, the real code used to express the mathematical behavior of the compression alg. This is another critical step: for example, vectorized or multithreaded code is way faster than plain, single-threaded code.

When a code implementation is considered good enough, often it is packetized in a standalone manner, creating a compression library. The advantage to spin-off the alg implementation in a standalone library is that you can write many different compressing programs without reimplement the basic alg multiple times.

Finally, we have the compression program itself. It is the part that, providing a CLI or a GUI, “glues” together the user and the compression library.

Sometime the alg, library and program have the same name (eg: zip). Other times, we don't have a standalone library, but it is built right inside the compression program. While this is slightly confusing, what written above still apply.

To summarize, our benchmarks will cover the alg, libraries and programs illustrated below:

Program

Library

ALG

Comp. Ratio

Comp. Speed

Decomp. Speed

Lz4, version r110

buit-in

Lz4 (a LZ77 variant)

Low

Very High

Very High

Lzop, version 1.02rc1

Lzo, version 2.03

Lzo (a LZ77 variant)

Low

Very High

Very High

Gzip, version 1.3.12

built-in

LZ77

Medium

Medium

High

Pigz, version 2.2.5

Zlib, version 1.2.3

LZ77

Medium

High (multithread)

High

Bzip2, version 1.0.5

Libbz2, 1.0.5

Burrows–Wheeler

High

Low

Low

Pbzip2, version 1.1.6

Libbz2, 1.0.5

Burrows–Wheeler

High

Medium (multithread)

Medium (multithread)

7-zip

built-in

LZMA

Very High

Very Low (multithread)

Medium

Xz,version 4.999.9 beta

Liblzma, ver 4.999.9beta

LZMA

Very High

Very Low

Medium

Pxz,version 4.999.9 beta

Liblzma, ver 4.999.9beta

LZMA

Very High

Medium (multithread)

Medium