Programs, implementations, libraries and algorithms
Before moving to the raw number, lets first clarify the terminology.
A lossless compression algorithms is a mathematical algorithms that define how to reduce (compress) a specific dataset in a smaller one, without losing information. In other word, it involves encoding information using fewer bit that the original version, with no information loss. To be useful, a compression algorithms must be reversible – it should enable us to re-expand the compressed dataset, obtaining an exact copy of the original source. It's easy to see how the fundamental capabilities (compression and ratio and speed) are rooted in the algorithm itself, and different algorithms can strongly differ in results and applicable scopes.
The next step is the algorithm implementation – in short, the real code used to express the mathematical behavior of the compression alg. This is another critical step: for example, vectorized or multithreaded code is way faster than plain, single-threaded code.
When a code implementation is considered good enough, often it is packetized in a standalone manner, creating a compression library. The advantage to spin-off the alg implementation in a standalone library is that you can write many different compressing programs without reimplement the basic alg multiple times.
Finally, we have the compression program itself. It is the part that, providing a CLI or a GUI, “glues” together the user and the compression library.
Sometime the alg, library and program have the same name (eg: zip). Other times, we don't have a standalone library, but it is built right inside the compression program. While this is slightly confusing, what written above still apply.
To summarize, our benchmarks will cover the alg, libraries and programs illustrated below:
Program |
Library |
ALG |
Comp. Ratio |
Comp. Speed |
Decomp. Speed |
Lz4, version r110 |
buit-in |
Lz4 (a LZ77 variant) |
Low |
Very High |
Very High |
Lzop, version 1.02rc1 |
Lzo, version 2.03 |
Lzo (a LZ77 variant) |
Low |
Very High |
Very High |
Gzip, version 1.3.12 |
built-in |
LZ77 |
Medium |
Medium |
High |
Pigz, version 2.2.5 |
Zlib, version 1.2.3 |
LZ77 |
Medium |
High (multithread) |
High |
Bzip2, version 1.0.5 |
Libbz2, 1.0.5 |
Burrows–Wheeler |
High |
Low |
Low |
Pbzip2, version 1.1.6 |
Libbz2, 1.0.5 |
Burrows–Wheeler |
High |
Medium (multithread) |
Medium (multithread) |
7-zip |
built-in |
LZMA |
Very High |
Very Low (multithread) |
Medium |
Xz,version 4.999.9 beta |
Liblzma, ver 4.999.9beta |
LZMA |
Very High |
Very Low |
Medium |
Pxz,version 4.999.9 beta |
Liblzma, ver 4.999.9beta |
LZMA |
Very High |
Medium (multithread) |
Medium |