A real world test: John the ripper
So, we are in the situations that the only, general purpose advantages of X86-64 and SSE2 enabled processors are in the range of 5-15%, due to the lowered registers pressure. To exploit the added advantages of these extensions, we need a software that actively use 64 bit data types and arrange them in a convenient way to let SSE2 do the job. Unfortunately, this is not the classical office-style application; on top of that, while compilers improve over time, ofter they are not able to fully exploit the SSE2 capabilities.
So, we must forget about testing a software that can effectively use these advanced capabilities? Fortunately, no! An historic Unix software, John the ripper, can help us. As this piece of software is used to crack user's passwords, and this is a very hard job, it has been optimized to put in right use the X86-64 and SSE2 extensions. By using John the ripper, we can quantify the performance advantage that these extensions bring to the table and, equally interesting, we can use it to test different processors generations.
To this purpose, I used the following systems:
-
an X86 + SSE2 capable systems, in the form of a Pentium 4 Prescott @ 3.0 GHz (1 MB L2 cache) with 1 GB of DDR-400 (2 x 512 MB)
-
an X86-64 + SSE2 capable system of the previous generation, in the form of a Core2 T7200 @ 2.0 GHz (4 MB L2 cache) with 4 GB of DDR2-667 (2 x 2 GB)
-
a moder X86-64 + SSE2 capable system, with a Core i7 860 (8 MB L3 cache) @ 2.8 GHz and 4 GB of DDR3-1600 (2 x 2 GB)
-
for all systems, the operating system was Ubuntu 10.04 LTS. On processors that have 64 bit capabilities, I used both the 32 bit and the 64 bit OS version.
Please note that the Pentium 4 system is very old and has its own share of issues: for example, the mainboard use and old SiS chipset which support only single channel memory mode; on top of that, the mainboard has some stability problem that forced me to lower the bus frequency. In order to do a more or less accurate measurement, I fixed the FSB to 100 MHz, effectively forcing the P4 processor to run at 1.5 GHz; than, to obtain the expected values at 3.0 GHz, I simply doubled the benchmark score. I know that this is a very rough method to predict performance, but I think that it is always better that nothing.
Another important thing to note is that I had to download and manually compile John's sources, because I discover that the Ubuntu binary version was not using any SSE2 code. While this is understandable in the 32 bit OS version (as not all 32 bit processors supports SSE2 extensions) this is totally wrong in the 64 bit Ubuntu version, as all 64 bit capable processors supports SSE2.
So, its time for some numbers...