X86-64 and SSE2 performance on John the ripper

Written by Gionatan Danti on . Posted in Hardware analysis

User Rating:  / 4
PoorBest 

X86-64 – a recap

Why speaking about X86-64 first? The reason is simple: they are a “simple” extensions to the familiar x86 execution environment. But, wait – what I mean with “execution environment”? The short answer is that this environment is the collection of the available machine registers that your program can use (read or write) or that have a direct influence on code execution (example: the instruction pointer, which point to the next instruction). This simple diagram show a simplified the differences between the X86 and the X86-64 environments (it focus on general purpose registers only):

X86 vx X86-64 basic execution environment

As you can see, while the old X86 paradigm use only 8 x 32 bit wide registers, the X86-64 one use 16 x 64 bit wide registers, effectively quadrupling the on-chip, user addressable register file space.

In practical term, what do this means? The X86-64 registers configurations bring three main advantages:

  • the doubled number of registers (16 vs 8) tend to alleviate the pressure on the available registers. For example, imagine to have a piece of code that is actively operating on 10 variables. In the X86 case, you only have 8 registers, so your program can not load all variables at once and place them in fast hardware registers, but had to load/unload some of them from system RAM. This problem is called “registers pressure”, and alleviating it with 16 registers will bring some significant performance advantages, often in the range of 5-15% on general purpose code. To benefit from this speedup, you simply need to recompile your software to match X86-64 capability – in other word, no code modifications are required;

  • the added bit capability of each register is very useful when you had to operate on very large numbers. For example, think to add two very big number that do not fit in a 32 bit registers. In the X86 case, you had to break this number in two part, add these parts separately and check for the carry of the first additions. With a 64 bit system, you can simply issue a 64 bit add. This sort of advantage is not so obvious to exploit: after all, you need an application the require 64 bit integer capability, while the great part of today software is more than happy with 32 bit registers. However, on applications that really require 64 bit integers, you can expect some great performance advantages, perhaps in the range of 200% or 300%;

  • the added bit capability of each registers means that you can use more than 4 GB of system RAM without the performance handicap of complex methods as PAE. While in the past years this seems like a theoretical advantage (as 4 GB of RAM where very, very expensive), today, with cheap RAM prices, it is not so strange to see systems with 4+ GB of RAM.

So, to recap: in the general purpose case (the most common for the large part of the users), X86-64 can bring a 5-15% speed boost over X86. In very specific cases (applications that require 64 bit capability or over than 4 GB of RAM) the speedup can be very high, but it is hard to predict without speaking about a specific application. An important thing to note: use a 32 bit variable on a 64 bit processor does not automatically translate in higher performance, as the functional unit (eg: the adder unit) always work on the full register size and simply discards the higher order bit when they are not required.

You have no rights to post comments