Hardware analysis

X86-64 and SSE2 performance on John the ripper

Written by Gionatan Danti on . Posted in Hardware analysis

User Rating:  / 4

I'm sure that you hear something similar to “software is always behind processors features”, in the sense that it usually require many years because a new hardware feature is actively used by common, well spread software.

We had many example of this trend: for example, while the i386 processor bring 32 bit computing and other advanced capability to the x86 world in year 1985, the first real 32 bit operating system from Microsoft was Windows NT 3.1, released in 1993, a full 8 years gap. Obviously, some time must pass from hardware support to software support, simply because you need time to write your software – and you can do that only when you have a working hardware. So, a certain gap is not only understandable, but often inevitable.

The Phenom / PhenomII memory controller: ganged vs unganged mode benchmarked

Written by Gionatan Danti on . Posted in Hardware analysis

User Rating:  / 101

HINT: if you are interested in the quick & dirty benchmarks only, go to page #4

It is not a secret that processor performance grow at a very fast rate, faster that any other PC / server component. This disparity challenged CPU designer, as they had to create faster processor that are impacted from the slower system components as little as possible.

One of these system components, and one that can have a great influence on processor speed, is the Random Access Memory, or RAM in short. In the past years, there was a lot of effort to raise the RAM speed: in less that a decade, we went from 133 Mhz SDR DIMM RAM to 1333 Mhz DDR3 DIMM RAM, effectively increasing bandwidth by a factor of 10X. If you consider that modern PC and server platforms uses two or more memory channels, you can quickly realize the improvements in memory speed over the last ten yers.

However, CPU performance go up at an ever faster rate. Also, while memory bandwidth has improved tremendously, memory latency has improved by a factor of 2X or 3X at most. So, while todays RAMs are quite fast at moving relatively large data chunks (they have a burst speed in the range of 6.4 – 12.8 GB/s for DIMM module), their effective access latency remain at around 40/50 ns. So, RAM speed can seriously influence CPU speed.

For example, consider the FSTORE unit on Phenom / PhenomII CPU: it can output a canonical 64 bit-wide x87 register each clock, and it is clocked at around 3.0 Ghz. A simple math reveal that in the optimal conditions, one single core of a 3.0 Ghz Phenom / Phenom II processor can store floating point data at around 24 GB/s. Considering that the Phenom II x4 940 has four core, a single processor can write floating point data at a peak of 96 GB/s! And this is only part of the story, as the integer input/output rates are almost double. Compare these values to the peak bandwidth delivered by a single memory module and you can realize that today processors can be really limited by memory bandwidth.