HINT: if you are interested in the quick & dirty benchmarks only, go to page #4
It is not a secret that processor performance grow at a very fast rate, faster that any other PC / server component. This disparity challenged CPU designer, as they had to create faster processor that are impacted from the slower system components as little as possible.
One of these system components, and one that can have a great influence on processor speed, is the Random Access Memory, or RAM in short. In the past years, there was a lot of effort to raise the RAM speed: in less that a decade, we went from 133 Mhz SDR DIMM RAM to 1333 Mhz DDR3 DIMM RAM, effectively increasing bandwidth by a factor of 10X. If you consider that modern PC and server platforms uses two or more memory channels, you can quickly realize the improvements in memory speed over the last ten yers.
However, CPU performance go up at an ever faster rate. Also, while memory bandwidth has improved tremendously, memory latency has improved by a factor of 2X or 3X at most. So, while todays RAMs are quite fast at moving relatively large data chunks (they have a burst speed in the range of 6.4 – 12.8 GB/s for DIMM module), their effective access latency remain at around 40/50 ns. So, RAM speed can seriously influence CPU speed.
For example, consider the FSTORE unit on Phenom / PhenomII CPU: it can output a canonical 64 bit-wide x87 register each clock, and it is clocked at around 3.0 Ghz. A simple math reveal that in the optimal conditions, one single core of a 3.0 Ghz Phenom / Phenom II processor can store floating point data at around 24 GB/s. Considering that the Phenom II x4 940 has four core, a single processor can write floating point data at a peak of 96 GB/s! And this is only part of the story, as the integer input/output rates are almost double. Compare these values to the peak bandwidth delivered by a single memory module and you can realize that today processors can be really limited by memory bandwidth.