AMD Bulldozer and Intel Hyperthreading: two different roads for the same destination

Written by Gionatan Danti on . Posted in Hardware analysis

User Rating:  / 17
PoorBest 

Intel Hyperthreading and AMD dual-core module approaches comparison

From the previous pages you should agree that, while different in some important aspects, Intel Hyperthreading and AMD dual-core technologies are vastly similar in many key areas.

The following tables show the situations: first from an hardware standpoint...

Core resource

Intel approach

AMD approach

L2

shared

shared

L1

shared

dedicated

Front-end (fetch, decoder, etc.)

time-shared

time-shared

Integer execution units

time-shared

dedicated

Floating point units

time-shared

time-shared

Back-end (register write, retire unit, etc.)

mixed (some time-shared, some partitioned)

dedicated

...and then from a software standpoint:

Execution mode

Intel approach – X86 width

AMD approach – X86 width

Integer Single-thread

4-way 3-way (mostly)

2-way (mostly)

Integer Multi-thread

4-way 3-way (mostly)

2x 2-way (mostly)

Aggregate integer performance

4-way 3-way (mostly)

4-way (mostly)

From the last table, it seems that Intel architecture should be far superior for single-thread core, but do not forget that ILP inside a single instruction stream is very often in the range of 2 X86 instructions. On the other side, while more often than not an AMD core can execute only 2 X86 instructions at once, in the right situation it can execute up to 4 X86 instructions. I abused the "mostly" word because with the right instuction stream both Intel and AMD design can issue up to 4 integer X86 operations per clock (eg: when executing separate memory and compute operations). However, many common X86 instructions directly refers to a memory location and so they are decomposed in two micro operations, resulting in the concurrent execution of two X86 integer instructions at most. Anyway, both Intel and AMD design rely on a single 4-way decoder, so the maximum sustained machine width is 4-way X86 mixed integer and floating point instructions for both designs (excluding X86 fusions).

Note that I left out the FPU-width analysis from the above table: AMD's FPU is very different from Intel one, as the former has 2x 128-bit FMAC pipes while the latter has 2x 256-bit FADD/FMUL pipes.

Comments   

 
#1 roger 2012-10-02 20:36
no benchmark mentions on AMD's approach?
 
 
#2 kmq 2014-03-31 01:15
The Intel approach is better because not all programs are designed for multi core execution!
 

You have no rights to post comments