Intel Hyperthreading and AMD dual-core module approaches comparison
From the previous pages you should agree that, while different in some important aspects, Intel Hyperthreading and AMD dual-core technologies are vastly similar in many key areas.
The following tables show the situations: first from an hardware standpoint...
Core resource |
Intel approach |
AMD approach |
L2 |
shared |
shared |
L1 |
shared |
dedicated |
Front-end (fetch, decoder, etc.) |
time-shared |
time-shared |
Integer execution units |
time-shared |
dedicated |
Floating point units |
time-shared |
time-shared |
Back-end (register write, retire unit, etc.) |
mixed (some time-shared, some partitioned) |
dedicated |
...and then from a software standpoint:
Execution mode |
Intel approach – X86 width |
AMD approach – X86 width |
Integer Single-thread |
4-way 3-way (mostly) |
2-way (mostly) |
Integer Multi-thread |
4-way 3-way (mostly) |
2x 2-way (mostly) |
Aggregate integer performance |
4-way 3-way (mostly) |
4-way (mostly) |
From the last table, it seems that Intel architecture should be far superior for single-thread core, but do not forget that ILP inside a single instruction stream is very often in the range of 2 X86 instructions. On the other side, while more often than not an AMD core can execute only 2 X86 instructions at once, in the right situation it can execute up to 4 X86 instructions. I abused the "mostly" word because with the right instuction stream both Intel and AMD design can issue up to 4 integer X86 operations per clock (eg: when executing separate memory and compute operations). However, many common X86 instructions directly refers to a memory location and so they are decomposed in two micro operations, resulting in the concurrent execution of two X86 integer instructions at most. Anyway, both Intel and AMD design rely on a single 4-way decoder, so the maximum sustained machine width is 4-way X86 mixed integer and floating point instructions for both designs (excluding X86 fusions).
Note that I left out the FPU-width analysis from the above table: AMD's FPU is very different from Intel one, as the former has 2x 128-bit FMAC pipes while the latter has 2x 256-bit FADD/FMUL pipes.