Dissecting GK104
The new GK104 is a significantly departure from previous Nvidia's products: for example, it provide FP64 capability by the means of a small number of dedicated 64-bit ALUs, while previous Fermi-based GPU ganged together two 32-bit ALUs for this purpose.
This and other things mean that current Kepler-based chips are great for gaming, but not too well suited for general purpose applications. So, in this game of hypothesis, we must remember that, while future GK100 / GK110 chips will surely follow the same, new Nvidia's design paradigm (eg: dropping the “fast clock”), in other respect it will be noticeably different from GK104.
Anyway, let's start our analysis from GK104 die:
GK104 die shot
This 294 mm2, 28nm product includes the following basic blocks (with estimated size):
- a 256-bit memory bus (60 mm2);
- a gen-3.0 PCI-E bus (44 mm2);
- 4 GPC (147 mm2 total, 37 mm2 each);
- a GigaThread engine (6 mm2);
- 512 KB L2 cache + 32 ROPs (37 mm2).
Each GPC is internally divided into 2 SMX (16 mm2 each), each of them is composed of 192 SP (the ALUs cluster weights at about 5 mm2). On the ROPs side, the 32 units are divided into four 8-wide banks, each with 128 KB of L2 cache (L2/ROP bank size is about 10mm). Please note that these sizes are estimations: as Nvidia did not publish a die shot with detailed descriptions, I can go wrong in some measurement. However, general sizes should be more or less accurate.
With these figures in mind, we can start to elaborate on possible GK100 / GK110 products.