After this analysis, we can conclude that R600 was not a bad chip design, however it actual implementation was far from ideal. The “broken” RBEs/ROPs, the high power consumption (also caused by the very wide ring bus), the resulting lower frequencies and other factors all contributed to a not very successful Radeon 2900 XT.
When, with RV770, ATI / AMD corrected almost all of these issues (save from the very high compute-to-texture ratio), the result was a very capable, fast chip with great compute density. This strategy was reiterated almost unchanged with Cypress (320 5-way VLIW execution units, for a total of 1600 ALUs) and, with some major modifications, with Cayman (384 4-way VLIW execution units, for a total of 1536 ALUs) ASICs.
So, if this VLIW core was so good, why AMD's recently announced next generation architecture will not use any VLIW goodness, while relying, instead, on a combination of scalar and vector cores? The point is that, while from a graphical standpoint R600 and its descendants can do very good, from a general GPGPU compute standpoint they rarely came close to their respective peak performance. Remember that, to fully exploit a VLIW core, the compiler has to extract a good amount of parallelism from the instruction and data streams. While this is not so difficult on graphical data, for a general computational kernel this is way harder.
In light of AMD's “fusion” strategy, they had to rapidly collide CPU and GPU resources and programming model: for this objective, a VLIW core as the one implemented inside R600-class GPU is probably not the best choice. A scalar core (or, to be more precise, a vector core presented as a scalar one) will be noticeably more effective in this case.
This will mean the end of VLIW for graphic? Its hard to say, but I would not be too much surprised if some VLIW concepts will be used in future graphic chips.
Have a nice day!