Now, whatever the case, it seems to me that, when looking at the general desktop or even workstation market, the gamble that AMD took by pairing two integer cores with one shared FP core, didn't take off very well in the typical desktop or workstation usage scenarios, where either there can be many threads in benchmark wanting the use of FP units, or the existing OS schedulers don't yet handle well the asymmetrical scheduling of threads on cores with FP and without FP at the time.
Also, it seems that the inherent core execution parallelism on the Bulldozer, in terms of instructions per cycle, isn't really much ahead of its previous generation Phenom cores either, and that's where the other problem of matching Intel performance, core for core, is arising.
OK, enough complaints on this - what can AMD do with the Bulldozer core as it is now? First, the core seems to be doing very well, including power consumption wise, on lower clocks. So, the mobile versions of Bulldozer, like the one combined with GPU in the 'Trinity' successor to Llano, could do just fine. Second, the same benefits apply to enterprise servers, especially those in virtualised or cloud environments, or even 'throughput' HPC use, where many small threads run at the same time, a lot of them without any FP use.
That's why AMD launches 'Interlagos', a dual-die, 16 core total, chip, to be the first in its Bulldozer server line up. Yes, the core speeds will be lesser, below 3 GHz, but the power usage will be lower, the core density per chip the highest in X86 world, and with enough shared memory bandwidth, all of 4 DDR3-1600 channels per socket, to feed the cores well.
So, you can have four of those chips, 64 cores total, on a single mainboard, with half a terabyte of RAM, and run a large database look up or even massive web site serving. Since most of the time is spent handling memory and I/O, and switching between threads, core speed matters less - still does matter, of course, but not to the same extent as usual. And, again, not too many threads are FP bound, so less issues are there for the FP unit sharing, as long as the OS thread schedulers handle things fine. Therefore, AMD could still have a run at this market, depending on the price they set.
Either way, a rework of the architecture has to happen. Three key things come to my mind right now: first, more parallelism inside each core, for higher instructions per cycle. Second, return back to the usual standard core design with one FP unit per core. Third, more memory channels per die, obviously four DDR3 / DDR4 channels instead of two DDR3 channels will do a much better job feeding an 8-core chip. Let's see what AMD can do about this, as soon as they can.
No comments:
Post a Comment