Processor Whispers – About Chancellors and 3D Chips
by Andreas Stiller
3D is popular in cinemas, TVs and increasingly in chips, too. Already, there are 3D image sensors and soon 3D stacks for memories and processors will follow. And AMD processors receive new names: Sempron, Athlon and Phenom to become extinct.
At CeBIT, IBM boss Palmisano tactfully covered up the slip-up of the German chancellor Angela Merkel who, during the presentation of IBM's 3D prototype chip, had asked him, "Do you take that from Intel?" His reply, "No, ours are better", was almost drowned by the audience's laughter.
Source: Deutsche Messe AG Still, Ms. Merkel – whose doctoral degree, by the way, has withstood intense scrutiny – wasn't completely wrong with her comment as Intel and other competitors have been working to conquer the third dimension for chips for quite some time. The first goal is to place the memory directly above or beneath the CPU, like Intel's Teraflops Research Chips. While these are only test chips, IBM intends to produce the Power8 processor, planned for 2013 (in 28 or 22nm), using 3D stack technology, with a directly linked memory and also, probably, a layer of small specialised computing cores adapted for specific intended uses. It is here that the successors of the Synergistic Processing Units of the Cell processors might turn up; the 3D stacks offer lots of room for modularity. Future plans envision up to 100,000 connections per mm² in silicon (TSV: Through-Silicon Via).
Maybe Intel will surprise us with 3D technology for its Haswell processor scheduled for 2013, perhaps, with a large "stacked" L3 cache. In any case, the processor's internal use name already includes a 3: H3. 3D stack technologies also dominate Globalfoundry's roadmap. Currently, 3D CMOS image sensors (3D CIS) or so called system-in-packages with analogue, HF and power components (3D RF-SIP), are available. Around 2013, the 3D IC packages for processors and similar chips are supposed to follow, with analogue and HF technology, DRAM, processor cores and also microelectro-mechanical systems (MEMS). Whether the 3D technology will then be destined for AMD or for other clients – presently, only 30 per cent of Globalfoundry's turnover comes from AMD – remains to be seen.
The principal issue with 3D ICs is cooling. IBM is undertaking research in this direction together with the École Polytechnique Federale de Lausanne and the ETH Zurich within the scope of the CMOS AIC project. The scientists have presented the first test chips in which cooling water is passed through small capillary tubes with a diameter of no more than 50 microns. It will be a few more years before this technology is ready for production, though.
The surrounding infrastructure, the water cooling system on the board, is being tested with accordingly designed blades for the supercomputer Aquasar at the ETH Zurich. In contrast to the very expensive supercomputer Blue Waters with its water-cooled memory modules, the new concept integrates off-the-shelf DIMMs into the water cooling system, together with standard processors, chipsets and voltage regulators.
This technology is fully developed and is currently being transferred to iDataPlex servers. From the end of this year, these servers, equipped with Intel's next Xeon generation Sandy Bridge-EP, are supposed to allow the SuperMUC at the Leibniz Supercomputing Center (LRZ) to achieve a new top performance.
The IBM architect responsible for SuperMUC, Klaus Gottschalk, hopes that the hot-water-cooled computer with 3 petaflops of theoretical computing power will make it to the top of the Top500 list of supercomputers at the ISC in June 2012. However, it's said that LRZ director Professor Bode intends to retire the power-gulping Itanium dinosaurs much earlier, replacing them with much more efficient Westmere systems.
FX and A
Possibly, though, AMD's Bulldozer chips will be even more efficient. At CeBIT, AMD already announced the rather plain official names of the upcoming new AMD generations. The Bulldozer family will begin with FX, for example, FX8000 for the 8-core version. It seems the release date "end of June", as the intersection of "second quarter" and "early summer", remains unchanged. First Bulldozer-compatible motherboards with the still empty AM3+ socket were already on show at CeBIT.
And marketing director John Fruehe emailed us an explanation concerning the slightly unclear "90 per cent” performance statement by Michael Gordon at the ISSCC.
If, on the Bulldozer, you set the performance of a single thread to 100 per cent, then – on average – two integer threads (probably SPECint_rate2006) in the same module would achieve a throughput of 180 per cent. That's only slightly less than the 200 per cent that two completely separate integer cores with their own front ends would be capable of. And distributing both threads among two Bulldozer modules results in a throughput of 195 per cent – almost the maximum value. For the floating point benchmark, two threads in one module have to share a common FPU. Similar to hyperthreading, they profit from the improved utilisation of the functional units, which is why the throughput reaches 120 per cent here.
So, the scaling numbers can't be used for a direct comparison with the K10, but – according to Fruehe – thanks to its turbo core, the comparison wouldn't make the Bulldozer look bad, especially because it features further optimisations, including macro-op fusion, just like the Nehalem and the Sandy Bridge.
The Zacate and Ontario APUs that are already available are now also being pressed into the plain one-letter naming scheme with "E" and "C", and for embedded "G". As for the Fusion processor with integrated Llano graphics, which will join the race under the name Axxx, marketing boss Leslie Sobon talked at CeBIT about a launch in the third quarter. Most internet voices think the American Independence day, 4 July, a likely date. Behind the curtain, AMD had a Llano prototype with 1.8 GHz contest a Sandy Bridge Core i7-2630QM (2 Ghz) in a few selected benchmarks – the company logos of both laptops had been taped over. In a CPU benchmark, the Llano was narrowly defeated, but it clearly dominated in the graphics benchmark Furmark – mostly by factor of two or more. Additionally, its graphics quality – measured with 3DCenter Filter Tester – was better and the power consumption was lower. Naturally, no permission was given to run other benchmarks.
And something else:
With some bobbing and weaving, Intel has finally managed to get an age-old trivial patent – on the shared access of CPU and graphics chip to the primary storage – acknowledged by the German patent office (DE19681745 B4). Intel's intention is anyone's guess. Unfortunately, we reported about the corresponding UMA technology of the SiS chipset 551x in the c't issue of February 1996 and Intel's PCT application dates back to 1998. The patent should therefore be invalid.