Processor Whispers: Of Milky Ways and milkmaids
by Andreas Stiller
AMD offers first x86 processors with 5GHz clock speed, but loses its top position among the supercomputers. Intel's Xeon Phi takes over – though not in the USA, but in China.
AMD announced that, with the FX-9590, it's offering the "first commercially available 5GHz CPU processor". A feast for the enthusiast, even if 5GHz is not the base, but the "max turbo" value. Disregarding the pleonasm "CPU processor", the statement is what we in Germany would call a "Milchmädchenrechnung" (milkmaid's reckoning), a kind of naive fallacy – I'm sorry if this expression is politically incorrect.
I say so because, apparently, AMD hasn't looked beyond the four walls of the x86 space. For years now, the IBM P6+ with 5GHz and the IBM zEnterprise 196 with 5.2GHz have been juggling ones and zeros. Last year, they were joined by the six-core chip zEnterprise EC12 with up to 5.5GHz. All of these chips are indeed commercially available, at least if your pockets are deep enough. IBM puts up to six of these chips on a multi-chip module (MCM), including 192MB of L4 cache, for the price of a nice single-family home in an expensive neighbourhood.
For a little more, namely $1.75 million, a research institution can instead purchase a Blue Gene/Q Express rack – but careful, in this case, "express" means half empty. Such a rack will deliver about 100 teraflops. Lots of fully equipped Blue Gene/Q racks are at work in the Lawrence Livermore National Lab in sunny California. Under the name Sequoia, they achieved second place with 16 petaflops in last year's Top500 list of supercomputers. How they will rank in the new Top500 list, will be revealed at the opening of the international supercomputer conference ISC'13, in Leipzig on 17 June – unless a flood gets in the way.
Milky Way with Xeon Phi
Just like it was two and a half years ago, the new number one will probably be a Chinese computer. It was supposed to be a surprise and is still under NDA at Intel, but Linpack creator and Top500 co-organiser Jack Dongarra could not hold back the delight about his visit to the National University of Defense Technology (NUDT) and has already spilled the beans in a report about the new Tianhe 2. Unlike the Tianhe (Milky Way) 1A, which relies on NVIDIA Tesla cards, the Tianhe 2 draws on 48,000 Xeon Phi cards, reaching a theoretical peak performance of 55 petaflops. In real terms, with Linpack, it achieves 33.8 petaflops, in which the 384,000 CPU cores with the officially still unreleased 12-core Ivy Bridge EP (Xeon E5-2692v2 with 2.2GHz) play a role too.
Source: Image courtesy of Jack Dongarra Another interesting component: the frontend processors Galaxy FT-1500, developed in-house by NUDT and based on SPARC v9. Yes, that's right, they don't rely on the MIPS architecture that you would usually expect in a Chinese processor. They are 16-core processors that supposedly deliver roughly 144 gigaflops at a clock speed of 1.8GHz and an energy consumption of 65 watts.
The Tianhe 2 is supposed to have 4096 of these galactic FT-1500, which alone would account for half a petaflop of theoretical peak performance.
The computer does suck up a lot of power though: 17.8MW excluding cooling. In terms of efficiency, that doesn't earn it a bad ranking, but it is quite a bit away from the Top500 list's top-ranked system, the Titan Oak Ridge National Lab (Cray XK7 with AMD Interlagos and NVIDIA Tesla K20x) with 2143 megaflops/watt. Last November, in a coup de main, Intel had pulled two smaller systems, targeted particularly at energy efficiency, out of the hat shortly before the release of the last Top500 list. They came with Xeon Phi and delivered up to 2450 megaflops/watt.
Seemingly, the Tianhe 2 is equipped with the Xeon Phi 3120P with only 57 unlocked cores and a clock speed of 1.1GHz, which is not yet available on the open market. However, it has meanwhile popped up in the Intel database and should soon roll out alongside other versions.
Presuming a probably higher yield rate, the 3120A/P should be much cheaper than chips with 60 or even 61 cores – the die has a total of 62 cores.
It has also been reported (by cpuworld.com) that it has less memory (6GB with a smaller 384-bit interface) and a slightly lower memory clock speed of 2.5GHz, but a hungry TDP of 300 watts. That seems very high in comparison with the 225 watts of the 60-core Xeon Phi 5110P. Probably there is some truth to the semi-accurate rumours about the chip's thermal problems. Besides the 5110P, for example, there is now supposed to be a 5110D with an increased tolerance of plus 20 watts.
But with Intel, TDP is a tricky topic, similar to the turbo modes. As for the Haswell Xeons E3-1200v3 for a single socket, which were presented at Computex, there was quite a confusion due to the varying specifications in Intel's ARK database, Intel's Platform Brief and the press reports. A TDP of 95 watts as thermal solution for the platform here, a TDP of up to 84 watts for each processor there, max turbo values for each core here (ARK), turbo values for the total of cores there – you don't have to be a milkmaid to get lost then. To mix things up a little more, Intel has started using the scenario design power (SDP) for low-power chips. But with that, at best, the milk is shaken, but not stirred, because Intel now plans to sell the new Atoms with Silvermont architecture, labelled SDP, under the Celeron and Pentium name. At Computex, first Celeron J1750-equipped boards were showcased. Who knows, maybe we will soon also see an Itanium Z3370 with Silvermont ...
In any case, the Haswell Xeons will be available with and without GPU and, apparently, all with unlocked vPRO as well as the transactional synchronisation extension TSX, which we have already successfully tested with the almost comparable Core i7-4770.
As of yet, Intel's Haswell portfolio is limited to quad-core processors. Consequently, Apple's new single-cylinder machine, the Mac Pro 2013, announced to have 12 cores, probably has a single Ivy Bridge EP, so not a Haswell yet. This means it will have to make do without the power of two FMA pipelines, but in return it has 12 instead of 4 cores. Well, at least a single processor means that the inability of Mac OS X to properly handle the meanwhile widespread non uniform memory address (NUMA) doesn't matter – but still, besides the system design, Apple might also want to start considering a contemporary update to its operating system.