Processor Whispers: About big brothers and strong children
by Andreas Stiller
Intel launches new Xeon processors, NVIDIA presents Kepler 2, and AMD rolls out Trinity – and loses its head Bulldozer developer.
The new Xeon E5 families are supposed, first of all, to allow for inexpensive servers with AVX and PCI Express 3.0. That's particularly true for the E5-2400 line which isn't very thrilling technologically as it's just a more economic version of the E5-2600 line which was released back in March. It has three instead of four memory channels and one instead of two QPI links. This way, with a smaller socket (LGA1356), it reduces both the complexity and price of the system design.
The processors of the 2400 line aren't that much cheaper than their bigger brothers, however. The lowest-priced member of that family, the E5-2403, has four cores, 1.8GHz nominal clock speed and QPI with 6.4GT/s, costs $188, while its E5-2603 brother only costs $10 more. At least, the E5-2403 is one dollar cheaper than the most economical member of the also newly presented E3-V1200v2 family. The latter consists of single-processor Xeons with the recently introduced Ivy Bridge architecture, which, thanks to the 22nm process, are supposed to consume less energy. According to Intel, a slimmed-down server with E3-1265Lv2 reaches a SPECPower rating of 4291 ssj_ops/watt, which, according to Intel's internal tests, is 39 per cent better than the rating of its predecessor, the Xeon E3-1260L. But it seems that Intel should have asked Fujitsu's specialists for advice – they managed to get 4697 ssj_ops/watt from their E3-1260L server.
The Ivy Bridge's significantly improved processor graphics are only activated in one of the six new versions, the E3-1265v2, which is aimed at entry-level workstations. Hopefully, the fast random number generator, code-named Bull Mountain, will work with disabled processor graphics. It's located outside of the cores in the so-called uncore area – similar to the Quick Sync encoder, which only functions with enabled processor graphics.
Here and there, a server will also benefit from one of the other architectural improvements, such as the Float16 file format, faster string instructions, and the enhanced security through supervisory mode execution protection (SMEP).
The E5-4600 family for four-socket servers should draw even more interest, although it's still based on the Sandy Bridge EP and doesn't come cheap, with prices of up to $3616. Apparently, the pricing includes a bit of "protection money" to reduce the scope in which this family will be getting in the way of the noble EX server class. The EX processors of the E7 family shine, with two more cores and four QPI links per processor. Of the latter, the newcomers only possess two, so that, with four sockets, some processors can't be linked directly. Instead, an interjacent processor is used as an additional "hop" to allow communication between those processors. Then again, the EX versions' maximum clock speed is a bit lower, the QPI and memory transfer a bit slower and they support neither AVX nor PCI Express 3.0.
And it's thanks to AVX that an E5-4650 system can achieve 602 gigaflops in Linpack benchmark, easily leaving our EX test system with four E7-4870s behind as it only manages 352 gigaflops. It's a different tale with SPECjbb2005 though. Tests by Cisco show the EX system in front, even if not by much, with 2.77 million bops: the new four-socket competition manages 2.66 million bobs. Sandy Bridge processors with four QPI links are not planned, this is supposed to remain an exclusive Ivy Bridge characteristic – it will probably be quite a while though, before Ivy Bridge EX makes an appearance.
With 7 Billion Transistors
By then, NVIDIA will probably have long released the eagerly expected Kepler 2 chip – that giant chip with more than 7 billion transistors and up to 2880 computing cores. Before presenting it at NVIDIA's developers conference GTC in mid-May, NVIDIA boss Jen-Hsun Huang first had to explain the not-so-great quarterly figures. In comparison to the year-ago quarter, sales dropped by $38 million to $924 million while profits went down to $60 million from $135 million.
In the Kepler seminar, senior architect Lars Nyland and CUDA specialist Stephen Jones talked about the architectural intricacies of Kepler 2, like the new instructions and computing capabilities and the completely overhauled memory system, which is claimed to provide a Linpack efficiency of more than 90 per cent. But with double precision the theoretical peak performance is only one third of the value with single precision.
Another topic is the fusion of Tegra with a CUDA-capable graphics core, under the project name Denver. The developer platform Carma, with Tegra 3 and a (still) external Quadro-1000M GPU, is reportedly ready for delivery now. NVIDIA intends to heavily advertise it, especially at the ISC12 in Hamburg.
Processor and GPU under one (APU) roof? Old hat for AMD. The company has finally launched Trinity. Its graphics performance is quite good ("outstanding" in AMD's marketing language), but the computing performance of its face-lifted Bulldozer cores doesn't manage to impress – competitor Ivy Bridge clearly plays in a different league here. Anyway, with appropriate prices, below the rather expensive Intel Ultrabooks, Trinity should be able to compete in the notebook market. If it will become the big driving force, as the eloquent head of product, Lisa Su, predicts, remains to be seen.
AMD also suffered big loss in personnel terms when, at the end of April, AMD Fellow and head architect Chuck Moore died at the age of 51 of pancreatic cancer – the same cancer that took Steve Jobs. Chuck Moore – not to be confused with the Forth inventor of the same name – was well known and well liked in the processor scene. During many years, various PowerPC (from PPC601) and Power processors (Power 1, 2 and 4) were designed with his collaboration or under his direction at IBM. In 2000, he presented the Power 4 at the Microprocessor Forum in San José – which is where we met the first time, too. At this time, Moore had already been working on the early concepts for the Power 6 and the Cell processor, but he soon left IBM and then, quite a while later, went to AMD. Here, together with Mike Butler, he was responsible for the design of the bulldozer cores.
Let's hope that his latest child, the Bulldozer, will grow up to be a strong steamroller.