Processor Whispers: About momenta and stories
by Andreas Stiller
NVIDIA cranks it up – first with the Tesla M2090 card, which finally allows all Fermi cores to participate in calculations. Later this year, the much more efficient Kepler is supposed to follow. And the quad-core chip Tegra 3 might hit the market inside an Amazon tablet this summer. Meanwhile, AMD and Intel are pushing OpenCL.
At NVIDIA, in the sunny Californian city of Santa Clara, it seems that some people carefully read Processor Whispers, which would explain the immediate protest against the assumption made in the last issue that one reason the GPU Technology Conference might have been postponed to next year is because, maybe, the next GPU chip, Kepler, won't be ready in time for October 2011. All the better, if this concern turns out to be completely unfounded and the chip manufactured by TSMC in the 28 nm process can be expected to launch without delay at the end of the year. After all, its power efficiency is supposed to be three times higher than that of the current Fermi chip. Looking at NVIDIA's chart – about 1.5 Gflops/watt for Fermi and 5 Gflops/watt for Kepler for double precision calculations – it seems that Nvidia is referring to the effective Linpack Gflops, instead of the theoretical peak values which would be the usual way. The peak value for the Fermi should be around 2.3 Gflops/watt (M2050 with 515 DP Gflops at 225 watts). As NVIDIA's head scientist Bill Dally already disclosed during an interview at SC2010, Kepler is supposed to deliver a strongly improved Linpack efficiency of about 90 per cent because of its new memory architecture. So far, this efficiency has stagnated at a comparatively weak 50 per cent.
For now, NVIDIA intends to strengthen its Fermi lineup with the M2090, which will really have all 512 CUDA cores (shaders) unlocked. This way, the card is theoretically capable of 665 Gflops at 1.3 GHz and has, for instance, a Linpack performance advantage of 25 per cent over the M2070. Interestingly though, it's supposed to stay within the 225 watt TDP envelope.
Hewlett-Packard has, in cooperation with NVIDIA, designed a GPU server called ProLiant SL390s G7 for these cards that packs eight of these GPUs in a dual processor node. With 5.3 Tflops DP computing power, this will probably be one of the highlights showcased at the International Supercomputing Conference (ISC2011) in Hamburg in mid-June.
HP is asking $4049 for the M2090, which is close to the expensive former price of its predecessor, the M2070, which has now been reduced to $3099. Both cards come with 6 GB of memory. At the press conference about NVIDIA's quarterly figures, NVIDIA boss Jen-Hsun Huang expressed his disappointment concerning the growth of the HPC and workstation market – a drop in profit of close to two per cent, down to $135 million, and a reduction in turnover of four per cent, down to $932 million. Most likely, the pricing of the HPC GPU cards is not entirely innocent in this matter.
However, it was others who received criticism about their prices being too high: Huang was disappointed about the up-to-now meagre success of Android 3.0 Honeycomb, which is supposed to give leverage to NVIDIA's Tegra 2 processor. According to Huang, the first devices that have been launched (starting with Motorola's Xoom) are just too expensive. Also, there are still comparatively few really good Android apps for tablets. Maybe the new Acer Iconia Tab A500 will provide fresh momentum in this scene. In any case, Huang is confident; he sees Android ahead of iOS in two and a half years at the most. And to make sure that NVIDIA is well positioned for mobile phone chips, it has bought the European company Icera. Around the middle of the year, according to the rumours, Amazon might resoundingly enter the Android tablet market – first with the Tegra 2 inside the entry level model "Coyote" and then with the quad-core chip Tegra 3 inside the "Hollywood".
AMD also intends to bring new momentum to its GPU sales. Last year, AMD could at least gain some ground on NVIDIA concerning the external graphics chips for desktop PCs. In the HPC sector, however, AMD almost completely left the field to NVIDIA – this is supposed to change soon. With Cray, AMD has an HPC partner that is marketing Opterons in the upper class of supercomputers with some success; although, lately, Cray was $1.5 million in the red. The reason Cray boss Ungaro gave was that of government budget cuts, in the US, but most of all in crisis-ridden Japan. In Europe, on the other hand, in spite of Greece, Portugal and Ireland, the situation is much better and Cray has already nailed down numerous deals for upgrades to AMD's upcoming Interlagos processor, for instance, with the University of Edinburgh and with the Swiss National Supercomputing Centre (CSCS). The latter plans to upgrade its current Cray XT5 to a Cray XE6 with 400 teraflops – all in the hope that AMD will roll out the Interlagos on schedule. As for the GPUs, Cray doesn't rely on AMD, but on NVIDIA's aforementioned Kepler chip.
AMD just has too little to offer in this regard. For a year now, its top product has been the FireStream 9370 with 528 DP Gflops, 4 GB GDDR5 at 225 watts, which is available for around $2400. But the Firestream doesn't have ECC and so the much cheaper Radeon is apparently preferred by the HPC scene, like the American company RenderStream, which now equips servers and workstations with eight Radeon HD 6970 or four Radeon HD 6990. So equipped, the systems have a theoretical DP computing power of 5.4 Tflops, roughly the same as HP's new GPU server (which will probably cost about eight times as much) achieves with eight M2090 cards.
Of course, the integer performance is important, too. When cracking passwords (oclHashcat-lite), for instance, the four HD 6990 cards manage 45.7 billion MD5 checks per second while eight NVIDIA GTX580 clearly lag behind with 18.3 billion.
In any case, HPC and OpenCL are going to play an important role at the coming AMD Fusion Developer Summit. Up to now, compared to NVIDIA’s dominant CUDA, little has been seen of OpenCL in the wild, but now Intel plans to step in as well. Intel's OpenCL SDK, which is currently in the beta phase, is supposed to be released in the summer. However, it will at first be limited to current processors and the vector units SSE4.1 and AVX. If and when OpenCL for the Sandy Bridge graphics processor will follow, is still unclear.
AMD needs help to establish OpenCL on a large scale. At the ISC2011, two specialists from both companies will amicably hold a joint four-hour tutorial titled "OpenCL: An Introduction for HPC Programmers" – well, the CUDA tutorial on the same day will last nine hours.