Processor Whispers - About Waves and Cores
by Andreas Stiller
While a surprise wave of resignations washed Germany's CDU politicians out of their offices, oily waves arrived at the beaches of the Gulf of Mexico and German schlager fans celebrated with mexican waves, Taipei was hit by a wave of tablets. Meanwhile, Intel announced processors with more than 50 cores for 2012.
The strong ground motion detected by the GEO600 detector at the Albert Einstein Institute for Gravitational Physics in Hanover on the last weekend of May did not originate from a far-away supernova but was caused by thousands of foot-stomping Lena fans – the German singer who won the Eurovision Song Contest – cheering in front of the New Town Hall; Germans had been waiting for those first place shockwaves for decades and almost a century for the gravitational waves. The supercomputer, Atlas, is supposed to locate them in the detector’s data with the support of the more than 250,000 participants of the Einstein@home internet project. On the Top500 supercomputer list, Atlas only scores 255th place with 32.5 teraflops – but only because the recently heavily upgraded computer has not been linpacked yet. The around 300 teraflops of average performance obtained by Einstein@home doesn’t count either.
Some internet projects are even more powerful: for example, Folding@home's participants deliver on average more than 6 petaflops of performance, which mainly comes from AMD and Nvidia graphic cards as well as Playstation 3 cells. However, we are talking about single precision floating point here and the scientists responsible for Folding@home had to take some questionable steps to convert GPU flops into x86 flops. With matrix operations or with the Linpack benchmark this wouldn’t be necessary, but Folding@home employs more complex operations like exponentiation and logarithm. Current GPUs can manage such calculations very quickly in single precision but in double precision they only deliver a fraction of their theoretical potential.
On the new Top500 list, there are three GPU-accelerated supercomputers. Two of them use Nvidia’s new Tesla card C2050 with the Fermi chip while the third is equipped with the dual-GPU card AMD Radeon HD 4870 X2. It’s possible, with a little effort, to estimate the CPU's contribution to the results, which allows a rough comparison of the Tesla C2050 and the Radeon HD 4870 X2: around 140 to 160 teraflops per card with the highly optimised Linpack benchmark in double precision. That’s only about 30 per cent of the theoretical maximum performance.
Intel plans to beat AMD’s and Nvidia’s graphics chips at High Performance Computing (HPC) with the Larrabee, but under a new name. At the Supercomputer Conference ISC’10, server boss Kirk Skaugen explained that the ex-Larrabee is now sailing under the flag of the “Many Integrated Core” (MIC) architecture. A 32-nm version named Aubrey Isle will be released as a developer sample with 32 cores, 8 MB of shared cache and a clock speed of 1.2 GHz. Thanks to quad hyperthreading each chip handles 128 threads in quasi-parallel. Apart from wafers with Aubrey Isles, Skaugen also presented the coprocessor card Knights Ferry, which Intel is already delivering to selected developers such as those at CERN. He also had first benchmark results, but only for single precision. For the important part of the Linpack benchmark, LU, Skaugen showily pushed the value to 517 gigaflops; the competition supposedly gets up to 360 gigaflops. The mass production of a 22-nm MIC chip with more than 50 cores named Knights Corner is planned for some time 2010/2011.
At the Forschungszentrum Jülich (a research center in Jülich, Germany), Intel has worked with the middle-ware company Partec to found the ExaCluster Laboratory to develop cluster technologies up to exascale systems. Bull is building the supercomputer TERA 100 with 18,000 Nehalem-EX processors, which is supposed to deliver more than 1 petaflop. HP and the Tokio Institute of Technology plan to equip the Tsubame 2.0 with really big nodes by October: two Westmere Xeons each, plus three Nvidia Tesla C2050 cards. The 1400 nodes are estimated to amount to a theoretical 2.4 petaflops – they are hoping for 1.5 petaflops Linpack performance.
A Wave of Tablets
The monster wave of tablets triggered by the launch of the iPad reached Taipei in the beginning of June at the Computex computer fair. On the day before the start of Computex, Asus (Eee Pad, Eee Tablet) and MSI (WindPad 100 with Intel Atom, 110 with Nvidia Tegra 2) started the tablet tide and VIA showed off Chinese low-budget tablets with Android 1.6 and the rather slow ARM-SoC of its subsidiary Wondermedia. Amtek has designed tablets with the following processors: Freescale/ARM CPU, Nvidia Tegra 2, Atom and CULV Celeron. The One Laptop Per Child Association (OLPC) intends to launch its first tablet before the new XO-3 is released in 2012. Its tablet called Moby is slated for early 2011 and will be based on the Marvell reference design with a 1GHz ARM CPU. Meanwhile, Qualcomm has announced the first dual-core Snapdragon and Intel has countered with the Moorestown version Oak Trail (specifically designed for tablets) in 2011. Intel will roll out DDR3 and dual-core Atoms before then.
Many ARM tablets use Android, but developers will probably have to wait for Android 3 to be able to make suitable software for those devices with displays that are bigger than smartphone displays. Nvidia boss Jen-Hsun Huang expects Android 3 to support the wider tablet applications of his Tegra 2 – unfortunately he didn’t say if the latter is the “Gingerbread” Android expected toward the end of 2010 or if we’ll have to wait longer.
The Intel chipsets P67 and H67 (Cougar Point) for LGA1155 motherboards were also on display at Computex. Asrock, Biostar and other companies' exhibits confirmed our speculation from our last issue: no integrated USB 3.0 adapter, but PCI Express 2.0 – and slowly but surely it’s “farewell PCI”.