Processor Whispers: About counting, dividing and scaling down
by Andreas Stiller
NVIDIA reclaims the top position in the size matters discipline, AMD revises its counting methods and finds important former employees at its competitors. Intel invests in new factories – but due to a lack of satisfactory government grants, it has chosen Ireland instead of Israel.
With 7.1 billion transistors, NVIDIA's Kepler 2 is supposed to be the mightiest chip on earth right now. That is, if NVIDIA has counted properly, because it doesn't have much of a head start: the gargantuan Virtex 7 from Xilinx closely follows with 6.8 billion transistors. After a bit of a gap, next in line is the AMD Tahiti with 4.3 billion transistors. As for CPUs, the 10-core Westmere-EX with 2.6 billion transistors is currently still the biggest, but it should soon be surpassed by the Itanium Poulson with 3.1 billion transistors.
Counting transistors is easier said than done. What counts and what doesn't? Inside the chips, there are many test structures and redundant circuits, and these passive transistors don't necessarily have to be counted. In the past, Intel has often implemented extensions for testing purposes that were not enabled in the final product, as, for example, hyperthreading in the first Pentium 4. Besides, the published numbers always refer to the maximum configuration of a processor generation. With hyperthreading disabled and reduced caches, a chip often houses masses of transistors that are condemned to idleness.
The processor caches often also have redundant lines for subsequent repairs. The more or less obsolete Itanium can even repair the caches during operation. And IBM's BlueGene/Q even has a complete core as a reserve.
In particular, however, the designs include large numbers of decoupling capacitors (decaps) whose purpose is to avoid crosstalk via the power supply lines. And these capacitors are formed by special transistors. AMD has informed the news site Brightsideofnews that the company had taken these "decap cells" into account until the end of 2011, but not since 2012. This means that their Llano chip, which had been originally specified with 1.45 billion transistors at launch, is now left with only with 1.18 billion. Previously, there had already been similar confusion as to the number of transistors in the Bulldozer architecture. And who knows, maybe some test hardware is still hidden away here too, waiting to be enabled by a secret MSR instruction and the not-so-secret-anymore AMD key (0x9c5a203a). It could be that the Radix-8 hardware divider, which is officially scheduled to make its debut inside the Steamroller, is already slumbering inside the Piledriver chips.
Mathematician David M. Russinoff, who was responsible for the formal hardware verification of the FPUs at AMD since the K5 days, was one of the people involved in designing the Radix-8 divider. Like many others, Russinoff left AMD at the end of last year. With pleasure, Intel immediately hired him as principal engineer. He didn't even have to relocate, because Intel operates a rather large development and research centre in Austin, Texas. Among other things, this centre is responsible for the Atom chips, which could really do with significantly more power for floating point calculations. The latency and throughput times of the Atom processors for divisions (IDIV, FDIV, DIVPD ...) – be it the old Bonnell or the current Saltwell architectures – seem very poor indeed: they are much, much higher than with other x86 processors.
Generally, Intel has been using hardware dividers based on the SRT algorithm for quite a while now, starting with Radix-2 (one bit per clock cycle) in the Pentium. It initially suffered from a small bug though, which caused quite a stir back then. We are still hoarding a few faulty Pentium units around here ourselves, in order to test the promised "lifelong" guarantee someday. In any case, we did get a replacement without any trouble a few years ago.
Many years after the Pentium was launched, the developer team in Haifa presented the Merom with the Radix-4 division (2 bits per clock cycle), which was almost twice as fast. Then, in 2008, the Core 2 (Penryn) was released with Radix-16 (4 bits per clock cycle), which again doubled its predecessor's performance. Additionally, it comes with an "early out" recognition, which, at best, can deliver a result after only 6 clocks. If that's not the case, the Penryn will still easily beat the Atom. For example, the Penryn requires 21 clocks for DIVPD – with the Atom, it's 122 ...
Shortly after David Russinoff left AMD, chief technology officer Eric Demers departed the company as well. Demers had originally come to AMD from ATI as a graphics architect. It has become known that he has found a new position at Qualcomm, which acquired AMD's smartphone graphics branch (and the Imageon processors) in 2009. Qualcomm also owns the GPU design Adreno, which is merged with ARM cores inside the Snapdragon processor. As Demers is quite familiar with AMD's HSA (Heterogeneous System Architecture), it's not unlikely that the competitors Qualcomm and AMD might become partners within the scope of the HSA initiative.
At the Fusion Developer Summit 2012 in Bellevue, near Seattle, in mid-June, AMD is going to release the HSA specifications and will probably name some partners as well.
Intel plans to build its new foreign 14 nm fab in the Irish town of Leixlip, not in Israel. Intel boss Otellini announced this bit of news at an investor meeting in mid-May. This is Intel's reaction to the refusal by the Israeli government to subsidize its $4.8 billion project with a $600 million grant. The responsible Investment Promotion Center of the Ministry of Industry and Trade had offered less than half of what Intel had asked for and even that only on the condition that Intel would invest in an additional construction, in the town of Beit She’an. Nobody knows what the Irish offered in terms of subsidies – Guinness, whiskey, salmon ... It must have been good, though, as Intel not only intends to upgrade its existing factories – originally, there was simply a mention of a $500 million update for the old fab 14, which has been mothballed for three years now – but has also submitted an application for authorization to build a new factory covering 162,000 m² of ground. Next to Intel's production facilities in Oregon and Arizona, the new factory in Leixlip is also destined to produce the Broadwell processors (which are Haswell processors scaled down to 14 nm structures), starting at the end of 2013.
Israel, however, where Ivy Bridge processors are manufactured in one of the most technically advanced factories for 22 nm technology in Kiryat Gat, gets a second chance. Fab manager Maxine Fassberg and head of Intel Israel, Mooly Eden, are negotiating with the government concerning subsidies for the construction of a factory for 10 nm technology, which is supposed be ready for production by 2015.