Processor Whispers: About New Topologies and Old Sins - Update
by Andreas Stiller
The 32nm processors, with which Intel is initiating the era of the x86 CPUs with integrated graphics, are not exactly keeping their old promises – but yesterday’s sins are now catching up with Intel.
Just one and a half years ago, Intel published “Detecting Multi-Core Processor Topology in an IA-32 Platform” where it explained how to correctly determine the CPU topology in a future-proof way in order to optimally distribute one’s software on any system. Whoever counted on that ... will now be disabused: at least in the systems we have tested until now, the Arrandale and Clarkdale processors behave like quad-core processors with crossed out cores and so the “robust algorithm” that Intel presented then, now returns “holes” in the core IDs. And although the newest processor generation features a nice extension of the CPUID command, which can alternatively provide Information about the Topology, old software might fall flat on its face for now.
Also the advertised new crypto commands weren’t really convincing. They might make the code look more elegant, but, until now, we couldn’t make out any mentionable performance advantages for “real” applications like Winzip 14. However, those are all petty fault-findings and otherwise the new dual-core processors with their integrated graphics chip perform well enough and should conquer the market for low priced home and office PCs rather quickly. Their technical specifications and pricing alone allow for that and further-reaching sales promotional measures, like the ones Intel obviously deemed necessary in the past, should therefore be water under the bridge now.
Anyway, even after the reconciliation with AMD, the past is catching up with Intel. After all, the Assistant Attorney General for Antitrust at the United States Department of Justice, Christine Varney, appointed by Obama, plans to swing her broom the European way. At least the commissioners of the Federal Trade Commission see enough reasons to drag Intel to court again. Their statement of claim is a cannonade of severe accusations, worse than the ones formerly put forth by AMD and now amplified by Nvidia and GPUs. However, the case won’t be tried in a federal court, but, after the rarely applied section 5 of the Federal Trade Commission Act, by a FTC Administrative Law Judge. Still, in the last 30 years, the FTC has never had any success with this kind of lawsuit – all judgements were quashed by normal federal courts during later appeals procedures. Nonetheless, Intel boss Otellini doesn’t plan to wait it out and has called the FTC complaint a misguided case.
Probably he isn’t entirely wrong, especially the measures demanded by the FTC partly exceed elsewhere common regulations by far. Among other things, the commissioners want to oblige Intel to change its licensing policy and release x86 technology, like CPU interfaces, to third companies – Nvidia Boss Jen-Hsun Huang is already rubbing his hands with glee.
As an example for the tenor of the statement of claim, I’d like to revisit an issue that c’t (The H's sister publication in Gemany) is indirectly involved in. In the beginning of 2005, we openly complained about the unfairness of Intel’s compilers, which, at that time, deprived the competitors’ processors of operating at optimal performance. With a simple patch, we were able to circumvent this lock. The result was that one of the over 20 benchmarks of the SPEC CPU2000 suite, 181.mcf, ran 31 percent faster on an Opteron.
The FTC is now talking about defective compilers in this context. According to the FTC, many design changes to the compiler had the sole goal of making the competition look bad. Furthermore, Intel used misdirecting or false statements about their processors’ benchmark results without indicating that the performance advantage was mainly or solely caused by their compilers. The FTC now wants Intel to replace the defective compilers with operational ones free of charge, compensate companies for the effort of recompilation and inform the companies’ clients that, as the case arises, they should exchange the inferior software. And, of course, Intel may not bring such discriminative products to the market in the future.
Intel always argued that, because of the product liability laws, it could only enable features that have been sufficiently validated. Special optimisations for competing AMD processors and the ample corresponding validation effort would have been very demanding – Intel just had not been able to bring itself to do it.
However, according to our ratings, the code produced by Intel’s “defective” compiler was mostly also faster on Opteron systems than the “operational” one from Microsoft, PGI or GNU. Unlike Intel – which, of course, like IBM, Sun and all the others, wanted to shed the best light on its processors – we realised our ratings using different compilers and, more important, compatible code without any special optimisations.
Commercial software, the kind of software the FTC most likely refers to when talking about “defective”, very rarely employs choosey optimisations anyway – after all, who wants his or her software to perform well only on a limited range of CPUs? As opposed to its CPUs, Intel has been and is a long way from a leading market position with its compilers; their application is mostly restricted to high performance computing. Apart from that, C/C++ is dominated by Microsoft VC under Windows and by GNU C/C++ under Linux.
Also, the aforementioned feature, which the Opteron was deprived of, a cache optimisation for badly organised data structures, only very rarely had any effect (basically just with the given example from the SPEC benchmark). With the c’t patch, however, performance minded developers were able to circumvent this restriction. In the end, there won’t be a lot of “defectiveness” to be found in the wild. And so, the FTC is out hunting with pretty big cannons and thus overshooting the mark a little. Still, Intel might think about splitting off its compiler development into a separate company, or at least handle it with a little more neutrality. Maybe then, some day, one will also find the flag /Bulldozer in Intel’s compilers.
And Something Else
Hewlett-Packard plans to update its highly available NonStop server family for its 35th anniversary – only for that, HP still has to use the ancient and extremely slow Itanium-9100 Montevale processors. Only sometime in the first half of 2010, after numerous delays, the next generation Itanium Tukwila – also called Godot – is supposed to finally make an appearance. Just in time for the NonStop anniversary, Red Hat has now announced that it’s abandoning the Itanium: RHEL 5 for Itanium will still be serviced for a few years, but the development of version 6 has been suspended owing to a lack of demand.
Intel is stating that the mentioned white paper "Detecting Multi-Core Processor Topology in an IA-32 Platform" was first published early in 2006 and is obsolete since the summer of 2008 when it was superseded by a new paper "Intel® 64 Architecture Processor Topology Enumeration". This was made necessary because the Nehalem design did require a small change in the CPUID instruction architecture in terms of how it reports raw data related to multi-core topology enumerations.
Comment from the author
Since this Processor Whispers was published, Intel has removed the old paper from the website and linked to the new one, but the issue at hand was not the publishing date but the fact that recently developed software, that is, written since August 2008 that relied on the then current algorithms will fail to correctly detect the CPU topology of some modern architectures. There's no apparent reason why Intel has chosen this incompatible path, but even software being developed now may suffer when they use older Intel Compilers with OpenMP and KMP_AFFINITY_MASK settings. It is true that the impact is only a possible performance degradation and, at least for now, is limited to the Nehalem dual core processors Clarksdale and Arrandale, where OpenMP software is rarely run, but we will have to see how the six core per chip Westmere-EP and Gulftown processors will behave.