Processor Whispers: About Haskell and Haswell
by Andreas Stiller
It's the “worst-kept secret of the industry” – so it was said at Supercomputing 2011 (SC11) – that Intel's Haswell processor will feature transactional memory. Other leaked bits of news concern Intel's Ivy Bridge and AMD's Trinity.
About two years ago, in August 2009, Intel, IBM and Sun founded a “Drafting Group” in order to devise a common specification for transactional memory (TM). All three of them were planning to incorporate this feature into their next processor generations. Sun intended to do so with its Rock processor, but that was dumped after the acquisition by Oracle one year later. IBM, on the other hand, was more successful with its Blue Gene/Q. At SC11 in Seattle, the first processor with hardware transactional memory (HTM) was presented under the official name PowerPC A2.
At the beginning of 2013, Intel will also offer HTM, with its Haswell processor. Everyone from Intel openly admitted as much when asked directly at SC11. Supposedly, Intel will soon announce the new TM instructions that will be added to the already released AVX2 extension. It's about time, as the continually increasing number of processor cores makes the need for technologies for faster thread synchronisation more and more urgent. Without such technologies, the processor will eventually be so busy with itself that it won't be able to get any real work done.
With transactional memory, the idea is to not lose time by locking successive data access by threads to shared memory areas, but instead first bundling the accesses into an atomic transaction, for instance in the L1 cache, in order to save time by executing them all at once during the commit. This happens under the optimistic assumption that no other thread will stick its oar in and access the shared memory in the meantime. If that happens, though, that's bad luck and a rollback mechanism is required to abort the intended – but by then invalid – transaction. In that case, the transaction is re-executed, if applicable with new source data.
Software-wise, Intel has been strongly dedicated to software transactional memory (STM) for years and it has been refining the Intel C++ STM Compiler, Prototype Edition. In this compiler, __TM_atomic{} can be used to mark passages that are to be treated as atomic transactions.
Some other compilers and interpreters, such as the functional programming language Haskell, have also committed themselves to STM at an early stage, rather aggressively. The Glasgow Haskell Compiler implemented STM in version 6.4 and some applications based on it (like some Bittorrent clients) actually make ample use of it. Equivalent implementations for Java and Python are eagerly being worked on as well.
However, the whole thing stands and falls with the respective conflict rate and with the time required for conflict detection and rollback. Using only software, TM isn't generally efficient enough, but it can be supported, complemented or replaced by various hardware mechanisms that significantly increase its efficiency.
Before Haswell's planned release in 2013, Intel will first roll out the Xeon E5 (Sandy Bridge EP). And for desktop PCs and Notebooks, the Ivy Bridge in 22 nm technology is expected. According to Intel's leaked NDA Desktop Platform Roadmap WW46, the wait won't be over until the second quarter of 2012, though. Also according to this roadmap, at 77 watts TDP, the new 22 nm desktop processors of the normal energy class will consume about 20 per cent less power, but there are no versions with a higher nominal clock rate than the current highest one.
Leaks
The Core i5 lies between 3.0/3.2 GHz (i5-3300) and 3.4/3.9 GHz (i5-3570) with four cores without HT, has 4 MB cache, two memory channels DDR3-1333/1600 and comes with integrated DirectX 11 capable HD 2500 or HD 4000 graphics. The top processor Core i7-3770 features HD 4000, 8 MB cache, hyperthreading and 3.4/3.9 GHz of clock rate – a bit more in the overclockable K version. There are also stripped down low-power versions running from 65 down to 35 watts.
Intel added some benchmark results for the i7-3770 in comparison with the Sandy Bridge i7-2600, with a nominally equal clock speed. The increase in graphics performance, by a factor 2.7 to 3 in 3DMark, looks impressive, but Intel pitted an HD 4000 against a hardly comparable HD 2000. Compared to an HD 3000, the performance will probably be no more than twice as high.
Thanks to small architectural improvements and an optimised turbo-boost, and probably faster memory, the CPU benchmarks improved only a little, between 7 per cent (Sysmark 2012) and 25 per cent (Excel 2010) in the best case.
Does AMD keep some secret plans for transactional memory in this safe drawer? Margaret Lewis (Product Marketing Director) and Pat Patla (General Manager of Server Products) at SC11.
AMD hasn't announced anything about transactional memory so far, but, years ago, it presented a possible architecture extension called Advanced Synchronisation Facility (ASF), which is able to lock complete cache lines and thus offers a strongly improved basis for STM. However, so far there's only a simulator for PTLsim.
It is not only that, though, which is unclear at AMD, but also many other things, as the new boss, Rory Reads, appears to be sweeping through the company with an iron broom. The mass lay-offs heavily affected Germany in particular: 20 out of 80 employees working at the office in Munich have been fired, among them almost the complete PR crew. AMD has also been quarrelling with its manufacturer, Globalfoundries, for quite some time now. Manufacturing problems concerning the Llano at the beginning of the year are supposed to have caused the planned McAir deal with Apple to fall apart. And things don't look good for the Bobcat successors Witchia and Krishna. Word is that AMD has cancelled them or is planning to take the 28 nm APU production away from Globalfoundries completely and put it in the hands of TSMC. In any case, Globalfoundries has put the plans for another fab in Abu Dhabi on ice for now.
Although first leaked benchmark results for the Llano successor Trinity with the new Piledriver core can't really impress with a performance increase of 23 to 35 per cent in graphics (3DMark Vantage) and 7 to 17 per cent in general (PCMark Vantage), but at least the graphics performance should be more than sufficient to keep the Ivy Bridge in check in this context.
(djwm)














