Processor Whispers - About MIPS and MIPS
by Andreas Stiller
When two quarrel, the third rejoices: while ARM and Atom were slinging mud at each other, MIPS could advance unhurriedly. And the abbreviation MIPS - with a different meaning - plays an important role in chip manufacturing, too.
MIPS has already quietly conquered some markets in the embedded sector: DVD recorders, set-top boxes, cable modems and so on, and now the small Californian processor company is creating a bit of a stir with its quad-core MK1074. Android is an interesting alternative to Linux and Windows CE (BSP 6.04) especially for the currently very popular smaller devices. The first set-top boxes with Android and processors from the MIPS licensee Sigma Designs are on the market and smartphones are also planned.
A team of researchers at the University of California in San Diego is currently working on a particularly power efficient design - called Greendroid - as an environment for a MIPS core. And who knows, maybe Microsoft will release Windows Phone 7 for MIPS later on. In any case, Microsoft and MIPS go back a long way: in the 90s, when MIPS was still part of SGI, there was even Windows NT 3.1 to 4.0 for the MIPS architecture, originally designed by John Hennessy, RISC guru and president of Stanford University. The legendary R4000 MIPS processor made microprocessor history as the first 64 bit processor, almost a year before Digital’s Alpha and a long time before UltraSPARC, PA-RISC-2.0, POWER2, Itanium, PowerPC or AMD64 appeared.
The transition to 64 bits for ARM’s future high-end designs is still pending, even if the new extensions for ARMv7-A provide a "Large Physical Address Extension" to 40 bits as well as further virtualisation options. With two address windows, the new address extension is somewhat more sophisticated than the now obsolete PAE of the x86 processors but it can’t really keep up with true 64-bit operation. While ARM hasn’t released a roadmap including 64 bit processors, it did announce simultaneous multi-threading (SMT) at the "Linley Tech Processor Conference", although not yet for the next Cortex-A15 "Eagle". For MIPS, on the other hand, SMT is already an old trick and thanks to MIPS64 it will be well prepared if the next generation of tablet PCs starts to cross the 4 GB RAM limit, or if the chips begin to play a more important role in the server market.
MIPS also intends to gain ground again with supercomputers, namely with the Loongson (code named Godson) from the Chinese Institute of Computing Technology (ICT) in Shanghai. In the beginning there was a bit of animosity because of MIPS’s licences, but it all got straightened out by an extensive architecture licence in the summer of 2009. And now MIPS benefits greatly from many companies and engineering offices in that country that are increasingly orienting themselves towards the "Chinese" architecture.
STMicroelectronics manufactures the Loongson and sells the Loongson 2F under the name STLS2F01: a single-core with 4-issue super-scalar, 900 MHz clock rate and 4 watts TDP. The newer Loongson 3A with four 1 GHz cores (GS464), two DDR2/3 memory channels as well as two HyperTransport 1.0 controllers and 16 gigaflops at 15 watts is already in production: the Chinese system manufacturer Dawning sells it inside its blade servers. In November, the Dawning 5000L - possibly equipped with 80,000 Loongson processors - might make its way to the top of the upcoming Top500 list of supercomputers as the next petaflops system.
At the Hotchips conference in August, Professor Weiwu Hu presented the next core, the GS464V, whose highlight is its 256 bit wide vector units á la Intel AVX. Unlike Intel’s next processor generation Sandy Bridge, it intensively supports "Fused Multiply Add" and - counting both units - features eight parallel FMA instructions for double precision floating point values. Next year, this core is scheduled to make its debut inside the 8-core chip Loongson 3B, which - at only 1 GHz clock rate and 40 watts TDP - theoretically gets 128 gigaflops. In practice, according to Weiwu Hu, about 93 per cent of this value can be achieved in the matrix multiplication - close to 120 gigaflops in total and 3 gigaflops / Watt. This performance roughly equals the DGEMM performance of two current Intel Xeon X5680 processors (Westmere-EP) with a total of 12 cores and 3.33 GHz clock rate. However, at 260 watts, the two Xeons eat more than six times as much power.
Another important characteristic of the chip is its ability to quickly emulate x86 code. The ICT has provided special hardware and instructions to ensure that x86 software executes very efficiently in conjunction with the software emulator QEMU. If this works as well as promised, the Loongson computers would immediately be able to draw on a flood of software .
The 8-core Loongson 3B is still designed for the "old" STM 65-nm process, but - according to Weiwu Hu - an upgrade to 28 nm is planned for the 16-core Loongson 3C in 2012, to meet the international standard. He didn’t say who the manufacturing partner will be. TSMC in Taiwan is close, but GLOBALFOUNDRIES is also a possible candidate - just like TSMC it’s a partner of STM.
Fittingly, GLOBALFOUNDRIES will be holding the "Global Technology Conference" GTC2010 in Taiwan and Shanghai this mid-October. The main topics will be 28-nm HKMG technology and the road map towards 22/20 nm. Together with AMD and STMicroelectronics GLOBALFOUNDRIES intends to point out the advantages of the "Gate-First" approach over the "Gate-Last" technology from Intel and TSMC - the latter actually switched from First to Last. The discussions about the advantages and disadvantages of the different process sequences have been going on for a long time now.
Placing the gate first (MIPS, Metal Inserted Poly-Silicon) – as IBM, Infineon, GLOBALFOUNDRIES, Renesas and Samsung do – makes it possible to position the other electrodes, source and drain, easily, exactly and space efficiently. But this way, the gate has to go through the later high-temperature manufacturing steps, something that is problematic for some metal gates. Intel and TSMC prefer the Gate-Last process (RMG, Replacement Metal Gate). Although RMG is much more complex process that requires more space, the gate is only placed after the high-temperature phase is over. Recently, an Barclays bank analyst caused quite a stir when he reported on problems with thermal instabilities and shifts in the transistor threshold voltage in connection with the Gate-First procedure in the 32-nm and 28-nm HKMG process. Naturally, GLOBALFOUNDRIES and Samsung have denied this is a problem.