GCC benefited from the activities of Cygnus, but inevitably, not all of the code found its way back into the upstream releases of GCC. As David A. Wheeler tells it "In 1997, there were disagreements over the development approach and slow development speed of GCC. In particular, many were dissatisfied with the FSF-appointed GCC maintainer, who was very slow to accept changes. Cygnus (headed by Michael Tiemann) decided to create a fork of the project named EGCS, and invited others to join."
A Cygnus developer, D.V. Henkel-Wallace, sent out an email announcing that "A bunch of us (including Fortran, Linux, Intel and RTEMS hackers) have decided to start a more experimental development project, just like Cygnus and the FSF started the GCC2 project about 6 years ago. Only this time the net community with which we are working is larger! We are calling this project 'EGCS' (pronounced 'eggs')." EGCS was an acronym for "Experimental/Enhanced GNU Compiler System".
The complaint was that the FSF had taken a 'cathedral' approach to software development. "GCC is part of the FSF's publicity for the GNU project, as well as being the GNU system's compiler, so stability is paramount for them. On the other hand, Cygnus, the Linux folks, the PGCC folks, the Fortran folks and many others have done development work which has not yet gone into the GCC2 tree despite years of efforts to make it possible."
EGCS immediately absorbed several branches of code that had failed to make it into GCC, including PGCC (Pentium-optimized GCC), G77 (Fortran), and many other enhancements and features for specific architectures.
The split was paralleled by the 1994 libc fork of the GNU C library, glibc, which was led by the Linux kernel developers, and for similar reasons. In the case of the glibc fork the FSF project outdistanced the forked project over time. "Over the next few years the original glibc increasingly offered far better standards conformance, multi-threading and higher performance" and the fork, known as libc, was abandoned in 1997. The EGCS fork took a different path, as Wheeler recounts, "EGCS worked at an accelerated pace, and soon surpassed the original GCC project. In April 1999 the rift was healed; the FSF agreed to switch to using the EGCS code for GCC, and the EGCS project agreed to dissolve itself and take over the original GCC project."
The reintegration of GCC and EGCS was completed with the release of GCC 2.95 in July 1999, when the GNU C Compiler was renamed the GNU Compiler Collection, and the project adopted what may be termed a 'bazaar' approach, overseen by a steering committee of long term developers inherited from EGCS, who have taken a collective approach to reconciling technical and spiritual differences.
Front to back
Compilation of code by GCC can be seen as a three part process, which, for the sake of simplification, consists of a 'front end', 'middle end' and 'back end'.
A front end is the part of the compiler which parses and validates the syntax of a particular language and produces error messages. The code is translated into a Syntax Tree, which is unique to each language, and is then converted into a language independent Intermediate Representation (IR) known as GENERIC.
The middle end takes the output of GENERIC, and creates another IR known as GIMPLE, before performing control flow analysis and optimisations to reduce the code to its simplest Static Single Assignment (SSA) form. At this stage more than 20 different optimisations are performed before producing a format known as Register Transfer Language or RTL.
A back end takes the RTL and performs register allocation and code scheduling optimisations before producing the machine code for a specific target architecture.
GCC includes a variety of 'front ends' (for each language supported) and 'back ends' (for each of the targeted architectures), and consists of well over two million lines of code.
Between 2001 and 2004, the framework of GCC underwent a major rewrite, which introduced the GENERIC, GIMPLE and SSA phases of compilation, and enhanced and simplified the architecture for handling code optimisation. The changes were sponsored by Red Hat and involved the contributions of more than 30 developers from the GCC community. An article in Red Hat magazine describing these changes gives a more comprehensive description of GCC and its compilation process.
A matter of policy
It is sometimes argued that the GCC code lacks clarity (and that the intermediate semantics are opaque) as a matter of policy, and that this has worked against the greater design goals of GCC.
Joe Buck has said as much on the GCC mailing list: "RMS would tell you that we only have a GNU C++ compiler because Mike Tiemann's employer could not make it proprietary, and we only have a GNU Objective-C compiler because Steve Jobs could not make it proprietary. Had the equivalent of dump formats existed at the time, we'd only have C. (Of course, that's the inverse issue: proprietary front ends connecting to GNU back ends). By making just building onto GCC and GPLing the code the path of least resistance, RMS hopes to motivate more people to produce free software."
While front ends and back ends are migratory paths to the compiled object code, it isn't easy to separate their functionality from the body of the code, and code that is built into GCC must be released back to the community under the terms of the GPL.
Over the years the tendency of compiler writers and chip manufacturers to follow "the path of least resistance" has contributed to the usefulness and universal compatibility of GCC, to the greater benefit of users and developers. GCC is a flagship of the free software movement and the goal of the free software movement is to make software free. Inevitably this hasn't suited everybody.