In association with heise online

21 October 2008, 14:50

Kernel Log: More than 10 million lines of Linux source files

  • Twitter
  • Facebook
  • submit to slashdot
  • StumbleUpon
  • submit to reddit

After the release of Linux 2.6.27, kernel developers are currently busily integrating patches for the next kernel version into the main development branch of Linux. This usually involves discarding some old code and adding new code though on balance, there are usually more new lines than old ones, making the kernel grow continually.

In this process, the kernel developers have now passed the 10 million line mark if blank lines, comments and text files are included in a current Git checkout of the Linux source code (find . -type f -not -regex '\./\.git.*' | xargs cat | wc -l). It is also worth noting that the lines of text in source code files as that number has recently passed 9 million (find . -name *.[hcS] -not -regex '\./\.git.*' | xargs cat | wc -l).

Programs like SLOCCount can be used to inspect the Linux kernel's source code in more detail. According to this tool, the source code line count is not 9 million but exactly 6,399,191 (Source Lines of Code/SLOC), as the program doesn't count blank lines, comments and several other types of input. More than half of the lines are part of hardware drivers; the second largest chunk is the arch/ directory which contains the source code of the various architectures supported by Linux.

SLOC	Directory	SLOC-by-Language (Sorted)
3301081 drivers ansic=3296641,yacc=1680,asm=1136,perl=829,lex=778,
1258638 arch ansic=1047549,asm=209655,sh=617,yacc=307,lex=300,
544871 fs ansic=544871
376716 net ansic=376716
356180 sound ansic=355997,asm=183
320078 include ansic=318367,cpp=1511,asm=125,pascal=75
74503 kernel ansic=74198,perl=305
36312 mm ansic=36312
32729 crypto ansic=32729
25303 security ansic=25303
24111 scripts ansic=14424,perl=4653,cpp=1791,sh=1155,yacc=967,
17065 lib ansic=17065
10723 block ansic=10723
7616 Documentation ansic=5615,sh=926,perl=857,lisp=218
5227 ipc ansic=5227
2622 virt ansic=2622
2287 init ansic=2287
1803 firmware asm=1598,ansic=205
833 samples ansic=833
493 usr ansic=491,asm=2
0 top_dir (none)

According to SLOCCount, 96.4 per cent of the code is written in C and 3.3 percent in Assembler. The other programming languages are only used marginally: Perl, for example, was used for some help scripts during kernel development and only accounts for a tiny 0.1 percent. In the Assembler-heavy architecture directory, SLOCCount also claims to have found 116 lines of Pascal code – but that could well be a misinterpretation by SLOCCount.

Totals grouped by language (dominant language first):
ansic: 6168175 (96.39%)
asm: 212699 (3.32%)
perl: 6672 (0.10%)
cpp: 3302 (0.05%)
yacc: 2954 (0.05%)
sh: 2715 (0.04%)
lex: 1820 (0.03%)
python: 424 (0.01%)
lisp: 218 (0.00%)
pascal: 116 (0.00%)
awk: 96 (0.00%)

SLOCCount also tries to give a rough calculation of the source code's value; according to the program's estimates, it would take more than 200 developers about nine and a half years and cost $267 million to rewrite the code from scratch. Given that the program has not been updated for four years, the accuracy of this calculation is arguable; especially the cost per developer would now surely need to be increased.

Total Physical Source Lines of Code (SLOC)		  = 6,399,191
Development Effort Estimate, Person-Years (Person-Months) = 1,983.63
(Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05))
Schedule Estimate, Years (Months) = 9.59 (115.10)
(Basic COCOMO model, Months = 2.5 * (person-months**0.38))
Estimated Average Number of Developers (Effort/Schedule) = 206.81
Total Estimated Cost to Develop = $ 267,961,839
(average salary = $56,286/year, overhead = 2.40).

Generated using David A. Wheeler's 'SLOCCount'

There is no end in sight for kernel growth which has been ongoing in the Linux 2.6 series for several years – with every new version, the kernel hackers extend the Linux kernel further to include new functions and drivers, improving the hardware support or making it more flexible, better or faster. A look at the figures pertaining to the latest kernel versions also shows that it is not only the number of lines of source code which is continually increasing, but also the number of changes per kernel version.

Further background and information about developments in the Linux kernel and its environment can also be found in previous issues of the kernel log at heise open:

Older Kernel logs can be found in the archives or by using the search function at heise open. (thl/c't)


Print Version | Send by email | Permalink:

  • July's Community Calendar

The H Open

The H Security

The H Developer

The H Internet Toolkit