GitHub's Linguist open sourced
GitHub has released its Linguist library, designed to identify the programming language in a file, as open source. GitHub, a commercial project hosting service which handles files of numerous types, uses the library to detect which syntax highlighter to use for a file, to work out when to ignore binary files and generated files, and to generate graph data for projects by language.
For example, when faced with a new file, GitHub's software uses Linguist to recognise what language it is, and then passes it on to Albino, a GitHub-developed Ruby wrapper for the Python-based Pygments syntax highlighter library. Linguist uses file extensions and, if that fails, examines the file contents to establish language.
GitHub hopes that by open sourcing the library, users will be able to extend it with support for new languages which it can then incorporate back into the GitHub system. Its usefulness outside of GitHub is debatable though – the library is configured for production use on GitHub and testing elsewhere – unless a third party modifies it sufficiently. Linguist is available under an MIT licence from its GitHub repository.