In association with heise online

29 January 2010, 15:24

Unicode dominates web

  • Twitter
  • Facebook
  • submit to slashdot
  • StumbleUpon
  • submit to reddit

Zoom Language-specific encoding has become less popular on the web.
Over the past two years, language-specific and character-specific web page encoding formats have rapidly decreased in favour of the universally applicable UTF-8 Unicode format. For instance, pure ASCII and Latin-1 pages now only have a share of less than 20% each. These results are, at least, applicable to the web pages indexed by Google.

Unicode allows several fonts to be used in one document, which isn't possible with either standard ISO encoding formats or with those for Asian languages. In addition to the usual character set, the Unicode standard also includes codes for ligatures, special mathematical characters and symbols of purely academic interest such as hieroglyphs. The UTF-8 format (RFC) used on the web contains between 1 and 4 bytes per character while the 1-byte characters correspond to ASCII codes.


Print Version | Send by email | Permalink:

  • July's Community Calendar

The H Open

The H Security

The H Developer

The H Internet Toolkit