International domain names get a bit closer and more European-friendly
In early 2009, the Internet Engineering Task Force (IETF) plans to adopt the update for international domain names (IDN) discussed since the beginning of this year. This became evident in the talks at the developers' meeting in Minneapolis this week. One of the new entries on the list of characters allowed for domains that don't use the ASCII character set will be the German "eszet" or "scharfes S" character ("ß"), which has been excluded from the IDN standards (RFC3490, RFC3492) until now. It won't make an immediate difference for German internet users, though – domains containing an eszet will continue to replace it with ss. Marcos Sanz, who represents the German internet registrar DeNIC in the IETF's IDN working groups, said in Minneapolis that DeNIC welcomes the additional possibility. He said, however, DeNIC has so far not decided how and when to make use of this new registration option. Sanz said it is important to consider that many users rely on the current mapping rules and state their contact addresses with the ß character accordingly.
Up until the last moment, there were discussions within the IETF about the extent to which registries should be allowed to determine their own special language characteristics and how to use them. The authors of the voluminous new standards series about internationalised domains to be adopted as IDNA 2008 advocated stricter rules within the actual standard documents.
On the other hand, Vinton Cerf said that "in terms of the numerous special requests we strongly depend on the registries". The IETF's board asked the Turing Award 2004-winning co-author of IP to lead the hot-headed working group. Cerf said that the registries know best about language specific problems. In Minneapolis, he strongly advised to hold a final consultation with representatives of the Arabic countries, whose alphabet-related problems make the Germany's single "eszet" character seem rather unimportant. A separate standard document called BIDI, for example, allows domain names to be written from right to left. The current "consultation" relates to the various numbering systems within the Arabic language community.
Apart from the Western numbers 1,2,3, ..., which came to Europe from India via the Arabic countries, Arabic languages also use classical Eastern Arabic numerals, also called Indic numerals. Things get complicated because the numbers four, five and six are written differently in the Eastern and Western Arabic countries. The problem was further complicated by Unicode, the organisation that takes inventory and files the code of languages. Instead of just giving different character codes to the three character variations, Unicode gave different character codes to the entire two sets of numbers. As a result, the Arabic number one matches two different Unicode character codes, depending on whether the Unicode character set for Western Arabic or Eastern Arabic is used.
If the two character sets are adopted and used in parallel, the overlap will at best cause confusion, at worst it will be exploited by phishers. Arabic language experts have now been asked to help decide whether to adopt only one of the character sets, whether to restrict domains to using one specific character set, or whether to only stipulate that the numbers can't appear right next to each other. Some observers think that the deadline of two weeks is much too short, warning that the whole standardisation process is far too driven by Western experts. Indeed it sometimes seems rather peculiar that Americans and Europeans should argue about the finer aspects of Persian, Urdu or Dhivehi – the language spoken in the Maldives.