In association with heise online

04 December 2008, 09:34

Python 3.0 Arrives

The serpent sheds its compatibility skin

Dj Walker-Morgan

The Python language appeared first appeared in 1991 and has carved itself a place in the world of scripted languages as a focussed minimalist language with an extensive standard library. Python is very much defined by the C implementation of the language, CPython, though the Python language is available on the Java and CLR virtual machines in the form jython and IronPython. Its most recognisable feature as a language is the use of white space to replace curly braces or begin/end statements; Python coders say the reduction of syntactic noise, and the visual structuring of the code help them to be more productive.

Python 3.0 is a new version of the Python languages which breaks backwards compatibility with older Python programs, but a new version of Python, version 2.6, has recently been released which acts as a bridge between the two generations. Python 2.6 has a -3 switch which lets the Python interpreter warn where code will break. But what kinds of changes to the language are happening that require this kind of bridging version?

One of the first programs people write is "Hello World"; in Python 2.x this is simply

print "Hello World"

In Python 3.0, this is a syntax error. Python 3.0's "Hello World" looks like this

print("Hello World")

The Python developers took the special case print keyword and made it into a function, removing that special case. There are a number of benefits to the new handling; for example, it's easier to override a print statement to redirect output to a logging system, which is useful, as Python is more and more often deployed as part of enterprise applications. It also makes print behave as what it is; a library function rather than core to the language.

This is a large part of the reason why Python 3.0 isn't backwards compatible. Compatibility wasn't possible if the changes the developers believed necessary to the language were to be made. Many of these changes are deep in the nature of the language, rather than revisions, such as, for example, many generations of Java, where the libraries and syntax was added, but old APIs persisted. It is not as radical a strategy as starting with a completely blank sheet, but a thorough, no holds barred, reworking of the Python language, especially while simultaneously developing a bridging release, should mean that, for developers at least, the migration will be much easier.

Python 3.0 takes the opportunity to remove a whole set of clutter. The "classic classes" from before Python 2.1 have been taken out, leaving only the new-style classes that were introduced in 2.1 and matured in later releases. Outdated library modules, which led to coders being faced with a number of similar modules, have been removed.

One long awaited change is that Python 3.0 now uses Unicode as its core string encoding; previously Unicode was handled in Python 2.x as a separate and special Unicode string. This change has an impact on any Python program that manipulates text and assumes that all text is ASCII, or programs which used strings as a proxy for an array of bytes. For the latter case, there are now immutable and mutable bytes, depending on what is needed.

To go with the new strings, a new formatting system has been introduced and the various older formatting calls removed, so that there is now one unified way of formatting text. The new formatter allows, not only for pre-built format directives, but allows classes to customise and extend formatting capabilities, so that, for example, datetime rather than have its own formatting code, extends the formatter to support the presentation of dates and times. The idea of a unified formatter was borrowed from C#.

Other languages provide inspiration for Python 3.0 features. Inspired by Java, Zope Interfaces and DictMixIn, Python has added Abstract Base Classes, ABCs, which allow a programmer to partially define how a class will work, ready for other classes to extend it. ABCs can be used as types in the Python language for more expressiveness. Java was also the inspiration for the addition of dictionary views. In Python 3.0, dictionary objects return view objects when queried for keys, values or items. These views have the advantage of dynamically updating when some other part of the code manipulates the dictionary object and of avoiding situations where a large amount of memory would be used returning values from a large dictionary. In a similar style, map and filter operations return iterators rather arrays, allowing for more memory efficient coding, though some would suggest that it makes Python harder to use interactively.

What comes out of the design process is just like Python, only everywhere a little different. As well as Python 2.6's bridge between Python 2.x and 3.0, there is also a tool 2to3 which attempts to translate and update existing Python code to the new cleaner Python style.

There is one important reason why people should not immediately be upgrading to Python 3.0, even if they can change their code to run – performance. Python 3.0 is not yet field optimised; although the developers have put in a lot of effort maintaining performance, many of the structural changes within the language have increased resource demands. Early benchmarking with the beta showed a 33 per cent slowdown and despite best efforts, it is reasonable to assume that there will be, at least for the near future, a tangible performance penalty, at least until Python 3.1 and later.

In many ways, this is why Python 2.6 is as important as 3.0. As well as being able to flag 3.0 incompatible code, 2.6 is able to import or use many 3.0 features. Python programmers will be able to evolve their code iteratively towards 3.0, using only the features they are comfortable with. Rather than face a "big bang" migration, Python coders can expect to naturally drift towards 3.0, on their own time scale, and watch the development of Python 3.x as its performance improves.

Radically reworking a language is always a risky proposition; some redevelopment projects see languages disappear down the memory hole for years, while others fail when delivered because the migration risk is too high. The Python developers appear to have balanced all the classic concerns with the process, and laid a foundation for the next generation Python, which should, in time, become the preferred incarnation.

See Also:

Print Version | Permalink:
  • Twitter
  • Facebook
  • submit to slashdot
  • StumbleUpon
  • submit to reddit

  • July's Community Calendar

The H Open

The H Security

The H Developer

The H Internet Toolkit