Python Documentation contents¶

What’s New in Python¶

The “What’s New in Python” series of essays takes tours through the most important changes between major Python versions. They are a “must read” for anyone wishing to stay up-to-date after a new release.

What’s New In Python 3.2¶

Author:	Raymond Hettinger
Release:	3.2.2
Date:	August 02, 2015

This article explains the new features in Python 3.2 as compared to 3.1. It focuses on a few highlights and gives a few examples. For full details, see the Misc/NEWS file.

See also

PEP 392 - Python 3.2 Release Schedule

PEP 384: Defining a Stable ABI¶

In the past, extension modules built for one Python version were often not usable with other Python versions. Particularly on Windows, every feature release of Python required rebuilding all extension modules that one wanted to use. This requirement was the result of the free access to Python interpreter internals that extension modules could use.

With Python 3.2, an alternative approach becomes available: extension modules which restrict themselves to a limited API (by defining Py_LIMITED_API) cannot use many of the internals, but are constrained to a set of API functions that are promised to be stable for several releases. As a consequence, extension modules built for 3.2 in that mode will also work with 3.3, 3.4, and so on. Extension modules that make use of details of memory structures can still be built, but will need to be recompiled for every feature release.

See also

PEP 384 - Defining a Stable ABI: PEP written by Martin von Löwis.

PEP 389: Argparse Command Line Parsing Module¶

A new module for command line parsing, argparse, was introduced to overcome the limitations of optparse which did not provide support for positional arguments (not just options), subcommands, required options and other common patterns of specifying and validating options.

This module has already had widespread success in the community as a third-party module. Being more fully featured than its predecessor, the argparse module is now the preferred module for command-line processing. The older module is still being kept available because of the substantial amount of legacy code that depends on it.

Here’s an annotated example parser showing features like limiting results to a set of choices, specifying a metavar in the help screen, validating that one or more positional arguments is present, and making a required option:

import argparse
parser = argparse.ArgumentParser(
            description = 'Manage servers',         # main description for help
            epilog = 'Tested on Solaris and Linux') # displayed after help
parser.add_argument('action',                       # argument name
            choices = ['deploy', 'start', 'stop'],  # three allowed values
            help = 'action on each target')         # help msg
parser.add_argument('targets',
            metavar = 'HOSTNAME',                   # var name used in help msg
            nargs = '+',                            # require one or more targets
            help = 'url for target machines')       # help msg explanation
parser.add_argument('-u', '--user',                 # -u or --user option
            required = True,                        # make it a required argument
            help = 'login as user')

Example of calling the parser on a command string:

>>> cmd  = 'deploy sneezy.example.com sleepy.example.com -u skycaptain'
>>> result = parser.parse_args(cmd.split())
>>> result.action
'deploy'
>>> result.targets
['sneezy.example.com', 'sleepy.example.com']
>>> result.user
'skycaptain'

Example of the parser’s automatically generated help:

>>> parser.parse_args('-h'.split())

usage: manage_cloud.py [-h] -u USER
                       {deploy,start,stop} HOSTNAME [HOSTNAME ...]

Manage servers

positional arguments:
  {deploy,start,stop}   action on each target
  HOSTNAME              url for target machines

optional arguments:
  -h, --help            show this help message and exit
  -u USER, --user USER  login as user

Tested on Solaris and Linux

An especially nice argparse feature is the ability to define subparsers, each with their own argument patterns and help displays:

import argparse
parser = argparse.ArgumentParser(prog='HELM')
subparsers = parser.add_subparsers()

parser_l = subparsers.add_parser('launch', help='Launch Control')   # first subgroup
parser_l.add_argument('-m', '--missiles', action='store_true')
parser_l.add_argument('-t', '--torpedos', action='store_true')

parser_m = subparsers.add_parser('move', help='Move Vessel',        # second subgroup
                                 aliases=('steer', 'turn'))         # equivalent names
parser_m.add_argument('-c', '--course', type=int, required=True)
parser_m.add_argument('-s', '--speed', type=int, default=0)

$ ./helm.py --help                         # top level help (launch and move)
$ ./helm.py launch --help                  # help for launch options
$ ./helm.py launch --missiles              # set missiles=True and torpedos=False
$ ./helm.py steer --course 180 --speed 5   # set movement parameters

See also

PEP 389 - New Command Line Parsing Module: PEP written by Steven Bethard.

Upgrading optparse code for details on the differences from optparse.

PEP 391: Dictionary Based Configuration for Logging¶

The logging module provided two kinds of configuration, one style with function calls for each option or another style driven by an external file saved in a ConfigParser format. Those options did not provide the flexibility to create configurations from JSON or YAML files, nor did they support incremental configuration, which is needed for specifying logger options from a command line.

To support a more flexible style, the module now offers logging.config.dictConfig() for specifying logging configuration with plain Python dictionaries. The configuration options include formatters, handlers, filters, and loggers. Here’s a working example of a configuration dictionary:

{"version": 1,
 "formatters": {"brief": {"format": "%(levelname)-8s: %(name)-15s: %(message)s"},
                "full": {"format": "%(asctime)s %(name)-15s %(levelname)-8s %(message)s"}
                },
 "handlers": {"console": {
                   "class": "logging.StreamHandler",
                   "formatter": "brief",
                   "level": "INFO",
                   "stream": "ext://sys.stdout"},
              "console_priority": {
                   "class": "logging.StreamHandler",
                   "formatter": "full",
                   "level": "ERROR",
                   "stream": "ext://sys.stderr"}
              },
 "root": {"level": "DEBUG", "handlers": ["console", "console_priority"]}}

If that dictionary is stored in a file called conf.json, it can be loaded and called with code like this:

>>> import json, logging.config
>>> with open('conf.json') as f:
        conf = json.load(f)
>>> logging.config.dictConfig(conf)
>>> logging.info("Transaction completed normally")
INFO    : root           : Transaction completed normally
>>> logging.critical("Abnormal termination")
2011-02-17 11:14:36,694 root            CRITICAL Abnormal termination

See also

PEP 391 - Dictionary Based Configuration for Logging: PEP written by Vinay Sajip.

PEP 3148: The `concurrent.futures` module¶

Code for creating and managing concurrency is being collected in a new top-level namespace, concurrent. Its first member is a futures package which provides a uniform high-level interface for managing threads and processes.

The design for concurrent.futures was inspired by java.util.concurrent.package. In that model, a running call and its result are represented by a Future object that abstracts features common to threads, processes, and remote procedure calls. That object supports status checks (running or done), timeouts, cancellations, adding callbacks, and access to results or exceptions.

The primary offering of the new module is a pair of executor classes for launching and managing calls. The goal of the executors is to make it easier to use existing tools for making parallel calls. They save the effort needed to setup a pool of resources, launch the calls, create a results queue, add time-out handling, and limit the total number of threads, processes, or remote procedure calls.

Ideally, each application should share a single executor across multiple components so that process and thread limits can be centrally managed. This solves the design challenge that arises when each component has its own competing strategy for resource management.

Both classes share a common interface with three methods: submit() for scheduling a callable and returning a Future object; map() for scheduling many asynchronous calls at a time, and shutdown() for freeing resources. The class is a context manager and can be used in a with statement to assure that resources are automatically released when currently pending futures are done executing.

A simple of example of ThreadPoolExecutor is a launch of four parallel threads for copying files:

import concurrent.futures, shutil
with concurrent.futures.ThreadPoolExecutor(max_workers=4) as e:
    e.submit(shutil.copy, 'src1.txt', 'dest1.txt')
    e.submit(shutil.copy, 'src2.txt', 'dest2.txt')
    e.submit(shutil.copy, 'src3.txt', 'dest3.txt')
    e.submit(shutil.copy, 'src4.txt', 'dest4.txt')

See also

PEP 3148 - Futures – Execute Computations Asynchronously: PEP written by Brian Quinlan.

Code for Threaded Parallel URL reads, an example using threads to fetch multiple web pages in parallel.

Code for computing prime numbers in parallel, an example demonstrating ProcessPoolExecutor.

PEP 3147: PYC Repository Directories¶

Python’s scheme for caching bytecode in .pyc files did not work well in environments with multiple Python interpreters. If one interpreter encountered a cached file created by another interpreter, it would recompile the source and overwrite the cached file, thus losing the benefits of caching.

The issue of “pyc fights” has become more pronounced as it has become commonplace for Linux distributions to ship with multiple versions of Python. These conflicts also arise with CPython alternatives such as Unladen Swallow.

To solve this problem, Python’s import machinery has been extended to use distinct filenames for each interpreter. Instead of Python 3.2 and Python 3.3 and Unladen Swallow each competing for a file called “mymodule.pyc”, they will now look for “mymodule.cpython-32.pyc”, “mymodule.cpython-33.pyc”, and “mymodule.unladen10.pyc”. And to prevent all of these new files from cluttering source directories, the pyc files are now collected in a “__pycache__” directory stored under the package directory.

Aside from the filenames and target directories, the new scheme has a few aspects that are visible to the programmer:

Imported modules now have a __cached__ attribute which stores the name of the actual file that was imported:

>>> import collections
>>> collections.__cached__
'c:/py32/lib/__pycache__/collections.cpython-32.pyc'

The tag that is unique to each interpreter is accessible from the imp module:
```
>>> import imp
>>> imp.get_tag()
'cpython-32'
```

Scripts that try to deduce source filename from the imported file now need to be smarter. It is no longer sufficient to simply strip the “c” from a ”.pyc” filename. Instead, use the new functions in the imp module:

>>> imp.source_from_cache('c:/py32/lib/__pycache__/collections.cpython-32.pyc')
'c:/py32/lib/collections.py'
>>> imp.cache_from_source('c:/py32/lib/collections.py')
'c:/py32/lib/__pycache__/collections.cpython-32.pyc'

The py_compile and compileall modules have been updated to reflect the new naming convention and target directory. The command-line invocation of compileall has new options: -i for specifying a list of files and directories to compile and -b which causes bytecode files to be written to their legacy location rather than __pycache__.
The importlib.abc module has been updated with new abstract base classes for loading bytecode files. The obsolete ABCs, PyLoader and PyPycLoader, have been deprecated (instructions on how to stay Python 3.1 compatible are included with the documentation).

See also

PEP 3147 - PYC Repository Directories: PEP written by Barry Warsaw.

PEP 3149: ABI Version Tagged .so Files¶

The PYC repository directory allows multiple bytecode cache files to be co-located. This PEP implements a similar mechanism for shared object files by giving them a common directory and distinct names for each version.

The common directory is “pyshared” and the file names are made distinct by identifying the Python implementation (such as CPython, PyPy, Jython, etc.), the major and minor version numbers, and optional build flags (such as “d” for debug, “m” for pymalloc, “u” for wide-unicode). For an arbitrary package “foo”, you may see these files when the distribution package is installed:

/usr/share/pyshared/foo.cpython-32m.so
/usr/share/pyshared/foo.cpython-33md.so

In Python itself, the tags are accessible from functions in the sysconfig module:

>>> import sysconfig
>>> sysconfig.get_config_var('SOABI')    # find the version tag
'cpython-32mu'
>>> sysconfig.get_config_var('SO')       # find the full filename extension
'.cpython-32mu.so'

See also

PEP 3149 - ABI Version Tagged .so Files: PEP written by Barry Warsaw.

PEP 3333: Python Web Server Gateway Interface v1.0.1¶

This informational PEP clarifies how bytes/text issues are to be handled by the WSGI protocol. The challenge is that string handling in Python 3 is most conveniently handled with the str type even though the HTTP protocol is itself bytes oriented.

The PEP differentiates so-called native strings that are used for request/response headers and metadata versus byte strings which are used for the bodies of requests and responses.

The native strings are always of type str but are restricted to code points between U+0000 through U+00FF which are translatable to bytes using Latin-1 encoding. These strings are used for the keys and values in the environment dictionary and for response headers and statuses in the start_response() function. They must follow RFC 2616 with respect to encoding. That is, they must either be ISO-8859-1 characters or use RFC 2047 MIME encoding.

For developers porting WSGI applications from Python 2, here are the salient points:

If the app already used strings for headers in Python 2, no change is needed.
If instead, the app encoded output headers or decoded input headers, then the headers will need to be re-encoded to Latin-1. For example, an output header encoded in utf-8 was using h.encode('utf-8') now needs to convert from bytes to native strings using h.encode('utf-8').decode('latin-1').
Values yielded by an application or sent using the write() method must be byte strings. The start_response() function and environ must use native strings. The two cannot be mixed.

For server implementers writing CGI-to-WSGI pathways or other CGI-style protocols, the users must to be able access the environment using native strings even though the underlying platform may have a different convention. To bridge this gap, the wsgiref module has a new function, wsgiref.handlers.read_environ() for transcoding CGI variables from os.environ into native strings and returning a new dictionary.

See also

PEP 3333 - Python Web Server Gateway Interface v1.0.1: PEP written by Phillip Eby.

Other Language Changes¶

Some smaller changes made to the core Python language are:

String formatting for format() and str.format() gained new capabilities for the format character #. Previously, for integers in binary, octal, or hexadecimal, it caused the output to be prefixed with ‘0b’, ‘0o’, or ‘0x’ respectively. Now it can also handle floats, complex, and Decimal, causing the output to always have a decimal point even when no digits follow it.
```
>>> format(20, '#o')
'0o24'
>>> format(12.34, '#5.0f')
'  12.'
```
(Suggested by Mark Dickinson and implemented by Eric Smith in issue 7094.)

There is also a new str.format_map() method that extends the capabilities of the existing str.format() method by accepting arbitrary mapping objects. This new method makes it possible to use string formatting with any of Python’s many dictionary-like objects such as defaultdict, Shelf, ConfigParser, or dbm. It is also useful with custom dict subclasses that normalize keys before look-up or that supply a __missing__() method for unknown keys:

>>> import shelve
>>> d = shelve.open('tmp.shl')
>>> 'The {project_name} status is {status} as of {date}'.format_map(d)
'The testing project status is green as of February 15, 2011'

>>> class LowerCasedDict(dict):
        def __getitem__(self, key):
            return dict.__getitem__(self, key.lower())
>>> lcd = LowerCasedDict(part='widgets', quantity=10)
>>> 'There are {QUANTITY} {Part} in stock'.format_map(lcd)
'There are 10 widgets in stock'

>>> class PlaceholderDict(dict):
        def __missing__(self, key):
            return '<{}>'.format(key)
>>> 'Hello {name}, welcome to {location}'.format_map(PlaceholderDict())
'Hello <name>, welcome to <location>'

(Suggested by Raymond Hettinger and implemented by Eric Smith in issue 6081.)

The interpreter can now be started with a quiet option, -q, to prevent the copyright and version information from being displayed in the interactive mode. The option can be introspected using the sys.flags attribute:
```
$ python -q
>>> sys.flags
sys.flags(debug=0, division_warning=0, inspect=0, interactive=0,
optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0,
ignore_environment=0, verbose=0, bytes_warning=0, quiet=1)
```
(Contributed by Marcin Wojdyr in issue 1772833).
The hasattr() function works by calling getattr() and detecting whether an exception is raised. This technique allows it to detect methods created dynamically by __getattr__() or __getattribute__() which would otherwise be absent from the class dictionary. Formerly, hasattr would catch any exception, possibly masking genuine errors. Now, hasattr has been tightened to only catch AttributeError and let other exceptions pass through:
```
>>> class A:
        @property
        def f(self):
            return 1 // 0

>>> a = A()
>>> hasattr(a, 'f')
Traceback (most recent call last):
  ...
ZeroDivisionError: integer division or modulo by zero
```
(Discovered by Yury Selivanov and fixed by Benjamin Peterson; issue 9666.)
The str() of a float or complex number is now the same as its repr(). Previously, the str() form was shorter but that just caused confusion and is no longer needed now that the shortest possible repr() is displayed by default:
```
>>> import math
>>> repr(math.pi)
'3.141592653589793'
>>> str(math.pi)
'3.141592653589793'
```
(Proposed and implemented by Mark Dickinson; issue 9337.)
memoryview objects now have a release() method and they also now support the context manager protocol. This allows timely release of any resources that were acquired when requesting a buffer from the original object.
```
>>> with memoryview(b'abcdefgh') as v:
        print(v.tolist())
[97, 98, 99, 100, 101, 102, 103, 104]
```
(Added by Antoine Pitrou; issue 9757.)
Previously it was illegal to delete a name from the local namespace if it occurs as a free variable in a nested block:
```
def outer(x):
    def inner():
       return x
    inner()
    del x
```
This is now allowed. Remember that the target of an except clause is cleared, so this code which used to work with Python 2.6, raised a SyntaxError with Python 3.1 and now works again:
```
def f():
    def print_error():
       print(e)
    try:
       something
    except Exception as e:
       print_error()
       # implicit "del e" here
```
(See issue 4617.)
The internal structsequence tool now creates subclasses of tuple. This means that C structures like those returned by os.stat(), time.gmtime(), and sys.version_info now work like a named tuple and now work with functions and methods that expect a tuple as an argument. This is a big step forward in making the C structures as flexible as their pure Python counterparts:
```
>>> isinstance(sys.version_info, tuple)
True
>>> 'Version %d.%d.%d %s(%d)' % sys.version_info
'Version 3.2.0 final(0)'
```
(Suggested by Arfrever Frehtes Taifersar Arahesis and implemented by Benjamin Peterson in issue 8413.)
Warnings are now easier to control using the PYTHONWARNINGS environment variable as an alternative to using -W at the command line:
```
$ export PYTHONWARNINGS='ignore::RuntimeWarning::,once::UnicodeWarning::'
```
(Suggested by Barry Warsaw and implemented by Philip Jenvey in issue 7301.)
A new warning category, ResourceWarning, has been added. It is emitted when potential issues with resource consumption or cleanup are detected. It is silenced by default in normal release builds but can be enabled through the means provided by the warnings module, or on the command line.

A ResourceWarning is issued at interpreter shutdown if the gc.garbage list isn’t empty, and if gc.DEBUG_UNCOLLECTABLE is set, all uncollectable objects are printed. This is meant to make the programmer aware that their code contains object finalization issues.

A ResourceWarning is also issued when a file object is destroyed without having been explicitly closed. While the deallocator for such object ensures it closes the underlying operating system resource (usually, a file descriptor), the delay in deallocating the object could produce various issues, especially under Windows. Here is an example of enabling the warning from the command line:
```
$ python -q -Wdefault
>>> f = open("foo", "wb")
>>> del f
__main__:1: ResourceWarning: unclosed file <_io.BufferedWriter name='foo'>
```
(Added by Antoine Pitrou and Georg Brandl in issue 10093 and issue 477863.)
range objects now support index and count methods. This is part of an effort to make more objects fully implement the collections.Sequence abstract base class. As a result, the language will have a more uniform API. In addition, range objects now support slicing and negative indices, even with values larger than sys.maxsize. This makes range more interoperable with lists:
```
>>> range(0, 100, 2).count(10)
1
>>> range(0, 100, 2).index(10)
5
>>> range(0, 100, 2)[5]
10
>>> range(0, 100, 2)[0:5]
range(0, 10, 2)
```
(Contributed by Daniel Stutzbach in issue 9213, by Alexander Belopolsky in issue 2690, and by Nick Coghlan in issue 10889.)
The callable() builtin function from Py2.x was resurrected. It provides a concise, readable alternative to using an abstract base class in an expression like isinstance(x, collections.Callable):
```
>>> callable(max)
True
>>> callable(20)
False
```
(See issue 10518.)
Python’s import mechanism can now load modules installed in directories with non-ASCII characters in the path name. This solved an aggravating problem with home directories for users with non-ASCII characters in their usernames.

(Required extensive work by Victor Stinner in issue 9425.)

New, Improved, and Deprecated Modules¶

Python’s standard library has undergone significant maintenance efforts and quality improvements.

The biggest news for Python 3.2 is that the email package, mailbox module, and nntplib modules now work correctly with the bytes/text model in Python 3. For the first time, there is correct handling of messages with mixed encodings.

Throughout the standard library, there has been more careful attention to encodings and text versus bytes issues. In particular, interactions with the operating system are now better able to exchange non-ASCII data using the Windows MBCS encoding, locale-aware encodings, or UTF-8.

Another significant win is the addition of substantially better support for SSL connections and security certificates.

In addition, more classes now implement a context manager to support convenient and reliable resource clean-up using a with statement.

email¶

The usability of the email package in Python 3 has been mostly fixed by the extensive efforts of R. David Murray. The problem was that emails are typically read and stored in the form of bytes rather than str text, and they may contain multiple encodings within a single email. So, the email package had to be extended to parse and generate email messages in bytes format.

New functions message_from_bytes() and message_from_binary_file(), and new classes BytesFeedParser and BytesParser allow binary message data to be parsed into model objects.
Given bytes input to the model, get_payload() will by default decode a message body that has a Content-Transfer-Encoding of 8bit using the charset specified in the MIME headers and return the resulting string.
Given bytes input to the model, Generator will convert message bodies that have a Content-Transfer-Encoding of 8bit to instead have a 7bit Content-Transfer-Encoding.

Headers with unencoded non-ASCII bytes are deemed to be RFC 2047-encoded using the unknown-8bit character set.
A new class BytesGenerator produces bytes as output, preserving any unchanged non-ASCII data that was present in the input used to build the model, including message bodies with a Content-Transfer-Encoding of 8bit.
The smtplib SMTP class now accepts a byte string for the msg argument to the sendmail() method, and a new method, send_message() accepts a Message object and can optionally obtain the from_addr and to_addrs addresses directly from the object.

(Proposed and implemented by R. David Murray, issue 4661 and issue 10321.)

elementtree¶

The xml.etree.ElementTree package and its xml.etree.cElementTree counterpart have been updated to version 1.3.

Several new and useful functions and methods have been added:

xml.etree.ElementTree.fromstringlist() which builds an XML document from a sequence of fragments
xml.etree.ElementTree.register_namespace() for registering a global namespace prefix
xml.etree.ElementTree.tostringlist() for string representation including all sublists
xml.etree.ElementTree.Element.extend() for appending a sequence of zero or more elements
xml.etree.ElementTree.Element.iterfind() searches an element and subelements
xml.etree.ElementTree.Element.itertext() creates a text iterator over an element and its subelements
xml.etree.ElementTree.TreeBuilder.end() closes the current element
xml.etree.ElementTree.TreeBuilder.doctype() handles a doctype declaration

Two methods have been deprecated:

xml.etree.ElementTree.getchildren() use list(elem) instead.
xml.etree.ElementTree.getiterator() use Element.iter instead.

For details of the update, see Introducing ElementTree on Fredrik Lundh’s website.

(Contributed by Florent Xicluna and Fredrik Lundh, issue 6472.)

functools¶

The functools module includes a new decorator for caching function calls. functools.lru_cache() can save repeated queries to an external resource whenever the results are expected to be the same.

For example, adding a caching decorator to a database query function can save database accesses for popular searches:
```
>>> import functools
>>> @functools.lru_cache(maxsize=300)
>>> def get_phone_number(name):
        c = conn.cursor()
        c.execute('SELECT phonenumber FROM phonelist WHERE name=?', (name,))
        return c.fetchone()[0]
```
```
>>> for name in user_requests:
        get_phone_number(name)        # cached lookup
```
To help with choosing an effective cache size, the wrapped function is instrumented for tracking cache statistics:
```
>>> get_phone_number.cache_info()
CacheInfo(hits=4805, misses=980, maxsize=300, currsize=300)
```
If the phonelist table gets updated, the outdated contents of the cache can be cleared with:
```
>>> get_phone_number.cache_clear()
```
(Contributed by Raymond Hettinger and incorporating design ideas from Jim Baker, Miki Tebeka, and Nick Coghlan; see recipe 498245, recipe 577479, issue 10586, and issue 10593.)
The functools.wraps() decorator now adds a __wrapped__ attribute pointing to the original callable function. This allows wrapped functions to be introspected. It also copies __annotations__ if defined. And now it also gracefully skips over missing attributes such as __doc__ which might not be defined for the wrapped callable.

In the above example, the cache can be removed by recovering the original function:
```
>>> get_phone_number = get_phone_number.__wrapped__    # uncached function
```
(By Nick Coghlan and Terrence Cole; issue 9567, issue 3445, and issue 8814.)

To help write classes with rich comparison methods, a new decorator functools.total_ordering() will use a existing equality and inequality methods to fill in the remaining methods.

For example, supplying __eq__ and __lt__ will enable total_ordering() to fill-in __le__, __gt__ and __ge__:

@total_ordering
class Student:
    def __eq__(self, other):
        return ((self.lastname.lower(), self.firstname.lower()) ==
                (other.lastname.lower(), other.firstname.lower()))
    def __lt__(self, other):
        return ((self.lastname.lower(), self.firstname.lower()) <
                (other.lastname.lower(), other.firstname.lower()))

With the total_ordering decorator, the remaining comparison methods are filled in automatically.

(Contributed by Raymond Hettinger.)

To aid in porting programs from Python 2, the functools.cmp_to_key() function converts an old-style comparison function to modern key function:
```
>>> # locale-aware sort order
>>> sorted(iterable, key=cmp_to_key(locale.strcoll))
```
For sorting examples and a brief sorting tutorial, see the Sorting HowTo tutorial.

(Contributed by Raymond Hettinger.)

itertools¶

The itertools module has a new accumulate() function modeled on APL’s scan operator and Numpy’s accumulate function:
```
>>> from itertools import accumulate
>>> list(accumulate([8, 2, 50]))
[8, 10, 60]
```
```
>>> prob_dist = [0.1, 0.4, 0.2, 0.3]
>>> list(accumulate(prob_dist))      # cumulative probability distribution
[0.1, 0.5, 0.7, 1.0]
```
For an example using accumulate(), see the examples for the random module.

(Contributed by Raymond Hettinger and incorporating design suggestions from Mark Dickinson.)

collections¶

The collections.Counter class now has two forms of in-place subtraction, the existing -= operator for saturating subtraction and the new subtract() method for regular subtraction. The former is suitable for multisets which only have positive counts, and the latter is more suitable for use cases that allow negative counts:
```
>>> tally = Counter(dogs=5, cat=3)
>>> tally -= Counter(dogs=2, cats=8)    # saturating subtraction
>>> tally
Counter({'dogs': 3})
```
```
>>> tally = Counter(dogs=5, cats=3)
>>> tally.subtract(dogs=2, cats=8)      # regular subtraction
>>> tally
Counter({'dogs': 3, 'cats': -5})
```
(Contributed by Raymond Hettinger.)
The collections.OrderedDict class has a new method move_to_end() which takes an existing key and moves it to either the first or last position in the ordered sequence.

The default is to move an item to the last position. This is equivalent of renewing an entry with od[k] = od.pop(k).

A fast move-to-end operation is useful for resequencing entries. For example, an ordered dictionary can be used to track order of access by aging entries from the oldest to the most recently accessed.
```
>>> d = OrderedDict.fromkeys(['a', 'b', 'X', 'd', 'e'])
>>> list(d)
['a', 'b', 'X', 'd', 'e']
>>> d.move_to_end('X')
>>> list(d)
['a', 'b', 'd', 'e', 'X']
```
(Contributed by Raymond Hettinger.)
The collections.deque class grew two new methods count() and reverse() that make them more substitutable for list objects:
```
>>> d = deque('simsalabim')
>>> d.count('s')
2
>>> d.reverse()
>>> d
deque(['m', 'i', 'b', 'a', 'l', 'a', 's', 'm', 'i', 's'])
```
(Contributed by Raymond Hettinger.)

threading¶

The threading module has a new Barrier synchronization class for making multiple threads wait until all of them have reached a common barrier point. Barriers are useful for making sure that a task with multiple preconditions does not run until all of the predecessor tasks are complete.

Barriers can work with an arbitrary number of threads. This is a generalization of a Rendezvous which is defined for only two threads.

Implemented as a two-phase cyclic barrier, Barrier objects are suitable for use in loops. The separate filling and draining phases assure that all threads get released (drained) before any one of them can loop back and re-enter the barrier. The barrier fully resets after each cycle.

Example of using barriers:

from threading import Barrier, Thread

def get_votes(site):
    ballots = conduct_election(site)
    all_polls_closed.wait()        # do not count until all polls are closed
    totals = summarize(ballots)
    publish(site, totals)

all_polls_closed = Barrier(len(sites))
for site in sites:
    Thread(target=get_votes, args=(site,)).start()

In this example, the barrier enforces a rule that votes cannot be counted at any polling site until all polls are closed. Notice how a solution with a barrier is similar to one with threading.Thread.join(), but the threads stay alive and continue to do work (summarizing ballots) after the barrier point is crossed.

If any of the predecessor tasks can hang or be delayed, a barrier can be created with an optional timeout parameter. Then if the timeout period elapses before all the predecessor tasks reach the barrier point, all waiting threads are released and a BrokenBarrierError exception is raised:

def get_votes(site):
    ballots = conduct_election(site)
    try:
        all_polls_closed.wait(timeout = midnight - time.now())
    except BrokenBarrierError:
        lockbox = seal_ballots(ballots)
        queue.put(lockbox)
    else:
        totals = summarize(ballots)
        publish(site, totals)

In this example, the barrier enforces a more robust rule. If some election sites do not finish before midnight, the barrier times-out and the ballots are sealed and deposited in a queue for later handling.

See Barrier Synchronization Patterns for more examples of how barriers can be used in parallel computing. Also, there is a simple but thorough explanation of barriers in The Little Book of Semaphores, section 3.6.

(Contributed by Kristján Valur Jónsson with an API review by Jeffrey Yasskin in issue 8777.)

datetime and time¶

The datetime module has a new type timezone that implements the tzinfo interface by returning a fixed UTC offset and timezone name. This makes it easier to create timezone-aware datetime objects:

>>> from datetime import datetime, timezone

>>> datetime.now(timezone.utc)
datetime.datetime(2010, 12, 8, 21, 4, 2, 923754, tzinfo=datetime.timezone.utc)

>>> datetime.strptime("01/01/2000 12:00 +0000", "%m/%d/%Y %H:%M %z")
datetime.datetime(2000, 1, 1, 12, 0, tzinfo=datetime.timezone.utc)

Also, timedelta objects can now be multiplied by float and divided by float and int objects. And timedelta objects can now divide one another.
The datetime.date.strftime() method is no longer restricted to years after 1900. The new supported year range is from 1000 to 9999 inclusive.
Whenever a two-digit year is used in a time tuple, the interpretation has been governed by time.accept2dyear. The default is True which means that for a two-digit year, the century is guessed according to the POSIX rules governing the %y strptime format.

Starting with Py3.2, use of the century guessing heuristic will emit a DeprecationWarning. Instead, it is recommended that time.accept2dyear be set to False so that large date ranges can be used without guesswork:
```
>>> import time, warnings
>>> warnings.resetwarnings()      # remove the default warning filters

>>> time.accept2dyear = True      # guess whether 11 means 11 or 2011
>>> time.asctime((11, 1, 1, 12, 34, 56, 4, 1, 0))
Warning (from warnings module):
  ...
DeprecationWarning: Century info guessed for a 2-digit year.
'Fri Jan  1 12:34:56 2011'

>>> time.accept2dyear = False     # use the full range of allowable dates
>>> time.asctime((11, 1, 1, 12, 34, 56, 4, 1, 0))
'Fri Jan  1 12:34:56 11'
```
Several functions now have significantly expanded date ranges. When time.accept2dyear is false, the time.asctime() function will accept any year that fits in a C int, while the time.mktime() and time.strftime() functions will accept the full range supported by the corresponding operating system functions.

(Contributed by Alexander Belopolsky and Victor Stinner in issue 1289118, issue 5094, issue 6641, issue 2706, issue 1777412, issue 8013, and issue 10827.)

math¶

The math module has been updated with six new functions inspired by the C99 standard.

The isfinite() function provides a reliable and fast way to detect special values. It returns True for regular numbers and False for Nan or Infinity:

>>> [isfinite(x) for x in (123, 4.56, float('Nan'), float('Inf'))]
[True, True, False, False]

The expm1() function computes e**x-1 for small values of x without incurring the loss of precision that usually accompanies the subtraction of nearly equal quantities:

>>> expm1(0.013671875)   # more accurate way to compute e**x-1 for a small x
0.013765762467652909

The erf() function computes a probability integral or Gaussian error function. The complementary error function, erfc(), is 1 - erf(x):

>>> erf(1.0/sqrt(2.0))   # portion of normal distribution within 1 standard deviation
0.682689492137086
>>> erfc(1.0/sqrt(2.0))  # portion of normal distribution outside 1 standard deviation
0.31731050786291404
>>> erf(1.0/sqrt(2.0)) + erfc(1.0/sqrt(2.0))
1.0

The gamma() function is a continuous extension of the factorial function. See http://en.wikipedia.org/wiki/Gamma_function for details. Because the function is related to factorials, it grows large even for small values of x, so there is also a lgamma() function for computing the natural logarithm of the gamma function:

>>> gamma(7.0)           # six factorial
720.0
>>> lgamma(801.0)        # log(800 factorial)
4551.950730698041

(Contributed by Mark Dickinson.)

abc¶

The abc module now supports abstractclassmethod() and abstractstaticmethod().

These tools make it possible to define an abstract base class that requires a particular classmethod() or staticmethod() to be implemented:

class Temperature(metaclass=abc.ABCMeta):
    @abc.abstractclassmethod
    def from_fahrenheit(cls, t):
        ...
    @abc.abstractclassmethod
    def from_celsius(cls, t):
        ...

(Patch submitted by Daniel Urban; issue 5867.)

io¶

The io.BytesIO has a new method, getbuffer(), which provides functionality similar to memoryview(). It creates an editable view of the data without making a copy. The buffer’s random access and support for slice notation are well-suited to in-place editing:

>>> REC_LEN, LOC_START, LOC_LEN = 34, 7, 11

>>> def change_location(buffer, record_number, location):
        start = record_number * REC_LEN + LOC_START
        buffer[start: start+LOC_LEN] = location

>>> import io

>>> byte_stream = io.BytesIO(
    b'G3805  storeroom  Main chassis    '
    b'X7899  shipping   Reserve cog     '
    b'L6988  receiving  Primary sprocket'
)
>>> buffer = byte_stream.getbuffer()
>>> change_location(buffer, 1, b'warehouse  ')
>>> change_location(buffer, 0, b'showroom   ')
>>> print(byte_stream.getvalue())
b'G3805  showroom   Main chassis    '
b'X7899  warehouse  Reserve cog     '
b'L6988  receiving  Primary sprocket'

(Contributed by Antoine Pitrou in issue 5506.)

reprlib¶

When writing a __repr__() method for a custom container, it is easy to forget to handle the case where a member refers back to the container itself. Python’s builtin objects such as list and set handle self-reference by displaying ”...” in the recursive part of the representation string.

To help write such __repr__() methods, the reprlib module has a new decorator, recursive_repr(), for detecting recursive calls to __repr__() and substituting a placeholder string instead:

>>> class MyList(list):
        @recursive_repr()
        def __repr__(self):
            return '<' + '|'.join(map(repr, self)) + '>'

>>> m = MyList('abc')
>>> m.append(m)
>>> m.append('x')
>>> print(m)
<'a'|'b'|'c'|...|'x'>

(Contributed by Raymond Hettinger in issue 9826 and issue 9840.)

logging¶

In addition to dictionary-based configuration described above, the logging package has many other improvements.

The logging documentation has been augmented by a basic tutorial, an advanced tutorial, and a cookbook of logging recipes. These documents are the fastest way to learn about logging.

The logging.basicConfig() set-up function gained a style argument to support three different types of string formatting. It defaults to “%” for traditional %-formatting, can be set to “{” for the new str.format() style, or can be set to “$” for the shell-style formatting provided by string.Template. The following three configurations are equivalent:

>>> from logging import basicConfig
>>> basicConfig(style='%', format="%(name)s -> %(levelname)s: %(message)s")
>>> basicConfig(style='{', format="{name} -> {levelname} {message}")
>>> basicConfig(style='$', format="$name -> $levelname: $message")

If no configuration is set-up before a logging event occurs, there is now a default configuration using a StreamHandler directed to sys.stderr for events of WARNING level or higher. Formerly, an event occurring before a configuration was set-up would either raise an exception or silently drop the event depending on the value of logging.raiseExceptions. The new default handler is stored in logging.lastResort.

The use of filters has been simplified. Instead of creating a Filter object, the predicate can be any Python callable that returns True or False.

There were a number of other improvements that add flexibility and simplify configuration. See the module documentation for a full listing of changes in Python 3.2.

csv¶

The csv module now supports a new dialect, unix_dialect, which applies quoting for all fields and a traditional Unix style with '\n' as the line terminator. The registered dialect name is unix.

The csv.DictWriter has a new method, writeheader() for writing-out an initial row to document the field names:

>>> import csv, sys
>>> w = csv.DictWriter(sys.stdout, ['name', 'dept'], dialect='unix')
>>> w.writeheader()
"name","dept"
>>> w.writerows([
        {'name': 'tom', 'dept': 'accounting'},
        {'name': 'susan', 'dept': 'Salesl'}])
"tom","accounting"
"susan","sales"

(New dialect suggested by Jay Talbot in issue 5975, and the new method suggested by Ed Abraham in issue 1537721.)

contextlib¶

There is a new and slightly mind-blowing tool ContextDecorator that is helpful for creating a context manager that does double duty as a function decorator.

As a convenience, this new functionality is used by contextmanager() so that no extra effort is needed to support both roles.

The basic idea is that both context managers and function decorators can be used for pre-action and post-action wrappers. Context managers wrap a group of statements using a with statement, and function decorators wrap a group of statements enclosed in a function. So, occasionally there is a need to write a pre-action or post-action wrapper that can be used in either role.

For example, it is sometimes useful to wrap functions or groups of statements with a logger that can track the time of entry and time of exit. Rather than writing both a function decorator and a context manager for the task, the contextmanager() provides both capabilities in a single definition:

from contextlib import contextmanager
import logging

logging.basicConfig(level=logging.INFO)

@contextmanager
def track_entry_and_exit(name):
    logging.info('Entering: {}'.format(name))
    yield
    logging.info('Exiting: {}'.format(name))

Formerly, this would have only been usable as a context manager:

with track_entry_and_exit('widget loader'):
    print('Some time consuming activity goes here')
    load_widget()

Now, it can be used as a decorator as well:

@track_entry_and_exit('widget loader')
def activity():
    print('Some time consuming activity goes here')
    load_widget()

Trying to fulfill two roles at once places some limitations on the technique. Context managers normally have the flexibility to return an argument usable by a with statement, but there is no parallel for function decorators.

In the above example, there is not a clean way for the track_entry_and_exit context manager to return a logging instance for use in the body of enclosed statements.

(Contributed by Michael Foord in issue 9110.)

decimal and fractions¶

Mark Dickinson crafted an elegant and efficient scheme for assuring that different numeric datatypes will have the same hash value whenever their actual values are equal (issue 8188):

assert hash(Fraction(3, 2)) == hash(1.5) == \
       hash(Decimal("1.5")) == hash(complex(1.5, 0))

Some of the hashing details are exposed through a new attribute, sys.hash_info, which describes the bit width of the hash value, the prime modulus, the hash values for infinity and nan, and the multiplier used for the imaginary part of a number:

>>> sys.hash_info
sys.hash_info(width=64, modulus=2305843009213693951, inf=314159, nan=0, imag=1000003)

An early decision to limit the inter-operability of various numeric types has been relaxed. It is still unsupported (and ill-advised) to have implicit mixing in arithmetic expressions such as Decimal('1.1') + float('1.1') because the latter loses information in the process of constructing the binary float. However, since existing floating point value can be converted losslessly to either a decimal or rational representation, it makes sense to add them to the constructor and to support mixed-type comparisons.

The decimal.Decimal constructor now accepts float objects directly so there in no longer a need to use the from_float() method (issue 8257).
Mixed type comparisons are now fully supported so that Decimal objects can be directly compared with float and fractions.Fraction (issue 2531 and issue 8188).

Similar changes were made to fractions.Fraction so that the from_float() and from_decimal() methods are no longer needed (issue 8294):

>>> Decimal(1.1)
Decimal('1.100000000000000088817841970012523233890533447265625')
>>> Fraction(1.1)
Fraction(2476979795053773, 2251799813685248)

Another useful change for the decimal module is that the Context.clamp attribute is now public. This is useful in creating contexts that correspond to the decimal interchange formats specified in IEEE 754 (see issue 8540).

(Contributed by Mark Dickinson and Raymond Hettinger.)

ftp¶

The ftplib.FTP class now supports the context manager protocol to unconditionally consume socket.error exceptions and to close the FTP connection when done:

>>> from ftplib import FTP
>>> with FTP("ftp1.at.proftpd.org") as ftp:
        ftp.login()
        ftp.dir()

'230 Anonymous login ok, restrictions apply.'
dr-xr-xr-x   9 ftp      ftp           154 May  6 10:43 .
dr-xr-xr-x   9 ftp      ftp           154 May  6 10:43 ..
dr-xr-xr-x   5 ftp      ftp          4096 May  6 10:43 CentOS
dr-xr-xr-x   3 ftp      ftp            18 Jul 10  2008 Fedora

Other file-like objects such as mmap.mmap and fileinput.input() also grew auto-closing context managers:

with fileinput.input(files=('log1.txt', 'log2.txt')) as f:
    for line in f:
        process(line)

(Contributed by Tarek Ziadé and Giampaolo Rodolà in issue 4972, and by Georg Brandl in issue 8046 and issue 1286.)

The FTP_TLS class now accepts a context parameter, which is a ssl.SSLContext object allowing bundling SSL configuration options, certificates and private keys into a single (potentially long-lived) structure.

(Contributed by Giampaolo Rodolà; issue 8806.)

popen¶

The os.popen() and subprocess.Popen() functions now support with statements for auto-closing of the file descriptors.

(Contributed by Antoine Pitrou and Brian Curtin in issue 7461 and issue 10554.)

select¶

The select module now exposes a new, constant attribute, PIPE_BUF, which gives the minimum number of bytes which are guaranteed not to block when select.select() says a pipe is ready for writing.

>>> import select
>>> select.PIPE_BUF
512

(Available on Unix systems. Patch by Sébastien Sablé in issue 9862)

gzip and zipfile¶

gzip.GzipFile now implements the io.BufferedIOBase abstract base class (except for truncate()). It also has a peek() method and supports unseekable as well as zero-padded file objects.

The gzip module also gains the compress() and decompress() functions for easier in-memory compression and decompression. Keep in mind that text needs to be encoded as bytes before compressing and decompressing:

>>> s = 'Three shall be the number thou shalt count, '
>>> s += 'and the number of the counting shall be three'
>>> b = s.encode()                        # convert to utf-8
>>> len(b)
89
>>> c = gzip.compress(b)
>>> len(c)
77
>>> gzip.decompress(c).decode()[:42]      # decompress and convert to text
'Three shall be the number thou shalt count,'

(Contributed by Anand B. Pillai in issue 3488; and by Antoine Pitrou, Nir Aides and Brian Curtin in issue 9962, issue 1675951, issue 7471 and issue 2846.)

Also, the zipfile.ZipExtFile class was reworked internally to represent files stored inside an archive. The new implementation is significantly faster and can be wrapped in a io.BufferedReader object for more speedups. It also solves an issue where interleaved calls to read and readline gave the wrong results.

(Patch submitted by Nir Aides in issue 7610.)

tarfile¶

The TarFile class can now be used as a context manager. In addition, its add() method has a new option, filter, that controls which files are added to the archive and allows the file metadata to be edited.

The new filter option replaces the older, less flexible exclude parameter which is now deprecated. If specified, the optional filter parameter needs to be a keyword argument. The user-supplied filter function accepts a TarInfo object and returns an updated TarInfo object, or if it wants the file to be excluded, the function can return None:

>>> import tarfile, glob

>>> def myfilter(tarinfo):
       if tarinfo.isfile():             # only save real files
            tarinfo.uname = 'monty'     # redact the user name
            return tarinfo

>>> with tarfile.open(name='myarchive.tar.gz', mode='w:gz') as tf:
        for filename in glob.glob('*.txt'):
            tf.add(filename, filter=myfilter)
        tf.list()
-rw-r--r-- monty/501        902 2011-01-26 17:59:11 annotations.txt
-rw-r--r-- monty/501        123 2011-01-26 17:59:11 general_questions.txt
-rw-r--r-- monty/501       3514 2011-01-26 17:59:11 prion.txt
-rw-r--r-- monty/501        124 2011-01-26 17:59:11 py_todo.txt
-rw-r--r-- monty/501       1399 2011-01-26 17:59:11 semaphore_notes.txt

(Proposed by Tarek Ziadé and implemented by Lars Gustäbel in issue 6856.)

hashlib¶

The hashlib module has two new constant attributes listing the hashing algorithms guaranteed to be present in all implementations and those available on the current implementation:

>>> import hashlib

>>> hashlib.algorithms_guaranteed
{'sha1', 'sha224', 'sha384', 'sha256', 'sha512', 'md5'}

>>> hashlib.algorithms_available
{'md2', 'SHA256', 'SHA512', 'dsaWithSHA', 'mdc2', 'SHA224', 'MD4', 'sha256',
'sha512', 'ripemd160', 'SHA1', 'MDC2', 'SHA', 'SHA384', 'MD2',
'ecdsa-with-SHA1','md4', 'md5', 'sha1', 'DSA-SHA', 'sha224',
'dsaEncryption', 'DSA', 'RIPEMD160', 'sha', 'MD5', 'sha384'}

(Suggested by Carl Chenet in issue 7418.)

ast¶

The ast module has a wonderful a general-purpose tool for safely evaluating expression strings using the Python literal syntax. The ast.literal_eval() function serves as a secure alternative to the builtin eval() function which is easily abused. Python 3.2 adds bytes and set literals to the list of supported types: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, and None.

>>> from ast import literal_eval

>>> request = "{'req': 3, 'func': 'pow', 'args': (2, 0.5)}"
>>> literal_eval(request)
{'args': (2, 0.5), 'req': 3, 'func': 'pow'}

>>> request = "os.system('do something harmful')"
>>> literal_eval(request)
Traceback (most recent call last):
  ...
ValueError: malformed node or string: <_ast.Call object at 0x101739a10>

(Implemented by Benjamin Peterson and Georg Brandl.)

os¶

Different operating systems use various encodings for filenames and environment variables. The os module provides two new functions, fsencode() and fsdecode(), for encoding and decoding filenames:

>>> filename = 'Sehenswürdigkeiten'
>>> os.fsencode(filename)
b'Sehensw\xc3\xbcrdigkeiten'

Some operating systems allow direct access to encoded bytes in the environment. If so, the os.supports_bytes_environ constant will be true.

For direct access to encoded environment variables (if available), use the new os.getenvb() function or use os.environb which is a bytes version of os.environ.

(Contributed by Victor Stinner.)

shutil¶

The shutil.copytree() function has two new options:

ignore_dangling_symlinks: when symlinks=False so that the function copies a file pointed to by a symlink, not the symlink itself. This option will silence the error raised if the file doesn’t exist.
copy_function: is a callable that will be used to copy files. shutil.copy2() is used by default.

(Contributed by Tarek Ziadé.)

In addition, the shutil module now supports archiving operations for zipfiles, uncompressed tarfiles, gzipped tarfiles, and bzipped tarfiles. And there are functions for registering additional archiving file formats (such as xz compressed tarfiles or custom formats).

The principal functions are make_archive() and unpack_archive(). By default, both operate on the current directory (which can be set by os.chdir()) and on any sub-directories. The archive filename needs to be specified with a full pathname. The archiving step is non-destructive (the original files are left unchanged).

>>> import shutil, pprint

>>> os.chdir('mydata')                               # change to the source directory
>>> f = shutil.make_archive('/var/backup/mydata',
                            'zip')                   # archive the current directory
>>> f                                                # show the name of archive
'/var/backup/mydata.zip'
>>> os.chdir('tmp')                                  # change to an unpacking
>>> shutil.unpack_archive('/var/backup/mydata.zip')  # recover the data

>>> pprint.pprint(shutil.get_archive_formats())      # display known formats
[('bztar', "bzip2'ed tar-file"),
 ('gztar', "gzip'ed tar-file"),
 ('tar', 'uncompressed tar file'),
 ('zip', 'ZIP file')]

>>> shutil.register_archive_format(                  # register a new archive format
        name = 'xz',
        function = xz.compress,                      # callable archiving function
        extra_args = [('level', 8)],                 # arguments to the function
        description = 'xz compression'
)

(Contributed by Tarek Ziadé.)

sqlite3¶

The sqlite3 module was updated to pysqlite version 2.6.0. It has two new capabilities.

The sqlite3.Connection.in_transit attribute is true if there is an active transaction for uncommitted changes.
The sqlite3.Connection.enable_load_extension() and sqlite3.Connection.load_extension() methods allows you to load SQLite extensions from ”.so” files. One well-known extension is the fulltext-search extension distributed with SQLite.

(Contributed by R. David Murray and Shashwat Anand; issue 8845.)

html¶

A new html module was introduced with only a single function, escape(), which is used for escaping reserved characters from HTML markup:

>>> import html
>>> html.escape('x > 2 && x < 7')
'x &gt; 2 &amp;&amp; x &lt; 7'

socket¶

The socket module has two new improvements.

Socket objects now have a detach() method which puts the socket into closed state without actually closing the underlying file descriptor. The latter can then be reused for other purposes. (Added by Antoine Pitrou; issue 8524.)
socket.create_connection() now supports the context manager protocol to unconditionally consume socket.error exceptions and to close the socket when done. (Contributed by Giampaolo Rodolà; issue 9794.)

ssl¶

The ssl module added a number of features to satisfy common requirements for secure (encrypted, authenticated) internet connections:

A new class, SSLContext, serves as a container for persistent SSL data, such as protocol settings, certificates, private keys, and various other options. It includes a wrap_socket() for creating an SSL socket from an SSL context.
A new function, ssl.match_hostname(), supports server identity verification for higher-level protocols by implementing the rules of HTTPS (from RFC 2818) which are also suitable for other protocols.
The ssl.wrap_socket() constructor function now takes a ciphers argument. The ciphers string lists the allowed encryption algorithms using the format described in the OpenSSL documentation.
When linked against recent versions of OpenSSL, the ssl module now supports the Server Name Indication extension to the TLS protocol, allowing multiple “virtual hosts” using different certificates on a single IP port. This extension is only supported in client mode, and is activated by passing the server_hostname argument to ssl.SSLContext.wrap_socket().
Various options have been added to the ssl module, such as OP_NO_SSLv2 which disables the insecure and obsolete SSLv2 protocol.
The extension now loads all the OpenSSL ciphers and digest algorithms. If some SSL certificates cannot be verified, they are reported as an “unknown algorithm” error.
The version of OpenSSL being used is now accessible using the module attributes ssl.OPENSSL_VERSION (a string), ssl.OPENSSL_VERSION_INFO (a 5-tuple), and ssl.OPENSSL_VERSION_NUMBER (an integer).

(Contributed by Antoine Pitrou in issue 8850, issue 1589, issue 8322, issue 5639, issue 4870, issue 8484, and issue 8321.)

nntp¶

The nntplib module has a revamped implementation with better bytes and text semantics as well as more practical APIs. These improvements break compatibility with the nntplib version in Python 3.1, which was partly dysfunctional in itself.

Support for secure connections through both implicit (using nntplib.NNTP_SSL) and explicit (using nntplib.NNTP.starttls()) TLS has also been added.

(Contributed by Antoine Pitrou in issue 9360 and Andrew Vant in issue 1926.)

certificates¶

http.client.HTTPSConnection, urllib.request.HTTPSHandler and urllib.request.urlopen() now take optional arguments to allow for server certificate checking against a set of Certificate Authorities, as recommended in public uses of HTTPS.

(Added by Antoine Pitrou, issue 9003.)

imaplib¶

Support for explicit TLS on standard IMAP4 connections has been added through the new imaplib.IMAP4.starttls method.

(Contributed by Lorenzo M. Catucci and Antoine Pitrou, issue 4471.)

http.client¶

There were a number of small API improvements in the http.client module. The old-style HTTP 0.9 simple responses are no longer supported and the strict parameter is deprecated in all classes.

The HTTPConnection and HTTPSConnection classes now have a source_address parameter for a (host, port) tuple indicating where the HTTP connection is made from.

Support for certificate checking and HTTPS virtual hosts were added to HTTPSConnection.

The request() method on connection objects allowed an optional body argument so that a file object could be used to supply the content of the request. Conveniently, the body argument now also accepts an iterable object so long as it includes an explicit Content-Length header. This extended interface is much more flexible than before.

To establish an HTTPS connection through a proxy server, there is a new set_tunnel() method that sets the host and port for HTTP Connect tunneling.

To match the behavior of http.server, the HTTP client library now also encodes headers with ISO-8859-1 (Latin-1) encoding. It was already doing that for incoming headers, so now the behavior is consistent for both incoming and outgoing traffic. (See work by Armin Ronacher in issue 10980.)

unittest¶

The unittest module has a number of improvements supporting test discovery for packages, easier experimentation at the interactive prompt, new testcase methods, improved diagnostic messages for test failures, and better method names.

The command-line call python -m unittest can now accept file paths instead of module names for running specific tests (issue 10620). The new test discovery can find tests within packages, locating any test importable from the top-level directory. The top-level directory can be specified with the -t option, a pattern for matching files with -p, and a directory to start discovery with -s:
```
$ python -m unittest discover -s my_proj_dir -p _test.py
```
(Contributed by Michael Foord.)
Experimentation at the interactive prompt is now easier because the unittest.case.TestCase class can now be instantiated without arguments:
```
>>> TestCase().assertEqual(pow(2, 3), 8)
```
(Contributed by Michael Foord.)
The unittest module has two new methods, assertWarns() and assertWarnsRegex() to verify that a given warning type is triggered by the code under test:
```
with self.assertWarns(DeprecationWarning):
    legacy_function('XYZ')
```
(Contributed by Antoine Pitrou, issue 9754.)

Another new method, assertCountEqual() is used to compare two iterables to determine if their element counts are equal (whether the same elements are present with the same number of occurrences regardless of order):
```
def test_anagram(self):
    self.assertCountEqual('algorithm', 'logarithm')
```
(Contributed by Raymond Hettinger.)
A principal feature of the unittest module is an effort to produce meaningful diagnostics when a test fails. When possible, the failure is recorded along with a diff of the output. This is especially helpful for analyzing log files of failed test runs. However, since diffs can sometime be voluminous, there is a new maxDiff attribute that sets maximum length of diffs displayed.
In addition, the method names in the module have undergone a number of clean-ups.

For example, assertRegex() is the new name for assertRegexpMatches() which was misnamed because the test uses re.search(), not re.match(). Other methods using regular expressions are now named using short form “Regex” in preference to “Regexp” – this matches the names used in other unittest implementations, matches Python’s old name for the re module, and it has unambiguous camel-casing.

(Contributed by Raymond Hettinger and implemented by Ezio Melotti.)

To improve consistency, some long-standing method aliases are being deprecated in favor of the preferred names:

Old Name Preferred Name

assert_() assertTrue()

assertEquals() assertEqual()

assertNotEquals() assertNotEqual()

assertAlmostEquals() assertAlmostEqual()

assertNotAlmostEquals() assertNotAlmostEqual()

Old Name	Preferred Name
`assert_()`	`assertTrue()`
`assertEquals()`	`assertEqual()`
`assertNotEquals()`	`assertNotEqual()`
`assertAlmostEquals()`	`assertAlmostEqual()`
`assertNotAlmostEquals()`	`assertNotAlmostEqual()`

Likewise, the TestCase.fail* methods deprecated in Python 3.1 are expected to be removed in Python 3.3. Also see the Deprecated aliases section in the unittest documentation.

(Contributed by Ezio Melotti; issue 9424.)

The assertDictContainsSubset() method was deprecated because it was misimplemented with the arguments in the wrong order. This created hard-to-debug optical illusions where tests like TestCase().assertDictContainsSubset({'a':1, 'b':2}, {'a':1}) would fail.

(Contributed by Raymond Hettinger.)

random¶

The integer methods in the random module now do a better job of producing uniform distributions. Previously, they computed selections with int(n*random()) which had a slight bias whenever n was not a power of two. Now, multiple selections are made from a range up to the next power of two and a selection is kept only when it falls within the range 0 <= x < n. The functions and methods affected are randrange(), randint(), choice(), shuffle() and sample().

(Contributed by Raymond Hettinger; issue 9025.)

poplib¶

POP3_SSL class now accepts a context parameter, which is a ssl.SSLContext object allowing bundling SSL configuration options, certificates and private keys into a single (potentially long-lived) structure.

(Contributed by Giampaolo Rodolà; issue 8807.)

asyncore¶

asyncore.dispatcher now provides a handle_accepted() method returning a (sock, addr) pair which is called when a connection has actually been established with a new remote endpoint. This is supposed to be used as a replacement for old handle_accept() and avoids the user to call accept() directly.

(Contributed by Giampaolo Rodolà; issue 6706.)

tempfile¶

The tempfile module has a new context manager, TemporaryDirectory which provides easy deterministic cleanup of temporary directories:

with tempfile.TemporaryDirectory() as tmpdirname:
    print('created temporary dir:', tmpdirname)

(Contributed by Neil Schemenauer and Nick Coghlan; issue 5178.)

inspect¶

The inspect module has a new function getgeneratorstate() to easily identify the current state of a generator-iterator:

>>> from inspect import getgeneratorstate
>>> def gen():
        yield 'demo'
>>> g = gen()
>>> getgeneratorstate(g)
'GEN_CREATED'
>>> next(g)
'demo'
>>> getgeneratorstate(g)
'GEN_SUSPENDED'
>>> next(g, None)
>>> getgeneratorstate(g)
'GEN_CLOSED'

(Contributed by Rodolpho Eckhardt and Nick Coghlan, issue 10220.)

To support lookups without the possibility of activating a dynamic attribute, the inspect module has a new function, getattr_static(). Unlike hasattr(), this is a true read-only search, guaranteed not to change state while it is searching:

>>> class A:
        @property
        def f(self):
            print('Running')
            return 10

>>> a = A()
>>> getattr(a, 'f')
Running
10
>>> inspect.getattr_static(a, 'f')
<property object at 0x1022bd788>

(Contributed by Michael Foord.)

pydoc¶

The pydoc module now provides a much-improved Web server interface, as well as a new command-line option -b to automatically open a browser window to display that server:

$ pydoc3.2 -b

(Contributed by Ron Adam; issue 2001.)

dis¶

The dis module gained two new functions for inspecting code, code_info() and show_code(). Both provide detailed code object information for the supplied function, method, source code string or code object. The former returns a string and the latter prints it:

>>> import dis, random
>>> dis.show_code(random.choice)
Name:              choice
Filename:          /Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/random.py
Argument count:    2
Kw-only arguments: 0
Number of locals:  3
Stack size:        11
Flags:             OPTIMIZED, NEWLOCALS, NOFREE
Constants:
   0: 'Choose a random element from a non-empty sequence.'
   1: 'Cannot choose from an empty sequence'
Names:
   0: _randbelow
   1: len
   2: ValueError
   3: IndexError
Variable names:
   0: self
   1: seq
   2: i

In addition, the dis() function now accepts string arguments so that the common idiom dis(compile(s, '', 'eval')) can be shortened to dis(s):

>>> dis('3*x+1 if x%2==1 else x//2')
  1           0 LOAD_NAME                0 (x)
              3 LOAD_CONST               0 (2)
              6 BINARY_MODULO
              7 LOAD_CONST               1 (1)
             10 COMPARE_OP               2 (==)
             13 POP_JUMP_IF_FALSE       28
             16 LOAD_CONST               2 (3)
             19 LOAD_NAME                0 (x)
             22 BINARY_MULTIPLY
             23 LOAD_CONST               1 (1)
             26 BINARY_ADD
             27 RETURN_VALUE
        >>   28 LOAD_NAME                0 (x)
             31 LOAD_CONST               0 (2)
             34 BINARY_FLOOR_DIVIDE
             35 RETURN_VALUE

Taken together, these improvements make it easier to explore how CPython is implemented and to see for yourself what the language syntax does under-the-hood.

(Contributed by Nick Coghlan in issue 9147.)

dbm¶

All database modules now support the get() and setdefault() methods.

(Suggested by Ray Allen in issue 9523.)

ctypes¶

A new type, ctypes.c_ssize_t represents the C ssize_t datatype.

site¶

The site module has three new functions useful for reporting on the details of a given Python installation.

getsitepackages() lists all global site-packages directories.
getuserbase() reports on the user’s base directory where data can be stored.
getusersitepackages() reveals the user-specific site-packages directory path.

>>> import site
>>> site.getsitepackages()
['/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages',
 '/Library/Frameworks/Python.framework/Versions/3.2/lib/site-python',
 '/Library/Python/3.2/site-packages']
>>> site.getuserbase()
'/Users/raymondhettinger/Library/Python/3.2'
>>> site.getusersitepackages()
'/Users/raymondhettinger/Library/Python/3.2/lib/python/site-packages'

Conveniently, some of site’s functionality is accessible directly from the command-line:

$ python -m site --user-base
/Users/raymondhettinger/.local
$ python -m site --user-site
/Users/raymondhettinger/.local/lib/python3.2/site-packages

(Contributed by Tarek Ziadé in issue 6693.)

sysconfig¶

The new sysconfig module makes it straightforward to discover installation paths and configuration variables that vary across platforms and installations.

The module offers access simple access functions for platform and version information:

get_platform() returning values like linux-i586 or macosx-10.6-ppc.
get_python_version() returns a Python version string such as “3.2”.

It also provides access to the paths and variables corresponding to one of seven named schemes used by distutils. Those include posix_prefix, posix_home, posix_user, nt, nt_user, os2, os2_home:

get_paths() makes a dictionary containing installation paths for the current installation scheme.
get_config_vars() returns a dictionary of platform specific variables.

There is also a convenient command-line interface:

C:\Python32>python -m sysconfig
Platform: "win32"
Python version: "3.2"
Current installation scheme: "nt"

Paths:
        data = "C:\Python32"
        include = "C:\Python32\Include"
        platinclude = "C:\Python32\Include"
        platlib = "C:\Python32\Lib\site-packages"
        platstdlib = "C:\Python32\Lib"
        purelib = "C:\Python32\Lib\site-packages"
        scripts = "C:\Python32\Scripts"
        stdlib = "C:\Python32\Lib"

Variables:
        BINDIR = "C:\Python32"
        BINLIBDEST = "C:\Python32\Lib"
        EXE = ".exe"
        INCLUDEPY = "C:\Python32\Include"
        LIBDEST = "C:\Python32\Lib"
        SO = ".pyd"
        VERSION = "32"
        abiflags = ""
        base = "C:\Python32"
        exec_prefix = "C:\Python32"
        platbase = "C:\Python32"
        prefix = "C:\Python32"
        projectbase = "C:\Python32"
        py_version = "3.2"
        py_version_nodot = "32"
        py_version_short = "3.2"
        srcdir = "C:\Python32"
        userbase = "C:\Documents and Settings\Raymond\Application Data\Python"

(Moved out of Distutils by Tarek Ziadé.)

pdb¶

The pdb debugger module gained a number of usability improvements:

pdb.py now has a -c option that executes commands as given in a .pdbrc script file.
A .pdbrc script file can contain continue and next commands that continue debugging.
The Pdb class constructor now accepts a nosigint argument.
New commands: l(list), ll(long list) and source for listing source code.
New commands: display and undisplay for showing or hiding the value of an expression if it has changed.
New command: interact for starting an interactive interpreter containing the global and local names found in the current scope.
Breakpoints can be cleared by breakpoint number.

(Contributed by Georg Brandl, Antonio Cuni and Ilya Sandler.)

configparser¶

The configparser module was modified to improve usability and predictability of the default parser and its supported INI syntax. The old ConfigParser class was removed in favor of SafeConfigParser which has in turn been renamed to ConfigParser. Support for inline comments is now turned off by default and section or option duplicates are not allowed in a single configuration source.

Config parsers gained a new API based on the mapping protocol:

>>> parser = ConfigParser()
>>> parser.read_string("""
[DEFAULT]
location = upper left
visible = yes
editable = no
color = blue

[main]
title = Main Menu
color = green

[options]
title = Options
""")
>>> parser['main']['color']
'green'
>>> parser['main']['editable']
'no'
>>> section = parser['options']
>>> section['title']
'Options'
>>> section['title'] = 'Options (editable: %(editable)s)'
>>> section['title']
'Options (editable: no)'

The new API is implemented on top of the classical API, so custom parser subclasses should be able to use it without modifications.

The INI file structure accepted by config parsers can now be customized. Users can specify alternative option/value delimiters and comment prefixes, change the name of the DEFAULT section or switch the interpolation syntax.

There is support for pluggable interpolation including an additional interpolation handler ExtendedInterpolation:

>>> parser = ConfigParser(interpolation=ExtendedInterpolation())
>>> parser.read_dict({'buildout': {'directory': '/home/ambv/zope9'},
                      'custom': {'prefix': '/usr/local'}})
>>> parser.read_string("""
    [buildout]
    parts =
      zope9
      instance
    find-links =
      ${buildout:directory}/downloads/dist

    [zope9]
    recipe = plone.recipe.zope9install
    location = /opt/zope

    [instance]
    recipe = plone.recipe.zope9instance
    zope9-location = ${zope9:location}
    zope-conf = ${custom:prefix}/etc/zope.conf
    """)
>>> parser['buildout']['find-links']
'\n/home/ambv/zope9/downloads/dist'
>>> parser['instance']['zope-conf']
'/usr/local/etc/zope.conf'
>>> instance = parser['instance']
>>> instance['zope-conf']
'/usr/local/etc/zope.conf'
>>> instance['zope9-location']
'/opt/zope'

A number of smaller features were also introduced, like support for specifying encoding in read operations, specifying fallback values for get-functions, or reading directly from dictionaries and strings.

(All changes contributed by Łukasz Langa.)

urllib.parse¶

A number of usability improvements were made for the urllib.parse module.

The urlparse() function now supports IPv6 addresses as described in RFC 2732:

>>> import urllib.parse
>>> urllib.parse.urlparse('http://[dead:beef:cafe:5417:affe:8FA3:deaf:feed]/foo/')
ParseResult(scheme='http',
            netloc='[dead:beef:cafe:5417:affe:8FA3:deaf:feed]',
            path='/foo/',
            params='',
            query='',
            fragment='')

The urldefrag() function now returns a named tuple:

>>> r = urllib.parse.urldefrag('http://python.org/about/#target')
>>> r
DefragResult(url='http://python.org/about/', fragment='target')
>>> r[0]
'http://python.org/about/'
>>> r.fragment
'target'

And, the urlencode() function is now much more flexible, accepting either a string or bytes type for the query argument. If it is a string, then the safe, encoding, and error parameters are sent to quote_plus() for encoding:

>>> urllib.parse.urlencode([
         ('type', 'telenovela'),
         ('name', '¿Dónde Está Elisa?')],
         encoding='latin-1')
'type=telenovela&name=%BFD%F3nde+Est%E1+Elisa%3F'

As detailed in Parsing ASCII Encoded Bytes, all the urllib.parse functions now accept ASCII-encoded byte strings as input, so long as they are not mixed with regular strings. If ASCII-encoded byte strings are given as parameters, the return types will also be an ASCII-encoded byte strings:

>>> urllib.parse.urlparse(b'http://www.python.org:80/about/')
ParseResultBytes(scheme=b'http', netloc=b'www.python.org:80',
                 path=b'/about/', params=b'', query=b'', fragment=b'')

(Work by Nick Coghlan, Dan Mahn, and Senthil Kumaran in issue 2987, issue 5468, and issue 9873.)

mailbox¶

Thanks to a concerted effort by R. David Murray, the mailbox module has been fixed for Python 3.2. The challenge was that mailbox had been originally designed with a text interface, but email messages are best represented with bytes because various parts of a message may have different encodings.

The solution harnessed the email package’s binary support for parsing arbitrary email messages. In addition, the solution required a number of API changes.

As expected, the add() method for mailbox.Mailbox objects now accepts binary input.

StringIO and text file input are deprecated. Also, string input will fail early if non-ASCII characters are used. Previously it would fail when the email was processed in a later step.

There is also support for binary output. The get_file() method now returns a file in the binary mode (where it used to incorrectly set the file to text-mode). There is also a new get_bytes() method that returns a bytes representation of a message corresponding to a given key.

It is still possible to get non-binary output using the old API’s get_string() method, but that approach is not very useful. Instead, it is best to extract messages from a Message object or to load them from binary input.

(Contributed by R. David Murray, with efforts from Steffen Daode Nurpmeso and an initial patch by Victor Stinner in issue 9124.)

turtledemo¶

The demonstration code for the turtle module was moved from the Demo directory to main library. It includes over a dozen sample scripts with lively displays. Being on sys.path, it can now be run directly from the command-line:

$ python -m turtledemo

(Moved from the Demo directory by Alexander Belopolsky in issue 10199.)

Multi-threading¶

The mechanism for serializing execution of concurrently running Python threads (generally known as the GIL or Global Interpreter Lock) has been rewritten. Among the objectives were more predictable switching intervals and reduced overhead due to lock contention and the number of ensuing system calls. The notion of a “check interval” to allow thread switches has been abandoned and replaced by an absolute duration expressed in seconds. This parameter is tunable through sys.setswitchinterval(). It currently defaults to 5 milliseconds.

Additional details about the implementation can be read from a python-dev mailing-list message (however, “priority requests” as exposed in this message have not been kept for inclusion).

(Contributed by Antoine Pitrou.)
Regular and recursive locks now accept an optional timeout argument to their acquire() method. (Contributed by Antoine Pitrou; issue 7316.)
Similarly, threading.Semaphore.acquire() also gained a timeout argument. (Contributed by Torsten Landschoff; issue 850728.)
Regular and recursive lock acquisitions can now be interrupted by signals on platforms using Pthreads. This means that Python programs that deadlock while acquiring locks can be successfully killed by repeatedly sending SIGINT to the process (by pressing Ctrl+C in most shells). (Contributed by Reid Kleckner; issue 8844.)

Optimizations¶

A number of small performance enhancements have been added:

Python’s peephole optimizer now recognizes patterns such x in {1, 2, 3} as being a test for membership in a set of constants. The optimizer recasts the set as a frozenset and stores the pre-built constant.

Now that the speed penalty is gone, it is practical to start writing membership tests using set-notation. This style is both semantically clear and operationally fast:
```
extension = name.rpartition('.')[2]
if extension in {'xml', 'html', 'xhtml', 'css'}:
    handle(name)
```
(Patch and additional tests contributed by Dave Malcolm; issue 6690).
Serializing and unserializing data using the pickle module is now several times faster.

(Contributed by Alexandre Vassalotti, Antoine Pitrou and the Unladen Swallow team in issue 9410 and issue 3873.)
The Timsort algorithm used in list.sort() and sorted() now runs faster and uses less memory when called with a key function. Previously, every element of a list was wrapped with a temporary object that remembered the key value associated with each element. Now, two arrays of keys and values are sorted in parallel. This saves the memory consumed by the sort wrappers, and it saves time lost to delegating comparisons.

(Patch by Daniel Stutzbach in issue 9915.)
JSON decoding performance is improved and memory consumption is reduced whenever the same string is repeated for multiple keys. Also, JSON encoding now uses the C speedups when the sort_keys argument is true.

(Contributed by Antoine Pitrou in issue 7451 and by Raymond Hettinger and Antoine Pitrou in issue 10314.)
Recursive locks (created with the threading.RLock() API) now benefit from a C implementation which makes them as fast as regular locks, and between 10x and 15x faster than their previous pure Python implementation.

(Contributed by Antoine Pitrou; issue 3001.)
The fast-search algorithm in stringlib is now used by the split(), splitlines() and replace() methods on bytes, bytearray and str objects. Likewise, the algorithm is also used by rfind(), rindex(), rsplit() and rpartition().

(Patch by Florent Xicluna in issue 7622 and issue 7462.)
String to integer conversions now work two “digits” at a time, reducing the number of division and modulo operations.

(issue 6713 by Gawain Bolton, Mark Dickinson, and Victor Stinner.)

There were several other minor optimizations. Set differencing now runs faster when one operand is much larger than the other (patch by Andress Bennetts in issue 8685). The array.repeat() method has a faster implementation (issue 1569291 by Alexander Belopolsky). The BaseHTTPRequestHandler has more efficient buffering (issue 3709 by Andrew Schaaf). The operator.attrgetter() function has been sped-up (issue 10160 by Christos Georgiou). And ConfigParser loads multi-line arguments a bit faster (issue 7113 by Łukasz Langa).

Unicode¶

Python has been updated to Unicode 6.0.0. The update to the standard adds over 2,000 new characters including emoji symbols which are important for mobile phones.

In addition, the updated standard has altered the character properties for two Kannada characters (U+0CF1, U+0CF2) and one New Tai Lue numeric character (U+19DA), making the former eligible for use in identifiers while disqualifying the latter. For more information, see Unicode Character Database Changes.

Codecs¶

Support was added for cp720 Arabic DOS encoding (issue 1616979).

MBCS encoding no longer ignores the error handler argument. In the default strict mode, it raises an UnicodeDecodeError when it encounters an undecodable byte sequence and an UnicodeEncodeError for an unencodable character.

The MBCS codec supports 'strict' and 'ignore' error handlers for decoding, and 'strict' and 'replace' for encoding.

To emulate Python3.1 MBCS encoding, select the 'ignore' handler for decoding and the 'replace' handler for encoding.

On Mac OS X, Python decodes command line arguments with 'utf-8' rather than the locale encoding.

By default, tarfile uses 'utf-8' encoding on Windows (instead of 'mbcs') and the 'surrogateescape' error handler on all operating systems.

Documentation¶

The documentation continues to be improved.

A table of quick links has been added to the top of lengthy sections such as 内置函数. In the case of itertools, the links are accompanied by tables of cheatsheet-style summaries to provide an overview and memory jog without having to read all of the docs.
In some cases, the pure Python source code can be a helpful adjunct to the documentation, so now many modules now feature quick links to the latest version of the source code. For example, the functools module documentation has a quick link at the top labeled:

Source code Lib/functools.py.

(Contributed by Raymond Hettinger; see rationale.)
The docs now contain more examples and recipes. In particular, re module has an extensive section, Regular Expression Examples. Likewise, the itertools module continues to be updated with new Itertools Recipes.
The datetime module now has an auxiliary implementation in pure Python. No functionality was changed. This just provides an easier-to-read alternate implementation.

(Contributed by Alexander Belopolsky in issue 9528.)
The unmaintained Demo directory has been removed. Some demos were integrated into the documentation, some were moved to the Tools/demo directory, and others were removed altogether.

(Contributed by Georg Brandl in issue 7962.)

IDLE¶

The format menu now has an option to clean source files by stripping trailing whitespace.

(Contributed by Raymond Hettinger; issue 5150.)
IDLE on Mac OS X now works with both Carbon AquaTk and Cocoa AquaTk.

(Contributed by Kevin Walzer, Ned Deily, and Ronald Oussoren; issue 6075.)

Code Repository¶

In addition to the existing Subversion code repository at http://svn.python.org there is now a Mercurial repository at http://hg.python.org/.

After the 3.2 release, there are plans to switch to Mercurial as the primary repository. This distributed version control system should make it easier for members of the community to create and share external changesets. See PEP 385 for details.

To learn the new version control system, see the tutorial by Joel Spolsky or the Guide to Mercurial Workflows.

Build and C API Changes¶

Changes to Python’s build process and to the C API include:

The idle, pydoc and 2to3 scripts are now installed with a version-specific suffix on make altinstall (issue 10679).
The C functions that access the Unicode Database now accept and return characters from the full Unicode range, even on narrow unicode builds (Py_UNICODE_TOLOWER, Py_UNICODE_ISDECIMAL, and others). A visible difference in Python is that unicodedata.numeric() now returns the correct value for large code points, and repr() may consider more characters as printable.

(Reported by Bupjoe Lee and fixed by Amaury Forgeot D’Arc; issue 5127.)
Computed gotos are now enabled by default on supported compilers (which are detected by the configure script). They can still be disabled selectively by specifying --without-computed-gotos.

(Contributed by Antoine Pitrou; issue 9203.)
The option --with-wctype-functions was removed. The built-in unicode database is now used for all functions.

(Contributed by Amaury Forgeot D’Arc; issue 9210.)
Hash values are now values of a new type, Py_hash_t, which is defined to be the same size as a pointer. Previously they were of type long, which on some 64-bit operating systems is still only 32 bits long. As a result of this fix, set and dict can now hold more than 2**32 entries on builds with 64-bit pointers (previously, they could grow to that size but their performance degraded catastrophically).

(Suggested by Raymond Hettinger and implemented by Benjamin Peterson; issue 9778.)
A new macro Py_VA_COPY copies the state of the variable argument list. It is equivalent to C99 va_copy but available on all Python platforms (issue 2443).
A new C API function PySys_SetArgvEx() allows an embedded interpreter to set sys.argv without also modifying sys.path (issue 5753).
PyEval_CallObject is now only available in macro form. The function declaration, which was kept for backwards compatibility reasons, is now removed – the macro was introduced in 1997 (issue 8276).
There is a new function PyLong_AsLongLongAndOverflow() which is analogous to PyLong_AsLongAndOverflow(). They both serve to convert Python int into a native fixed-width type while providing detection of cases where the conversion won’t fit (issue 7767).
The PyUnicode_CompareWithASCIIString() function now returns not equal if the Python string is NUL terminated.
There is a new function PyErr_NewExceptionWithDoc() that is like PyErr_NewException() but allows a docstring to be specified. This lets C exceptions have the same self-documenting capabilities as their pure Python counterparts (issue 7033).
When compiled with the --with-valgrind option, the pymalloc allocator will be automatically disabled when running under Valgrind. This gives improved memory leak detection when running under Valgrind, while taking advantage of pymalloc at other times (issue 2422).
Removed the O? format from the PyArg_Parse functions. The format is no longer used and it had never been documented (issue 8837).

There were a number of other small changes to the C-API. See the Misc/NEWS file for a complete list.

Also, there were a number of updates to the Mac OS X build, see Mac/BuildScript/README.txt for details. For users running a 32/64-bit build, there is a known problem with the default Tcl/Tk on Mac OS X 10.6. Accordingly, we recommend installing an updated alternative such as ActiveState Tcl/Tk 8.5.9. See http://www.python.org/download/mac/tcltk/ for additional details.

Porting to Python 3.2¶

This section lists previously described changes and other bugfixes that may require changes to your code:

The configparser module has a number of clean-ups. The major change is to replace the old ConfigParser class with long-standing preferred alternative SafeConfigParser. In addition there are a number of smaller incompatibilities:
- The interpolation syntax is now validated on get() and set() operations. In the default interpolation scheme, only two tokens with percent signs are valid: %(name)s and %%, the latter being an escaped percent sign.
- The set() and add_section() methods now verify that values are actual strings. Formerly, unsupported types could be introduced unintentionally.
- Duplicate sections or options from a single source now raise either DuplicateSectionError or DuplicateOptionError. Formerly, duplicates would silently overwrite a previous entry.
- Inline comments are now disabled by default so now the ; character can be safely used in values.
- Comments now can be indented. Consequently, for ; or # to appear at the start of a line in multiline values, it has to be interpolated. This keeps comment prefix characters in values from being mistaken as comments.
- "" is now a valid value and is no longer automatically converted to an empty string. For empty strings, use "option =" in a line.
The nntplib module was reworked extensively, meaning that its APIs are often incompatible with the 3.1 APIs.
bytearray objects can no longer be used as filenames; instead, they should be converted to bytes.
The array.tostring() and array.fromstring() have been renamed to array.tobytes() and array.frombytes() for clarity. The old names have been deprecated. (See issue 8990.)
PyArg_Parse*() functions:
- “t#” format has been removed: use “s#” or “s*” instead
- “w” and “w#” formats has been removed: use “w*” instead
The PyCObject type, deprecated in 3.1, has been removed. To wrap opaque C pointers in Python objects, the PyCapsule API should be used instead; the new type has a well-defined interface for passing typing safety information and a less complicated signature for calling a destructor.
The sys.setfilesystemencoding() function was removed because it had a flawed design.
The random.seed() function and method now salt string seeds with an sha512 hash function. To access the previous version of seed in order to reproduce Python 3.1 sequences, set the version argument to 1, random.seed(s, version=1).
The previously deprecated string.maketrans() function has been removed in favor of the static methods bytes.maketrans() and bytearray.maketrans(). This change solves the confusion around which types were supported by the string module. Now, str, bytes, and bytearray each have their own maketrans and translate methods with intermediate translation tables of the appropriate type.

(Contributed by Georg Brandl; issue 5675.)
The previously deprecated contextlib.nested() function has been removed in favor of a plain with statement which can accept multiple context managers. The latter technique is faster (because it is built-in), and it does a better job finalizing multiple context managers when one of them raises an exception:
```
with open('mylog.txt') as infile, open('a.out', 'w') as outfile:
    for line in infile:
        if '<critical>' in line:
            outfile.write(line)
```
(Contributed by Georg Brandl and Mattias Brändström; appspot issue 53094.)
struct.pack() now only allows bytes for the s string pack code. Formerly, it would accept text arguments and implicitly encode them to bytes using UTF-8. This was problematic because it made assumptions about the correct encoding and because a variable-length encoding can fail when writing to fixed length segment of a structure.

Code such as struct.pack('<6sHHBBB', 'GIF87a', x, y) should be rewritten with to use bytes instead of text, struct.pack('<6sHHBBB', b'GIF87a', x, y).

(Discovered by David Beazley and fixed by Victor Stinner; issue 10783.)
The xml.etree.ElementTree class now raises an xml.etree.ElementTree.ParseError when a parse fails. Previously it raised a xml.parsers.expat.ExpatError.
The new, longer str() value on floats may break doctests which rely on the old output format.
In subprocess.Popen, the default value for close_fds is now True under Unix; under Windows, it is True if the three standard streams are set to None, False otherwise. Previously, close_fds was always False by default, which produced difficult to solve bugs or race conditions when open file descriptors would leak into the child process.
Support for legacy HTTP 0.9 has been removed from urllib.request and http.client. Such support is still present on the server side (in http.server).

(Contributed by Antoine Pitrou, issue 10711.)
SSL sockets in timeout mode now raise socket.timeout when a timeout occurs, rather than a generic SSLError.

(Contributed by Antoine Pitrou, issue 10272.)
The misleading functions PyEval_AcquireLock() and PyEval_ReleaseLock() have been officially deprecated. The thread-state aware APIs (such as PyEval_SaveThread() and PyEval_RestoreThread()) should be used instead.
Due to security risks, asyncore.handle_accept() has been deprecated, and a new function, asyncore.handle_accepted(), was added to replace it.

(Contributed by Giampaolo Rodola in issue 6706.)
Due to the new GIL implementation, PyEval_InitThreads() cannot be called before Py_Initialize() anymore.

What’s New In Python 3.1¶

Author:	Raymond Hettinger
Release:	3.2.2
Date:	August 02, 2015

This article explains the new features in Python 3.1, compared to 3.0.

PEP 372: Ordered Dictionaries¶

Regular Python dictionaries iterate over key/value pairs in arbitrary order. Over the years, a number of authors have written alternative implementations that remember the order that the keys were originally inserted. Based on the experiences from those implementations, a new collections.OrderedDict class has been introduced.

The OrderedDict API is substantially the same as regular dictionaries but will iterate over keys and values in a guaranteed order depending on when a key was first inserted. If a new entry overwrites an existing entry, the original insertion position is left unchanged. Deleting an entry and reinserting it will move it to the end.

The standard library now supports use of ordered dictionaries in several modules. The configparser module uses them by default. This lets configuration files be read, modified, and then written back in their original order. The _asdict() method for collections.namedtuple() now returns an ordered dictionary with the values appearing in the same order as the underlying tuple indicies. The json module is being built-out with an object_pairs_hook to allow OrderedDicts to be built by the decoder. Support was also added for third-party tools like PyYAML.

See also

PEP 372 - Ordered Dictionaries: PEP written by Armin Ronacher and Raymond Hettinger. Implementation written by Raymond Hettinger.

PEP 378: Format Specifier for Thousands Separator¶

The built-in format() function and the str.format() method use a mini-language that now includes a simple, non-locale aware way to format a number with a thousands separator. That provides a way to humanize a program’s output, improving its professional appearance and readability:

>>> format(1234567, ',d')
'1,234,567'
>>> format(1234567.89, ',.2f')
'1,234,567.89'
>>> format(12345.6 + 8901234.12j, ',f')
'12,345.600000+8,901,234.120000j'
>>> format(Decimal('1234567.89'), ',f')
'1,234,567.89'

The supported types are int, float, complex and decimal.Decimal.

Discussions are underway about how to specify alternative separators like dots, spaces, apostrophes, or underscores. Locale-aware applications should use the existing n format specifier which already has some support for thousands separators.

See also

PEP 378 - Format Specifier for Thousands Separator: PEP written by Raymond Hettinger and implemented by Eric Smith and Mark Dickinson.

Other Language Changes¶

Some smaller changes made to the core Python language are:

Directories and zip archives containing a __main__.py file can now be executed directly by passing their name to the interpreter. The directory/zipfile is automatically inserted as the first entry in sys.path. (Suggestion and initial patch by Andy Chu; revised patch by Phillip J. Eby and Nick Coghlan; issue 1739468.)
The int() type gained a bit_length method that returns the number of bits necessary to represent its argument in binary:
```
>>> n = 37
>>> bin(37)
'0b100101'
>>> n.bit_length()
6
>>> n = 2**123-1
>>> n.bit_length()
123
>>> (n+1).bit_length()
124
```
(Contributed by Fredrik Johansson, Victor Stinner, Raymond Hettinger, and Mark Dickinson; issue 3439.)
The fields in format() strings can now be automatically numbered:
```
>>> 'Sir {} of {}'.format('Gallahad', 'Camelot')
'Sir Gallahad of Camelot'
```
Formerly, the string would have required numbered fields such as: 'Sir {0} of {1}'.

(Contributed by Eric Smith; issue 5237.)
The string.maketrans() function is deprecated and is replaced by new static methods, bytes.maketrans() and bytearray.maketrans(). This change solves the confusion around which types were supported by the string module. Now, str, bytes, and bytearray each have their own maketrans and translate methods with intermediate translation tables of the appropriate type.

(Contributed by Georg Brandl; issue 5675.)
The syntax of the with statement now allows multiple context managers in a single statement:
```
>>> with open('mylog.txt') as infile, open('a.out', 'w') as outfile:
...     for line in infile:
...         if '<critical>' in line:
...             outfile.write(line)
```
With the new syntax, the contextlib.nested() function is no longer needed and is now deprecated.

(Contributed by Georg Brandl and Mattias Brändström; appspot issue 53094.)
round(x, n) now returns an integer if x is an integer. Previously it returned a float:
```
>>> round(1123, -2)
1100
```
(Contributed by Mark Dickinson; issue 4707.)
Python now uses David Gay’s algorithm for finding the shortest floating point representation that doesn’t change its value. This should help mitigate some of the confusion surrounding binary floating point numbers.

The significance is easily seen with a number like 1.1 which does not have an exact equivalent in binary floating point. Since there is no exact equivalent, an expression like float('1.1') evaluates to the nearest representable value which is 0x1.199999999999ap+0 in hex or 1.100000000000000088817841970012523233890533447265625 in decimal. That nearest value was and still is used in subsequent floating point calculations.

What is new is how the number gets displayed. Formerly, Python used a simple approach. The value of repr(1.1) was computed as format(1.1, '.17g') which evaluated to '1.1000000000000001'. The advantage of using 17 digits was that it relied on IEEE-754 guarantees to assure that eval(repr(1.1)) would round-trip exactly to its original value. The disadvantage is that many people found the output to be confusing (mistaking intrinsic limitations of binary floating point representation as being a problem with Python itself).

The new algorithm for repr(1.1) is smarter and returns '1.1'. Effectively, it searches all equivalent string representations (ones that get stored with the same underlying float value) and returns the shortest representation.

The new algorithm tends to emit cleaner representations when possible, but it does not change the underlying values. So, it is still the case that 1.1 + 2.2 != 3.3 even though the representations may suggest otherwise.

The new algorithm depends on certain features in the underlying floating point implementation. If the required features are not found, the old algorithm will continue to be used. Also, the text pickle protocols assure cross-platform portability by using the old algorithm.

(Contributed by Eric Smith and Mark Dickinson; issue 1580)

New, Improved, and Deprecated Modules¶

Added a collections.Counter class to support convenient counting of unique items in a sequence or iterable:
```
>>> Counter(['red', 'blue', 'red', 'green', 'blue', 'blue'])
Counter({'blue': 3, 'red': 2, 'green': 1})
```
(Contributed by Raymond Hettinger; issue 1696199.)
Added a new module, tkinter.ttk for access to the Tk themed widget set. The basic idea of ttk is to separate, to the extent possible, the code implementing a widget’s behavior from the code implementing its appearance.

(Contributed by Guilherme Polo; issue 2983.)

The gzip.GzipFile and bz2.BZ2File classes now support the context manager protocol:

>>> # Automatically close file after writing
>>> with gzip.GzipFile(filename, "wb") as f:
...     f.write(b"xxx")

(Contributed by Antoine Pitrou.)

The decimal module now supports methods for creating a decimal object from a binary float. The conversion is exact but can sometimes be surprising:
```
>>> Decimal.from_float(1.1)
Decimal('1.100000000000000088817841970012523233890533447265625')
```
The long decimal result shows the actual binary fraction being stored for 1.1. The fraction has many digits because 1.1 cannot be exactly represented in binary.

(Contributed by Raymond Hettinger and Mark Dickinson.)
The itertools module grew two new functions. The itertools.combinations_with_replacement() function is one of four for generating combinatorics including permutations and Cartesian products. The itertools.compress() function mimics its namesake from APL. Also, the existing itertools.count() function now has an optional step argument and can accept any type of counting sequence including fractions.Fraction and decimal.Decimal:
```
>>> [p+q for p,q in combinations_with_replacement('LOVE', 2)]
['LL', 'LO', 'LV', 'LE', 'OO', 'OV', 'OE', 'VV', 'VE', 'EE']

>>> list(compress(data=range(10), selectors=[0,0,1,1,0,1,0,1,0,0]))
[2, 3, 5, 7]

>>> c = count(start=Fraction(1,2), step=Fraction(1,6))
>>> [next(c), next(c), next(c), next(c)]
[Fraction(1, 2), Fraction(2, 3), Fraction(5, 6), Fraction(1, 1)]
```
(Contributed by Raymond Hettinger.)

collections.namedtuple() now supports a keyword argument rename which lets invalid fieldnames be automatically converted to positional names in the form _0, _1, etc. This is useful when the field names are being created by an external source such as a CSV header, SQL field list, or user input:

>>> query = input()
SELECT region, dept, count(*) FROM main GROUPBY region, dept

>>> cursor.execute(query)
>>> query_fields = [desc[0] for desc in cursor.description]
>>> UserQuery = namedtuple('UserQuery', query_fields, rename=True)
>>> pprint.pprint([UserQuery(*row) for row in cursor])
[UserQuery(region='South', dept='Shipping', _2=185),
 UserQuery(region='North', dept='Accounting', _2=37),
 UserQuery(region='West', dept='Sales', _2=419)]

(Contributed by Raymond Hettinger; issue 1818.)

The re.sub(), re.subn() and re.split() functions now accept a flags parameter.

(Contributed by Gregory Smith.)
The logging module now implements a simple logging.NullHandler class for applications that are not using logging but are calling library code that does. Setting-up a null handler will suppress spurious warnings such as “No handlers could be found for logger foo”:
```
>>> h = logging.NullHandler()
>>> logging.getLogger("foo").addHandler(h)
```
(Contributed by Vinay Sajip; issue 4384).
The runpy module which supports the -m command line switch now supports the execution of packages by looking for and executing a __main__ submodule when a package name is supplied.

(Contributed by Andi Vajda; issue 4195.)
The pdb module can now access and display source code loaded via zipimport (or any other conformant PEP 302 loader).

(Contributed by Alexander Belopolsky; issue 4201.)
functools.partial objects can now be pickled.

(Suggested by Antoine Pitrou and Jesse Noller. Implemented by Jack Diederich; issue 5228.)

Add pydoc help topics for symbols so that help('@') works as expected in the interactive environment.

(Contributed by David Laban; issue 4739.)
The unittest module now supports skipping individual tests or classes of tests. And it supports marking a test as a expected failure, a test that is known to be broken, but shouldn’t be counted as a failure on a TestResult:
```
class TestGizmo(unittest.TestCase):

    @unittest.skipUnless(sys.platform.startswith("win"), "requires Windows")
    def test_gizmo_on_windows(self):
        ...

    @unittest.expectedFailure
    def test_gimzo_without_required_library(self):
        ...
```
Also, tests for exceptions have been builtout to work with context managers using the with statement:
```
def test_division_by_zero(self):
    with self.assertRaises(ZeroDivisionError):
        x / 0
```
In addition, several new assertion methods were added including assertSetEqual(), assertDictEqual(), assertDictContainsSubset(), assertListEqual(), assertTupleEqual(), assertSequenceEqual(), assertRaisesRegexp(), assertIsNone(), and assertIsNotNone().

(Contributed by Benjamin Peterson and Antoine Pitrou.)
The io module has three new constants for the seek() method SEEK_SET, SEEK_CUR, and SEEK_END.

The sys.version_info tuple is now a named tuple:

>>> sys.version_info
sys.version_info(major=3, minor=1, micro=0, releaselevel='alpha', serial=2)

(Contributed by Ross Light; issue 4285.)

The nntplib and imaplib modules now support IPv6.

(Contributed by Derek Morr; issue 1655 and issue 1664.)
The pickle module has been adapted for better interoperability with Python 2.x when used with protocol 2 or lower. The reorganization of the standard library changed the formal reference for many objects. For example, __builtin__.set in Python 2 is called builtins.set in Python 3. This change confounded efforts to share data between different versions of Python. But now when protocol 2 or lower is selected, the pickler will automatically use the old Python 2 names for both loading and dumping. This remapping is turned-on by default but can be disabled with the fix_imports option:
```
>>> s = {1, 2, 3}
>>> pickle.dumps(s, protocol=0)
b'c__builtin__\nset\np0\n((lp1\nL1L\naL2L\naL3L\natp2\nRp3\n.'
>>> pickle.dumps(s, protocol=0, fix_imports=False)
b'cbuiltins\nset\np0\n((lp1\nL1L\naL2L\naL3L\natp2\nRp3\n.'
```
An unfortunate but unavoidable side-effect of this change is that protocol 2 pickles produced by Python 3.1 won’t be readable with Python 3.0. The latest pickle protocol, protocol 3, should be used when migrating data between Python 3.x implementations, as it doesn’t attempt to remain compatible with Python 2.x.

(Contributed by Alexandre Vassalotti and Antoine Pitrou, issue 6137.)
A new module, importlib was added. It provides a complete, portable, pure Python reference implementation of the import statement and its counterpart, the __import__() function. It represents a substantial step forward in documenting and defining the actions that take place during imports.

(Contributed by Brett Cannon.)

Optimizations¶

Major performance enhancements have been added:

The new I/O library (as defined in PEP 3116) was mostly written in Python and quickly proved to be a problematic bottleneck in Python 3.0. In Python 3.1, the I/O library has been entirely rewritten in C and is 2 to 20 times faster depending on the task at hand. The pure Python version is still available for experimentation purposes through the _pyio module.

(Contributed by Amaury Forgeot d’Arc and Antoine Pitrou.)
Added a heuristic so that tuples and dicts containing only untrackable objects are not tracked by the garbage collector. This can reduce the size of collections and therefore the garbage collection overhead on long-running programs, depending on their particular use of datatypes.

(Contributed by Antoine Pitrou, issue 4688.)
Enabling a configure option named --with-computed-gotos on compilers that support it (notably: gcc, SunPro, icc), the bytecode evaluation loop is compiled with a new dispatch mechanism which gives speedups of up to 20%, depending on the system, the compiler, and the benchmark.

(Contributed by Antoine Pitrou along with a number of other participants, issue 4753).
The decoding of UTF-8, UTF-16 and LATIN-1 is now two to four times faster.

(Contributed by Antoine Pitrou and Amaury Forgeot d’Arc, issue 4868.)
The json module now has a C extension to substantially improve its performance. In addition, the API was modified so that json works only with str, not with bytes. That change makes the module closely match the JSON specification which is defined in terms of Unicode.

(Contributed by Bob Ippolito and converted to Py3.1 by Antoine Pitrou and Benjamin Peterson; issue 4136.)
Unpickling now interns the attribute names of pickled objects. This saves memory and allows pickles to be smaller.

(Contributed by Jake McGuire and Antoine Pitrou; issue 5084.)

IDLE¶

IDLE’s format menu now provides an option to strip trailing whitespace from a source file.

(Contributed by Roger D. Serwy; issue 5150.)

Build and C API Changes¶

Changes to Python’s build process and to the C API include:

Integers are now stored internally either in base 2**15 or in base 2**30, the base being determined at build time. Previously, they were always stored in base 2**15. Using base 2**30 gives significant performance improvements on 64-bit machines, but benchmark results on 32-bit machines have been mixed. Therefore, the default is to use base 2**30 on 64-bit machines and base 2**15 on 32-bit machines; on Unix, there’s a new configure option --enable-big-digits that can be used to override this default.

Apart from the performance improvements this change should be invisible to end users, with one exception: for testing and debugging purposes there’s a new sys.int_info that provides information about the internal format, giving the number of bits per digit and the size in bytes of the C type used to store each digit:
```
>>> import sys
>>> sys.int_info
sys.int_info(bits_per_digit=30, sizeof_digit=4)
```
(Contributed by Mark Dickinson; issue 4258.)
The PyLong_AsUnsignedLongLong() function now handles a negative pylong by raising OverflowError instead of TypeError.

(Contributed by Mark Dickinson and Lisandro Dalcrin; issue 5175.)
Deprecated PyNumber_Int(). Use PyNumber_Long() instead.

(Contributed by Mark Dickinson; issue 4910.)
Added a new PyOS_string_to_double() function to replace the deprecated functions PyOS_ascii_strtod() and PyOS_ascii_atof().

(Contributed by Mark Dickinson; issue 5914.)
Added PyCapsule as a replacement for the PyCObject API. The principal difference is that the new type has a well defined interface for passing typing safety information and a less complicated signature for calling a destructor. The old type had a problematic API and is now deprecated.

(Contributed by Larry Hastings; issue 5630.)

Porting to Python 3.1¶

This section lists previously described changes and other bugfixes that may require changes to your code:

The new floating point string representations can break existing doctests. For example:

def e():
    '''Compute the base of natural logarithms.

    >>> e()
    2.7182818284590451

    '''
    return sum(1/math.factorial(x) for x in reversed(range(30)))

doctest.testmod()

**********************************************************************
Failed example:
    e()
Expected:
    2.7182818284590451
Got:
    2.718281828459045
**********************************************************************

The automatic name remapping in the pickle module for protocol 2 or lower can make Python 3.1 pickles unreadable in Python 3.0. One solution is to use protocol 3. Another solution is to set the fix_imports option to False. See the discussion above for more details.

What’s New In Python 3.0¶

Author:	Guido van Rossum
Release:	3.2.2
Date:	August 02, 2015

This article explains the new features in Python 3.0, compared to 2.6. Python 3.0, also known as “Python 3000” or “Py3K”, is the first ever intentionally backwards incompatible Python release. There are more changes than in a typical release, and more that are important for all Python users. Nevertheless, after digesting the changes, you’ll find that Python really hasn’t changed all that much – by and large, we’re mostly fixing well-known annoyances and warts, and removing a lot of old cruft.

This article doesn’t attempt to provide a complete specification of all new features, but instead tries to give a convenient overview. For full details, you should refer to the documentation for Python 3.0, and/or the many PEPs referenced in the text. If you want to understand the complete implementation and design rationale for a particular feature, PEPs usually have more details than the regular documentation; but note that PEPs usually are not kept up-to-date once a feature has been fully implemented.

Due to time constraints this document is not as complete as it should have been. As always for a new release, the Misc/NEWS file in the source distribution contains a wealth of detailed information about every small thing that was changed.

Common Stumbling Blocks¶

This section lists those few changes that are most likely to trip you up if you’re used to Python 2.5.

Print Is A Function¶

The print statement has been replaced with a print() function, with keyword arguments to replace most of the special syntax of the old print statement (PEP 3105). Examples:

Old: print "The answer is", 2*2
New: print("The answer is", 2*2)

Old: print x,           # Trailing comma suppresses newline
New: print(x, end=" ")  # Appends a space instead of a newline

Old: print              # Prints a newline
New: print()            # You must call the function!

Old: print >>sys.stderr, "fatal error"
New: print("fatal error", file=sys.stderr)

Old: print (x, y)       # prints repr((x, y))
New: print((x, y))      # Not the same as print(x, y)!

You can also customize the separator between items, e.g.:

print("There are <", 2**32, "> possibilities!", sep="")

which produces:

There are <4294967296> possibilities!

Note:

The print() function doesn’t support the “softspace” feature of the old print statement. For example, in Python 2.x, print "A\n", "B" would write "A\nB\n"; but in Python 3.0, print("A\n", "B") writes "A\n B\n".
Initially, you’ll be finding yourself typing the old print x a lot in interactive mode. Time to retrain your fingers to type print(x) instead!
When using the 2to3 source-to-source conversion tool, all print statements are automatically converted to print() function calls, so this is mostly a non-issue for larger projects.

Views And Iterators Instead Of Lists¶

Some well-known APIs no longer return lists:

dict methods dict.keys(), dict.items() and dict.values() return “views” instead of lists. For example, this no longer works: k = d.keys(); k.sort(). Use k = sorted(d) instead (this works in Python 2.5 too and is just as efficient).
Also, the dict.iterkeys(), dict.iteritems() and dict.itervalues() methods are no longer supported.
map() and filter() return iterators. If you really need a list, a quick fix is e.g. list(map(...)), but a better fix is often to use a list comprehension (especially when the original code uses lambda), or rewriting the code so it doesn’t need a list at all. Particularly tricky is map() invoked for the side effects of the function; the correct transformation is to use a regular for loop (since creating a list would just be wasteful).
range() now behaves like xrange() used to behave, except it works with values of arbitrary size. The latter no longer exists.
zip() now returns an iterator.

Ordering Comparisons¶

Python 3.0 has simplified the rules for ordering comparisons:

The ordering comparison operators (<, <=, >=, >) raise a TypeError exception when the operands don’t have a meaningful natural ordering. Thus, expressions like 1 < '', 0 > None or len <= len are no longer valid, and e.g. None < None raises TypeError instead of returning False. A corollary is that sorting a heterogeneous list no longer makes sense – all the elements must be comparable to each other. Note that this does not apply to the == and != operators: objects of different incomparable types always compare unequal to each other.
builtin.sorted() and list.sort() no longer accept the cmp argument providing a comparison function. Use the key argument instead. N.B. the key and reverse arguments are now “keyword-only”.
The cmp() function should be treated as gone, and the __cmp__() special method is no longer supported. Use __lt__() for sorting, __eq__() with __hash__(), and other rich comparisons as needed. (If you really need the cmp() functionality, you could use the expression (a > b) - (a < b) as the equivalent for cmp(a, b).)

Integers¶

PEP 0237: Essentially, long renamed to int. That is, there is only one built-in integral type, named int; but it behaves mostly like the old long type.
PEP 0238: An expression like 1/2 returns a float. Use 1//2 to get the truncating behavior. (The latter syntax has existed for years, at least since Python 2.2.)
The sys.maxint constant was removed, since there is no longer a limit to the value of integers. However, sys.maxsize can be used as an integer larger than any practical list or string index. It conforms to the implementation’s “natural” integer size and is typically the same as sys.maxint in previous releases on the same platform (assuming the same build options).
The repr() of a long integer doesn’t include the trailing L anymore, so code that unconditionally strips that character will chop off the last digit instead. (Use str() instead.)
Octal literals are no longer of the form 0720; use 0o720 instead.

Text Vs. Data Instead Of Unicode Vs. 8-bit¶

Everything you thought you knew about binary data and Unicode has changed.

Python 3.0 uses the concepts of text and (binary) data instead of Unicode strings and 8-bit strings. All text is Unicode; however encoded Unicode is represented as binary data. The type used to hold text is str, the type used to hold data is bytes. The biggest difference with the 2.x situation is that any attempt to mix text and data in Python 3.0 raises TypeError, whereas if you were to mix Unicode and 8-bit strings in Python 2.x, it would work if the 8-bit string happened to contain only 7-bit (ASCII) bytes, but you would get UnicodeDecodeError if it contained non-ASCII values. This value-specific behavior has caused numerous sad faces over the years.
As a consequence of this change in philosophy, pretty much all code that uses Unicode, encodings or binary data most likely has to change. The change is for the better, as in the 2.x world there were numerous bugs having to do with mixing encoded and unencoded text. To be prepared in Python 2.x, start using unicode for all unencoded text, and str for binary or encoded data only. Then the 2to3 tool will do most of the work for you.
You can no longer use u"..." literals for Unicode text. However, you must use b"..." literals for binary data.
As the str and bytes types cannot be mixed, you must always explicitly convert between them. Use str.encode() to go from str to bytes, and bytes.decode() to go from bytes to str. You can also use bytes(s, encoding=...) and str(b, encoding=...), respectively.
Like str, the bytes type is immutable. There is a separate mutable type to hold buffered binary data, bytearray. Nearly all APIs that accept bytes also accept bytearray. The mutable API is based on collections.MutableSequence.
All backslashes in raw string literals are interpreted literally. This means that '\U' and '\u' escapes in raw strings are not treated specially. For example, r'\u20ac' is a string of 6 characters in Python 3.0, whereas in 2.6, ur'\u20ac' was the single “euro” character. (Of course, this change only affects raw string literals; the euro character is '\u20ac' in Python 3.0.)
The built-in basestring abstract type was removed. Use str instead. The str and bytes types don’t have functionality enough in common to warrant a shared base class. The 2to3 tool (see below) replaces every occurrence of basestring with str.
Files opened as text files (still the default mode for open()) always use an encoding to map between strings (in memory) and bytes (on disk). Binary files (opened with a b in the mode argument) always use bytes in memory. This means that if a file is opened using an incorrect mode or encoding, I/O will likely fail loudly, instead of silently producing incorrect data. It also means that even Unix users will have to specify the correct mode (text or binary) when opening a file. There is a platform-dependent default encoding, which on Unixy platforms can be set with the LANG environment variable (and sometimes also with some other platform-specific locale-related environment variables). In many cases, but not all, the system default is UTF-8; you should never count on this default. Any application reading or writing more than pure ASCII text should probably have a way to override the encoding. There is no longer any need for using the encoding-aware streams in the codecs module.
Filenames are passed to and returned from APIs as (Unicode) strings. This can present platform-specific problems because on some platforms filenames are arbitrary byte strings. (On the other hand, on Windows filenames are natively stored as Unicode.) As a work-around, most APIs (e.g. open() and many functions in the os module) that take filenames accept bytes objects as well as strings, and a few APIs have a way to ask for a bytes return value. Thus, os.listdir() returns a list of bytes instances if the argument is a bytes instance, and os.getcwdb() returns the current working directory as a bytes instance. Note that when os.listdir() returns a list of strings, filenames that cannot be decoded properly are omitted rather than raising UnicodeError.
Some system APIs like os.environ and sys.argv can also present problems when the bytes made available by the system is not interpretable using the default encoding. Setting the LANG variable and rerunning the program is probably the best approach.
PEP 3138: The repr() of a string no longer escapes non-ASCII characters. It still escapes control characters and code points with non-printable status in the Unicode standard, however.
PEP 3120: The default source encoding is now UTF-8.
PEP 3131: Non-ASCII letters are now allowed in identifiers. (However, the standard library remains ASCII-only with the exception of contributor names in comments.)
The StringIO and cStringIO modules are gone. Instead, import the io module and use io.StringIO or io.BytesIO for text and data respectively.
See also the Unicode HOWTO, which was updated for Python 3.0.

Overview Of Syntax Changes¶

This section gives a brief overview of every syntactic change in Python 3.0.

New Syntax¶

PEP 3107: Function argument and return value annotations. This provides a standardized way of annotating a function’s parameters and return value. There are no semantics attached to such annotations except that they can be introspected at runtime using the __annotations__ attribute. The intent is to encourage experimentation through metaclasses, decorators or frameworks.
PEP 3102: Keyword-only arguments. Named parameters occurring after *args in the parameter list must be specified using keyword syntax in the call. You can also use a bare * in the parameter list to indicate that you don’t accept a variable-length argument list, but you do have keyword-only arguments.
Keyword arguments are allowed after the list of base classes in a class definition. This is used by the new convention for specifying a metaclass (see next section), but can be used for other purposes as well, as long as the metaclass supports it.
PEP 3104: nonlocal statement. Using nonlocal x you can now assign directly to a variable in an outer (but non-global) scope. nonlocal is a new reserved word.
PEP 3132: Extended Iterable Unpacking. You can now write things like a, b, *rest = some_sequence. And even *rest, a = stuff. The rest object is always a (possibly empty) list; the right-hand side may be any iterable. Example:
```
(a, *rest, b) = range(5)
```
This sets a to 0, b to 4, and rest to [1, 2, 3].
Dictionary comprehensions: {k: v for k, v in stuff} means the same thing as dict(stuff) but is more flexible. (This is PEP 0274 vindicated. :-)
Set literals, e.g. {1, 2}. Note that {} is an empty dictionary; use set() for an empty set. Set comprehensions are also supported; e.g., {x for x in stuff} means the same thing as set(stuff) but is more flexible.
New octal literals, e.g. 0o720 (already in 2.6). The old octal literals (0720) are gone.
New binary literals, e.g. 0b1010 (already in 2.6), and there is a new corresponding built-in function, bin().
Bytes literals are introduced with a leading b or B, and there is a new corresponding built-in function, bytes().

Changed Syntax¶

PEP 3109 and PEP 3134: new raise statement syntax: raise [expr [from expr]]. See below.
as and with are now reserved words. (Since 2.6, actually.)
True, False, and None are reserved words. (2.6 partially enforced the restrictions on None already.)
Change from except exc, var to except exc as var. See PEP 3110.
PEP 3115: New Metaclass Syntax. Instead of:
```
class C:
    __metaclass__ = M
    ...
```
you must now use:
```
class C(metaclass=M):
    ...
```
The module-global __metaclass__ variable is no longer supported. (It was a crutch to make it easier to default to new-style classes without deriving every class from object.)
List comprehensions no longer support the syntactic form [... for var in item1, item2, ...]. Use [... for var in (item1, item2, ...)] instead. Also note that list comprehensions have different semantics: they are closer to syntactic sugar for a generator expression inside a list() constructor, and in particular the loop control variables are no longer leaked into the surrounding scope.
The ellipsis (...) can be used as an atomic expression anywhere. (Previously it was only allowed in slices.) Also, it must now be spelled as .... (Previously it could also be spelled as . . ., by a mere accident of the grammar.)

Removed Syntax¶

PEP 3113: Tuple parameter unpacking removed. You can no longer write def foo(a, (b, c)): .... Use def foo(a, b_c): b, c = b_c instead.
Removed backticks (use repr() instead).
Removed <> (use != instead).
Removed keyword: exec() is no longer a keyword; it remains as a function. (Fortunately the function syntax was also accepted in 2.x.) Also note that exec() no longer takes a stream argument; instead of exec(f) you can use exec(f.read()).
Integer literals no longer support a trailing l or L.
String literals no longer support a leading u or U.
The from module import * syntax is only allowed at the module level, no longer inside functions.
The only acceptable syntax for relative imports is from .[module] import name. All import forms not starting with . are interpreted as absolute imports. (PEP 0328)
Classic classes are gone.

Changes Already Present In Python 2.6¶

Since many users presumably make the jump straight from Python 2.5 to Python 3.0, this section reminds the reader of new features that were originally designed for Python 3.0 but that were back-ported to Python 2.6. The corresponding sections in What’s New in Python 2.6 should be consulted for longer descriptions.

PEP 343: The ‘with’ statement. The with statement is now a standard feature and no longer needs to be imported from the __future__. Also check out Writing Context Managers and The contextlib module.
PEP 366: Explicit Relative Imports From a Main Module. This enhances the usefulness of the -m option when the referenced module lives in a package.
PEP 370: Per-user site-packages Directory.
PEP 371: The multiprocessing Package.
PEP 3101: Advanced String Formatting. Note: the 2.6 description mentions the format() method for both 8-bit and Unicode strings. In 3.0, only the str type (text strings with Unicode support) supports this method; the bytes type does not. The plan is to eventually make this the only API for string formatting, and to start deprecating the % operator in Python 3.1.
PEP 3105: print As a Function. This is now a standard feature and no longer needs to be imported from __future__. More details were given above.
PEP 3110: Exception-Handling Changes. The except exc as var syntax is now standard and except exc, var is no longer supported. (Of course, the as var part is still optional.)
PEP 3112: Byte Literals. The b"..." string literal notation (and its variants like b'...', b"""...""", and br"...") now produces a literal of type bytes.
PEP 3116: New I/O Library. The io module is now the standard way of doing file I/O, and the initial values of sys.stdin, sys.stdout and sys.stderr are now instances of io.TextIOBase. The built-in open() function is now an alias for io.open() and has additional keyword arguments encoding, errors, newline and closefd. Also note that an invalid mode argument now raises ValueError, not IOError. The binary file object underlying a text file object can be accessed as f.buffer (but beware that the text object maintains a buffer of itself in order to speed up the encoding and decoding operations).
PEP 3118: Revised Buffer Protocol. The old builtin buffer() is now really gone; the new builtin memoryview() provides (mostly) similar functionality.
PEP 3119: Abstract Base Classes. The abc module and the ABCs defined in the collections module plays a somewhat more prominent role in the language now, and built-in collection types like dict and list conform to the collections.MutableMapping and collections.MutableSequence ABCs, respectively.
PEP 3127: Integer Literal Support and Syntax. As mentioned above, the new octal literal notation is the only one supported, and binary literals have been added.
PEP 3129: Class Decorators.
PEP 3141: A Type Hierarchy for Numbers. The numbers module is another new use of ABCs, defining Python’s “numeric tower”. Also note the new fractions module which implements numbers.Rational.

Library Changes¶

Due to time constraints, this document does not exhaustively cover the very extensive changes to the standard library. PEP 3108 is the reference for the major changes to the library. Here’s a capsule review:

Many old modules were removed. Some, like gopherlib (no longer used) and md5 (replaced by hashlib), were already deprecated by PEP 0004. Others were removed as a result of the removal of support for various platforms such as Irix, BeOS and Mac OS 9 (see PEP 0011). Some modules were also selected for removal in Python 3.0 due to lack of use or because a better replacement exists. See PEP 3108 for an exhaustive list.
The bsddb3 package was removed because its presence in the core standard library has proved over time to be a particular burden for the core developers due to testing instability and Berkeley DB’s release schedule. However, the package is alive and well, externally maintained at http://www.jcea.es/programacion/pybsddb.htm.
Some modules were renamed because their old name disobeyed PEP 0008, or for various other reasons. Here’s the list:

Old Name New Name

_winreg winreg

ConfigParser configparser

copy_reg copyreg

Queue queue

SocketServer socketserver

markupbase _markupbase

repr reprlib

test.test_support test.support
A common pattern in Python 2.x is to have one version of a module implemented in pure Python, with an optional accelerated version implemented as a C extension; for example, pickle and cPickle. This places the burden of importing the accelerated version and falling back on the pure Python version on each user of these modules. In Python 3.0, the accelerated versions are considered implementation details of the pure Python versions. Users should always import the standard version, which attempts to import the accelerated version and falls back to the pure Python version. The pickle / cPickle pair received this treatment. The profile module is on the list for 3.1. The StringIO module has been turned into a class in the io module.
Some related modules have been grouped into packages, and usually the submodule names have been simplified. The resulting new packages are:
- dbm (anydbm, dbhash, dbm, dumbdbm, gdbm, whichdb).
- html (HTMLParser, htmlentitydefs).
- http (httplib, BaseHTTPServer, CGIHTTPServer, SimpleHTTPServer, Cookie, cookielib).
- tkinter (all Tkinter-related modules except turtle). The target audience of turtle doesn’t really care about tkinter. Also note that as of Python 2.6, the functionality of turtle has been greatly enhanced.
- urllib (urllib, urllib2, urlparse, robotparse).
- xmlrpc (xmlrpclib, DocXMLRPCServer, SimpleXMLRPCServer).

Old Name	New Name
_winreg	winreg
ConfigParser	configparser
copy_reg	copyreg
Queue	queue
SocketServer	socketserver
markupbase	_markupbase
repr	reprlib
test.test_support	test.support

Some other changes to standard library modules, not covered by PEP 3108:

Killed sets. Use the built-in set() class.
Cleanup of the sys module: removed sys.exitfunc(), sys.exc_clear(), sys.exc_type, sys.exc_value, sys.exc_traceback. (Note that sys.last_type etc. remain.)
Cleanup of the array.array type: the read() and write() methods are gone; use fromfile() and tofile() instead. Also, the 'c' typecode for array is gone – use either 'b' for bytes or 'u' for Unicode characters.
Cleanup of the operator module: removed sequenceIncludes() and isCallable().
Cleanup of the thread module: acquire_lock() and release_lock() are gone; use acquire() and release() instead.
Cleanup of the random module: removed the jumpahead() API.
The new module is gone.
The functions os.tmpnam(), os.tempnam() and os.tmpfile() have been removed in favor of the tempfile module.
The tokenize module has been changed to work with bytes. The main entry point is now tokenize.tokenize(), instead of generate_tokens.
string.letters and its friends (string.lowercase and string.uppercase) are gone. Use string.ascii_letters etc. instead. (The reason for the removal is that string.letters and friends had locale-specific behavior, which is a bad idea for such attractively-named global “constants”.)
Renamed module __builtin__ to builtins (removing the underscores, adding an ‘s’). The __builtins__ variable found in most global namespaces is unchanged. To modify a builtin, you should use builtins, not __builtins__!

PEP 3101: A New Approach To String Formatting¶

A new system for built-in string formatting operations replaces the % string formatting operator. (However, the % operator is still supported; it will be deprecated in Python 3.1 and removed from the language at some later time.) Read PEP 3101 for the full scoop.

Changes To Exceptions¶

The APIs for raising and catching exception have been cleaned up and new powerful features added:

PEP 0352: All exceptions must be derived (directly or indirectly) from BaseException. This is the root of the exception hierarchy. This is not new as a recommendation, but the requirement to inherit from BaseException is new. (Python 2.6 still allowed classic classes to be raised, and placed no restriction on what you can catch.) As a consequence, string exceptions are finally truly and utterly dead.
Almost all exceptions should actually derive from Exception; BaseException should only be used as a base class for exceptions that should only be handled at the top level, such as SystemExit or KeyboardInterrupt. The recommended idiom for handling all exceptions except for this latter category is to use except Exception.
StandardError was removed.
Exceptions no longer behave as sequences. Use the args attribute instead.
PEP 3109: Raising exceptions. You must now use raise Exception(args) instead of raise Exception, args. Additionally, you can no longer explicitly specify a traceback; instead, if you have to do this, you can assign directly to the __traceback__ attribute (see below).
PEP 3110: Catching exceptions. You must now use except SomeException as variable instead of except SomeException, variable. Moreover, the variable is explicitly deleted when the except block is left.
PEP 3134: Exception chaining. There are two cases: implicit chaining and explicit chaining. Implicit chaining happens when an exception is raised in an except or finally handler block. This usually happens due to a bug in the handler block; we call this a secondary exception. In this case, the original exception (that was being handled) is saved as the __context__ attribute of the secondary exception. Explicit chaining is invoked with this syntax:
```
raise SecondaryException() from primary_exception
```
(where primary_exception is any expression that produces an exception object, probably an exception that was previously caught). In this case, the primary exception is stored on the __cause__ attribute of the secondary exception. The traceback printed when an unhandled exception occurs walks the chain of __cause__ and __context__ attributes and prints a separate traceback for each component of the chain, with the primary exception at the top. (Java users may recognize this behavior.)
PEP 3134: Exception objects now store their traceback as the __traceback__ attribute. This means that an exception object now contains all the information pertaining to an exception, and there are fewer reasons to use sys.exc_info() (though the latter is not removed).
A few exception messages are improved when Windows fails to load an extension module. For example, error code 193 is now %1 is not a valid Win32 application. Strings now deal with non-English locales.

Miscellaneous Other Changes¶

Operators And Special Methods¶

!= now returns the opposite of ==, unless == returns NotImplemented.
The concept of “unbound methods” has been removed from the language. When referencing a method as a class attribute, you now get a plain function object.
__getslice__(), __setslice__() and __delslice__() were killed. The syntax a[i:j] now translates to a.__getitem__(slice(i, j)) (or __setitem__() or __delitem__(), when used as an assignment or deletion target, respectively).
PEP 3114: the standard next() method has been renamed to __next__().
The __oct__() and __hex__() special methods are removed – oct() and hex() use __index__() now to convert the argument to an integer.
Removed support for __members__ and __methods__.
The function attributes named func_X have been renamed to use the __X__ form, freeing up these names in the function attribute namespace for user-defined attributes. To wit, func_closure, func_code, func_defaults, func_dict, func_doc, func_globals, func_name were renamed to __closure__, __code__, __defaults__, __dict__, __doc__, __globals__, __name__, respectively.
__nonzero__() is now __bool__().

Builtins¶

PEP 3135: New super(). You can now invoke super() without arguments and (assuming this is in a regular instance method defined inside a class statement) the right class and instance will automatically be chosen. With arguments, the behavior of super() is unchanged.
PEP 3111: raw_input() was renamed to input(). That is, the new input() function reads a line from sys.stdin and returns it with the trailing newline stripped. It raises EOFError if the input is terminated prematurely. To get the old behavior of input(), use eval(input()).
A new built-in function next() was added to call the __next__() method on an object.
The round() function rounding strategy and return type have changed. Exact halfway cases are now rounded to the nearest even result instead of away from zero. (For example, round(2.5) now returns 2 rather than 3.) round(x[, n])() now delegates to x.__round__([n]) instead of always returning a float. It generally returns an integer when called with a single argument and a value of the same type as x when called with two arguments.
Moved intern() to sys.intern().
Removed: apply(). Instead of apply(f, args) use f(*args).
Removed callable(). Instead of callable(f) you can use isinstance(f, collections.Callable). The operator.isCallable() function is also gone.
Removed coerce(). This function no longer serves a purpose now that classic classes are gone.
Removed execfile(). Instead of execfile(fn) use exec(open(fn).read()).
Removed the file type. Use open(). There are now several different kinds of streams that open can return in the io module.
Removed reduce(). Use functools.reduce() if you really need it; however, 99 percent of the time an explicit for loop is more readable.
Removed reload(). Use imp.reload().
Removed. dict.has_key() – use the in operator instead.

Build and C API Changes¶

Due to time constraints, here is a very incomplete list of changes to the C API.

Support for several platforms was dropped, including but not limited to Mac OS 9, BeOS, RISCOS, Irix, and Tru64.
PEP 3118: New Buffer API.
PEP 3121: Extension Module Initialization & Finalization.
PEP 3123: Making PyObject_HEAD conform to standard C.
No more C API support for restricted execution.
PyNumber_Coerce(), PyNumber_CoerceEx(), PyMember_Get(), and PyMember_Set() C APIs are removed.
New C API PyImport_ImportModuleNoBlock(), works like PyImport_ImportModule() but won’t block on the import lock (returning an error instead).
Renamed the boolean conversion C-level slot and method: nb_nonzero is now nb_bool.
Removed METH_OLDARGS and WITH_CYCLE_GC from the C API.

Performance¶

The net result of the 3.0 generalizations is that Python 3.0 runs the pystone benchmark around 10% slower than Python 2.5. Most likely the biggest cause is the removal of special-casing for small integers. There’s room for improvement, but it will happen after 3.0 is released!

Porting To Python 3.0¶

For porting existing Python 2.5 or 2.6 source code to Python 3.0, the best strategy is the following:

(Prerequisite:) Start with excellent test coverage.
Port to Python 2.6. This should be no more work than the average port from Python 2.x to Python 2.(x+1). Make sure all your tests pass.
(Still using 2.6:) Turn on the -3 command line switch. This enables warnings about features that will be removed (or change) in 3.0. Run your test suite again, and fix code that you get warnings about until there are no warnings left, and all your tests still pass.
Run the 2to3 source-to-source translator over your source code tree. (See 2to3 - Automated Python 2 to 3 code translation for more on this tool.) Run the result of the translation under Python 3.0. Manually fix up any remaining issues, fixing problems until all tests pass again.

It is not recommended to try to write source code that runs unchanged under both Python 2.6 and 3.0; you’d have to use a very contorted coding style, e.g. avoiding print statements, metaclasses, and much more. If you are maintaining a library that needs to support both Python 2.6 and Python 3.0, the best approach is to modify step 3 above by editing the 2.6 version of the source code and running the 2to3 translator again, rather than editing the 3.0 version of the source code.

For porting C extensions to Python 3.0, please see Porting Extension Modules to 3.0.

What’s New in Python 2.7¶

Author:	A.M. Kuchling (amk at amk.ca)
Release:	3.2.2
Date:	August 02, 2015

This article explains the new features in Python 2.7. The final release of 2.7 is currently scheduled for July 2010; the detailed schedule is described in PEP 373.

Numeric handling has been improved in many ways, for both floating-point numbers and for the Decimal class. There are some useful additions to the standard library, such as a greatly enhanced unittest module, the argparse module for parsing command-line options, convenient ordered-dictionary and Counter classes in the collections module, and many other improvements.

Python 2.7 is planned to be the last of the 2.x releases, so we worked on making it a good release for the long term. To help with porting to Python 3, several new features from the Python 3.x series have been included in 2.7.

This article doesn’t attempt to provide a complete specification of the new features, but instead provides a convenient overview. For full details, you should refer to the documentation for Python 2.7 at http://docs.python.org. If you want to understand the rationale for the design and implementation, refer to the PEP for a particular new feature or the issue on http://bugs.python.org in which a change was discussed. Whenever possible, “What’s New in Python” links to the bug/patch item for each change.

The Future for Python 2.x¶

Python 2.7 is intended to be the last major release in the 2.x series. The Python maintainers are planning to focus their future efforts on the Python 3.x series.

This means that 2.7 will remain in place for a long time, running production systems that have not been ported to Python 3.x. Two consequences of the long-term significance of 2.7 are:

It’s very likely the 2.7 release will have a longer period of maintenance compared to earlier 2.x versions. Python 2.7 will continue to be maintained while the transition to 3.x continues, and the developers are planning to support Python 2.7 with bug-fix releases beyond the typical two years.
A policy decision was made to silence warnings only of interest to developers. DeprecationWarning and its descendants are now ignored unless otherwise requested, preventing users from seeing warnings triggered by an application. This change was also made in the branch that will become Python 3.2. (Discussed on stdlib-sig and carried out in issue 7319.)

In previous releases, DeprecationWarning messages were enabled by default, providing Python developers with a clear indication of where their code may break in a future major version of Python.

However, there are increasingly many users of Python-based applications who are not directly involved in the development of those applications. DeprecationWarning messages are irrelevant to such users, making them worry about an application that’s actually working correctly and burdening application developers with responding to these concerns.

You can re-enable display of DeprecationWarning messages by running Python with the -Wdefault (short form: -Wd) switch, or by setting the PYTHONWARNINGS environment variable to "default" (or "d") before running Python. Python code can also re-enable them by calling warnings.simplefilter('default').

Python 3.1 Features¶

Much as Python 2.6 incorporated features from Python 3.0, version 2.7 incorporates some of the new features in Python 3.1. The 2.x series continues to provide tools for migrating to the 3.x series.

A partial list of 3.1 features that were backported to 2.7:

The syntax for set literals ({1,2,3} is a mutable set).
Dictionary and set comprehensions ({ i: i*2 for i in range(3)}).
Multiple context managers in a single with statement.
A new version of the io library, rewritten in C for performance.
The ordered-dictionary type described in PEP 372: Adding an Ordered Dictionary to collections.
The new "," format specifier described in PEP 378: Format Specifier for Thousands Separator.
The memoryview object.
A small subset of the importlib module, described below.
The repr() of a float x is shorter in many cases: it’s now based on the shortest decimal string that’s guaranteed to round back to x. As in previous versions of Python, it’s guaranteed that float(repr(x)) recovers x.
Float-to-string and string-to-float conversions are correctly rounded. The round() function is also now correctly rounded.
The PyCapsule type, used to provide a C API for extension modules.
The PyLong_AsLongAndOverflow() C API function.

Other new Python3-mode warnings include:

operator.isCallable() and operator.sequenceIncludes(), which are not supported in 3.x, now trigger warnings.
The -3 switch now automatically enables the -Qwarn switch that causes warnings about using classic division with integers and long integers.

PEP 372: Adding an Ordered Dictionary to collections¶

Regular Python dictionaries iterate over key/value pairs in arbitrary order. Over the years, a number of authors have written alternative implementations that remember the order that the keys were originally inserted. Based on the experiences from those implementations, 2.7 introduces a new OrderedDict class in the collections module.

The OrderedDict API provides the same interface as regular dictionaries but iterates over keys and values in a guaranteed order depending on when a key was first inserted:

>>> from collections import OrderedDict
>>> d = OrderedDict([('first', 1),
...                  ('second', 2),
...                  ('third', 3)])
>>> d.items()
[('first', 1), ('second', 2), ('third', 3)]

If a new entry overwrites an existing entry, the original insertion position is left unchanged:

>>> d['second'] = 4
>>> d.items()
[('first', 1), ('second', 4), ('third', 3)]

Deleting an entry and reinserting it will move it to the end:

>>> del d['second']
>>> d['second'] = 5
>>> d.items()
[('first', 1), ('third', 3), ('second', 5)]

The popitem() method has an optional last argument that defaults to True. If last is True, the most recently added key is returned and removed; if it’s False, the oldest key is selected:

>>> od = OrderedDict([(x,0) for x in range(20)])
>>> od.popitem()
(19, 0)
>>> od.popitem()
(18, 0)
>>> od.popitem(last=False)
(0, 0)
>>> od.popitem(last=False)
(1, 0)

Comparing two ordered dictionaries checks both the keys and values, and requires that the insertion order was the same:

>>> od1 = OrderedDict([('first', 1),
...                    ('second', 2),
...                    ('third', 3)])
>>> od2 = OrderedDict([('third', 3),
...                    ('first', 1),
...                    ('second', 2)])
>>> od1 == od2
False
>>> # Move 'third' key to the end
>>> del od2['third']; od2['third'] = 3
>>> od1 == od2
True

Comparing an OrderedDict with a regular dictionary ignores the insertion order and just compares the keys and values.

How does the OrderedDict work? It maintains a doubly-linked list of keys, appending new keys to the list as they’re inserted. A secondary dictionary maps keys to their corresponding list node, so deletion doesn’t have to traverse the entire linked list and therefore remains O(1).

The standard library now supports use of ordered dictionaries in several modules.

The ConfigParser module uses them by default, meaning that configuration files can now be read, modified, and then written back in their original order.
The _asdict() method for collections.namedtuple() now returns an ordered dictionary with the values appearing in the same order as the underlying tuple indices.
The json module’s JSONDecoder class constructor was extended with an object_pairs_hook parameter to allow OrderedDict instances to be built by the decoder. Support was also added for third-party tools like PyYAML.

See also

PEP 372 - Adding an ordered dictionary to collections: PEP written by Armin Ronacher and Raymond Hettinger; implemented by Raymond Hettinger.

PEP 378: Format Specifier for Thousands Separator¶

To make program output more readable, it can be useful to add separators to large numbers, rendering them as 18,446,744,073,709,551,616 instead of 18446744073709551616.

The fully general solution for doing this is the locale module, which can use different separators (”,” in North America, ”.” in Europe) and different grouping sizes, but locale is complicated to use and unsuitable for multi-threaded applications where different threads are producing output for different locales.

Therefore, a simple comma-grouping mechanism has been added to the mini-language used by the str.format() method. When formatting a floating-point number, simply include a comma between the width and the precision:

>>> '{:20,.2f}'.format(18446744073709551616.0)
'18,446,744,073,709,551,616.00'

When formatting an integer, include the comma after the width:

>>> '{:20,d}'.format(18446744073709551616)
'18,446,744,073,709,551,616'

This mechanism is not adaptable at all; commas are always used as the separator and the grouping is always into three-digit groups. The comma-formatting mechanism isn’t as general as the locale module, but it’s easier to use.

See also

PEP 378 - Format Specifier for Thousands Separator: PEP written by Raymond Hettinger; implemented by Eric Smith.

PEP 389: The argparse Module for Parsing Command Lines¶

The argparse module for parsing command-line arguments was added as a more powerful replacement for the optparse module.

This means Python now supports three different modules for parsing command-line arguments: getopt, optparse, and argparse. The getopt module closely resembles the C library’s getopt() function, so it remains useful if you’re writing a Python prototype that will eventually be rewritten in C. optparse becomes redundant, but there are no plans to remove it because there are many scripts still using it, and there’s no automated way to update these scripts. (Making the argparse API consistent with optparse‘s interface was discussed but rejected as too messy and difficult.)

In short, if you’re writing a new script and don’t need to worry about compatibility with earlier versions of Python, use argparse instead of optparse.

Here’s an example:

import argparse

parser = argparse.ArgumentParser(description='Command-line example.')

# Add optional switches
parser.add_argument('-v', action='store_true', dest='is_verbose',
                    help='produce verbose output')
parser.add_argument('-o', action='store', dest='output',
                    metavar='FILE',
                    help='direct output to FILE instead of stdout')
parser.add_argument('-C', action='store', type=int, dest='context',
                    metavar='NUM', default=0,
                    help='display NUM lines of added context')

# Allow any number of additional arguments.
parser.add_argument(nargs='*', action='store', dest='inputs',
                    help='input filenames (default is stdin)')

args = parser.parse_args()
print args.__dict__

Unless you override it, -h and --help switches are automatically added, and produce neatly formatted output:

-> ./python.exe argparse-example.py --help
usage: argparse-example.py [-h] [-v] [-o FILE] [-C NUM] [inputs [inputs ...]]

Command-line example.

positional arguments:
  inputs      input filenames (default is stdin)

optional arguments:
  -h, --help  show this help message and exit
  -v          produce verbose output
  -o FILE     direct output to FILE instead of stdout
  -C NUM      display NUM lines of added context

As with optparse, the command-line switches and arguments are returned as an object with attributes named by the dest parameters:

-> ./python.exe argparse-example.py -v
{'output': None,
 'is_verbose': True,
 'context': 0,
 'inputs': []}

-> ./python.exe argparse-example.py -v -o /tmp/output -C 4 file1 file2
{'output': '/tmp/output',
 'is_verbose': True,
 'context': 4,
 'inputs': ['file1', 'file2']}

argparse has much fancier validation than optparse; you can specify an exact number of arguments as an integer, 0 or more arguments by passing '*', 1 or more by passing '+', or an optional argument with '?'. A top-level parser can contain sub-parsers to define subcommands that have different sets of switches, as in svn commit, svn checkout, etc. You can specify an argument’s type as FileType, which will automatically open files for you and understands that '-' means standard input or output.

See also

argparse module documentation

Upgrading optparse code to use argparse: Part of the Python documentation, describing how to convert code that uses optparse.
PEP 389 - argparse - New Command Line Parsing Module: PEP written and implemented by Steven Bethard.

PEP 391: Dictionary-Based Configuration For Logging¶

The logging module is very flexible; applications can define a tree of logging subsystems, and each logger in this tree can filter out certain messages, format them differently, and direct messages to a varying number of handlers.

All this flexibility can require a lot of configuration. You can write Python statements to create objects and set their properties, but a complex set-up requires verbose but boring code. logging also supports a fileConfig() function that parses a file, but the file format doesn’t support configuring filters, and it’s messier to generate programmatically.

Python 2.7 adds a dictConfig() function that uses a dictionary to configure logging. There are many ways to produce a dictionary from different sources: construct one with code; parse a file containing JSON; or use a YAML parsing library if one is installed.

The following example configures two loggers, the root logger and a logger named “network”. Messages sent to the root logger will be sent to the system log using the syslog protocol, and messages to the “network” logger will be written to a network.log file that will be rotated once the log reaches 1Mb.

import logging
import logging.config

configdict = {
 'version': 1,    # Configuration schema in use; must be 1 for now
 'formatters': {
     'standard': {
         'format': ('%(asctime)s %(name)-15s '
                    '%(levelname)-8s %(message)s')}},

 'handlers': {'netlog': {'backupCount': 10,
                     'class': 'logging.handlers.RotatingFileHandler',
                     'filename': '/logs/network.log',
                     'formatter': 'standard',
                     'level': 'INFO',
                     'maxBytes': 1024*1024},
              'syslog': {'class': 'logging.handlers.SysLogHandler',
                         'formatter': 'standard',
                         'level': 'ERROR'}},

 # Specify all the subordinate loggers
 'loggers': {
             'network': {
                         'handlers': ['netlog']
             }
 },
 # Specify properties of the root logger
 'root': {
          'handlers': ['syslog']
 },
}

# Set up configuration
logging.config.dictConfig(configdict)

# As an example, log two error messages
logger = logging.getLogger('/')
logger.error('Database not found')

netlogger = logging.getLogger('network')
netlogger.error('Connection failed')

Three smaller enhancements to the logging module, all implemented by Vinay Sajip, are:

The SysLogHandler class now supports syslogging over TCP. The constructor has a socktype parameter giving the type of socket to use, either socket.SOCK_DGRAM for UDP or socket.SOCK_STREAM for TCP. The default protocol remains UDP.
Logger instances gained a getChild() method that retrieves a descendant logger using a relative path. For example, once you retrieve a logger by doing log = getLogger('app'), calling log.getChild('network.listen') is equivalent to getLogger('app.network.listen').
The LoggerAdapter class gained a isEnabledFor() method that takes a level and returns whether the underlying logger would process a message of that level of importance.

See also

PEP 391 - Dictionary-Based Configuration For Logging: PEP written and implemented by Vinay Sajip.

PEP 3106: Dictionary Views¶

The dictionary methods keys(), values(), and items() are different in Python 3.x. They return an object called a view instead of a fully materialized list.

It’s not possible to change the return values of keys(), values(), and items() in Python 2.7 because too much code would break. Instead the 3.x versions were added under the new names viewkeys(), viewvalues(), and viewitems().

>>> d = dict((i*10, chr(65+i)) for i in range(26))
>>> d
{0: 'A', 130: 'N', 10: 'B', 140: 'O', 20: ..., 250: 'Z'}
>>> d.viewkeys()
dict_keys([0, 130, 10, 140, 20, 150, 30, ..., 250])

Views can be iterated over, but the key and item views also behave like sets. The & operator performs intersection, and | performs a union:

>>> d1 = dict((i*10, chr(65+i)) for i in range(26))
>>> d2 = dict((i**.5, i) for i in range(1000))
>>> d1.viewkeys() & d2.viewkeys()
set([0.0, 10.0, 20.0, 30.0])
>>> d1.viewkeys() | range(0, 30)
set([0, 1, 130, 3, 4, 5, 6, ..., 120, 250])

The view keeps track of the dictionary and its contents change as the dictionary is modified:

>>> vk = d.viewkeys()
>>> vk
dict_keys([0, 130, 10, ..., 250])
>>> d[260] = '&'
>>> vk
dict_keys([0, 130, 260, 10, ..., 250])

However, note that you can’t add or remove keys while you’re iterating over the view:

>>> for k in vk:
...     d[k*2] = k
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: dictionary changed size during iteration

You can use the view methods in Python 2.x code, and the 2to3 converter will change them to the standard keys(), values(), and items() methods.

See also

PEP 3106 - Revamping dict.keys(), .values() and .items(): PEP written by Guido van Rossum. Backported to 2.7 by Alexandre Vassalotti; issue 1967.

PEP 3137: The memoryview Object¶

The memoryview object provides a view of another object’s memory content that matches the bytes type’s interface.

>>> import string
>>> m = memoryview(string.letters)
>>> m
<memory at 0x37f850>
>>> len(m)           # Returns length of underlying object
52
>>> m[0], m[25], m[26]   # Indexing returns one byte
('a', 'z', 'A')
>>> m2 = m[0:26]         # Slicing returns another memoryview
>>> m2
<memory at 0x37f080>

The content of the view can be converted to a string of bytes or a list of integers:

>>> m2.tobytes()
'abcdefghijklmnopqrstuvwxyz'
>>> m2.tolist()
[97, 98, 99, 100, 101, 102, 103, ... 121, 122]
>>>

memoryview objects allow modifying the underlying object if it’s a mutable object.

>>> m2[0] = 75
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: cannot modify read-only memory
>>> b = bytearray(string.letters)  # Creating a mutable object
>>> b
bytearray(b'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ')
>>> mb = memoryview(b)
>>> mb[0] = '*'         # Assign to view, changing the bytearray.
>>> b[0:5]              # The bytearray has been changed.
bytearray(b'*bcde')
>>>

See also

PEP 3137 - Immutable Bytes and Mutable Buffer: PEP written by Guido van Rossum. Implemented by Travis Oliphant, Antoine Pitrou and others. Backported to 2.7 by Antoine Pitrou; issue 2396.

Other Language Changes¶

Some smaller changes made to the core Python language are:

The syntax for set literals has been backported from Python 3.x. Curly brackets are used to surround the contents of the resulting mutable set; set literals are distinguished from dictionaries by not containing colons and values. {} continues to represent an empty dictionary; use set() for an empty set.
```
>>> {1,2,3,4,5}
set([1, 2, 3, 4, 5])
>>> set() # empty set
set([])
>>> {}    # empty dict
{}
```
Backported by Alexandre Vassalotti; issue 2335.
Dictionary and set comprehensions are another feature backported from 3.x, generalizing list/generator comprehensions to use the literal syntax for sets and dictionaries.
```
>>> {x: x*x for x in range(6)}
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25}
>>> {('a'*x) for x in range(6)}
set(['', 'a', 'aa', 'aaa', 'aaaa', 'aaaaa'])
```
Backported by Alexandre Vassalotti; issue 2333.
The with statement can now use multiple context managers in one statement. Context managers are processed from left to right and each one is treated as beginning a new with statement. This means that:
```
with A() as a, B() as b:
    ... suite of statements ...
```
is equivalent to:
```
with A() as a:
    with B() as b:
        ... suite of statements ...
```
The contextlib.nested() function provides a very similar function, so it’s no longer necessary and has been deprecated.

(Proposed in http://codereview.appspot.com/53094; implemented by Georg Brandl.)
Conversions between floating-point numbers and strings are now correctly rounded on most platforms. These conversions occur in many different places: str() on floats and complex numbers; the float and complex constructors; numeric formatting; serializing and deserializing floats and complex numbers using the marshal, pickle and json modules; parsing of float and imaginary literals in Python code; and Decimal-to-float conversion.

Related to this, the repr() of a floating-point number x now returns a result based on the shortest decimal string that’s guaranteed to round back to x under correct rounding (with round-half-to-even rounding mode). Previously it gave a string based on rounding x to 17 decimal digits.

The rounding library responsible for this improvement works on Windows and on Unix platforms using the gcc, icc, or suncc compilers. There may be a small number of platforms where correct operation of this code cannot be guaranteed, so the code is not used on such systems. You can find out which code is being used by checking sys.float_repr_style, which will be short if the new code is in use and legacy if it isn’t.

Implemented by Eric Smith and Mark Dickinson, using David Gay’s dtoa.c library; issue 7117.
Conversions from long integers and regular integers to floating point now round differently, returning the floating-point number closest to the number. This doesn’t matter for small integers that can be converted exactly, but for large numbers that will unavoidably lose precision, Python 2.7 now approximates more closely. For example, Python 2.6 computed the following:
```
>>> n = 295147905179352891391
>>> float(n)
2.9514790517935283e+20
>>> n - long(float(n))
65535L
```
Python 2.7’s floating-point result is larger, but much closer to the true value:
```
>>> n = 295147905179352891391
>>> float(n)
2.9514790517935289e+20
>>> n - long(float(n))
-1L
```
(Implemented by Mark Dickinson; issue 3166.)

Integer division is also more accurate in its rounding behaviours. (Also implemented by Mark Dickinson; issue 1811.)
Implicit coercion for complex numbers has been removed; the interpreter will no longer ever attempt to call a __coerce__() method on complex objects. (Removed by Meador Inge and Mark Dickinson; issue 5211.)
The str.format() method now supports automatic numbering of the replacement fields. This makes using str.format() more closely resemble using %s formatting:
```
>>> '{}:{}:{}'.format(2009, 04, 'Sunday')
'2009:4:Sunday'
>>> '{}:{}:{day}'.format(2009, 4, day='Sunday')
'2009:4:Sunday'
```
The auto-numbering takes the fields from left to right, so the first {...} specifier will use the first argument to str.format(), the next specifier will use the next argument, and so on. You can’t mix auto-numbering and explicit numbering – either number all of your specifier fields or none of them – but you can mix auto-numbering and named fields, as in the second example above. (Contributed by Eric Smith; issue 5237.)

Complex numbers now correctly support usage with format(), and default to being right-aligned. Specifying a precision or comma-separation applies to both the real and imaginary parts of the number, but a specified field width and alignment is applied to the whole of the resulting 1.5+3j output. (Contributed by Eric Smith; issue 1588 and issue 7988.)

The ‘F’ format code now always formats its output using uppercase characters, so it will now produce ‘INF’ and ‘NAN’. (Contributed by Eric Smith; issue 3382.)

A low-level change: the object.__format__() method now triggers a PendingDeprecationWarning if it’s passed a format string, because the __format__() method for object converts the object to a string representation and formats that. Previously the method silently applied the format string to the string representation, but that could hide mistakes in Python code. If you’re supplying formatting information such as an alignment or precision, presumably you’re expecting the formatting to be applied in some object-specific way. (Fixed by Eric Smith; issue 7994.)
The int() and long() types gained a bit_length method that returns the number of bits necessary to represent its argument in binary:
```
>>> n = 37
>>> bin(n)
'0b100101'
>>> n.bit_length()
6
>>> n = 2**123-1
>>> n.bit_length()
123
>>> (n+1).bit_length()
124
```
(Contributed by Fredrik Johansson and Victor Stinner; issue 3439.)
The import statement will no longer try an absolute import if a relative import (e.g. from .os import sep) fails. This fixes a bug, but could possibly break certain import statements that were only working by accident. (Fixed by Meador Inge; issue 7902.)
It’s now possible for a subclass of the built-in unicode type to override the __unicode__() method. (Implemented by Victor Stinner; issue 1583863.)
The bytearray type’s translate() method now accepts None as its first argument. (Fixed by Georg Brandl; issue 4759.)
When using @classmethod and @staticmethod to wrap methods as class or static methods, the wrapper object now exposes the wrapped function as their __func__ attribute. (Contributed by Amaury Forgeot d’Arc, after a suggestion by George Sakkis; issue 5982.)
When a restricted set of attributes were set using __slots__, deleting an unset attribute would not raise AttributeError as you would expect. Fixed by Benjamin Peterson; issue 7604.)
Two new encodings are now supported: “cp720”, used primarily for Arabic text; and “cp858”, a variant of CP 850 that adds the euro symbol. (CP720 contributed by Alexander Belchenko and Amaury Forgeot d’Arc in issue 1616979; CP858 contributed by Tim Hatch in issue 8016.)
The file object will now set the filename attribute on the IOError exception when trying to open a directory on POSIX platforms (noted by Jan Kaliszewski; issue 4764), and now explicitly checks for and forbids writing to read-only file objects instead of trusting the C library to catch and report the error (fixed by Stefan Krah; issue 5677).
The Python tokenizer now translates line endings itself, so the compile() built-in function now accepts code using any line-ending convention. Additionally, it no longer requires that the code end in a newline.
Extra parentheses in function definitions are illegal in Python 3.x, meaning that you get a syntax error from def f((x)): pass. In Python3-warning mode, Python 2.7 will now warn about this odd usage. (Noted by James Lingard; issue 7362.)
It’s now possible to create weak references to old-style class objects. New-style classes were always weak-referenceable. (Fixed by Antoine Pitrou; issue 8268.)
When a module object is garbage-collected, the module’s dictionary is now only cleared if no one else is holding a reference to the dictionary (issue 7140).

Interpreter Changes¶

A new environment variable, PYTHONWARNINGS, allows controlling warnings. It should be set to a string containing warning settings, equivalent to those used with the -W switch, separated by commas. (Contributed by Brian Curtin; issue 7301.)

For example, the following setting will print warnings every time they occur, but turn warnings from the Cookie module into an error. (The exact syntax for setting an environment variable varies across operating systems and shells.)

export PYTHONWARNINGS=all,error:::Cookie:0

Optimizations¶

Several performance enhancements have been added:

A new opcode was added to perform the initial setup for with statements, looking up the __enter__() and __exit__() methods. (Contributed by Benjamin Peterson.)
The garbage collector now performs better for one common usage pattern: when many objects are being allocated without deallocating any of them. This would previously take quadratic time for garbage collection, but now the number of full garbage collections is reduced as the number of objects on the heap grows. The new logic only performs a full garbage collection pass when the middle generation has been collected 10 times and when the number of survivor objects from the middle generation exceeds 10% of the number of objects in the oldest generation. (Suggested by Martin von Löwis and implemented by Antoine Pitrou; issue 4074.)
The garbage collector tries to avoid tracking simple containers which can’t be part of a cycle. In Python 2.7, this is now true for tuples and dicts containing atomic types (such as ints, strings, etc.). Transitively, a dict containing tuples of atomic types won’t be tracked either. This helps reduce the cost of each garbage collection by decreasing the number of objects to be considered and traversed by the collector. (Contributed by Antoine Pitrou; issue 4688.)
Long integers are now stored internally either in base 2**15 or in base 2**30, the base being determined at build time. Previously, they were always stored in base 2**15. Using base 2**30 gives significant performance improvements on 64-bit machines, but benchmark results on 32-bit machines have been mixed. Therefore, the default is to use base 2**30 on 64-bit machines and base 2**15 on 32-bit machines; on Unix, there’s a new configure option --enable-big-digits that can be used to override this default.

Apart from the performance improvements this change should be invisible to end users, with one exception: for testing and debugging purposes there’s a new structseq sys.long_info that provides information about the internal format, giving the number of bits per digit and the size in bytes of the C type used to store each digit:
```
>>> import sys
>>> sys.long_info
sys.long_info(bits_per_digit=30, sizeof_digit=4)
```
(Contributed by Mark Dickinson; issue 4258.)

Another set of changes made long objects a few bytes smaller: 2 bytes smaller on 32-bit systems and 6 bytes on 64-bit. (Contributed by Mark Dickinson; issue 5260.)
The division algorithm for long integers has been made faster by tightening the inner loop, doing shifts instead of multiplications, and fixing an unnecessary extra iteration. Various benchmarks show speedups of between 50% and 150% for long integer divisions and modulo operations. (Contributed by Mark Dickinson; issue 5512.) Bitwise operations are also significantly faster (initial patch by Gregory Smith; issue 1087418).
The implementation of % checks for the left-side operand being a Python string and special-cases it; this results in a 1-3% performance increase for applications that frequently use % with strings, such as templating libraries. (Implemented by Collin Winter; issue 5176.)
List comprehensions with an if condition are compiled into faster bytecode. (Patch by Antoine Pitrou, back-ported to 2.7 by Jeffrey Yasskin; issue 4715.)
Converting an integer or long integer to a decimal string was made faster by special-casing base 10 instead of using a generalized conversion function that supports arbitrary bases. (Patch by Gawain Bolton; issue 6713.)
The split(), replace(), rindex(), rpartition(), and rsplit() methods of string-like types (strings, Unicode strings, and bytearray objects) now use a fast reverse-search algorithm instead of a character-by-character scan. This is sometimes faster by a factor of 10. (Added by Florent Xicluna; issue 7462 and issue 7622.)
The pickle and cPickle modules now automatically intern the strings used for attribute names, reducing memory usage of the objects resulting from unpickling. (Contributed by Jake McGuire; issue 5084.)
The cPickle module now special-cases dictionaries, nearly halving the time required to pickle them. (Contributed by Collin Winter; issue 5670.)

New and Improved Modules¶

As in every release, Python’s standard library received a number of enhancements and bug fixes. Here’s a partial list of the most notable changes, sorted alphabetically by module name. Consult the Misc/NEWS file in the source tree for a more complete list of changes, or look through the Subversion logs for all the details.

The bdb module’s base debugging class Bdb gained a feature for skipping modules. The constructor now takes an iterable containing glob-style patterns such as django.*; the debugger will not step into stack frames from a module that matches one of these patterns. (Contributed by Maru Newby after a suggestion by Senthil Kumaran; issue 5142.)
The binascii module now supports the buffer API, so it can be used with memoryview instances and other similar buffer objects. (Backported from 3.x by Florent Xicluna; issue 7703.)
Updated module: the bsddb module has been updated from 4.7.2devel9 to version 4.8.4 of the pybsddb package. The new version features better Python 3.x compatibility, various bug fixes, and adds several new BerkeleyDB flags and methods. (Updated by Jesús Cea Avión; issue 8156. The pybsddb changelog can be read at http://hg.jcea.es/pybsddb/file/tip/ChangeLog.)
The bz2 module’s BZ2File now supports the context management protocol, so you can write with bz2.BZ2File(...) as f:. (Contributed by Hagen Fürstenau; issue 3860.)
New class: the Counter class in the collections module is useful for tallying data. Counter instances behave mostly like dictionaries but return zero for missing keys instead of raising a KeyError:
```
>>> from collections import Counter
>>> c = Counter()
>>> for letter in 'here is a sample of english text':
...   c[letter] += 1
...
>>> c
Counter({' ': 6, 'e': 5, 's': 3, 'a': 2, 'i': 2, 'h': 2,
'l': 2, 't': 2, 'g': 1, 'f': 1, 'm': 1, 'o': 1, 'n': 1,
'p': 1, 'r': 1, 'x': 1})
>>> c['e']
5
>>> c['z']
0
```
There are three additional Counter methods. most_common() returns the N most common elements and their counts. elements() returns an iterator over the contained elements, repeating each element as many times as its count. subtract() takes an iterable and subtracts one for each element instead of adding; if the argument is a dictionary or another Counter, the counts are subtracted.
```
>>> c.most_common(5)
[(' ', 6), ('e', 5), ('s', 3), ('a', 2), ('i', 2)]
>>> c.elements() ->
   'a', 'a', ' ', ' ', ' ', ' ', ' ', ' ',
   'e', 'e', 'e', 'e', 'e', 'g', 'f', 'i', 'i',
   'h', 'h', 'm', 'l', 'l', 'o', 'n', 'p', 's',
   's', 's', 'r', 't', 't', 'x'
>>> c['e']
5
>>> c.subtract('very heavy on the letter e')
>>> c['e']    # Count is now lower
-1
```
Contributed by Raymond Hettinger; issue 1696199.

New class: OrderedDict is described in the earlier section PEP 372: Adding an Ordered Dictionary to collections.

New method: The deque data type now has a count() method that returns the number of contained elements equal to the supplied argument x, and a reverse() method that reverses the elements of the deque in-place. deque also exposes its maximum length as the read-only maxlen attribute. (Both features added by Raymond Hettinger.)

The namedtuple class now has an optional rename parameter. If rename is true, field names that are invalid because they’ve been repeated or aren’t legal Python identifiers will be renamed to legal names that are derived from the field’s position within the list of fields:
```
>>> from collections import namedtuple
>>> T = namedtuple('T', ['field1', '$illegal', 'for', 'field2'], rename=True)
>>> T._fields
('field1', '_1', '_2', 'field2')
```
(Added by Raymond Hettinger; issue 1818.)

Finally, the Mapping abstract base class now returns NotImplemented if a mapping is compared to another type that isn’t a Mapping. (Fixed by Daniel Stutzbach; issue 8729.)

Constructors for the parsing classes in the ConfigParser module now take a allow_no_value parameter, defaulting to false; if true, options without values will be allowed. For example:

>>> import ConfigParser, StringIO
>>> sample_config = """
... [mysqld]
... user = mysql
... pid-file = /var/run/mysqld/mysqld.pid
... skip-bdb
... """
>>> config = ConfigParser.RawConfigParser(allow_no_value=True)
>>> config.readfp(StringIO.StringIO(sample_config))
>>> config.get('mysqld', 'user')
'mysql'
>>> print config.get('mysqld', 'skip-bdb')
None
>>> print config.get('mysqld', 'unknown')
Traceback (most recent call last):
  ...
NoOptionError: No option 'unknown' in section: 'mysqld'

(Contributed by Mats Kindahl; issue 7005.)

Deprecated function: contextlib.nested(), which allows handling more than one context manager with a single with statement, has been deprecated, because the with statement now supports multiple context managers.
The cookielib module now ignores cookies that have an invalid version field, one that doesn’t contain an integer value. (Fixed by John J. Lee; issue 3924.)
The copy module’s deepcopy() function will now correctly copy bound instance methods. (Implemented by Robert Collins; issue 1515.)
The ctypes module now always converts None to a C NULL pointer for arguments declared as pointers. (Changed by Thomas Heller; issue 4606.) The underlying libffi library has been updated to version 3.0.9, containing various fixes for different platforms. (Updated by Matthias Klose; issue 8142.)
New method: the datetime module’s timedelta class gained a total_seconds() method that returns the number of seconds in the duration. (Contributed by Brian Quinlan; issue 5788.)
New method: the Decimal class gained a from_float() class method that performs an exact conversion of a floating-point number to a Decimal. This exact conversion strives for the closest decimal approximation to the floating-point representation’s value; the resulting decimal value will therefore still include the inaccuracy, if any. For example, Decimal.from_float(0.1) returns Decimal('0.1000000000000000055511151231257827021181583404541015625'). (Implemented by Raymond Hettinger; issue 4796.)

Comparing instances of Decimal with floating-point numbers now produces sensible results based on the numeric values of the operands. Previously such comparisons would fall back to Python’s default rules for comparing objects, which produced arbitrary results based on their type. Note that you still cannot combine Decimal and floating-point in other operations such as addition, since you should be explicitly choosing how to convert between float and Decimal. (Fixed by Mark Dickinson; issue 2531.)

The constructor for Decimal now accepts floating-point numbers (added by Raymond Hettinger; issue 8257) and non-European Unicode characters such as Arabic-Indic digits (contributed by Mark Dickinson; issue 6595).

Most of the methods of the Context class now accept integers as well as Decimal instances; the only exceptions are the canonical() and is_canonical() methods. (Patch by Juan José Conti; issue 7633.)

When using Decimal instances with a string’s format() method, the default alignment was previously left-alignment. This has been changed to right-alignment, which is more sensible for numeric types. (Changed by Mark Dickinson; issue 6857.)

Comparisons involving a signaling NaN value (or sNAN) now signal InvalidOperation instead of silently returning a true or false value depending on the comparison operator. Quiet NaN values (or NaN) are now hashable. (Fixed by Mark Dickinson; issue 7279.)
The difflib module now produces output that is more compatible with modern diff/patch tools through one small change, using a tab character instead of spaces as a separator in the header giving the filename. (Fixed by Anatoly Techtonik; issue 7585.)
The Distutils sdist command now always regenerates the MANIFEST file, since even if the MANIFEST.in or setup.py files haven’t been modified, the user might have created some new files that should be included. (Fixed by Tarek Ziadé; issue 8688.)
The doctest module’s IGNORE_EXCEPTION_DETAIL flag will now ignore the name of the module containing the exception being tested. (Patch by Lennart Regebro; issue 7490.)
The email module’s Message class will now accept a Unicode-valued payload, automatically converting the payload to the encoding specified by output_charset. (Added by R. David Murray; issue 1368247.)
The Fraction class now accepts a single float or Decimal instance, or two rational numbers, as arguments to its constructor. (Implemented by Mark Dickinson; rationals added in issue 5812, and float/decimal in issue 8294.)

Ordering comparisons (<, <=, >, >=) between fractions and complex numbers now raise a TypeError. This fixes an oversight, making the Fraction match the other numeric types.
New class: FTP_TLS in the ftplib module provides secure FTP connections using TLS encapsulation of authentication as well as subsequent control and data transfers. (Contributed by Giampaolo Rodola; issue 2054.)

The storbinary() method for binary uploads can now restart uploads thanks to an added rest parameter (patch by Pablo Mouzo; issue 6845.)
New class decorator: total_ordering() in the functools module takes a class that defines an __eq__() method and one of __lt__(), __le__(), __gt__(), or __ge__(), and generates the missing comparison methods. Since the __cmp__() method is being deprecated in Python 3.x, this decorator makes it easier to define ordered classes. (Added by Raymond Hettinger; issue 5479.)

New function: cmp_to_key() will take an old-style comparison function that expects two arguments and return a new callable that can be used as the key parameter to functions such as sorted(), min() and max(), etc. The primary intended use is to help with making code compatible with Python 3.x. (Added by Raymond Hettinger.)
New function: the gc module’s is_tracked() returns true if a given instance is tracked by the garbage collector, false otherwise. (Contributed by Antoine Pitrou; issue 4688.)
The gzip module’s GzipFile now supports the context management protocol, so you can write with gzip.GzipFile(...) as f: (contributed by Hagen Fürstenau; issue 3860), and it now implements the io.BufferedIOBase ABC, so you can wrap it with io.BufferedReader for faster processing (contributed by Nir Aides; issue 7471). It’s also now possible to override the modification time recorded in a gzipped file by providing an optional timestamp to the constructor. (Contributed by Jacques Frechet; issue 4272.)

Files in gzip format can be padded with trailing zero bytes; the gzip module will now consume these trailing bytes. (Fixed by Tadek Pietraszek and Brian Curtin; issue 2846.)
New attribute: the hashlib module now has an algorithms attribute containing a tuple naming the supported algorithms. In Python 2.7, hashlib.algorithms contains ('md5', 'sha1', 'sha224', 'sha256', 'sha384', 'sha512'). (Contributed by Carl Chenet; issue 7418.)
The default HTTPResponse class used by the httplib module now supports buffering, resulting in much faster reading of HTTP responses. (Contributed by Kristján Valur Jónsson; issue 4879.)

The HTTPConnection and HTTPSConnection classes now support a source_address parameter, a (host, port) 2-tuple giving the source address that will be used for the connection. (Contributed by Eldon Ziegler; issue 3972.)
The ihooks module now supports relative imports. Note that ihooks is an older module for customizing imports, superseded by the imputil module added in Python 2.0. (Relative import support added by Neil Schemenauer.)
The imaplib module now supports IPv6 addresses. (Contributed by Derek Morr; issue 1655.)

New function: the inspect module’s getcallargs() takes a callable and its positional and keyword arguments, and figures out which of the callable’s parameters will receive each argument, returning a dictionary mapping argument names to their values. For example:

>>> from inspect import getcallargs
>>> def f(a, b=1, *pos, **named):
...     pass
>>> getcallargs(f, 1, 2, 3)
{'a': 1, 'b': 2, 'pos': (3,), 'named': {}}
>>> getcallargs(f, a=2, x=4)
{'a': 2, 'b': 1, 'pos': (), 'named': {'x': 4}}
>>> getcallargs(f)
Traceback (most recent call last):
...
TypeError: f() takes at least 1 argument (0 given)

Contributed by George Sakkis; issue 3135.

Updated module: The io library has been upgraded to the version shipped with Python 3.1. For 3.1, the I/O library was entirely rewritten in C and is 2 to 20 times faster depending on the task being performed. The original Python version was renamed to the _pyio module.

One minor resulting change: the io.TextIOBase class now has an errors attribute giving the error setting used for encoding and decoding errors (one of 'strict', 'replace', 'ignore').

The io.FileIO class now raises an OSError when passed an invalid file descriptor. (Implemented by Benjamin Peterson; issue 4991.) The truncate() method now preserves the file position; previously it would change the file position to the end of the new file. (Fixed by Pascal Chambon; issue 6939.)
New function: itertools.compress(data, selectors) takes two iterators. Elements of data are returned if the corresponding value in selectors is true:
```
itertools.compress('ABCDEF', [1,0,1,0,1,1]) =>
  A, C, E, F
```
New function: itertools.combinations_with_replacement(iter, r) returns all the possible r-length combinations of elements from the iterable iter. Unlike combinations(), individual elements can be repeated in the generated combinations:
```
itertools.combinations_with_replacement('abc', 2) =>
  ('a', 'a'), ('a', 'b'), ('a', 'c'),
  ('b', 'b'), ('b', 'c'), ('c', 'c')
```
Note that elements are treated as unique depending on their position in the input, not their actual values.

The itertools.count() function now has a step argument that allows incrementing by values other than 1. count() also now allows keyword arguments, and using non-integer values such as floats or Decimal instances. (Implemented by Raymond Hettinger; issue 5032.)

itertools.combinations() and itertools.product() previously raised ValueError for values of r larger than the input iterable. This was deemed a specification error, so they now return an empty iterator. (Fixed by Raymond Hettinger; issue 4816.)
Updated module: The json module was upgraded to version 2.0.9 of the simplejson package, which includes a C extension that makes encoding and decoding faster. (Contributed by Bob Ippolito; issue 4136.)

To support the new collections.OrderedDict type, json.load() now has an optional object_pairs_hook parameter that will be called with any object literal that decodes to a list of pairs. (Contributed by Raymond Hettinger; issue 5381.)
The mailbox module’s Maildir class now records the timestamp on the directories it reads, and only re-reads them if the modification time has subsequently changed. This improves performance by avoiding unneeded directory scans. (Fixed by A.M. Kuchling and Antoine Pitrou; issue 1607951, issue 6896.)
New functions: the math module gained erf() and erfc() for the error function and the complementary error function, expm1() which computes e**x - 1 with more precision than using exp() and subtracting 1, gamma() for the Gamma function, and lgamma() for the natural log of the Gamma function. (Contributed by Mark Dickinson and nirinA raseliarison; issue 3366.)
The multiprocessing module’s Manager* classes can now be passed a callable that will be called whenever a subprocess is started, along with a set of arguments that will be passed to the callable. (Contributed by lekma; issue 5585.)

The Pool class, which controls a pool of worker processes, now has an optional maxtasksperchild parameter. Worker processes will perform the specified number of tasks and then exit, causing the Pool to start a new worker. This is useful if tasks may leak memory or other resources, or if some tasks will cause the worker to become very large. (Contributed by Charles Cazabon; issue 6963.)
The nntplib module now supports IPv6 addresses. (Contributed by Derek Morr; issue 1664.)
New functions: the os module wraps the following POSIX system calls: getresgid() and getresuid(), which return the real, effective, and saved GIDs and UIDs; setresgid() and setresuid(), which set real, effective, and saved GIDs and UIDs to new values; initgroups(), which initialize the group access list for the current process. (GID/UID functions contributed by Travis H.; issue 6508. Support for initgroups added by Jean-Paul Calderone; issue 7333.)

The os.fork() function now re-initializes the import lock in the child process; this fixes problems on Solaris when fork() is called from a thread. (Fixed by Zsolt Cserna; issue 7242.)
In the os.path module, the normpath() and abspath() functions now preserve Unicode; if their input path is a Unicode string, the return value is also a Unicode string. (normpath() fixed by Matt Giuca in issue 5827; abspath() fixed by Ezio Melotti in issue 3426.)
The pydoc module now has help for the various symbols that Python uses. You can now do help('<<') or help('@'), for example. (Contributed by David Laban; issue 4739.)
The re module’s split(), sub(), and subn() now accept an optional flags argument, for consistency with the other functions in the module. (Added by Gregory P. Smith.)
New function: run_path() in the runpy module will execute the code at a provided path argument. path can be the path of a Python source file (example.py), a compiled bytecode file (example.pyc), a directory (./package/), or a zip archive (example.zip). If a directory or zip path is provided, it will be added to the front of sys.path and the module __main__ will be imported. It’s expected that the directory or zip contains a __main__.py; if it doesn’t, some other __main__.py might be imported from a location later in sys.path. This makes more of the machinery of runpy available to scripts that want to mimic the way Python’s command line processes an explicit path name. (Added by Nick Coghlan; issue 6816.)
New function: in the shutil module, make_archive() takes a filename, archive type (zip or tar-format), and a directory path, and creates an archive containing the directory’s contents. (Added by Tarek Ziadé.)

shutil‘s copyfile() and copytree() functions now raise a SpecialFileError exception when asked to copy a named pipe. Previously the code would treat named pipes like a regular file by opening them for reading, and this would block indefinitely. (Fixed by Antoine Pitrou; issue 3002.)
The signal module no longer re-installs the signal handler unless this is truly necessary, which fixes a bug that could make it impossible to catch the EINTR signal robustly. (Fixed by Charles-Francois Natali; issue 8354.)
New functions: in the site module, three new functions return various site- and user-specific paths. getsitepackages() returns a list containing all global site-packages directories, getusersitepackages() returns the path of the user’s site-packages directory, and getuserbase() returns the value of the USER_BASE environment variable, giving the path to a directory that can be used to store data. (Contributed by Tarek Ziadé; issue 6693.)

The site module now reports exceptions occurring when the sitecustomize module is imported, and will no longer catch and swallow the KeyboardInterrupt exception. (Fixed by Victor Stinner; issue 3137.)
The create_connection() function gained a source_address parameter, a (host, port) 2-tuple giving the source address that will be used for the connection. (Contributed by Eldon Ziegler; issue 3972.)

The recv_into() and recvfrom_into() methods will now write into objects that support the buffer API, most usefully the bytearray and memoryview objects. (Implemented by Antoine Pitrou; issue 8104.)
The SocketServer module’s TCPServer class now supports socket timeouts and disabling the Nagle algorithm. The disable_nagle_algorithm class attribute defaults to False; if overridden to be True, new request connections will have the TCP_NODELAY option set to prevent buffering many small sends into a single TCP packet. The timeout class attribute can hold a timeout in seconds that will be applied to the request socket; if no request is received within that time, handle_timeout() will be called and handle_request() will return. (Contributed by Kristján Valur Jónsson; issue 6192 and issue 6267.)
Updated module: the sqlite3 module has been updated to version 2.6.0 of the pysqlite package. Version 2.6.0 includes a number of bugfixes, and adds the ability to load SQLite extensions from shared libraries. Call the enable_load_extension(True) method to enable extensions, and then call load_extension() to load a particular shared library. (Updated by Gerhard Häring.)
The ssl module’s ssl.SSLSocket objects now support the buffer API, which fixed a test suite failure (fix by Antoine Pitrou; issue 7133) and automatically set OpenSSL’s SSL_MODE_AUTO_RETRY, which will prevent an error code being returned from recv() operations that trigger an SSL renegotiation (fix by Antoine Pitrou; issue 8222).

The ssl.wrap_socket() constructor function now takes a ciphers argument that’s a string listing the encryption algorithms to be allowed; the format of the string is described in the OpenSSL documentation. (Added by Antoine Pitrou; issue 8322.)

Another change makes the extension load all of OpenSSL’s ciphers and digest algorithms so that they’re all available. Some SSL certificates couldn’t be verified, reporting an “unknown algorithm” error. (Reported by Beda Kosata, and fixed by Antoine Pitrou; issue 8484.)

The version of OpenSSL being used is now available as the module attributes ssl.OPENSSL_VERSION (a string), ssl.OPENSSL_VERSION_INFO (a 5-tuple), and ssl.OPENSSL_VERSION_NUMBER (an integer). (Added by Antoine Pitrou; issue 8321.)
The struct module will no longer silently ignore overflow errors when a value is too large for a particular integer format code (one of bBhHiIlLqQ); it now always raises a struct.error exception. (Changed by Mark Dickinson; issue 1523.) The pack() function will also attempt to use __index__() to convert and pack non-integers before trying the __int__() method or reporting an error. (Changed by Mark Dickinson; issue 8300.)
New function: the subprocess module’s check_output() runs a command with a specified set of arguments and returns the command’s output as a string when the command runs without error, or raises a CalledProcessError exception otherwise.
```
>>> subprocess.check_output(['df', '-h', '.'])
'Filesystem     Size   Used  Avail Capacity  Mounted on\n
/dev/disk0s2    52G    49G   3.0G    94%    /\n'

>>> subprocess.check_output(['df', '-h', '/bogus'])
  ...
subprocess.CalledProcessError: Command '['df', '-h', '/bogus']' returned non-zero exit status 1
```
(Contributed by Gregory P. Smith.)

The subprocess module will now retry its internal system calls on receiving an EINTR signal. (Reported by several people; final patch by Gregory P. Smith in issue 1068268.)
New function: is_declared_global() in the symtable module returns true for variables that are explicitly declared to be global, false for ones that are implicitly global. (Contributed by Jeremy Hylton.)
The syslog module will now use the value of sys.argv[0] as the identifier instead of the previous default value of 'python'. (Changed by Sean Reifschneider; issue 8451.)
The sys.version_info value is now a named tuple, with attributes named major, minor, micro, releaselevel, and serial. (Contributed by Ross Light; issue 4285.)

sys.getwindowsversion() also returns a named tuple, with attributes named major, minor, build, platform, service_pack, service_pack_major, service_pack_minor, suite_mask, and product_type. (Contributed by Brian Curtin; issue 7766.)
The tarfile module’s default error handling has changed, to no longer suppress fatal errors. The default error level was previously 0, which meant that errors would only result in a message being written to the debug log, but because the debug log is not activated by default, these errors go unnoticed. The default error level is now 1, which raises an exception if there’s an error. (Changed by Lars Gustäbel; issue 7357.)

tarfile now supports filtering the TarInfo objects being added to a tar file. When you call add(), you may supply an optional filter argument that’s a callable. The filter callable will be passed the TarInfo for every file being added, and can modify and return it. If the callable returns None, the file will be excluded from the resulting archive. This is more powerful than the existing exclude argument, which has therefore been deprecated. (Added by Lars Gustäbel; issue 6856.) The TarFile class also now supports the context manager protocol. (Added by Lars Gustäbel; issue 7232.)
The wait() method of the threading.Event class now returns the internal flag on exit. This means the method will usually return true because wait() is supposed to block until the internal flag becomes true. The return value will only be false if a timeout was provided and the operation timed out. (Contributed by Tim Lesher; issue 1674032.)
The Unicode database provided by the unicodedata module is now used internally to determine which characters are numeric, whitespace, or represent line breaks. The database also includes information from the Unihan.txt data file (patch by Anders Chrigström and Amaury Forgeot d’Arc; issue 1571184) and has been updated to version 5.2.0 (updated by Florent Xicluna; issue 8024).
The urlparse module’s urlsplit() now handles unknown URL schemes in a fashion compliant with RFC 3986: if the URL is of the form "<something>://...", the text before the :// is treated as the scheme, even if it’s a made-up scheme that the module doesn’t know about. This change may break code that worked around the old behaviour. For example, Python 2.6.4 or 2.5 will return the following:
```
>>> import urlparse
>>> urlparse.urlsplit('invented://host/filename?query')
('invented', '', '//host/filename?query', '', '')
```
Python 2.7 (and Python 2.6.5) will return:
```
>>> import urlparse
>>> urlparse.urlsplit('invented://host/filename?query')
('invented', 'host', '/filename?query', '', '')
```
(Python 2.7 actually produces slightly different output, since it returns a named tuple instead of a standard tuple.)

The urlparse module also supports IPv6 literal addresses as defined by RFC 2732 (contributed by Senthil Kumaran; issue 2987).
```
>>> urlparse.urlparse('http://[1080::8:800:200C:417A]/foo')
ParseResult(scheme='http', netloc='[1080::8:800:200C:417A]',
            path='/foo', params='', query='', fragment='')
```
New class: the WeakSet class in the weakref module is a set that only holds weak references to its elements; elements will be removed once there are no references pointing to them. (Originally implemented in Python 3.x by Raymond Hettinger, and backported to 2.7 by Michael Foord.)
The ElementTree library, xml.etree, no longer escapes ampersands and angle brackets when outputting an XML processing instruction (which looks like <?xml-stylesheet href="#style1"?>) or comment (which looks like ). (Patch by Neil Muller; issue 2746.)
The XML-RPC client and server, provided by the xmlrpclib and SimpleXMLRPCServer modules, have improved performance by supporting HTTP/1.1 keep-alive and by optionally using gzip encoding to compress the XML being exchanged. The gzip compression is controlled by the encode_threshold attribute of SimpleXMLRPCRequestHandler, which contains a size in bytes; responses larger than this will be compressed. (Contributed by Kristján Valur Jónsson; issue 6267.)
The zipfile module’s ZipFile now supports the context management protocol, so you can write with zipfile.ZipFile(...) as f:. (Contributed by Brian Curtin; issue 5511.)

zipfile now also supports archiving empty directories and extracts them correctly. (Fixed by Kuba Wieczorek; issue 4710.) Reading files out of an archive is faster, and interleaving read() and readline() now works correctly. (Contributed by Nir Aides; issue 7610.)

The is_zipfile() function now accepts a file object, in addition to the path names accepted in earlier versions. (Contributed by Gabriel Genellina; issue 4756.)

The writestr() method now has an optional compress_type parameter that lets you override the default compression method specified in the ZipFile constructor. (Contributed by Ronald Oussoren; issue 6003.)

New module: importlib¶

Python 3.1 includes the importlib package, a re-implementation of the logic underlying Python’s import statement. importlib is useful for implementors of Python interpreters and to users who wish to write new importers that can participate in the import process. Python 2.7 doesn’t contain the complete importlib package, but instead has a tiny subset that contains a single function, import_module().

import_module(name, package=None) imports a module. name is a string containing the module or package’s name. It’s possible to do relative imports by providing a string that begins with a . character, such as ..utils.errors. For relative imports, the package argument must be provided and is the name of the package that will be used as the anchor for the relative import. import_module() both inserts the imported module into sys.modules and returns the module object.

Here are some examples:

>>> from importlib import import_module
>>> anydbm = import_module('anydbm')  # Standard absolute import
>>> anydbm
<module 'anydbm' from '/p/python/Lib/anydbm.py'>
>>> # Relative import
>>> file_util = import_module('..file_util', 'distutils.command')
>>> file_util
<module 'distutils.file_util' from '/python/Lib/distutils/file_util.pyc'>

importlib was implemented by Brett Cannon and introduced in Python 3.1.

New module: sysconfig¶

The sysconfig module has been pulled out of the Distutils package, becoming a new top-level module in its own right. sysconfig provides functions for getting information about Python’s build process: compiler switches, installation paths, the platform name, and whether Python is running from its source directory.

Some of the functions in the module are:

get_config_var() returns variables from Python’s Makefile and the pyconfig.h file.
get_config_vars() returns a dictionary containing all of the configuration variables.
getpath() returns the configured path for a particular type of module: the standard library, site-specific modules, platform-specific modules, etc.
is_python_build() returns true if you’re running a binary from a Python source tree, and false otherwise.

Consult the sysconfig documentation for more details and for a complete list of functions.

The Distutils package and sysconfig are now maintained by Tarek Ziadé, who has also started a Distutils2 package (source repository at http://hg.python.org/distutils2/) for developing a next-generation version of Distutils.

ttk: Themed Widgets for Tk¶

Tcl/Tk 8.5 includes a set of themed widgets that re-implement basic Tk widgets but have a more customizable appearance and can therefore more closely resemble the native platform’s widgets. This widget set was originally called Tile, but was renamed to Ttk (for “themed Tk”) on being added to Tcl/Tck release 8.5.

To learn more, read the ttk module documentation. You may also wish to read the Tcl/Tk manual page describing the Ttk theme engine, available at http://www.tcl.tk/man/tcl8.5/TkCmd/ttk_intro.htm. Some screenshots of the Python/Ttk code in use are at http://code.google.com/p/python-ttk/wiki/Screenshots.

The ttk module was written by Guilherme Polo and added in issue 2983. An alternate version called Tile.py, written by Martin Franklin and maintained by Kevin Walzer, was proposed for inclusion in issue 2618, but the authors argued that Guilherme Polo’s work was more comprehensive.

Updated module: unittest¶

The unittest module was greatly enhanced; many new features were added. Most of these features were implemented by Michael Foord, unless otherwise noted. The enhanced version of the module is downloadable separately for use with Python versions 2.4 to 2.6, packaged as the unittest2 package, from http://pypi.python.org/pypi/unittest2.

When used from the command line, the module can automatically discover tests. It’s not as fancy as py.test or nose, but provides a simple way to run tests kept within a set of package directories. For example, the following command will search the test/ subdirectory for any importable test files named test*.py:

python -m unittest discover -s test

Consult the unittest module documentation for more details. (Developed in issue 6001.)

The main() function supports some other new options:

-b or --buffer will buffer the standard output and standard error streams during each test. If the test passes, any resulting output will be discarded; on failure, the buffered output will be displayed.
-c or --catch will cause the control-C interrupt to be handled more gracefully. Instead of interrupting the test process immediately, the currently running test will be completed and then the partial results up to the interruption will be reported. If you’re impatient, a second press of control-C will cause an immediate interruption.

This control-C handler tries to avoid causing problems when the code being tested or the tests being run have defined a signal handler of their own, by noticing that a signal handler was already set and calling it. If this doesn’t work for you, there’s a removeHandler() decorator that can be used to mark tests that should have the control-C handling disabled.
-f or --failfast makes test execution stop immediately when a test fails instead of continuing to execute further tests. (Suggested by Cliff Dyer and implemented by Michael Foord; issue 8074.)

The progress messages now show ‘x’ for expected failures and ‘u’ for unexpected successes when run in verbose mode. (Contributed by Benjamin Peterson.)

Test cases can raise the SkipTest exception to skip a test (issue 1034053).

The error messages for assertEqual(), assertTrue(), and assertFalse() failures now provide more information. If you set the longMessage attribute of your TestCase classes to True, both the standard error message and any additional message you provide will be printed for failures. (Added by Michael Foord; issue 5663.)

The assertRaises() method now returns a context handler when called without providing a callable object to run. For example, you can write this:

with self.assertRaises(KeyError):
    {}['foo']

(Implemented by Antoine Pitrou; issue 4444.)

Module- and class-level setup and teardown fixtures are now supported. Modules can contain setUpModule() and tearDownModule() functions. Classes can have setUpClass() and tearDownClass() methods that must be defined as class methods (using @classmethod or equivalent). These functions and methods are invoked when the test runner switches to a test case in a different module or class.

The methods addCleanup() and doCleanups() were added. addCleanup() lets you add cleanup functions that will be called unconditionally (after setUp() if setUp() fails, otherwise after tearDown()). This allows for much simpler resource allocation and deallocation during tests (issue 5679).

A number of new methods were added that provide more specialized tests. Many of these methods were written by Google engineers for use in their test suites; Gregory P. Smith, Michael Foord, and GvR worked on merging them into Python’s version of unittest.

assertIsNone() and assertIsNotNone() take one expression and verify that the result is or is not None.
assertIs() and assertIsNot() take two values and check whether the two values evaluate to the same object or not. (Added by Michael Foord; issue 2578.)
assertIsInstance() and assertNotIsInstance() check whether the resulting object is an instance of a particular class, or of one of a tuple of classes. (Added by Georg Brandl; issue 7031.)
assertGreater(), assertGreaterEqual(), assertLess(), and assertLessEqual() compare two quantities.
assertMultiLineEqual() compares two strings, and if they’re not equal, displays a helpful comparison that highlights the differences in the two strings. This comparison is now used by default when Unicode strings are compared with assertEqual().
assertRegexpMatches() and assertNotRegexpMatches() checks whether the first argument is a string matching or not matching the regular expression provided as the second argument (issue 8038).
assertRaisesRegexp() checks whether a particular exception is raised, and then also checks that the string representation of the exception matches the provided regular expression.
assertIn() and assertNotIn() tests whether first is or is not in second.
assertItemsEqual() tests whether two provided sequences contain the same elements.
assertSetEqual() compares whether two sets are equal, and only reports the differences between the sets in case of error.
Similarly, assertListEqual() and assertTupleEqual() compare the specified types and explain any differences without necessarily printing their full values; these methods are now used by default when comparing lists and tuples using assertEqual(). More generally, assertSequenceEqual() compares two sequences and can optionally check whether both sequences are of a particular type.
assertDictEqual() compares two dictionaries and reports the differences; it’s now used by default when you compare two dictionaries using assertEqual(). assertDictContainsSubset() checks whether all of the key/value pairs in first are found in second.
assertAlmostEqual() and assertNotAlmostEqual() test whether first and second are approximately equal. This method can either round their difference to an optionally-specified number of places (the default is 7) and compare it to zero, or require the difference to be smaller than a supplied delta value.
loadTestsFromName() properly honors the suiteClass attribute of the TestLoader. (Fixed by Mark Roddy; issue 6866.)
A new hook lets you extend the assertEqual() method to handle new data types. The addTypeEqualityFunc() method takes a type object and a function. The function will be used when both of the objects being compared are of the specified type. This function should compare the two objects and raise an exception if they don’t match; it’s a good idea for the function to provide additional information about why the two objects aren’t matching, much as the new sequence comparison methods do.

unittest.main() now takes an optional exit argument. If False, main() doesn’t call sys.exit(), allowing main() to be used from the interactive interpreter. (Contributed by J. Pablo Fernández; issue 3379.)

TestResult has new startTestRun() and stopTestRun() methods that are called immediately before and after a test run. (Contributed by Robert Collins; issue 5728.)

With all these changes, the unittest.py was becoming awkwardly large, so the module was turned into a package and the code split into several files (by Benjamin Peterson). This doesn’t affect how the module is imported or used.

See also

http://www.voidspace.org.uk/python/articles/unittest2.shtml: Describes the new features, how to use them, and the rationale for various design decisions. (By Michael Foord.)

Updated module: ElementTree 1.3¶

The version of the ElementTree library included with Python was updated to version 1.3. Some of the new features are:

The various parsing functions now take a parser keyword argument giving an XMLParser instance that will be used. This makes it possible to override the file’s internal encoding:
```
p = ET.XMLParser(encoding='utf-8')
t = ET.XML("""<root/>""", parser=p)
```
Errors in parsing XML now raise a ParseError exception, whose instances have a position attribute containing a (line, column) tuple giving the location of the problem.
ElementTree’s code for converting trees to a string has been significantly reworked, making it roughly twice as fast in many cases. The ElementTree write() and Element write() methods now have a method parameter that can be “xml” (the default), “html”, or “text”. HTML mode will output empty elements as <empty></empty> instead of <empty/>, and text mode will skip over elements and only output the text chunks. If you set the tag attribute of an element to None but leave its children in place, the element will be omitted when the tree is written out, so you don’t need to do more extensive rearrangement to remove a single element.

Namespace handling has also been improved. All xmlns:<whatever> declarations are now output on the root element, not scattered throughout the resulting XML. You can set the default namespace for a tree by setting the default_namespace attribute and can register new prefixes with register_namespace(). In XML mode, you can use the true/false xml_declaration parameter to suppress the XML declaration.

New Element method: extend() appends the items from a sequence to the element’s children. Elements themselves behave like sequences, so it’s easy to move children from one element to another:

from xml.etree import ElementTree as ET

t = ET.XML("""<list>
  <item>1</item> <item>2</item>  <item>3</item>
</list>""")
new = ET.XML('<root/>')
new.extend(t)

# Outputs <root><item>1</item>...</root>
print ET.tostring(new)

New Element method: iter() yields the children of the element as a generator. It’s also possible to write for child in elem: to loop over an element’s children. The existing method getiterator() is now deprecated, as is getchildren() which constructs and returns a list of children.

New Element method: itertext() yields all chunks of text that are descendants of the element. For example:

t = ET.XML("""<list>
  <item>1</item> <item>2</item>  <item>3</item>
</list>""")

# Outputs ['\n  ', '1', ' ', '2', '  ', '3', '\n']
print list(t.itertext())

Deprecated: using an element as a Boolean (i.e., if elem:) would return true if the element had any children, or false if there were no children. This behaviour is confusing – None is false, but so is a childless element? – so it will now trigger a FutureWarning. In your code, you should be explicit: write len(elem) != 0 if you’re interested in the number of children, or elem is not None.

Fredrik Lundh develops ElementTree and produced the 1.3 version; you can read his article describing 1.3 at http://effbot.org/zone/elementtree-13-intro.htm. Florent Xicluna updated the version included with Python, after discussions on python-dev and in issue 6472.)

Build and C API Changes¶

Changes to Python’s build process and to the C API include:

The latest release of the GNU Debugger, GDB 7, can be scripted using Python. When you begin debugging an executable program P, GDB will look for a file named P-gdb.py and automatically read it. Dave Malcolm contributed a python-gdb.py that adds a number of commands useful when debugging Python itself. For example, py-up and py-down go up or down one Python stack frame, which usually corresponds to several C stack frames. py-print prints the value of a Python variable, and py-bt prints the Python stack trace. (Added as a result of issue 8032.)
If you use the .gdbinit file provided with Python, the “pyo” macro in the 2.7 version now works correctly when the thread being debugged doesn’t hold the GIL; the macro now acquires it before printing. (Contributed by Victor Stinner; issue 3632.)
Py_AddPendingCall() is now thread-safe, letting any worker thread submit notifications to the main Python thread. This is particularly useful for asynchronous IO operations. (Contributed by Kristján Valur Jónsson; issue 4293.)
New function: PyCode_NewEmpty() creates an empty code object; only the filename, function name, and first line number are required. This is useful for extension modules that are attempting to construct a more useful traceback stack. Previously such extensions needed to call PyCode_New(), which had many more arguments. (Added by Jeffrey Yasskin.)
New function: PyErr_NewExceptionWithDoc() creates a new exception class, just as the existing PyErr_NewException() does, but takes an extra char * argument containing the docstring for the new exception class. (Added by ‘lekma’ on the Python bug tracker; issue 7033.)
New function: PyFrame_GetLineNumber() takes a frame object and returns the line number that the frame is currently executing. Previously code would need to get the index of the bytecode instruction currently executing, and then look up the line number corresponding to that address. (Added by Jeffrey Yasskin.)
New functions: PyLong_AsLongAndOverflow() and PyLong_AsLongLongAndOverflow() approximates a Python long integer as a C long or long long. If the number is too large to fit into the output type, an overflow flag is set and returned to the caller. (Contributed by Case Van Horsen; issue 7528 and issue 7767.)
New function: stemming from the rewrite of string-to-float conversion, a new PyOS_string_to_double() function was added. The old PyOS_ascii_strtod() and PyOS_ascii_atof() functions are now deprecated.
New function: PySys_SetArgvEx() sets the value of sys.argv and can optionally update sys.path to include the directory containing the script named by sys.argv[0] depending on the value of an updatepath parameter.

This function was added to close a security hole for applications that embed Python. The old function, PySys_SetArgv(), would always update sys.path, and sometimes it would add the current directory. This meant that, if you ran an application embedding Python in a directory controlled by someone else, attackers could put a Trojan-horse module in the directory (say, a file named os.py) that your application would then import and run.

If you maintain a C/C++ application that embeds Python, check whether you’re calling PySys_SetArgv() and carefully consider whether the application should be using PySys_SetArgvEx() with updatepath set to false.

Security issue reported as CVE-2008-5983; discussed in issue 5753, and fixed by Antoine Pitrou.
New macros: the Python header files now define the following macros: Py_ISALNUM, Py_ISALPHA, Py_ISDIGIT, Py_ISLOWER, Py_ISSPACE, Py_ISUPPER, Py_ISXDIGIT, and Py_TOLOWER, Py_TOUPPER. All of these functions are analogous to the C standard macros for classifying characters, but ignore the current locale setting, because in several places Python needs to analyze characters in a locale-independent way. (Added by Eric Smith; issue 5793.)
Removed function: PyEval_CallObject is now only available as a macro. A function version was being kept around to preserve ABI linking compatibility, but that was in 1997; it can certainly be deleted by now. (Removed by Antoine Pitrou; issue 8276.)
New format codes: the PyFormat_FromString(), PyFormat_FromStringV(), and PyErr_Format() functions now accept %lld and %llu format codes for displaying C’s long long types. (Contributed by Mark Dickinson; issue 7228.)
The complicated interaction between threads and process forking has been changed. Previously, the child process created by os.fork() might fail because the child is created with only a single thread running, the thread performing the os.fork(). If other threads were holding a lock, such as Python’s import lock, when the fork was performed, the lock would still be marked as “held” in the new process. But in the child process nothing would ever release the lock, since the other threads weren’t replicated, and the child process would no longer be able to perform imports.

Python 2.7 acquires the import lock before performing an os.fork(), and will also clean up any locks created using the threading module. C extension modules that have internal locks, or that call fork() themselves, will not benefit from this clean-up.

(Fixed by Thomas Wouters; issue 1590864.)
The Py_Finalize() function now calls the internal threading._shutdown() function; this prevents some exceptions from being raised when an interpreter shuts down. (Patch by Adam Olsen; issue 1722344.)
When using the PyMemberDef structure to define attributes of a type, Python will no longer let you try to delete or set a T_STRING_INPLACE attribute.
Global symbols defined by the ctypes module are now prefixed with Py, or with _ctypes. (Implemented by Thomas Heller; issue 3102.)
New configure option: the --with-system-expat switch allows building the pyexpat module to use the system Expat library. (Contributed by Arfrever Frehtes Taifersar Arahesis; issue 7609.)
New configure option: the --with-valgrind option will now disable the pymalloc allocator, which is difficult for the Valgrind memory-error detector to analyze correctly. Valgrind will therefore be better at detecting memory leaks and overruns. (Contributed by James Henstridge; issue 2422.)
New configure option: you can now supply an empty string to --with-dbmliborder= in order to disable all of the various DBM modules. (Added by Arfrever Frehtes Taifersar Arahesis; issue 6491.)
The configure script now checks for floating-point rounding bugs on certain 32-bit Intel chips and defines a X87_DOUBLE_ROUNDING preprocessor definition. No code currently uses this definition, but it’s available if anyone wishes to use it. (Added by Mark Dickinson; issue 2937.)

configure also now sets a LDCXXSHARED Makefile variable for supporting C++ linking. (Contributed by Arfrever Frehtes Taifersar Arahesis; issue 1222585.)
The build process now creates the necessary files for pkg-config support. (Contributed by Clinton Roy; issue 3585.)
The build process now supports Subversion 1.7. (Contributed by Arfrever Frehtes Taifersar Arahesis; issue 6094.)

Capsules¶

Python 3.1 adds a new C datatype, PyCapsule, for providing a C API to an extension module. A capsule is essentially the holder of a C void * pointer, and is made available as a module attribute; for example, the socket module’s API is exposed as socket.CAPI, and unicodedata exposes ucnhash_CAPI. Other extensions can import the module, access its dictionary to get the capsule object, and then get the void * pointer, which will usually point to an array of pointers to the module’s various API functions.

There is an existing data type already used for this, PyCObject, but it doesn’t provide type safety. Evil code written in pure Python could cause a segmentation fault by taking a PyCObject from module A and somehow substituting it for the PyCObject in module B. Capsules know their own name, and getting the pointer requires providing the name:

void *vtable;

if (!PyCapsule_IsValid(capsule, "mymodule.CAPI") {
        PyErr_SetString(PyExc_ValueError, "argument type invalid");
        return NULL;
}

vtable = PyCapsule_GetPointer(capsule, "mymodule.CAPI");

You are assured that vtable points to whatever you’re expecting. If a different capsule was passed in, PyCapsule_IsValid() would detect the mismatched name and return false. Refer to Providing a C API for an Extension Module for more information on using these objects.

Python 2.7 now uses capsules internally to provide various extension-module APIs, but the PyCObject_AsVoidPtr() was modified to handle capsules, preserving compile-time compatibility with the CObject interface. Use of PyCObject_AsVoidPtr() will signal a PendingDeprecationWarning, which is silent by default.

Implemented in Python 3.1 and backported to 2.7 by Larry Hastings; discussed in issue 5630.

Port-Specific Changes: Windows¶

The msvcrt module now contains some constants from the crtassem.h header file: CRT_ASSEMBLY_VERSION, VC_ASSEMBLY_PUBLICKEYTOKEN, and LIBRARIES_ASSEMBLY_NAME_PREFIX. (Contributed by David Cournapeau; issue 4365.)
The _winreg module for accessing the registry now implements the CreateKeyEx() and DeleteKeyEx() functions, extended versions of previously-supported functions that take several extra arguments. The DisableReflectionKey(), EnableReflectionKey(), and QueryReflectionKey() were also tested and documented. (Implemented by Brian Curtin: issue 7347.)
The new _beginthreadex() API is used to start threads, and the native thread-local storage functions are now used. (Contributed by Kristján Valur Jónsson; issue 3582.)
The os.kill() function now works on Windows. The signal value can be the constants CTRL_C_EVENT, CTRL_BREAK_EVENT, or any integer. The first two constants will send Control-C and Control-Break keystroke events to subprocesses; any other value will use the TerminateProcess() API. (Contributed by Miki Tebeka; issue 1220212.)
The os.listdir() function now correctly fails for an empty path. (Fixed by Hirokazu Yamamoto; issue 5913.)
The mimelib module will now read the MIME database from the Windows registry when initializing. (Patch by Gabriel Genellina; issue 4969.)

Port-Specific Changes: Mac OS X¶

The path /Library/Python/2.7/site-packages is now appended to sys.path, in order to share added packages between the system installation and a user-installed copy of the same version. (Changed by Ronald Oussoren; issue 4865.)

Port-Specific Changes: FreeBSD¶

FreeBSD 7.1’s SO_SETFIB constant, used with getsockopt()/setsockopt() to select an alternate routing table, is now available in the socket module. (Added by Kyle VanderBeek; issue 8235.)

Other Changes and Fixes¶

Two benchmark scripts, iobench and ccbench, were added to the Tools directory. iobench measures the speed of the built-in file I/O objects returned by open() while performing various operations, and ccbench is a concurrency benchmark that tries to measure computing throughput, thread switching latency, and IO processing bandwidth when performing several tasks using a varying number of threads.
The Tools/i18n/msgfmt.py script now understands plural forms in .po files. (Fixed by Martin von Löwis; issue 5464.)
When importing a module from a .pyc or .pyo file with an existing .py counterpart, the co_filename attributes of the resulting code objects are overwritten when the original filename is obsolete. This can happen if the file has been renamed, moved, or is accessed through different paths. (Patch by Ziga Seilnacht and Jean-Paul Calderone; issue 1180193.)
The regrtest.py script now takes a --randseed= switch that takes an integer that will be used as the random seed for the -r option that executes tests in random order. The -r option also reports the seed that was used (Added by Collin Winter.)
Another regrtest.py switch is -j, which takes an integer specifying how many tests run in parallel. This allows reducing the total runtime on multi-core machines. This option is compatible with several other options, including the -R switch which is known to produce long runtimes. (Added by Antoine Pitrou, issue 6152.) This can also be used with a new -F switch that runs selected tests in a loop until they fail. (Added by Antoine Pitrou; issue 7312.)
When executed as a script, the py_compile.py module now accepts '-' as an argument, which will read standard input for the list of filenames to be compiled. (Contributed by Piotr Ożarowski; issue 8233.)

Porting to Python 2.7¶

This section lists previously described changes and other bugfixes that may require changes to your code:

The range() function processes its arguments more consistently; it will now call __int__() on non-float, non-integer arguments that are supplied to it. (Fixed by Alexander Belopolsky; issue 1533.)
The string format() method changed the default precision used for floating-point and complex numbers from 6 decimal places to 12, which matches the precision used by str(). (Changed by Eric Smith; issue 5920.)
Because of an optimization for the with statement, the special methods __enter__() and __exit__() must belong to the object’s type, and cannot be directly attached to the object’s instance. This affects new-style classes (derived from object) and C extension types. (issue 6101.)
Due to a bug in Python 2.6, the exc_value parameter to __exit__() methods was often the string representation of the exception, not an instance. This was fixed in 2.7, so exc_value will be an instance as expected. (Fixed by Florent Xicluna; issue 7853.)
When a restricted set of attributes were set using __slots__, deleting an unset attribute would not raise AttributeError as you would expect. Fixed by Benjamin Peterson; issue 7604.)

In the standard library:

Operations with datetime instances that resulted in a year falling outside the supported range didn’t always raise OverflowError. Such errors are now checked more carefully and will now raise the exception. (Reported by Mark Leander, patch by Anand B. Pillai and Alexander Belopolsky; issue 7150.)
When using Decimal instances with a string’s format() method, the default alignment was previously left-alignment. This has been changed to right-alignment, which might change the output of your programs. (Changed by Mark Dickinson; issue 6857.)

Comparisons involving a signaling NaN value (or sNAN) now signal InvalidOperation instead of silently returning a true or false value depending on the comparison operator. Quiet NaN values (or NaN) are now hashable. (Fixed by Mark Dickinson; issue 7279.)
The ElementTree library, xml.etree, no longer escapes ampersands and angle brackets when outputting an XML processing instruction (which looks like <?xml-stylesheet href=”#style1”?>) or comment (which looks like <!– comment –>). (Patch by Neil Muller; issue 2746.)
The readline() method of StringIO objects now does nothing when a negative length is requested, as other file-like objects do. (issue 7348).
The syslog module will now use the value of sys.argv[0] as the identifier instead of the previous default value of 'python'. (Changed by Sean Reifschneider; issue 8451.)
The tarfile module’s default error handling has changed, to no longer suppress fatal errors. The default error level was previously 0, which meant that errors would only result in a message being written to the debug log, but because the debug log is not activated by default, these errors go unnoticed. The default error level is now 1, which raises an exception if there’s an error. (Changed by Lars Gustäbel; issue 7357.)
The urlparse module’s urlsplit() now handles unknown URL schemes in a fashion compliant with RFC 3986: if the URL is of the form "<something>://...", the text before the :// is treated as the scheme, even if it’s a made-up scheme that the module doesn’t know about. This change may break code that worked around the old behaviour. For example, Python 2.6.4 or 2.5 will return the following:
```
>>> import urlparse
>>> urlparse.urlsplit('invented://host/filename?query')
('invented', '', '//host/filename?query', '', '')
```
Python 2.7 (and Python 2.6.5) will return:
```
>>> import urlparse
>>> urlparse.urlsplit('invented://host/filename?query')
('invented', 'host', '/filename?query', '', '')
```
(Python 2.7 actually produces slightly different output, since it returns a named tuple instead of a standard tuple.)

For C extensions:

C extensions that use integer format codes with the PyArg_Parse* family of functions will now raise a TypeError exception instead of triggering a DeprecationWarning (issue 5080).
Use the new PyOS_string_to_double() function instead of the old PyOS_ascii_strtod() and PyOS_ascii_atof() functions, which are now deprecated.

For applications that embed Python:

The PySys_SetArgvEx() function was added, letting applications close a security hole when the existing PySys_SetArgv() function was used. Check whether you’re calling PySys_SetArgv() and carefully consider whether the application should be using PySys_SetArgvEx() with updatepath set to false.

Acknowledgements¶

The author would like to thank the following people for offering suggestions, corrections and assistance with various drafts of this article: Nick Coghlan, Philip Jenvey, Ryan Lovett, R. David Murray, Hugh Secker-Walker.

What’s New in Python 2.6¶

Author:	A.M. Kuchling (amk at amk.ca)
Release:	3.2.2
Date:	August 02, 2015

This article explains the new features in Python 2.6, released on October 1 2008. The release schedule is described in PEP 361.

The major theme of Python 2.6 is preparing the migration path to Python 3.0, a major redesign of the language. Whenever possible, Python 2.6 incorporates new features and syntax from 3.0 while remaining compatible with existing code by not removing older features or syntax. When it’s not possible to do that, Python 2.6 tries to do what it can, adding compatibility functions in a future_builtins module and a -3 switch to warn about usages that will become unsupported in 3.0.

Some significant new packages have been added to the standard library, such as the multiprocessing and json modules, but there aren’t many new features that aren’t related to Python 3.0 in some way.

Python 2.6 also sees a number of improvements and bugfixes throughout the source. A search through the change logs finds there were 259 patches applied and 612 bugs fixed between Python 2.5 and 2.6. Both figures are likely to be underestimates.

This article doesn’t attempt to provide a complete specification of the new features, but instead provides a convenient overview. For full details, you should refer to the documentation for Python 2.6. If you want to understand the rationale for the design and implementation, refer to the PEP for a particular new feature. Whenever possible, “What’s New in Python” links to the bug/patch item for each change.

Python 3.0¶

The development cycle for Python versions 2.6 and 3.0 was synchronized, with the alpha and beta releases for both versions being made on the same days. The development of 3.0 has influenced many features in 2.6.

Python 3.0 is a far-ranging redesign of Python that breaks compatibility with the 2.x series. This means that existing Python code will need some conversion in order to run on Python 3.0. However, not all the changes in 3.0 necessarily break compatibility. In cases where new features won’t cause existing code to break, they’ve been backported to 2.6 and are described in this document in the appropriate place. Some of the 3.0-derived features are:

A __complex__() method for converting objects to a complex number.
Alternate syntax for catching exceptions: except TypeError as exc.
The addition of functools.reduce() as a synonym for the built-in reduce() function.

Python 3.0 adds several new built-in functions and changes the semantics of some existing builtins. Functions that are new in 3.0 such as bin() have simply been added to Python 2.6, but existing builtins haven’t been changed; instead, the future_builtins module has versions with the new 3.0 semantics. Code written to be compatible with 3.0 can do from future_builtins import hex, map as necessary.

A new command-line switch, -3, enables warnings about features that will be removed in Python 3.0. You can run code with this switch to see how much work will be necessary to port code to 3.0. The value of this switch is available to Python code as the boolean variable sys.py3kwarning, and to C extension code as Py_Py3kWarningFlag.

See also

The 3xxx series of PEPs, which contains proposals for Python 3.0. PEP 3000 describes the development process for Python 3.0. Start with PEP 3100 that describes the general goals for Python 3.0, and then explore the higher-numbered PEPS that propose specific features.

Changes to the Development Process¶

While 2.6 was being developed, the Python development process underwent two significant changes: we switched from SourceForge’s issue tracker to a customized Roundup installation, and the documentation was converted from LaTeX to reStructuredText.

New Issue Tracker: Roundup¶

For a long time, the Python developers had been growing increasingly annoyed by SourceForge’s bug tracker. SourceForge’s hosted solution doesn’t permit much customization; for example, it wasn’t possible to customize the life cycle of issues.

The infrastructure committee of the Python Software Foundation therefore posted a call for issue trackers, asking volunteers to set up different products and import some of the bugs and patches from SourceForge. Four different trackers were examined: Jira, Launchpad, Roundup, and Trac. The committee eventually settled on Jira and Roundup as the two candidates. Jira is a commercial product that offers no-cost hosted instances to free-software projects; Roundup is an open-source project that requires volunteers to administer it and a server to host it.

After posting a call for volunteers, a new Roundup installation was set up at http://bugs.python.org. One installation of Roundup can host multiple trackers, and this server now also hosts issue trackers for Jython and for the Python web site. It will surely find other uses in the future. Where possible, this edition of “What’s New in Python” links to the bug/patch item for each change.

Hosting of the Python bug tracker is kindly provided by Upfront Systems of Stellenbosch, South Africa. Martin von Loewis put a lot of effort into importing existing bugs and patches from SourceForge; his scripts for this import operation are at http://svn.python.org/view/tracker/importer/ and may be useful to other projects wishing to move from SourceForge to Roundup.

See also

http://bugs.python.org: The Python bug tracker.
http://bugs.jython.org:: The Jython bug tracker.
http://roundup.sourceforge.net/: Roundup downloads and documentation.
http://svn.python.org/view/tracker/importer/: Martin von Loewis’s conversion scripts.

New Documentation Format: reStructuredText Using Sphinx¶

The Python documentation was written using LaTeX since the project started around 1989. In the 1980s and early 1990s, most documentation was printed out for later study, not viewed online. LaTeX was widely used because it provided attractive printed output while remaining straightforward to write once the basic rules of the markup were learned.

Today LaTeX is still used for writing publications destined for printing, but the landscape for programming tools has shifted. We no longer print out reams of documentation; instead, we browse through it online and HTML has become the most important format to support. Unfortunately, converting LaTeX to HTML is fairly complicated and Fred L. Drake Jr., the long-time Python documentation editor, spent a lot of time maintaining the conversion process. Occasionally people would suggest converting the documentation into SGML and later XML, but performing a good conversion is a major task and no one ever committed the time required to finish the job.

During the 2.6 development cycle, Georg Brandl put a lot of effort into building a new toolchain for processing the documentation. The resulting package is called Sphinx, and is available from http://sphinx.pocoo.org/.

Sphinx concentrates on HTML output, producing attractively styled and modern HTML; printed output is still supported through conversion to LaTeX. The input format is reStructuredText, a markup syntax supporting custom extensions and directives that is commonly used in the Python community.

Sphinx is a standalone package that can be used for writing, and almost two dozen other projects (listed on the Sphinx web site) have adopted Sphinx as their documentation tool.

See also

编写 Python 的文档: Describes how to write for Python’s documentation.
Sphinx: Documentation and code for the Sphinx toolchain.
Docutils: The underlying reStructuredText parser and toolset.

PEP 343: The ‘with’ statement¶

The previous version, Python 2.5, added the ‘with‘ statement as an optional feature, to be enabled by a from __future__ import with_statement directive. In 2.6 the statement no longer needs to be specially enabled; this means that with is now always a keyword. The rest of this section is a copy of the corresponding section from the “What’s New in Python 2.5” document; if you’re familiar with the ‘with‘ statement from Python 2.5, you can skip this section.

The ‘with‘ statement clarifies code that previously would use try...finally blocks to ensure that clean-up code is executed. In this section, I’ll discuss the statement as it will commonly be used. In the next section, I’ll examine the implementation details and show how to write objects for use with this statement.

The ‘with‘ statement is a control-flow structure whose basic structure is:

with expression [as variable]:
    with-block

The expression is evaluated, and it should result in an object that supports the context management protocol (that is, has __enter__() and __exit__() methods).

The object’s __enter__() is called before with-block is executed and therefore can run set-up code. It also may return a value that is bound to the name variable, if given. (Note carefully that variable is not assigned the result of expression.)

After execution of the with-block is finished, the object’s __exit__() method is called, even if the block raised an exception, and can therefore run clean-up code.

Some standard Python objects now support the context management protocol and can be used with the ‘with‘ statement. File objects are one example:

with open('/etc/passwd', 'r') as f:
    for line in f:
        print line
        ... more processing code ...

After this statement has executed, the file object in f will have been automatically closed, even if the for loop raised an exception part- way through the block.

Note

In this case, f is the same object created by open(), because file.__enter__() returns self.

The threading module’s locks and condition variables also support the ‘with‘ statement:

lock = threading.Lock()
with lock:
    # Critical section of code
    ...

The lock is acquired before the block is executed and always released once the block is complete.

The localcontext() function in the decimal module makes it easy to save and restore the current decimal context, which encapsulates the desired precision and rounding characteristics for computations:

from decimal import Decimal, Context, localcontext

# Displays with default precision of 28 digits
v = Decimal('578')
print v.sqrt()

with localcontext(Context(prec=16)):
    # All code in this block uses a precision of 16 digits.
    # The original context is restored on exiting the block.
    print v.sqrt()

Writing Context Managers¶

Under the hood, the ‘with‘ statement is fairly complicated. Most people will only use ‘with‘ in company with existing objects and don’t need to know these details, so you can skip the rest of this section if you like. Authors of new objects will need to understand the details of the underlying implementation and should keep reading.

A high-level explanation of the context management protocol is:

The expression is evaluated and should result in an object called a “context manager”. The context manager must have __enter__() and __exit__() methods.
The context manager’s __enter__() method is called. The value returned is assigned to VAR. If no as VAR clause is present, the value is simply discarded.
The code in BLOCK is executed.
If BLOCK raises an exception, the context manager’s __exit__() method is called with three arguments, the exception details (type, value, traceback, the same values returned by sys.exc_info(), which can also be None if no exception occurred). The method’s return value controls whether an exception is re-raised: any false value re-raises the exception, and True will result in suppressing it. You’ll only rarely want to suppress the exception, because if you do the author of the code containing the ‘with‘ statement will never realize anything went wrong.
If BLOCK didn’t raise an exception, the __exit__() method is still called, but type, value, and traceback are all None.

Let’s think through an example. I won’t present detailed code but will only sketch the methods necessary for a database that supports transactions.

(For people unfamiliar with database terminology: a set of changes to the database are grouped into a transaction. Transactions can be either committed, meaning that all the changes are written into the database, or rolled back, meaning that the changes are all discarded and the database is unchanged. See any database textbook for more information.)

Let’s assume there’s an object representing a database connection. Our goal will be to let the user write code like this:

db_connection = DatabaseConnection()
with db_connection as cursor:
    cursor.execute('insert into ...')
    cursor.execute('delete from ...')
    # ... more operations ...

The transaction should be committed if the code in the block runs flawlessly or rolled back if there’s an exception. Here’s the basic interface for DatabaseConnection that I’ll assume:

class DatabaseConnection:
    # Database interface
    def cursor(self):
        "Returns a cursor object and starts a new transaction"
    def commit(self):
        "Commits current transaction"
    def rollback(self):
        "Rolls back current transaction"

The __enter__() method is pretty easy, having only to start a new transaction. For this application the resulting cursor object would be a useful result, so the method will return it. The user can then add as cursor to their ‘with‘ statement to bind the cursor to a variable name.

class DatabaseConnection:
    ...
    def __enter__(self):
        # Code to start a new transaction
        cursor = self.cursor()
        return cursor

The __exit__() method is the most complicated because it’s where most of the work has to be done. The method has to check if an exception occurred. If there was no exception, the transaction is committed. The transaction is rolled back if there was an exception.

In the code below, execution will just fall off the end of the function, returning the default value of None. None is false, so the exception will be re-raised automatically. If you wished, you could be more explicit and add a return statement at the marked location.

class DatabaseConnection:
    ...
    def __exit__(self, type, value, tb):
        if tb is None:
            # No exception, so commit
            self.commit()
        else:
            # Exception occurred, so rollback.
            self.rollback()
            # return False

The contextlib module¶

The contextlib module provides some functions and a decorator that are useful when writing objects for use with the ‘with‘ statement.

The decorator is called contextmanager(), and lets you write a single generator function instead of defining a new class. The generator should yield exactly one value. The code up to the yield will be executed as the __enter__() method, and the value yielded will be the method’s return value that will get bound to the variable in the ‘with‘ statement’s as clause, if any. The code after the yield will be executed in the __exit__() method. Any exception raised in the block will be raised by the yield statement.

Using this decorator, our database example from the previous section could be written as:

from contextlib import contextmanager

@contextmanager
def db_transaction(connection):
    cursor = connection.cursor()
    try:
        yield cursor
    except:
        connection.rollback()
        raise
    else:
        connection.commit()

db = DatabaseConnection()
with db_transaction(db) as cursor:
    ...

The contextlib module also has a nested(mgr1, mgr2, ...) function that combines a number of context managers so you don’t need to write nested ‘with‘ statements. In this example, the single ‘with‘ statement both starts a database transaction and acquires a thread lock:

lock = threading.Lock()
with nested (db_transaction(db), lock) as (cursor, locked):
    ...

Finally, the closing() function returns its argument so that it can be bound to a variable, and calls the argument’s .close() method at the end of the block.

import urllib, sys
from contextlib import closing

with closing(urllib.urlopen('http://www.yahoo.com')) as f:
    for line in f:
        sys.stdout.write(line)

See also

PEP 343 - The “with” statement: PEP written by Guido van Rossum and Nick Coghlan; implemented by Mike Bland, Guido van Rossum, and Neal Norwitz. The PEP shows the code generated for a ‘with‘ statement, which can be helpful in learning how the statement works.

The documentation for the contextlib module.

PEP 366: Explicit Relative Imports From a Main Module¶

Python’s -m switch allows running a module as a script. When you ran a module that was located inside a package, relative imports didn’t work correctly.

The fix for Python 2.6 adds a __package__ attribute to modules. When this attribute is present, relative imports will be relative to the value of this attribute instead of the __name__ attribute.

PEP 302-style importers can then set __package__ as necessary. The runpy module that implements the -m switch now does this, so relative imports will now work correctly in scripts running from inside a package.

PEP 370: Per-user `site-packages` Directory¶

When you run Python, the module search path sys.path usually includes a directory whose path ends in "site-packages". This directory is intended to hold locally-installed packages available to all users using a machine or a particular site installation.

Python 2.6 introduces a convention for user-specific site directories. The directory varies depending on the platform:

Unix and Mac OS X: ~/.local/
Windows: %APPDATA%/Python

Within this directory, there will be version-specific subdirectories, such as lib/python2.6/site-packages on Unix/Mac OS and Python26/site-packages on Windows.

If you don’t like the default directory, it can be overridden by an environment variable. PYTHONUSERBASE sets the root directory used for all Python versions supporting this feature. On Windows, the directory for application-specific data can be changed by setting the APPDATA environment variable. You can also modify the site.py file for your Python installation.

The feature can be disabled entirely by running Python with the -s option or setting the PYTHONNOUSERSITE environment variable.

See also

PEP 370 - Per-user site-packages Directory: PEP written and implemented by Christian Heimes.

PEP 371: The `multiprocessing` Package¶

The new multiprocessing package lets Python programs create new processes that will perform a computation and return a result to the parent. The parent and child processes can communicate using queues and pipes, synchronize their operations using locks and semaphores, and can share simple arrays of data.

The multiprocessing module started out as an exact emulation of the threading module using processes instead of threads. That goal was discarded along the path to Python 2.6, but the general approach of the module is still similar. The fundamental class is the Process, which is passed a callable object and a collection of arguments. The start() method sets the callable running in a subprocess, after which you can call the is_alive() method to check whether the subprocess is still running and the join() method to wait for the process to exit.

Here’s a simple example where the subprocess will calculate a factorial. The function doing the calculation is written strangely so that it takes significantly longer when the input argument is a multiple of 4.

import time
from multiprocessing import Process, Queue


def factorial(queue, N):
    "Compute a factorial."
    # If N is a multiple of 4, this function will take much longer.
    if (N % 4) == 0:
        time.sleep(.05 * N/4)

    # Calculate the result
    fact = 1L
    for i in range(1, N+1):
        fact = fact * i

    # Put the result on the queue
    queue.put(fact)

if __name__ == '__main__':
    queue = Queue()

    N = 5

    p = Process(target=factorial, args=(queue, N))
    p.start()
    p.join()

    result = queue.get()
    print 'Factorial', N, '=', result

A Queue is used to communicate the input parameter N and the result. The Queue object is stored in a global variable. The child process will use the value of the variable when the child was created; because it’s a Queue, parent and child can use the object to communicate. (If the parent were to change the value of the global variable, the child’s value would be unaffected, and vice versa.)

Two other classes, Pool and Manager, provide higher-level interfaces. Pool will create a fixed number of worker processes, and requests can then be distributed to the workers by calling apply() or apply_async() to add a single request, and map() or map_async() to add a number of requests. The following code uses a Pool to spread requests across 5 worker processes and retrieve a list of results:

from multiprocessing import Pool

def factorial(N, dictionary):
    "Compute a factorial."
    ...
p = Pool(5)
result = p.map(factorial, range(1, 1000, 10))
for v in result:
    print v

This produces the following output:

1
39916800
51090942171709440000
8222838654177922817725562880000000
33452526613163807108170062053440751665152000000000
...

The other high-level interface, the Manager class, creates a separate server process that can hold master copies of Python data structures. Other processes can then access and modify these data structures using proxy objects. The following example creates a shared dictionary by calling the dict() method; the worker processes then insert values into the dictionary. (Locking is not done for you automatically, which doesn’t matter in this example. Manager‘s methods also include Lock(), RLock(), and Semaphore() to create shared locks.)

import time
from multiprocessing import Pool, Manager

def factorial(N, dictionary):
    "Compute a factorial."
    # Calculate the result
    fact = 1L
    for i in range(1, N+1):
        fact = fact * i

    # Store result in dictionary
    dictionary[N] = fact

if __name__ == '__main__':
    p = Pool(5)
    mgr = Manager()
    d = mgr.dict()         # Create shared dictionary

    # Run tasks using the pool
    for N in range(1, 1000, 10):
        p.apply_async(factorial, (N, d))

    # Mark pool as closed -- no more tasks can be added.
    p.close()

    # Wait for tasks to exit
    p.join()

    # Output results
    for k, v in sorted(d.items()):
        print k, v

This will produce the output:

1
39916800
51090942171709440000
8222838654177922817725562880000000
33452526613163807108170062053440751665152000000000
15511187532873822802242430164693032110632597200169861120000...

See also

The documentation for the multiprocessing module.

PEP 371 - Addition of the multiprocessing package: PEP written by Jesse Noller and Richard Oudkerk; implemented by Richard Oudkerk and Jesse Noller.

PEP 3101: Advanced String Formatting¶

In Python 3.0, the % operator is supplemented by a more powerful string formatting method, format(). Support for the str.format() method has been backported to Python 2.6.

In 2.6, both 8-bit and Unicode strings have a .format() method that treats the string as a template and takes the arguments to be formatted. The formatting template uses curly brackets ({, }) as special characters:

>>> # Substitute positional argument 0 into the string.
>>> "User ID: {0}".format("root")
'User ID: root'
>>> # Use the named keyword arguments
>>> "User ID: {uid}   Last seen: {last_login}".format(
...    uid="root",
...    last_login = "5 Mar 2008 07:20")
'User ID: root   Last seen: 5 Mar 2008 07:20'

Curly brackets can be escaped by doubling them:

>>> "Empty dict: {{}}".format()
"Empty dict: {}"

Field names can be integers indicating positional arguments, such as {0}, {1}, etc. or names of keyword arguments. You can also supply compound field names that read attributes or access dictionary keys:

>>> import sys
>>> print 'Platform: {0.platform}\nPython version: {0.version}'.format(sys)
Platform: darwin
Python version: 2.6a1+ (trunk:61261M, Mar  5 2008, 20:29:41)
[GCC 4.0.1 (Apple Computer, Inc. build 5367)]'

>>> import mimetypes
>>> 'Content-type: {0[.mp4]}'.format(mimetypes.types_map)
'Content-type: video/mp4'

Note that when using dictionary-style notation such as [.mp4], you don’t need to put any quotation marks around the string; it will look up the value using .mp4 as the key. Strings beginning with a number will be converted to an integer. You can’t write more complicated expressions inside a format string.

So far we’ve shown how to specify which field to substitute into the resulting string. The precise formatting used is also controllable by adding a colon followed by a format specifier. For example:

>>> # Field 0: left justify, pad to 15 characters
>>> # Field 1: right justify, pad to 6 characters
>>> fmt = '{0:15} ${1:>6}'
>>> fmt.format('Registration', 35)
'Registration    $    35'
>>> fmt.format('Tutorial', 50)
'Tutorial        $    50'
>>> fmt.format('Banquet', 125)
'Banquet         $   125'

Format specifiers can reference other fields through nesting:

>>> fmt = '{0:{1}}'
>>> width = 15
>>> fmt.format('Invoice #1234', width)
'Invoice #1234  '
>>> width = 35
>>> fmt.format('Invoice #1234', width)
'Invoice #1234                      '

The alignment of a field within the desired width can be specified:

Character	Effect
< (default)	Left-align
>	Right-align
^	Center
=	(For numeric types only) Pad after the sign.

Format specifiers can also include a presentation type, which controls how the value is formatted. For example, floating-point numbers can be formatted as a general number or in exponential notation:

>>> '{0:g}'.format(3.75)
'3.75'
>>> '{0:e}'.format(3.75)
'3.750000e+00'

A variety of presentation types are available. Consult the 2.6 documentation for a complete list; here’s a sample:

`b`	Binary. Outputs the number in base 2.
`c`	Character. Converts the integer to the corresponding Unicode character before printing.
`d`	Decimal Integer. Outputs the number in base 10.
`o`	Octal format. Outputs the number in base 8.
`x`	Hex format. Outputs the number in base 16, using lower-case letters for the digits above 9.
`e`	Exponent notation. Prints the number in scientific notation using the letter ‘e’ to indicate the exponent.
`g`	General format. This prints the number as a fixed-point number, unless the number is too large, in which case it switches to ‘e’ exponent notation.
`n`	Number. This is the same as ‘g’ (for floats) or ‘d’ (for integers), except that it uses the current locale setting to insert the appropriate number separator characters.
`%`	Percentage. Multiplies the number by 100 and displays in fixed (‘f’) format, followed by a percent sign.

Classes and types can define a __format__() method to control how they’re formatted. It receives a single argument, the format specifier:

def __format__(self, format_spec):
    if isinstance(format_spec, unicode):
        return unicode(str(self))
    else:
        return str(self)

There’s also a format() builtin that will format a single value. It calls the type’s __format__() method with the provided specifier:

>>> format(75.6564, '.2f')
'75.66'

See also

Format String Syntax: The reference documentation for format fields.
PEP 3101 - Advanced String Formatting: PEP written by Talin. Implemented by Eric Smith.

PEP 3105: `print` As a Function¶

The print statement becomes the print() function in Python 3.0. Making print() a function makes it possible to replace the function by doing def print(...) or importing a new function from somewhere else.

Python 2.6 has a __future__ import that removes print as language syntax, letting you use the functional form instead. For example:

>>> from __future__ import print_function
>>> print('# of entries', len(dictionary), file=sys.stderr)

The signature of the new function is:

def print(*args, sep=' ', end='\n', file=None)

The parameters are:

args: positional arguments whose values will be printed out.

sep: the separator, which will be printed between arguments.

end: the ending text, which will be printed after all of the arguments have been output.

file: the file object to which the output will be sent.

See also

PEP 3105 - Make print a function: PEP written by Georg Brandl.

PEP 3110: Exception-Handling Changes¶

One error that Python programmers occasionally make is writing the following code:

try:
    ...
except TypeError, ValueError:  # Wrong!
    ...

The author is probably trying to catch both TypeError and ValueError exceptions, but this code actually does something different: it will catch TypeError and bind the resulting exception object to the local name "ValueError". The ValueError exception will not be caught at all. The correct code specifies a tuple of exceptions:

try:
    ...
except (TypeError, ValueError):
    ...

This error happens because the use of the comma here is ambiguous: does it indicate two different nodes in the parse tree, or a single node that’s a tuple?

Python 3.0 makes this unambiguous by replacing the comma with the word “as”. To catch an exception and store the exception object in the variable exc, you must write:

try:
    ...
except TypeError as exc:
    ...

Python 3.0 will only support the use of “as”, and therefore interprets the first example as catching two different exceptions. Python 2.6 supports both the comma and “as”, so existing code will continue to work. We therefore suggest using “as” when writing new Python code that will only be executed with 2.6.

See also

PEP 3110 - Catching Exceptions in Python 3000: PEP written and implemented by Collin Winter.

PEP 3112: Byte Literals¶

Python 3.0 adopts Unicode as the language’s fundamental string type and denotes 8-bit literals differently, either as b'string' or using a bytes constructor. For future compatibility, Python 2.6 adds bytes as a synonym for the str type, and it also supports the b'' notation.

The 2.6 str differs from 3.0’s bytes type in various ways; most notably, the constructor is completely different. In 3.0, bytes([65, 66, 67]) is 3 elements long, containing the bytes representing ABC; in 2.6, bytes([65, 66, 67]) returns the 12-byte string representing the str() of the list.

The primary use of bytes in 2.6 will be to write tests of object type such as isinstance(x, bytes). This will help the 2to3 converter, which can’t tell whether 2.x code intends strings to contain either characters or 8-bit bytes; you can now use either bytes or str to represent your intention exactly, and the resulting code will also be correct in Python 3.0.

There’s also a __future__ import that causes all string literals to become Unicode strings. This means that \u escape sequences can be used to include Unicode characters:

from __future__ import unicode_literals

s = ('\u751f\u3080\u304e\u3000\u751f\u3054'
     '\u3081\u3000\u751f\u305f\u307e\u3054')

print len(s)               # 12 Unicode characters

At the C level, Python 3.0 will rename the existing 8-bit string type, called PyStringObject in Python 2.x, to PyBytesObject. Python 2.6 uses #define to support using the names PyBytesObject(), PyBytes_Check(), PyBytes_FromStringAndSize(), and all the other functions and macros used with strings.

Instances of the bytes type are immutable just as strings are. A new bytearray type stores a mutable sequence of bytes:

>>> bytearray([65, 66, 67])
bytearray(b'ABC')
>>> b = bytearray(u'\u21ef\u3244', 'utf-8')
>>> b
bytearray(b'\xe2\x87\xaf\xe3\x89\x84')
>>> b[0] = '\xe3'
>>> b
bytearray(b'\xe3\x87\xaf\xe3\x89\x84')
>>> unicode(str(b), 'utf-8')
u'\u31ef \u3244'

Byte arrays support most of the methods of string types, such as startswith()/endswith(), find()/rfind(), and some of the methods of lists, such as append(), pop(), and reverse().

>>> b = bytearray('ABC')
>>> b.append('d')
>>> b.append(ord('e'))
>>> b
bytearray(b'ABCde')

There’s also a corresponding C API, with PyByteArray_FromObject(), PyByteArray_FromStringAndSize(), and various other functions.

See also

PEP 3112 - Bytes literals in Python 3000: PEP written by Jason Orendorff; backported to 2.6 by Christian Heimes.

PEP 3116: New I/O Library¶

Python’s built-in file objects support a number of methods, but file-like objects don’t necessarily support all of them. Objects that imitate files usually support read() and write(), but they may not support readline(), for example. Python 3.0 introduces a layered I/O library in the io module that separates buffering and text-handling features from the fundamental read and write operations.

There are three levels of abstract base classes provided by the io module:

RawIOBase defines raw I/O operations: read(), readinto(), write(), seek(), tell(), truncate(), and close(). Most of the methods of this class will often map to a single system call. There are also readable(), writable(), and seekable() methods for determining what operations a given object will allow.

Python 3.0 has concrete implementations of this class for files and sockets, but Python 2.6 hasn’t restructured its file and socket objects in this way.
BufferedIOBase is an abstract base class that buffers data in memory to reduce the number of system calls used, making I/O processing more efficient. It supports all of the methods of RawIOBase, and adds a raw attribute holding the underlying raw object.

There are five concrete classes implementing this ABC. BufferedWriter and BufferedReader are for objects that support write-only or read-only usage that have a seek() method for random access. BufferedRandom objects support read and write access upon the same underlying stream, and BufferedRWPair is for objects such as TTYs that have both read and write operations acting upon unconnected streams of data. The BytesIO class supports reading, writing, and seeking over an in-memory buffer.
TextIOBase: Provides functions for reading and writing strings (remember, strings will be Unicode in Python 3.0), and supporting universal newlines. TextIOBase defines the readline() method and supports iteration upon objects.

There are two concrete implementations. TextIOWrapper wraps a buffered I/O object, supporting all of the methods for text I/O and adding a buffer attribute for access to the underlying object. StringIO simply buffers everything in memory without ever writing anything to disk.

(In Python 2.6, io.StringIO is implemented in pure Python, so it’s pretty slow. You should therefore stick with the existing StringIO module or cStringIO for now. At some point Python 3.0’s io module will be rewritten into C for speed, and perhaps the C implementation will be backported to the 2.x releases.)

In Python 2.6, the underlying implementations haven’t been restructured to build on top of the io module’s classes. The module is being provided to make it easier to write code that’s forward-compatible with 3.0, and to save developers the effort of writing their own implementations of buffering and text I/O.

See also

PEP 3116 - New I/O: PEP written by Daniel Stutzbach, Mike Verdone, and Guido van Rossum. Code by Guido van Rossum, Georg Brandl, Walter Doerwald, Jeremy Hylton, Martin von Loewis, Tony Lownds, and others.

PEP 3118: Revised Buffer Protocol¶

The buffer protocol is a C-level API that lets Python types exchange pointers into their internal representations. A memory-mapped file can be viewed as a buffer of characters, for example, and this lets another module such as re treat memory-mapped files as a string of characters to be searched.

The primary users of the buffer protocol are numeric-processing packages such as NumPy, which expose the internal representation of arrays so that callers can write data directly into an array instead of going through a slower API. This PEP updates the buffer protocol in light of experience from NumPy development, adding a number of new features such as indicating the shape of an array or locking a memory region.

The most important new C API function is PyObject_GetBuffer(PyObject *obj, Py_buffer *view, int flags), which takes an object and a set of flags, and fills in the Py_buffer structure with information about the object’s memory representation. Objects can use this operation to lock memory in place while an external caller could be modifying the contents, so there’s a corresponding PyBuffer_Release(Py_buffer *view) to indicate that the external caller is done.

The flags argument to PyObject_GetBuffer() specifies constraints upon the memory returned. Some examples are:

PyBUF_WRITABLE indicates that the memory must be writable.

PyBUF_LOCK requests a read-only or exclusive lock on the memory.

PyBUF_C_CONTIGUOUS and PyBUF_F_CONTIGUOUS requests a C-contiguous (last dimension varies the fastest) or Fortran-contiguous (first dimension varies the fastest) array layout.

Two new argument codes for PyArg_ParseTuple(), s* and z*, return locked buffer objects for a parameter.

See also

PEP 3118 - Revising the buffer protocol: PEP written by Travis Oliphant and Carl Banks; implemented by Travis Oliphant.

PEP 3119: Abstract Base Classes¶

Some object-oriented languages such as Java support interfaces, declaring that a class has a given set of methods or supports a given access protocol. Abstract Base Classes (or ABCs) are an equivalent feature for Python. The ABC support consists of an abc module containing a metaclass called ABCMeta, special handling of this metaclass by the isinstance() and issubclass() builtins, and a collection of basic ABCs that the Python developers think will be widely useful. Future versions of Python will probably add more ABCs.

Let’s say you have a particular class and wish to know whether it supports dictionary-style access. The phrase “dictionary-style” is vague, however. It probably means that accessing items with obj[1] works. Does it imply that setting items with obj[2] = value works? Or that the object will have keys(), values(), and items() methods? What about the iterative variants such as iterkeys()? copy() and update()? Iterating over the object with iter()?

The Python 2.6 collections module includes a number of different ABCs that represent these distinctions. Iterable indicates that a class defines __iter__(), and Container means the class defines a __contains__() method and therefore supports x in y expressions. The basic dictionary interface of getting items, setting items, and keys(), values(), and items(), is defined by the MutableMapping ABC.

You can derive your own classes from a particular ABC to indicate they support that ABC’s interface:

import collections

class Storage(collections.MutableMapping):
    ...

Alternatively, you could write the class without deriving from the desired ABC and instead register the class by calling the ABC’s register() method:

import collections

class Storage:
    ...

collections.MutableMapping.register(Storage)

For classes that you write, deriving from the ABC is probably clearer. The register() method is useful when you’ve written a new ABC that can describe an existing type or class, or if you want to declare that some third-party class implements an ABC. For example, if you defined a PrintableType ABC, it’s legal to do:

# Register Python's types
PrintableType.register(int)
PrintableType.register(float)
PrintableType.register(str)

Classes should obey the semantics specified by an ABC, but Python can’t check this; it’s up to the class author to understand the ABC’s requirements and to implement the code accordingly.

To check whether an object supports a particular interface, you can now write:

def func(d):
    if not isinstance(d, collections.MutableMapping):
        raise ValueError("Mapping object expected, not %r" % d)

Don’t feel that you must now begin writing lots of checks as in the above example. Python has a strong tradition of duck-typing, where explicit type-checking is never done and code simply calls methods on an object, trusting that those methods will be there and raising an exception if they aren’t. Be judicious in checking for ABCs and only do it where it’s absolutely necessary.

You can write your own ABCs by using abc.ABCMeta as the metaclass in a class definition:

from abc import ABCMeta, abstractmethod

class Drawable():
    __metaclass__ = ABCMeta

    @abstractmethod
    def draw(self, x, y, scale=1.0):
        pass

    def draw_doubled(self, x, y):
        self.draw(x, y, scale=2.0)


class Square(Drawable):
    def draw(self, x, y, scale):
        ...

In the Drawable ABC above, the draw_doubled() method renders the object at twice its size and can be implemented in terms of other methods described in Drawable. Classes implementing this ABC therefore don’t need to provide their own implementation of draw_doubled(), though they can do so. An implementation of draw() is necessary, though; the ABC can’t provide a useful generic implementation.

You can apply the @abstractmethod decorator to methods such as draw() that must be implemented; Python will then raise an exception for classes that don’t define the method. Note that the exception is only raised when you actually try to create an instance of a subclass lacking the method:

>>> class Circle(Drawable):
...     pass
...
>>> c = Circle()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Can't instantiate abstract class Circle with abstract methods draw
>>>

Abstract data attributes can be declared using the @abstractproperty decorator:

from abc import abstractproperty
...

@abstractproperty
def readonly(self):
   return self._x

Subclasses must then define a readonly() property.

See also

PEP 3119 - Introducing Abstract Base Classes: PEP written by Guido van Rossum and Talin. Implemented by Guido van Rossum. Backported to 2.6 by Benjamin Aranguren, with Alex Martelli.

PEP 3127: Integer Literal Support and Syntax¶

Python 3.0 changes the syntax for octal (base-8) integer literals, prefixing them with “0o” or “0O” instead of a leading zero, and adds support for binary (base-2) integer literals, signalled by a “0b” or “0B” prefix.

Python 2.6 doesn’t drop support for a leading 0 signalling an octal number, but it does add support for “0o” and “0b”:

>>> 0o21, 2*8 + 1
(17, 17)
>>> 0b101111
47

The oct() builtin still returns numbers prefixed with a leading zero, and a new bin() builtin returns the binary representation for a number:

>>> oct(42)
'052'
>>> future_builtins.oct(42)
'0o52'
>>> bin(173)
'0b10101101'

The int() and long() builtins will now accept the “0o” and “0b” prefixes when base-8 or base-2 are requested, or when the base argument is zero (signalling that the base used should be determined from the string):

>>> int ('0o52', 0)
42
>>> int('1101', 2)
13
>>> int('0b1101', 2)
13
>>> int('0b1101', 0)
13

See also

PEP 3127 - Integer Literal Support and Syntax: PEP written by Patrick Maupin; backported to 2.6 by Eric Smith.

PEP 3129: Class Decorators¶

Decorators have been extended from functions to classes. It’s now legal to write:

@foo
@bar
class A:
  pass

This is equivalent to:

class A:
  pass

A = foo(bar(A))

See also

PEP 3129 - Class Decorators: PEP written by Collin Winter.

PEP 3141: A Type Hierarchy for Numbers¶

Python 3.0 adds several abstract base classes for numeric types inspired by Scheme’s numeric tower. These classes were backported to 2.6 as the numbers module.

The most general ABC is Number. It defines no operations at all, and only exists to allow checking if an object is a number by doing isinstance(obj, Number).

Complex is a subclass of Number. Complex numbers can undergo the basic operations of addition, subtraction, multiplication, division, and exponentiation, and you can retrieve the real and imaginary parts and obtain a number’s conjugate. Python’s built-in complex type is an implementation of Complex.

Real further derives from Complex, and adds operations that only work on real numbers: floor(), trunc(), rounding, taking the remainder mod N, floor division, and comparisons.

Rational numbers derive from Real, have numerator and denominator properties, and can be converted to floats. Python 2.6 adds a simple rational-number class, Fraction, in the fractions module. (It’s called Fraction instead of Rational to avoid a name clash with numbers.Rational.)

Integral numbers derive from Rational, and can be shifted left and right with << and >>, combined using bitwise operations such as & and |, and can be used as array indexes and slice boundaries.

In Python 3.0, the PEP slightly redefines the existing builtins round(), math.floor(), math.ceil(), and adds a new one, math.trunc(), that’s been backported to Python 2.6. math.trunc() rounds toward zero, returning the closest Integral that’s between the function’s argument and zero.

See also

PEP 3141 - A Type Hierarchy for Numbers: PEP written by Jeffrey Yasskin.

Scheme’s numerical tower, from the Guile manual.

Scheme’s number datatypes from the R5RS Scheme specification.

The `fractions` Module¶

To fill out the hierarchy of numeric types, the fractions module provides a rational-number class. Rational numbers store their values as a numerator and denominator forming a fraction, and can exactly represent numbers such as 2/3 that floating-point numbers can only approximate.

The Fraction constructor takes two Integral values that will be the numerator and denominator of the resulting fraction.

>>> from fractions import Fraction
>>> a = Fraction(2, 3)
>>> b = Fraction(2, 5)
>>> float(a), float(b)
(0.66666666666666663, 0.40000000000000002)
>>> a+b
Fraction(16, 15)
>>> a/b
Fraction(5, 3)

For converting floating-point numbers to rationals, the float type now has an as_integer_ratio() method that returns the numerator and denominator for a fraction that evaluates to the same floating-point value:

>>> (2.5) .as_integer_ratio()
(5, 2)
>>> (3.1415) .as_integer_ratio()
(7074029114692207L, 2251799813685248L)
>>> (1./3) .as_integer_ratio()
(6004799503160661L, 18014398509481984L)

Note that values that can only be approximated by floating-point numbers, such as 1./3, are not simplified to the number being approximated; the fraction attempts to match the floating-point value exactly.

The fractions module is based upon an implementation by Sjoerd Mullender that was in Python’s Demo/classes/ directory for a long time. This implementation was significantly updated by Jeffrey Yasskin.

Other Language Changes¶

Some smaller changes made to the core Python language are:

Directories and zip archives containing a __main__.py file can now be executed directly by passing their name to the interpreter. The directory or zip archive is automatically inserted as the first entry in sys.path. (Suggestion and initial patch by Andy Chu, subsequently revised by Phillip J. Eby and Nick Coghlan; issue 1739468.)
The hasattr() function was catching and ignoring all errors, under the assumption that they meant a __getattr__() method was failing somehow and the return value of hasattr() would therefore be False. This logic shouldn’t be applied to KeyboardInterrupt and SystemExit, however; Python 2.6 will no longer discard such exceptions when hasattr() encounters them. (Fixed by Benjamin Peterson; issue 2196.)
When calling a function using the ** syntax to provide keyword arguments, you are no longer required to use a Python dictionary; any mapping will now work:
```
>>> def f(**kw):
...    print sorted(kw)
...
>>> ud=UserDict.UserDict()
>>> ud['a'] = 1
>>> ud['b'] = 'string'
>>> f(**ud)
['a', 'b']
```
(Contributed by Alexander Belopolsky; issue 1686487.)

It’s also become legal to provide keyword arguments after a *args argument to a function call.
```
>>> def f(*args, **kw):
...     print args, kw
...
>>> f(1,2,3, *(4,5,6), keyword=13)
(1, 2, 3, 4, 5, 6) {'keyword': 13}
```
Previously this would have been a syntax error. (Contributed by Amaury Forgeot d’Arc; issue 3473.)
A new builtin, next(iterator, [default]) returns the next item from the specified iterator. If the default argument is supplied, it will be returned if iterator has been exhausted; otherwise, the StopIteration exception will be raised. (Backported in issue 2719.)
Tuples now have index() and count() methods matching the list type’s index() and count() methods:
```
>>> t = (0,1,2,3,4,0,1,2)
>>> t.index(3)
3
>>> t.count(0)
2
```
(Contributed by Raymond Hettinger)
The built-in types now have improved support for extended slicing syntax, accepting various combinations of (start, stop, step). Previously, the support was partial and certain corner cases wouldn’t work. (Implemented by Thomas Wouters.)

Properties now have three attributes, getter, setter and deleter, that are decorators providing useful shortcuts for adding a getter, setter or deleter function to an existing property. You would use them like this:

class C(object):
    @property
    def x(self):
        return self._x

    @x.setter
    def x(self, value):
        self._x = value

    @x.deleter
    def x(self):
        del self._x

class D(C):
    @C.x.getter
    def x(self):
        return self._x * 2

    @x.setter
    def x(self, value):
        self._x = value / 2

Several methods of the built-in set types now accept multiple iterables: intersection(), intersection_update(), union(), update(), difference() and difference_update().
```
>>> s=set('1234567890')
>>> s.intersection('abc123', 'cdf246')  # Intersection between all inputs
set(['2'])
>>> s.difference('246', '789')
set(['1', '0', '3', '5'])
```
(Contributed by Raymond Hettinger.)
Many floating-point features were added. The float() function will now turn the string nan into an IEEE 754 Not A Number value, and +inf and -inf into positive or negative infinity. This works on any platform with IEEE 754 semantics. (Contributed by Christian Heimes; issue 1635.)

Other functions in the math module, isinf() and isnan(), return true if their floating-point argument is infinite or Not A Number. (issue 1640)

Conversion functions were added to convert floating-point numbers into hexadecimal strings (issue 3008). These functions convert floats to and from a string representation without introducing rounding errors from the conversion between decimal and binary. Floats have a hex() method that returns a string representation, and the float.fromhex() method converts a string back into a number:
```
>>> a = 3.75
>>> a.hex()
'0x1.e000000000000p+1'
>>> float.fromhex('0x1.e000000000000p+1')
3.75
>>> b=1./3
>>> b.hex()
'0x1.5555555555555p-2'
```
A numerical nicety: when creating a complex number from two floats on systems that support signed zeros (-0 and +0), the complex() constructor will now preserve the sign of the zero. (Fixed by Mark T. Dickinson; issue 1507.)
Classes that inherit a __hash__() method from a parent class can set __hash__ = None to indicate that the class isn’t hashable. This will make hash(obj) raise a TypeError and the class will not be indicated as implementing the Hashable ABC.

You should do this when you’ve defined a __cmp__() or __eq__() method that compares objects by their value rather than by identity. All objects have a default hash method that uses id(obj) as the hash value. There’s no tidy way to remove the __hash__() method inherited from a parent class, so assigning None was implemented as an override. At the C level, extensions can set tp_hash to PyObject_HashNotImplemented(). (Fixed by Nick Coghlan and Amaury Forgeot d’Arc; issue 2235.)
The GeneratorExit exception now subclasses BaseException instead of Exception. This means that an exception handler that does except Exception: will not inadvertently catch GeneratorExit. (Contributed by Chad Austin; issue 1537.)
Generator objects now have a gi_code attribute that refers to the original code object backing the generator. (Contributed by Collin Winter; issue 1473257.)
The compile() built-in function now accepts keyword arguments as well as positional parameters. (Contributed by Thomas Wouters; issue 1444529.)
The complex() constructor now accepts strings containing parenthesized complex numbers, meaning that complex(repr(cplx)) will now round-trip values. For example, complex('(3+4j)') now returns the value (3+4j). (issue 1491866)
The string translate() method now accepts None as the translation table parameter, which is treated as the identity transformation. This makes it easier to carry out operations that only delete characters. (Contributed by Bengt Richter and implemented by Raymond Hettinger; issue 1193128.)
The built-in dir() function now checks for a __dir__() method on the objects it receives. This method must return a list of strings containing the names of valid attributes for the object, and lets the object control the value that dir() produces. Objects that have __getattr__() or __getattribute__() methods can use this to advertise pseudo-attributes they will honor. (issue 1591665)
Instance method objects have new attributes for the object and function comprising the method; the new synonym for im_self is __self__, and im_func is also available as __func__. The old names are still supported in Python 2.6, but are gone in 3.0.
An obscure change: when you use the locals() function inside a class statement, the resulting dictionary no longer returns free variables. (Free variables, in this case, are variables referenced in the class statement that aren’t attributes of the class.)

Optimizations¶

The warnings module has been rewritten in C. This makes it possible to invoke warnings from the parser, and may also make the interpreter’s startup faster. (Contributed by Neal Norwitz and Brett Cannon; issue 1631171.)
Type objects now have a cache of methods that can reduce the work required to find the correct method implementation for a particular class; once cached, the interpreter doesn’t need to traverse base classes to figure out the right method to call. The cache is cleared if a base class or the class itself is modified, so the cache should remain correct even in the face of Python’s dynamic nature. (Original optimization implemented by Armin Rigo, updated for Python 2.6 by Kevin Jacobs; issue 1700288.)

By default, this change is only applied to types that are included with the Python core. Extension modules may not necessarily be compatible with this cache, so they must explicitly add Py_TPFLAGS_HAVE_VERSION_TAG to the module’s tp_flags field to enable the method cache. (To be compatible with the method cache, the extension module’s code must not directly access and modify the tp_dict member of any of the types it implements. Most modules don’t do this, but it’s impossible for the Python interpreter to determine that. See issue 1878 for some discussion.)
Function calls that use keyword arguments are significantly faster by doing a quick pointer comparison, usually saving the time of a full string comparison. (Contributed by Raymond Hettinger, after an initial implementation by Antoine Pitrou; issue 1819.)
All of the functions in the struct module have been rewritten in C, thanks to work at the Need For Speed sprint. (Contributed by Raymond Hettinger.)
Some of the standard built-in types now set a bit in their type objects. This speeds up checking whether an object is a subclass of one of these types. (Contributed by Neal Norwitz.)
Unicode strings now use faster code for detecting whitespace and line breaks; this speeds up the split() method by about 25% and splitlines() by 35%. (Contributed by Antoine Pitrou.) Memory usage is reduced by using pymalloc for the Unicode string’s data.
The with statement now stores the __exit__() method on the stack, producing a small speedup. (Implemented by Jeffrey Yasskin.)
To reduce memory usage, the garbage collector will now clear internal free lists when garbage-collecting the highest generation of objects. This may return memory to the operating system sooner.

Interpreter Changes¶

Two command-line options have been reserved for use by other Python implementations. The -J switch has been reserved for use by Jython for Jython-specific options, such as switches that are passed to the underlying JVM. -X has been reserved for options specific to a particular implementation of Python such as CPython, Jython, or IronPython. If either option is used with Python 2.6, the interpreter will report that the option isn’t currently used.

Python can now be prevented from writing .pyc or .pyo files by supplying the -B switch to the Python interpreter, or by setting the PYTHONDONTWRITEBYTECODE environment variable before running the interpreter. This setting is available to Python programs as the sys.dont_write_bytecode variable, and Python code can change the value to modify the interpreter’s behaviour. (Contributed by Neal Norwitz and Georg Brandl.)

The encoding used for standard input, output, and standard error can be specified by setting the PYTHONIOENCODING environment variable before running the interpreter. The value should be a string in the form <encoding> or <encoding>:<errorhandler>. The encoding part specifies the encoding’s name, e.g. utf-8 or latin-1; the optional errorhandler part specifies what to do with characters that can’t be handled by the encoding, and should be one of “error”, “ignore”, or “replace”. (Contributed by Martin von Loewis.)

New and Improved Modules¶

The asyncore and asynchat modules are being actively maintained again, and a number of patches and bugfixes were applied. (Maintained by Josiah Carlson; see issue 1736190 for one patch.)
The bsddb module also has a new maintainer, Jesús Cea Avion, and the package is now available as a standalone package. The web page for the package is www.jcea.es/programacion/pybsddb.htm. The plan is to remove the package from the standard library in Python 3.0, because its pace of releases is much more frequent than Python’s.

The bsddb.dbshelve module now uses the highest pickling protocol available, instead of restricting itself to protocol 1. (Contributed by W. Barnes.)
The cgi module will now read variables from the query string of an HTTP POST request. This makes it possible to use form actions with URLs that include query strings such as “/cgi-bin/add.py?category=1”. (Contributed by Alexandre Fiori and Nubis; issue 1817.)

The parse_qs() and parse_qsl() functions have been relocated from the cgi module to the urlparse module. The versions still available in the cgi module will trigger PendingDeprecationWarning messages in 2.6 (issue 600362).
The cmath module underwent extensive revision, contributed by Mark Dickinson and Christian Heimes. Five new functions were added:
- polar() converts a complex number to polar form, returning the modulus and argument of the complex number.
- rect() does the opposite, turning a modulus, argument pair back into the corresponding complex number.
- phase() returns the argument (also called the angle) of a complex number.
- isnan() returns True if either the real or imaginary part of its argument is a NaN.
- isinf() returns True if either the real or imaginary part of its argument is infinite.
The revisions also improved the numerical soundness of the cmath module. For all functions, the real and imaginary parts of the results are accurate to within a few units of least precision (ulps) whenever possible. See issue 1381 for the details. The branch cuts for asinh(), atanh(): and atan() have also been corrected.

The tests for the module have been greatly expanded; nearly 2000 new test cases exercise the algebraic functions.

On IEEE 754 platforms, the cmath module now handles IEEE 754 special values and floating-point exceptions in a manner consistent with Annex ‘G’ of the C99 standard.

A new data type in the collections module: namedtuple(typename, fieldnames) is a factory function that creates subclasses of the standard tuple whose fields are accessible by name as well as index. For example:

>>> var_type = collections.namedtuple('variable',
...             'id name type size')
>>> # Names are separated by spaces or commas.
>>> # 'id, name, type, size' would also work.
>>> var_type._fields
('id', 'name', 'type', 'size')

>>> var = var_type(1, 'frequency', 'int', 4)
>>> print var[0], var.id    # Equivalent
1 1
>>> print var[2], var.type  # Equivalent
int int
>>> var._asdict()
{'size': 4, 'type': 'int', 'id': 1, 'name': 'frequency'}
>>> v2 = var._replace(name='amplitude')
>>> v2
variable(id=1, name='amplitude', type='int', size=4)

Several places in the standard library that returned tuples have been modified to return namedtuple instances. For example, the Decimal.as_tuple() method now returns a named tuple with sign, digits, and exponent fields.

(Contributed by Raymond Hettinger.)

Another change to the collections module is that the deque type now supports an optional maxlen parameter; if supplied, the deque’s size will be restricted to no more than maxlen items. Adding more items to a full deque causes old items to be discarded.
```
>>> from collections import deque
>>> dq=deque(maxlen=3)
>>> dq
deque([], maxlen=3)
>>> dq.append(1) ; dq.append(2) ; dq.append(3)
>>> dq
deque([1, 2, 3], maxlen=3)
>>> dq.append(4)
>>> dq
deque([2, 3, 4], maxlen=3)
```
(Contributed by Raymond Hettinger.)
The Cookie module’s Morsel objects now support an httponly attribute. In some browsers. cookies with this attribute set cannot be accessed or manipulated by JavaScript code. (Contributed by Arvin Schnell; issue 1638033.)
A new window method in the curses module, chgat(), changes the display attributes for a certain number of characters on a single line. (Contributed by Fabian Kreutz.)
```
# Boldface text starting at y=0,x=21
# and affecting the rest of the line.
stdscr.chgat(0, 21, curses.A_BOLD)
```
The Textbox class in the curses.textpad module now supports editing in insert mode as well as overwrite mode. Insert mode is enabled by supplying a true value for the insert_mode parameter when creating the Textbox instance.
The datetime module’s strftime() methods now support a %f format code that expands to the number of microseconds in the object, zero-padded on the left to six places. (Contributed by Skip Montanaro; issue 1158.)
The decimal module was updated to version 1.66 of the General Decimal Specification. New features include some methods for some basic mathematical functions such as exp() and log10():
```
>>> Decimal(1).exp()
Decimal("2.718281828459045235360287471")
>>> Decimal("2.7182818").ln()
Decimal("0.9999999895305022877376682436")
>>> Decimal(1000).log10()
Decimal("3")
```
The as_tuple() method of Decimal objects now returns a named tuple with sign, digits, and exponent fields.

(Implemented by Facundo Batista and Mark Dickinson. Named tuple support added by Raymond Hettinger.)
The difflib module’s SequenceMatcher class now returns named tuples representing matches, with a, b, and size attributes. (Contributed by Raymond Hettinger.)
An optional timeout parameter, specifying a timeout measured in seconds, was added to the ftplib.FTP class constructor as well as the connect() method. (Added by Facundo Batista.) Also, the FTP class’s storbinary() and storlines() now take an optional callback parameter that will be called with each block of data after the data has been sent. (Contributed by Phil Schwartz; issue 1221598.)
The reduce() built-in function is also available in the functools module. In Python 3.0, the builtin has been dropped and reduce() is only available from functools; currently there are no plans to drop the builtin in the 2.x series. (Patched by Christian Heimes; issue 1739906.)
When possible, the getpass module will now use /dev/tty to print a prompt message and read the password, falling back to standard error and standard input. If the password may be echoed to the terminal, a warning is printed before the prompt is displayed. (Contributed by Gregory P. Smith.)
The glob.glob() function can now return Unicode filenames if a Unicode path was used and Unicode filenames are matched within the directory. (issue 1001604)
A new function in the heapq module, merge(iter1, iter2, ...), takes any number of iterables returning data in sorted order, and returns a new generator that returns the contents of all the iterators, also in sorted order. For example:
```
>>> list(heapq.merge([1, 3, 5, 9], [2, 8, 16]))
[1, 2, 3, 5, 8, 9, 16]
```
Another new function, heappushpop(heap, item), pushes item onto heap, then pops off and returns the smallest item. This is more efficient than making a call to heappush() and then heappop().

heapq is now implemented to only use less-than comparison, instead of the less-than-or-equal comparison it previously used. This makes heapq‘s usage of a type match the list.sort() method. (Contributed by Raymond Hettinger.)
An optional timeout parameter, specifying a timeout measured in seconds, was added to the httplib.HTTPConnection and HTTPSConnection class constructors. (Added by Facundo Batista.)
Most of the inspect module’s functions, such as getmoduleinfo() and getargs(), now return named tuples. In addition to behaving like tuples, the elements of the return value can also be accessed as attributes. (Contributed by Raymond Hettinger.)

Some new functions in the module include isgenerator(), isgeneratorfunction(), and isabstract().
The itertools module gained several new functions.

izip_longest(iter1, iter2, ...[, fillvalue]) makes tuples from each of the elements; if some of the iterables are shorter than others, the missing values are set to fillvalue. For example:
```
>>> tuple(itertools.izip_longest([1,2,3], [1,2,3,4,5]))
((1, 1), (2, 2), (3, 3), (None, 4), (None, 5))
```
product(iter1, iter2, ..., [repeat=N]) returns the Cartesian product of the supplied iterables, a set of tuples containing every possible combination of the elements returned from each iterable.
```
>>> list(itertools.product([1,2,3], [4,5,6]))
[(1, 4), (1, 5), (1, 6),
 (2, 4), (2, 5), (2, 6),
 (3, 4), (3, 5), (3, 6)]
```
The optional repeat keyword argument is used for taking the product of an iterable or a set of iterables with themselves, repeated N times. With a single iterable argument, N-tuples are returned:
```
>>> list(itertools.product([1,2], repeat=3))
[(1, 1, 1), (1, 1, 2), (1, 2, 1), (1, 2, 2),
 (2, 1, 1), (2, 1, 2), (2, 2, 1), (2, 2, 2)]
```
With two iterables, 2N-tuples are returned.
```
>>> list(itertools.product([1,2], [3,4], repeat=2))
[(1, 3, 1, 3), (1, 3, 1, 4), (1, 3, 2, 3), (1, 3, 2, 4),
 (1, 4, 1, 3), (1, 4, 1, 4), (1, 4, 2, 3), (1, 4, 2, 4),
 (2, 3, 1, 3), (2, 3, 1, 4), (2, 3, 2, 3), (2, 3, 2, 4),
 (2, 4, 1, 3), (2, 4, 1, 4), (2, 4, 2, 3), (2, 4, 2, 4)]
```
combinations(iterable, r) returns sub-sequences of length r from the elements of iterable.
```
>>> list(itertools.combinations('123', 2))
[('1', '2'), ('1', '3'), ('2', '3')]
>>> list(itertools.combinations('123', 3))
[('1', '2', '3')]
>>> list(itertools.combinations('1234', 3))
[('1', '2', '3'), ('1', '2', '4'),
 ('1', '3', '4'), ('2', '3', '4')]
```
permutations(iter[, r]) returns all the permutations of length r of the iterable’s elements. If r is not specified, it will default to the number of elements produced by the iterable.
```
>>> list(itertools.permutations([1,2,3,4], 2))
[(1, 2), (1, 3), (1, 4),
 (2, 1), (2, 3), (2, 4),
 (3, 1), (3, 2), (3, 4),
 (4, 1), (4, 2), (4, 3)]
```
itertools.chain(*iterables) is an existing function in itertools that gained a new constructor in Python 2.6. itertools.chain.from_iterable(iterable) takes a single iterable that should return other iterables. chain() will then return all the elements of the first iterable, then all the elements of the second, and so on.
```
>>> list(itertools.chain.from_iterable([[1,2,3], [4,5,6]]))
[1, 2, 3, 4, 5, 6]
```
(All contributed by Raymond Hettinger.)
The logging module’s FileHandler class and its subclasses WatchedFileHandler, RotatingFileHandler, and TimedRotatingFileHandler now have an optional delay parameter to their constructors. If delay is true, opening of the log file is deferred until the first emit() call is made. (Contributed by Vinay Sajip.)

TimedRotatingFileHandler also has a utc constructor parameter. If the argument is true, UTC time will be used in determining when midnight occurs and in generating filenames; otherwise local time will be used.
Several new functions were added to the math module:
- isinf() and isnan() determine whether a given float is a (positive or negative) infinity or a NaN (Not a Number), respectively.
- copysign() copies the sign bit of an IEEE 754 number, returning the absolute value of x combined with the sign bit of y. For example, math.copysign(1, -0.0) returns -1.0. (Contributed by Christian Heimes.)
- factorial() computes the factorial of a number. (Contributed by Raymond Hettinger; issue 2138.)
- fsum() adds up the stream of numbers from an iterable, and is careful to avoid loss of precision through using partial sums. (Contributed by Jean Brouwers, Raymond Hettinger, and Mark Dickinson; issue 2819.)
- acosh(), asinh() and atanh() compute the inverse hyperbolic functions.
- log1p() returns the natural logarithm of 1+x (base e).
- trunc() rounds a number toward zero, returning the closest Integral that’s between the function’s argument and zero. Added as part of the backport of PEP 3141’s type hierarchy for numbers.
The math module has been improved to give more consistent behaviour across platforms, especially with respect to handling of floating-point exceptions and IEEE 754 special values.

Whenever possible, the module follows the recommendations of the C99 standard about 754’s special values. For example, sqrt(-1.) should now give a ValueError across almost all platforms, while sqrt(float('NaN')) should return a NaN on all IEEE 754 platforms. Where Annex ‘F’ of the C99 standard recommends signaling ‘divide-by-zero’ or ‘invalid’, Python will raise ValueError. Where Annex ‘F’ of the C99 standard recommends signaling ‘overflow’, Python will raise OverflowError. (See issue 711019 and issue 1640.)

(Contributed by Christian Heimes and Mark Dickinson.)
mmap objects now have a rfind() method that searches for a substring beginning at the end of the string and searching backwards. The find() method also gained an end parameter giving an index at which to stop searching. (Contributed by John Lenton.)
The operator module gained a methodcaller() function that takes a name and an optional set of arguments, returning a callable that will call the named function on any arguments passed to it. For example:
```
>>> # Equivalent to lambda s: s.replace('old', 'new')
>>> replacer = operator.methodcaller('replace', 'old', 'new')
>>> replacer('old wine in old bottles')
'new wine in new bottles'
```
(Contributed by Georg Brandl, after a suggestion by Gregory Petrosyan.)

The attrgetter() function now accepts dotted names and performs the corresponding attribute lookups:
```
>>> inst_name = operator.attrgetter(
...        '__class__.__name__')
>>> inst_name('')
'str'
>>> inst_name(help)
'_Helper'
```
(Contributed by Georg Brandl, after a suggestion by Barry Warsaw.)
The os module now wraps several new system calls. fchmod(fd, mode) and fchown(fd, uid, gid) change the mode and ownership of an opened file, and lchmod(path, mode) changes the mode of a symlink. (Contributed by Georg Brandl and Christian Heimes.)

chflags() and lchflags() are wrappers for the corresponding system calls (where they’re available), changing the flags set on a file. Constants for the flag values are defined in the stat module; some possible values include UF_IMMUTABLE to signal the file may not be changed and UF_APPEND to indicate that data can only be appended to the file. (Contributed by M. Levinson.)

os.closerange(low, high) efficiently closes all file descriptors from low to high, ignoring any errors and not including high itself. This function is now used by the subprocess module to make starting processes faster. (Contributed by Georg Brandl; issue 1663329.)
The os.environ object’s clear() method will now unset the environment variables using os.unsetenv() in addition to clearing the object’s keys. (Contributed by Martin Horcicka; issue 1181.)
The os.walk() function now has a followlinks parameter. If set to True, it will follow symlinks pointing to directories and visit the directory’s contents. For backward compatibility, the parameter’s default value is false. Note that the function can fall into an infinite recursion if there’s a symlink that points to a parent directory. (issue 1273829)
In the os.path module, the splitext() function has been changed to not split on leading period characters. This produces better results when operating on Unix’s dot-files. For example, os.path.splitext('.ipython') now returns ('.ipython', '') instead of ('', '.ipython'). (issue 1115886)

A new function, os.path.relpath(path, start='.'), returns a relative path from the start path, if it’s supplied, or from the current working directory to the destination path. (Contributed by Richard Barran; issue 1339796.)

On Windows, os.path.expandvars() will now expand environment variables given in the form “%var%”, and “~user” will be expanded into the user’s home directory path. (Contributed by Josiah Carlson; issue 957650.)
The Python debugger provided by the pdb module gained a new command: “run” restarts the Python program being debugged and can optionally take new command-line arguments for the program. (Contributed by Rocky Bernstein; issue 1393667.)
The pdb.post_mortem() function, used to begin debugging a traceback, will now use the traceback returned by sys.exc_info() if no traceback is supplied. (Contributed by Facundo Batista; issue 1106316.)
The pickletools module now has an optimize() function that takes a string containing a pickle and removes some unused opcodes, returning a shorter pickle that contains the same data structure. (Contributed by Raymond Hettinger.)

A get_data() function was added to the pkgutil module that returns the contents of resource files included with an installed Python package. For example:

>>> import pkgutil
>>> print pkgutil.get_data('test', 'exception_hierarchy.txt')
BaseException
 +-- SystemExit
 +-- KeyboardInterrupt
 +-- GeneratorExit
 +-- Exception
      +-- StopIteration
      +-- StandardError
 ...

(Contributed by Paul Moore; issue 2439.)

The pyexpat module’s Parser objects now allow setting their buffer_size attribute to change the size of the buffer used to hold character data. (Contributed by Achim Gaedke; issue 1137.)
The Queue module now provides queue variants that retrieve entries in different orders. The PriorityQueue class stores queued items in a heap and retrieves them in priority order, and LifoQueue retrieves the most recently added entries first, meaning that it behaves like a stack. (Contributed by Raymond Hettinger.)
The random module’s Random objects can now be pickled on a 32-bit system and unpickled on a 64-bit system, and vice versa. Unfortunately, this change also means that Python 2.6’s Random objects can’t be unpickled correctly on earlier versions of Python. (Contributed by Shawn Ligocki; issue 1727780.)

The new triangular(low, high, mode) function returns random numbers following a triangular distribution. The returned values are between low and high, not including high itself, and with mode as the most frequently occurring value in the distribution. (Contributed by Wladmir van der Laan and Raymond Hettinger; issue 1681432.)
Long regular expression searches carried out by the re module will check for signals being delivered, so time-consuming searches can now be interrupted. (Contributed by Josh Hoyt and Ralf Schmitt; issue 846388.)

The regular expression module is implemented by compiling bytecodes for a tiny regex-specific virtual machine. Untrusted code could create malicious strings of bytecode directly and cause crashes, so Python 2.6 includes a verifier for the regex bytecode. (Contributed by Guido van Rossum from work for Google App Engine; issue 3487.)
The rlcompleter module’s Completer.complete() method will now ignore exceptions triggered while evaluating a name. (Fixed by Lorenz Quack; issue 2250.)
The sched module’s scheduler instances now have a read-only queue attribute that returns the contents of the scheduler’s queue, represented as a list of named tuples with the fields (time, priority, action, argument). (Contributed by Raymond Hettinger; issue 1861.)
The select module now has wrapper functions for the Linux epoll() and BSD kqueue() system calls. modify() method was added to the existing poll objects; pollobj.modify(fd, eventmask) takes a file descriptor or file object and an event mask, modifying the recorded event mask for that file. (Contributed by Christian Heimes; issue 1657.)
The shutil.copytree() function now has an optional ignore argument that takes a callable object. This callable will receive each directory path and a list of the directory’s contents, and returns a list of names that will be ignored, not copied.

The shutil module also provides an ignore_patterns() function for use with this new parameter. ignore_patterns() takes an arbitrary number of glob-style patterns and returns a callable that will ignore any files and directories that match any of these patterns. The following example copies a directory tree, but skips both .svn directories and Emacs backup files, which have names ending with ‘~’:
```
shutil.copytree('Doc/library', '/tmp/library',
                ignore=shutil.ignore_patterns('*~', '.svn'))
```
(Contributed by Tarek Ziadé; issue 2663.)
Integrating signal handling with GUI handling event loops like those used by Tkinter or GTk+ has long been a problem; most software ends up polling, waking up every fraction of a second to check if any GUI events have occurred. The signal module can now make this more efficient. Calling signal.set_wakeup_fd(fd) sets a file descriptor to be used; when a signal is received, a byte is written to that file descriptor. There’s also a C-level function, PySignal_SetWakeupFd(), for setting the descriptor.

Event loops will use this by opening a pipe to create two descriptors, one for reading and one for writing. The writable descriptor will be passed to set_wakeup_fd(), and the readable descriptor will be added to the list of descriptors monitored by the event loop via select() or poll(). On receiving a signal, a byte will be written and the main event loop will be woken up, avoiding the need to poll.

(Contributed by Adam Olsen; issue 1583.)

The siginterrupt() function is now available from Python code, and allows changing whether signals can interrupt system calls or not. (Contributed by Ralf Schmitt.)

The setitimer() and getitimer() functions have also been added (where they’re available). setitimer() allows setting interval timers that will cause a signal to be delivered to the process after a specified time, measured in wall-clock time, consumed process time, or combined process+system time. (Contributed by Guilherme Polo; issue 2240.)
The smtplib module now supports SMTP over SSL thanks to the addition of the SMTP_SSL class. This class supports an interface identical to the existing SMTP class. (Contributed by Monty Taylor.) Both class constructors also have an optional timeout parameter that specifies a timeout for the initial connection attempt, measured in seconds. (Contributed by Facundo Batista.)

An implementation of the LMTP protocol (RFC 2033) was also added to the module. LMTP is used in place of SMTP when transferring e-mail between agents that don’t manage a mail queue. (LMTP implemented by Leif Hedstrom; issue 957003.)

SMTP.starttls() now complies with RFC 3207 and forgets any knowledge obtained from the server not obtained from the TLS negotiation itself. (Patch contributed by Bill Fenner; issue 829951.)
The socket module now supports TIPC (http://tipc.sf.net), a high-performance non-IP-based protocol designed for use in clustered environments. TIPC addresses are 4- or 5-tuples. (Contributed by Alberto Bertogli; issue 1646.)

A new function, create_connection(), takes an address and connects to it using an optional timeout value, returning the connected socket object. This function also looks up the address’s type and connects to it using IPv4 or IPv6 as appropriate. Changing your code to use create_connection() instead of socket(socket.AF_INET, ...) may be all that’s required to make your code work with IPv6.
The base classes in the SocketServer module now support calling a handle_timeout() method after a span of inactivity specified by the server’s timeout attribute. (Contributed by Michael Pomraning.) The serve_forever() method now takes an optional poll interval measured in seconds, controlling how often the server will check for a shutdown request. (Contributed by Pedro Werneck and Jeffrey Yasskin; issue 742598, issue 1193577.)
The sqlite3 module, maintained by Gerhard Haering, has been updated from version 2.3.2 in Python 2.5 to version 2.4.1.
The struct module now supports the C99 _Bool type, using the format character '?'. (Contributed by David Remahl.)
The Popen objects provided by the subprocess module now have terminate(), kill(), and send_signal() methods. On Windows, send_signal() only supports the SIGTERM signal, and all these methods are aliases for the Win32 API function TerminateProcess(). (Contributed by Christian Heimes.)
A new variable in the sys module, float_info, is an object containing information derived from the float.h file about the platform’s floating-point support. Attributes of this object include mant_dig (number of digits in the mantissa), epsilon (smallest difference between 1.0 and the next largest value representable), and several others. (Contributed by Christian Heimes; issue 1534.)

Another new variable, dont_write_bytecode, controls whether Python writes any .pyc or .pyo files on importing a module. If this variable is true, the compiled files are not written. The variable is initially set on start-up by supplying the -B switch to the Python interpreter, or by setting the PYTHONDONTWRITEBYTECODE environment variable before running the interpreter. Python code can subsequently change the value of this variable to control whether bytecode files are written or not. (Contributed by Neal Norwitz and Georg Brandl.)

Information about the command-line arguments supplied to the Python interpreter is available by reading attributes of a named tuple available as sys.flags. For example, the verbose attribute is true if Python was executed in verbose mode, debug is true in debugging mode, etc. These attributes are all read-only. (Contributed by Christian Heimes.)

A new function, getsizeof(), takes a Python object and returns the amount of memory used by the object, measured in bytes. Built-in objects return correct results; third-party extensions may not, but can define a __sizeof__() method to return the object’s size. (Contributed by Robert Schuppenies; issue 2898.)

It’s now possible to determine the current profiler and tracer functions by calling sys.getprofile() and sys.gettrace(). (Contributed by Georg Brandl; issue 1648.)
The tarfile module now supports POSIX.1-2001 (pax) tarfiles in addition to the POSIX.1-1988 (ustar) and GNU tar formats that were already supported. The default format is GNU tar; specify the format parameter to open a file using a different format:
```
tar = tarfile.open("output.tar", "w",
                   format=tarfile.PAX_FORMAT)
```
The new encoding and errors parameters specify an encoding and an error handling scheme for character conversions. 'strict', 'ignore', and 'replace' are the three standard ways Python can handle errors,; 'utf-8' is a special value that replaces bad characters with their UTF-8 representation. (Character conversions occur because the PAX format supports Unicode filenames, defaulting to UTF-8 encoding.)

The TarFile.add() method now accepts an exclude argument that’s a function that can be used to exclude certain filenames from an archive. The function must take a filename and return true if the file should be excluded or false if it should be archived. The function is applied to both the name initially passed to add() and to the names of files in recursively-added directories.

(All changes contributed by Lars Gustäbel).
An optional timeout parameter was added to the telnetlib.Telnet class constructor, specifying a timeout measured in seconds. (Added by Facundo Batista.)
The tempfile.NamedTemporaryFile class usually deletes the temporary file it created when the file is closed. This behaviour can now be changed by passing delete=False to the constructor. (Contributed by Damien Miller; issue 1537850.)

A new class, SpooledTemporaryFile, behaves like a temporary file but stores its data in memory until a maximum size is exceeded. On reaching that limit, the contents will be written to an on-disk temporary file. (Contributed by Dustin J. Mitchell.)

The NamedTemporaryFile and SpooledTemporaryFile classes both work as context managers, so you can write with tempfile.NamedTemporaryFile() as tmp: .... (Contributed by Alexander Belopolsky; issue 2021.)
The test.test_support module gained a number of context managers useful for writing tests. EnvironmentVarGuard() is a context manager that temporarily changes environment variables and automatically restores them to their old values.

Another context manager, TransientResource, can surround calls to resources that may or may not be available; it will catch and ignore a specified list of exceptions. For example, a network test may ignore certain failures when connecting to an external web site:
```
with test_support.TransientResource(IOError,
                                errno=errno.ETIMEDOUT):
    f = urllib.urlopen('https://sf.net')
    ...
```
Finally, check_warnings() resets the warning module’s warning filters and returns an object that will record all warning messages triggered (issue 3781):
```
with test_support.check_warnings() as wrec:
    warnings.simplefilter("always")
    # ... code that triggers a warning ...
    assert str(wrec.message) == "function is outdated"
    assert len(wrec.warnings) == 1, "Multiple warnings raised"
```
(Contributed by Brett Cannon.)

The textwrap module can now preserve existing whitespace at the beginnings and ends of the newly-created lines by specifying drop_whitespace=False as an argument:

>>> S = """This  sentence  has a bunch   of
...   extra   whitespace."""
>>> print textwrap.fill(S, width=15)
This  sentence
has a bunch
of    extra
whitespace.
>>> print textwrap.fill(S, drop_whitespace=False, width=15)
This  sentence
  has a bunch
   of    extra
   whitespace.
>>>

(Contributed by Dwayne Bailey; issue 1581073.)

The threading module API is being changed to use properties such as daemon instead of setDaemon() and isDaemon() methods, and some methods have been renamed to use underscores instead of camel-case; for example, the activeCount() method is renamed to active_count(). Both the 2.6 and 3.0 versions of the module support the same properties and renamed methods, but don’t remove the old methods. No date has been set for the deprecation of the old APIs in Python 3.x; the old APIs won’t be removed in any 2.x version. (Carried out by several people, most notably Benjamin Peterson.)

The threading module’s Thread objects gained an ident property that returns the thread’s identifier, a nonzero integer. (Contributed by Gregory P. Smith; issue 2871.)
The timeit module now accepts callables as well as strings for the statement being timed and for the setup code. Two convenience functions were added for creating Timer instances: repeat(stmt, setup, time, repeat, number) and timeit(stmt, setup, time, number) create an instance and call the corresponding method. (Contributed by Erik Demaine; issue 1533909.)
The Tkinter module now accepts lists and tuples for options, separating the elements by spaces before passing the resulting value to Tcl/Tk. (Contributed by Guilherme Polo; issue 2906.)
The turtle module for turtle graphics was greatly enhanced by Gregor Lingl. New features in the module include:
- Better animation of turtle movement and rotation.
- Control over turtle movement using the new delay(), tracer(), and speed() methods.
- The ability to set new shapes for the turtle, and to define a new coordinate system.
- Turtles now have an undo() method that can roll back actions.
- Simple support for reacting to input events such as mouse and keyboard activity, making it possible to write simple games.
- A turtle.cfg file can be used to customize the starting appearance of the turtle’s screen.
- The module’s docstrings can be replaced by new docstrings that have been translated into another language.
(issue 1513695)
An optional timeout parameter was added to the urllib.urlopen() function and the urllib.ftpwrapper class constructor, as well as the urllib2.urlopen() function. The parameter specifies a timeout measured in seconds. For example:
```
>>> u = urllib2.urlopen("http://slow.example.com",
                        timeout=3)
Traceback (most recent call last):
  ...
urllib2.URLError: <urlopen error timed out>
>>>
```
(Added by Facundo Batista.)
The Unicode database provided by the unicodedata module has been updated to version 5.1.0. (Updated by Martin von Loewis; issue 3811.)
The warnings module’s formatwarning() and showwarning() gained an optional line argument that can be used to supply the line of source code. (Added as part of issue 1631171, which re-implemented part of the warnings module in C code.)

A new function, catch_warnings(), is a context manager intended for testing purposes that lets you temporarily modify the warning filters and then restore their original values (issue 3781).
The XML-RPC SimpleXMLRPCServer and DocXMLRPCServer classes can now be prevented from immediately opening and binding to their socket by passing True as the bind_and_activate constructor parameter. This can be used to modify the instance’s allow_reuse_address attribute before calling the server_bind() and server_activate() methods to open the socket and begin listening for connections. (Contributed by Peter Parente; issue 1599845.)

SimpleXMLRPCServer also has a _send_traceback_header attribute; if true, the exception and formatted traceback are returned as HTTP headers “X-Exception” and “X-Traceback”. This feature is for debugging purposes only and should not be used on production servers because the tracebacks might reveal passwords or other sensitive information. (Contributed by Alan McIntyre as part of his project for Google’s Summer of Code 2007.)
The xmlrpclib module no longer automatically converts datetime.date and datetime.time to the xmlrpclib.DateTime type; the conversion semantics were not necessarily correct for all applications. Code using xmlrpclib should convert date and time instances. (issue 1330538) The code can also handle dates before 1900 (contributed by Ralf Schmitt; issue 2014) and 64-bit integers represented by using <i8> in XML-RPC responses (contributed by Riku Lindblad; issue 2985).
The zipfile module’s ZipFile class now has extract() and extractall() methods that will unpack a single file or all the files in the archive to the current directory, or to a specified directory:
```
z = zipfile.ZipFile('python-251.zip')

# Unpack a single file, writing it relative
# to the /tmp directory.
z.extract('Python/sysmodule.c', '/tmp')

# Unpack all the files in the archive.
z.extractall()
```
(Contributed by Alan McIntyre; issue 467924.)

The open(), read() and extract() methods can now take either a filename or a ZipInfo object. This is useful when an archive accidentally contains a duplicated filename. (Contributed by Graham Horler; issue 1775025.)

Finally, zipfile now supports using Unicode filenames for archived files. (Contributed by Alexey Borzenkov; issue 1734346.)

The `ast` module¶

The ast module provides an Abstract Syntax Tree representation of Python code, and Armin Ronacher contributed a set of helper functions that perform a variety of common tasks. These will be useful for HTML templating packages, code analyzers, and similar tools that process Python code.

The parse() function takes an expression and returns an AST. The dump() function outputs a representation of a tree, suitable for debugging:

import ast

t = ast.parse("""
d = {}
for i in 'abcdefghijklm':
    d[i + i] = ord(i) - ord('a') + 1
print d
""")
print ast.dump(t)

This outputs a deeply nested tree:

Module(body=[
  Assign(targets=[
    Name(id='d', ctx=Store())
   ], value=Dict(keys=[], values=[]))
  For(target=Name(id='i', ctx=Store()),
      iter=Str(s='abcdefghijklm'), body=[
    Assign(targets=[
      Subscript(value=
        Name(id='d', ctx=Load()),
          slice=
          Index(value=
            BinOp(left=Name(id='i', ctx=Load()), op=Add(),
             right=Name(id='i', ctx=Load()))), ctx=Store())
     ], value=
     BinOp(left=
      BinOp(left=
       Call(func=
        Name(id='ord', ctx=Load()), args=[
          Name(id='i', ctx=Load())
         ], keywords=[], starargs=None, kwargs=None),
       op=Sub(), right=Call(func=
        Name(id='ord', ctx=Load()), args=[
          Str(s='a')
         ], keywords=[], starargs=None, kwargs=None)),
       op=Add(), right=Num(n=1)))
    ], orelse=[])
   Print(dest=None, values=[
     Name(id='d', ctx=Load())
   ], nl=True)
 ])

The literal_eval() method takes a string or an AST representing a literal expression, parses and evaluates it, and returns the resulting value. A literal expression is a Python expression containing only strings, numbers, dictionaries, etc. but no statements or function calls. If you need to evaluate an expression but cannot accept the security risk of using an eval() call, literal_eval() will handle it safely:

>>> literal = '("a", "b", {2:4, 3:8, 1:2})'
>>> print ast.literal_eval(literal)
('a', 'b', {1: 2, 2: 4, 3: 8})
>>> print ast.literal_eval('"a" + "b"')
Traceback (most recent call last):
  ...
ValueError: malformed string

The module also includes NodeVisitor and NodeTransformer classes for traversing and modifying an AST, and functions for common transformations such as changing line numbers.

The `future_builtins` module¶

Python 3.0 makes many changes to the repertoire of built-in functions, and most of the changes can’t be introduced in the Python 2.x series because they would break compatibility. The future_builtins module provides versions of these built-in functions that can be imported when writing 3.0-compatible code.

The functions in this module currently include:

ascii(obj): equivalent to repr(). In Python 3.0, repr() will return a Unicode string, while ascii() will return a pure ASCII bytestring.
filter(predicate, iterable), map(func, iterable1, ...): the 3.0 versions return iterators, unlike the 2.x builtins which return lists.
hex(value), oct(value): instead of calling the __hex__() or __oct__() methods, these versions will call the __index__() method and convert the result to hexadecimal or octal. oct() will use the new 0o notation for its result.

The `json` module: JavaScript Object Notation¶

The new json module supports the encoding and decoding of Python types in JSON (Javascript Object Notation). JSON is a lightweight interchange format often used in web applications. For more information about JSON, see http://www.json.org.

json comes with support for decoding and encoding most built-in Python types. The following example encodes and decodes a dictionary:

>>> import json
>>> data = {"spam" : "foo", "parrot" : 42}
>>> in_json = json.dumps(data) # Encode the data
>>> in_json
'{"parrot": 42, "spam": "foo"}'
>>> json.loads(in_json) # Decode into a Python object
{"spam" : "foo", "parrot" : 42}

It’s also possible to write your own decoders and encoders to support more types. Pretty-printing of the JSON strings is also supported.

json (originally called simplejson) was written by Bob Ippolito.

The `plistlib` module: A Property-List Parser¶

The .plist format is commonly used on Mac OS X to store basic data types (numbers, strings, lists, and dictionaries) by serializing them into an XML-based format. It resembles the XML-RPC serialization of data types.

Despite being primarily used on Mac OS X, the format has nothing Mac-specific about it and the Python implementation works on any platform that Python supports, so the plistlib module has been promoted to the standard library.

Using the module is simple:

import sys
import plistlib
import datetime

# Create data structure
data_struct = dict(lastAccessed=datetime.datetime.now(),
                   version=1,
                   categories=('Personal','Shared','Private'))

# Create string containing XML.
plist_str = plistlib.writePlistToString(data_struct)
new_struct = plistlib.readPlistFromString(plist_str)
print data_struct
print new_struct

# Write data structure to a file and read it back.
plistlib.writePlist(data_struct, '/tmp/customizations.plist')
new_struct = plistlib.readPlist('/tmp/customizations.plist')

# read/writePlist accepts file-like objects as well as paths.
plistlib.writePlist(data_struct, sys.stdout)

ctypes Enhancements¶

Thomas Heller continued to maintain and enhance the ctypes module.

ctypes now supports a c_bool datatype that represents the C99 bool type. (Contributed by David Remahl; issue 1649190.)

The ctypes string, buffer and array types have improved support for extended slicing syntax, where various combinations of (start, stop, step) are supplied. (Implemented by Thomas Wouters.)

All ctypes data types now support from_buffer() and from_buffer_copy() methods that create a ctypes instance based on a provided buffer object. from_buffer_copy() copies the contents of the object, while from_buffer() will share the same memory area.

A new calling convention tells ctypes to clear the errno or Win32 LastError variables at the outset of each wrapped call. (Implemented by Thomas Heller; issue 1798.)

You can now retrieve the Unix errno variable after a function call. When creating a wrapped function, you can supply use_errno=True as a keyword parameter to the DLL() function and then call the module-level methods set_errno() and get_errno() to set and retrieve the error value.

The Win32 LastError variable is similarly supported by the DLL(), OleDLL(), and WinDLL() functions. You supply use_last_error=True as a keyword parameter and then call the module-level methods set_last_error() and get_last_error().

The byref() function, used to retrieve a pointer to a ctypes instance, now has an optional offset parameter that is a byte count that will be added to the returned pointer.

Improved SSL Support¶

Bill Janssen made extensive improvements to Python 2.6’s support for the Secure Sockets Layer by adding a new module, ssl, that’s built atop the OpenSSL library. This new module provides more control over the protocol negotiated, the X.509 certificates used, and has better support for writing SSL servers (as opposed to clients) in Python. The existing SSL support in the socket module hasn’t been removed and continues to work, though it will be removed in Python 3.0.

To use the new module, you must first create a TCP connection in the usual way and then pass it to the ssl.wrap_socket() function. It’s possible to specify whether a certificate is required, and to obtain certificate info by calling the getpeercert() method.

See also

The documentation for the ssl module.

Deprecations and Removals¶

String exceptions have been removed. Attempting to use them raises a TypeError.
Changes to the Exception interface as dictated by PEP 352 continue to be made. For 2.6, the message attribute is being deprecated in favor of the args attribute.
(3.0-warning mode) Python 3.0 will feature a reorganized standard library that will drop many outdated modules and rename others. Python 2.6 running in 3.0-warning mode will warn about these modules when they are imported.

The list of deprecated modules is: audiodev, bgenlocations, buildtools, bundlebuilder, Canvas, compiler, dircache, dl, fpformat, gensuitemodule, ihooks, imageop, imgfile, linuxaudiodev, mhlib, mimetools, multifile, new, pure, statvfs, sunaudiodev, test.testall, and toaiff.
The gopherlib module has been removed.
The MimeWriter module and mimify module have been deprecated; use the email package instead.
The md5 module has been deprecated; use the hashlib module instead.
The posixfile module has been deprecated; fcntl.lockf() provides better locking.
The popen2 module has been deprecated; use the subprocess module.
The rgbimg module has been removed.
The sets module has been deprecated; it’s better to use the built-in set and frozenset types.
The sha module has been deprecated; use the hashlib module instead.

Build and C API Changes¶

Changes to Python’s build process and to the C API include:

Python now must be compiled with C89 compilers (after 19 years!). This means that the Python source tree has dropped its own implementations of memmove() and strerror(), which are in the C89 standard library.
Python 2.6 can be built with Microsoft Visual Studio 2008 (version 9.0), and this is the new default compiler. See the PCbuild directory for the build files. (Implemented by Christian Heimes.)
On Mac OS X, Python 2.6 can be compiled as a 4-way universal build. The configure script can take a --with-universal-archs=[32-bit|64-bit|all] switch, controlling whether the binaries are built for 32-bit architectures (x86, PowerPC), 64-bit (x86-64 and PPC-64), or both. (Contributed by Ronald Oussoren.)
The BerkeleyDB module now has a C API object, available as bsddb.db.api. This object can be used by other C extensions that wish to use the bsddb module for their own purposes. (Contributed by Duncan Grisby.)
The new buffer interface, previously described in the PEP 3118 section, adds PyObject_GetBuffer() and PyBuffer_Release(), as well as a few other functions.
Python’s use of the C stdio library is now thread-safe, or at least as thread-safe as the underlying library is. A long-standing potential bug occurred if one thread closed a file object while another thread was reading from or writing to the object. In 2.6 file objects have a reference count, manipulated by the PyFile_IncUseCount() and PyFile_DecUseCount() functions. File objects can’t be closed unless the reference count is zero. PyFile_IncUseCount() should be called while the GIL is still held, before carrying out an I/O operation using the FILE * pointer, and PyFile_DecUseCount() should be called immediately after the GIL is re-acquired. (Contributed by Antoine Pitrou and Gregory P. Smith.)
Importing modules simultaneously in two different threads no longer deadlocks; it will now raise an ImportError. A new API function, PyImport_ImportModuleNoBlock(), will look for a module in sys.modules first, then try to import it after acquiring an import lock. If the import lock is held by another thread, an ImportError is raised. (Contributed by Christian Heimes.)
Several functions return information about the platform’s floating-point support. PyFloat_GetMax() returns the maximum representable floating point value, and PyFloat_GetMin() returns the minimum positive value. PyFloat_GetInfo() returns an object containing more information from the float.h file, such as "mant_dig" (number of digits in the mantissa), "epsilon" (smallest difference between 1.0 and the next largest value representable), and several others. (Contributed by Christian Heimes; issue 1534.)
C functions and methods that use PyComplex_AsCComplex() will now accept arguments that have a __complex__() method. In particular, the functions in the cmath module will now accept objects with this method. This is a backport of a Python 3.0 change. (Contributed by Mark Dickinson; issue 1675423.)
Python’s C API now includes two functions for case-insensitive string comparisons, PyOS_stricmp(char*, char*) and PyOS_strnicmp(char*, char*, Py_ssize_t). (Contributed by Christian Heimes; issue 1635.)
Many C extensions define their own little macro for adding integers and strings to the module’s dictionary in the init* function. Python 2.6 finally defines standard macros for adding values to a module, PyModule_AddStringMacro and PyModule_AddIntMacro(). (Contributed by Christian Heimes.)
Some macros were renamed in both 3.0 and 2.6 to make it clearer that they are macros, not functions. Py_Size() became Py_SIZE(), Py_Type() became Py_TYPE(), and Py_Refcnt() became Py_REFCNT(). The mixed-case macros are still available in Python 2.6 for backward compatibility. (issue 1629)
Distutils now places C extensions it builds in a different directory when running on a debug version of Python. (Contributed by Collin Winter; issue 1530959.)
Several basic data types, such as integers and strings, maintain internal free lists of objects that can be re-used. The data structures for these free lists now follow a naming convention: the variable is always named free_list, the counter is always named numfree, and a macro Py<typename>_MAXFREELIST is always defined.
A new Makefile target, “make patchcheck”, prepares the Python source tree for making a patch: it fixes trailing whitespace in all modified .py files, checks whether the documentation has been changed, and reports whether the Misc/ACKS and Misc/NEWS files have been updated. (Contributed by Brett Cannon.)

Another new target, “make profile-opt”, compiles a Python binary using GCC’s profile-guided optimization. It compiles Python with profiling enabled, runs the test suite to obtain a set of profiling results, and then compiles using these results for optimization. (Contributed by Gregory P. Smith.)

Port-Specific Changes: Windows¶

The support for Windows 95, 98, ME and NT4 has been dropped. Python 2.6 requires at least Windows 2000 SP4.
The new default compiler on Windows is Visual Studio 2008 (version 9.0). The build directories for Visual Studio 2003 (version 7.1) and 2005 (version 8.0) were moved into the PC/ directory. The new PCbuild directory supports cross compilation for X64, debug builds and Profile Guided Optimization (PGO). PGO builds are roughly 10% faster than normal builds. (Contributed by Christian Heimes with help from Amaury Forgeot d’Arc and Martin von Loewis.)
The msvcrt module now supports both the normal and wide char variants of the console I/O API. The getwch() function reads a keypress and returns a Unicode value, as does the getwche() function. The putwch() function takes a Unicode character and writes it to the console. (Contributed by Christian Heimes.)
os.path.expandvars() will now expand environment variables in the form “%var%”, and “~user” will be expanded into the user’s home directory path. (Contributed by Josiah Carlson; issue 957650.)
The socket module’s socket objects now have an ioctl() method that provides a limited interface to the WSAIoctl() system interface.
The _winreg module now has a function, ExpandEnvironmentStrings(), that expands environment variable references such as %NAME% in an input string. The handle objects provided by this module now support the context protocol, so they can be used in with statements. (Contributed by Christian Heimes.)

_winreg also has better support for x64 systems, exposing the DisableReflectionKey(), EnableReflectionKey(), and QueryReflectionKey() functions, which enable and disable registry reflection for 32-bit processes running on 64-bit systems. (issue 1753245)
The msilib module’s Record object gained GetInteger() and GetString() methods that return field values as an integer or a string. (Contributed by Floris Bruynooghe; issue 2125.)

Port-Specific Changes: Mac OS X¶

When compiling a framework build of Python, you can now specify the framework name to be used by providing the --with-framework-name= option to the configure script.
The macfs module has been removed. This in turn required the macostools.touched() function to be removed because it depended on the macfs module. (issue 1490190)
Many other Mac OS modules have been deprecated and will removed in Python 3.0: _builtinSuites, aepack, aetools, aetypes, applesingle, appletrawmain, appletrunner, argvemulator, Audio_mac, autoGIL, Carbon, cfmfile, CodeWarrior, ColorPicker, EasyDialogs, Explorer, Finder, FrameWork, findertools, ic, icglue, icopen, macerrors, MacOS, macfs, macostools, macresource, MiniAEFrame, Nav, Netscape, OSATerminology, pimp, PixMapWrapper, StdSuites, SystemEvents, Terminal, and terminalcommand.

Port-Specific Changes: IRIX¶

A number of old IRIX-specific modules were deprecated and will be removed in Python 3.0: al and AL, cd, cddb, cdplayer, CL and cl, DEVICE, ERRNO, FILE, FL and fl, flp, fm, GET, GLWS, GL and gl, IN, IOCTL, jpeg, panelparser, readcd, SV and sv, torgb, videoreader, and WAIT.

Porting to Python 2.6¶

This section lists previously described changes and other bugfixes that may require changes to your code:

Classes that aren’t supposed to be hashable should set __hash__ = None in their definitions to indicate the fact.
String exceptions have been removed. Attempting to use them raises a TypeError.
The __init__() method of collections.deque now clears any existing contents of the deque before adding elements from the iterable. This change makes the behavior match list.__init__().
object.__init__() previously accepted arbitrary arguments and keyword arguments, ignoring them. In Python 2.6, this is no longer allowed and will result in a TypeError. This will affect __init__() methods that end up calling the corresponding method on object (perhaps through using super()). See issue 1683368 for discussion.
The Decimal constructor now accepts leading and trailing whitespace when passed a string. Previously it would raise an InvalidOperation exception. On the other hand, the create_decimal() method of Context objects now explicitly disallows extra whitespace, raising a ConversionSyntax exception.
Due to an implementation accident, if you passed a file path to the built-in __import__() function, it would actually import the specified file. This was never intended to work, however, and the implementation now explicitly checks for this case and raises an ImportError.
C API: the PyImport_Import() and PyImport_ImportModule() functions now default to absolute imports, not relative imports. This will affect C extensions that import other modules.
C API: extension data types that shouldn’t be hashable should define their tp_hash slot to PyObject_HashNotImplemented().
The socket module exception socket.error now inherits from IOError. Previously it wasn’t a subclass of StandardError but now it is, through IOError. (Implemented by Gregory P. Smith; issue 1706815.)
The xmlrpclib module no longer automatically converts datetime.date and datetime.time to the xmlrpclib.DateTime type; the conversion semantics were not necessarily correct for all applications. Code using xmlrpclib should convert date and time instances. (issue 1330538)
(3.0-warning mode) The Exception class now warns when accessed using slicing or index access; having Exception behave like a tuple is being phased out.
(3.0-warning mode) inequality comparisons between two dictionaries or two objects that don’t implement comparison methods are reported as warnings. dict1 == dict2 still works, but dict1 < dict2 is being phased out.

Comparisons between cells, which are an implementation detail of Python’s scoping rules, also cause warnings because such comparisons are forbidden entirely in 3.0.

Acknowledgements¶

The author would like to thank the following people for offering suggestions, corrections and assistance with various drafts of this article: Georg Brandl, Steve Brown, Nick Coghlan, Ralph Corderoy, Jim Jewett, Kent Johnson, Chris Lambacher, Martin Michlmayr, Antoine Pitrou, Brian Warner.

What’s New in Python 2.5¶

Author:	A.M. Kuchling

This article explains the new features in Python 2.5. The final release of Python 2.5 is scheduled for August 2006; PEP 356 describes the planned release schedule.

The changes in Python 2.5 are an interesting mix of language and library improvements. The library enhancements will be more important to Python’s user community, I think, because several widely-useful packages were added. New modules include ElementTree for XML processing (xml.etree), the SQLite database module (sqlite), and the ctypes module for calling C functions.

The language changes are of middling significance. Some pleasant new features were added, but most of them aren’t features that you’ll use every day. Conditional expressions were finally added to the language using a novel syntax; see section PEP 308: Conditional Expressions. The new ‘with‘ statement will make writing cleanup code easier (section PEP 343: The ‘with’ statement). Values can now be passed into generators (section PEP 342: New Generator Features). Imports are now visible as either absolute or relative (section PEP 328: Absolute and Relative Imports). Some corner cases of exception handling are handled better (section PEP 341: Unified try/except/finally). All these improvements are worthwhile, but they’re improvements to one specific language feature or another; none of them are broad modifications to Python’s semantics.

As well as the language and library additions, other improvements and bugfixes were made throughout the source tree. A search through the SVN change logs finds there were 353 patches applied and 458 bugs fixed between Python 2.4 and 2.5. (Both figures are likely to be underestimates.)

This article doesn’t try to be a complete specification of the new features; instead changes are briefly introduced using helpful examples. For full details, you should always refer to the documentation for Python 2.5 at http://docs.python.org. If you want to understand the complete implementation and design rationale, refer to the PEP for a particular new feature.

Comments, suggestions, and error reports for this document are welcome; please e-mail them to the author or open a bug in the Python bug tracker.

PEP 308: Conditional Expressions¶

For a long time, people have been requesting a way to write conditional expressions, which are expressions that return value A or value B depending on whether a Boolean value is true or false. A conditional expression lets you write a single assignment statement that has the same effect as the following:

if condition:
    x = true_value
else:
    x = false_value

There have been endless tedious discussions of syntax on both python-dev and comp.lang.python. A vote was even held that found the majority of voters wanted conditional expressions in some form, but there was no syntax that was preferred by a clear majority. Candidates included C’s cond ? true_v : false_v, if cond then true_v else false_v, and 16 other variations.

Guido van Rossum eventually chose a surprising syntax:

x = true_value if condition else false_value

Evaluation is still lazy as in existing Boolean expressions, so the order of evaluation jumps around a bit. The condition expression in the middle is evaluated first, and the true_value expression is evaluated only if the condition was true. Similarly, the false_value expression is only evaluated when the condition is false.

This syntax may seem strange and backwards; why does the condition go in the middle of the expression, and not in the front as in C’s c ? x : y? The decision was checked by applying the new syntax to the modules in the standard library and seeing how the resulting code read. In many cases where a conditional expression is used, one value seems to be the ‘common case’ and one value is an ‘exceptional case’, used only on rarer occasions when the condition isn’t met. The conditional syntax makes this pattern a bit more obvious:

contents = ((doc + '\n') if doc else '')

I read the above statement as meaning “here contents is usually assigned a value of doc+'\n'; sometimes doc is empty, in which special case an empty string is returned.” I doubt I will use conditional expressions very often where there isn’t a clear common and uncommon case.

There was some discussion of whether the language should require surrounding conditional expressions with parentheses. The decision was made to not require parentheses in the Python language’s grammar, but as a matter of style I think you should always use them. Consider these two statements:

# First version -- no parens
level = 1 if logging else 0

# Second version -- with parens
level = (1 if logging else 0)

In the first version, I think a reader’s eye might group the statement into ‘level = 1’, ‘if logging’, ‘else 0’, and think that the condition decides whether the assignment to level is performed. The second version reads better, in my opinion, because it makes it clear that the assignment is always performed and the choice is being made between two values.

Another reason for including the brackets: a few odd combinations of list comprehensions and lambdas could look like incorrect conditional expressions. See PEP 308 for some examples. If you put parentheses around your conditional expressions, you won’t run into this case.

See also

PEP 308 - Conditional Expressions: PEP written by Guido van Rossum and Raymond D. Hettinger; implemented by Thomas Wouters.

PEP 309: Partial Function Application¶

The functools module is intended to contain tools for functional-style programming.

One useful tool in this module is the partial() function. For programs written in a functional style, you’ll sometimes want to construct variants of existing functions that have some of the parameters filled in. Consider a Python function f(a, b, c); you could create a new function g(b, c) that was equivalent to f(1, b, c). This is called “partial function application”.

partial() takes the arguments (function, arg1, arg2, ... kwarg1=value1, kwarg2=value2). The resulting object is callable, so you can just call it to invoke function with the filled-in arguments.

Here’s a small but realistic example:

import functools

def log (message, subsystem):
    "Write the contents of 'message' to the specified subsystem."
    print '%s: %s' % (subsystem, message)
    ...

server_log = functools.partial(log, subsystem='server')
server_log('Unable to open socket')

Here’s another example, from a program that uses PyGTK. Here a context- sensitive pop-up menu is being constructed dynamically. The callback provided for the menu option is a partially applied version of the open_item() method, where the first argument has been provided.

...
class Application:
    def open_item(self, path):
       ...
    def init (self):
        open_func = functools.partial(self.open_item, item_path)
        popup_menu.append( ("Open", open_func, 1) )

Another function in the functools module is the update_wrapper(wrapper, wrapped)() function that helps you write well- behaved decorators. update_wrapper() copies the name, module, and docstring attribute to a wrapper function so that tracebacks inside the wrapped function are easier to understand. For example, you might write:

def my_decorator(f):
    def wrapper(*args, **kwds):
        print 'Calling decorated function'
        return f(*args, **kwds)
    functools.update_wrapper(wrapper, f)
    return wrapper

wraps() is a decorator that can be used inside your own decorators to copy the wrapped function’s information. An alternate version of the previous example would be:

def my_decorator(f):
    @functools.wraps(f)
    def wrapper(*args, **kwds):
        print 'Calling decorated function'
        return f(*args, **kwds)
    return wrapper

See also

PEP 309 - Partial Function Application: PEP proposed and written by Peter Harris; implemented by Hye-Shik Chang and Nick Coghlan, with adaptations by Raymond Hettinger.

PEP 314: Metadata for Python Software Packages v1.1¶

Some simple dependency support was added to Distutils. The setup() function now has requires, provides, and obsoletes keyword parameters. When you build a source distribution using the sdist command, the dependency information will be recorded in the PKG-INFO file.

Another new keyword parameter is download_url, which should be set to a URL for the package’s source code. This means it’s now possible to look up an entry in the package index, determine the dependencies for a package, and download the required packages.

VERSION = '1.0'
setup(name='PyPackage',
      version=VERSION,
      requires=['numarray', 'zlib (>=1.1.4)'],
      obsoletes=['OldPackage']
      download_url=('http://www.example.com/pypackage/dist/pkg-%s.tar.gz'
                    % VERSION),
     )

Another new enhancement to the Python package index at http://cheeseshop.python.org is storing source and binary archives for a package. The new upload Distutils command will upload a package to the repository.

Before a package can be uploaded, you must be able to build a distribution using the sdist Distutils command. Once that works, you can run python setup.py upload to add your package to the PyPI archive. Optionally you can GPG-sign the package by supplying the --sign and --identity options.

Package uploading was implemented by Martin von Löwis and Richard Jones.

See also

PEP 314 - Metadata for Python Software Packages v1.1: PEP proposed and written by A.M. Kuchling, Richard Jones, and Fred Drake; implemented by Richard Jones and Fred Drake.

PEP 328: Absolute and Relative Imports¶

The simpler part of PEP 328 was implemented in Python 2.4: parentheses could now be used to enclose the names imported from a module using the from ... import ... statement, making it easier to import many different names.

The more complicated part has been implemented in Python 2.5: importing a module can be specified to use absolute or package-relative imports. The plan is to move toward making absolute imports the default in future versions of Python.

Let’s say you have a package directory like this:

pkg/
pkg/__init__.py
pkg/main.py
pkg/string.py

This defines a package named pkg containing the pkg.main and pkg.string submodules.

Consider the code in the main.py module. What happens if it executes the statement import string? In Python 2.4 and earlier, it will first look in the package’s directory to perform a relative import, finds pkg/string.py, imports the contents of that file as the pkg.string module, and that module is bound to the name string in the pkg.main module’s namespace.

That’s fine if pkg.string was what you wanted. But what if you wanted Python’s standard string module? There’s no clean way to ignore pkg.string and look for the standard module; generally you had to look at the contents of sys.modules, which is slightly unclean. Holger Krekel’s py.std package provides a tidier way to perform imports from the standard library, import py ; py.std.string.join(), but that package isn’t available on all Python installations.

Reading code which relies on relative imports is also less clear, because a reader may be confused about which module, string or pkg.string, is intended to be used. Python users soon learned not to duplicate the names of standard library modules in the names of their packages’ submodules, but you can’t protect against having your submodule’s name being used for a new module added in a future version of Python.

In Python 2.5, you can switch import‘s behaviour to absolute imports using a from __future__ import absolute_import directive. This absolute- import behaviour will become the default in a future version (probably Python 2.7). Once absolute imports are the default, import string will always find the standard library’s version. It’s suggested that users should begin using absolute imports as much as possible, so it’s preferable to begin writing from pkg import string in your code.

Relative imports are still possible by adding a leading period to the module name when using the from ... import form:

# Import names from pkg.string
from .string import name1, name2
# Import pkg.string
from . import string

This imports the string module relative to the current package, so in pkg.main this will import name1 and name2 from pkg.string. Additional leading periods perform the relative import starting from the parent of the current package. For example, code in the A.B.C module can do:

from . import D                 # Imports A.B.D
from .. import E                # Imports A.E
from ..F import G               # Imports A.F.G

Leading periods cannot be used with the import modname form of the import statement, only the from ... import form.

See also

PEP 328 - Imports: Multi-Line and Absolute/Relative: PEP written by Aahz; implemented by Thomas Wouters.
http://codespeak.net/py/current/doc/index.html: The py library by Holger Krekel, which contains the py.std package.

PEP 338: Executing Modules as Scripts¶

The -m switch added in Python 2.4 to execute a module as a script gained a few more abilities. Instead of being implemented in C code inside the Python interpreter, the switch now uses an implementation in a new module, runpy.

The runpy module implements a more sophisticated import mechanism so that it’s now possible to run modules in a package such as pychecker.checker. The module also supports alternative import mechanisms such as the zipimport module. This means you can add a .zip archive’s path to sys.path and then use the -m switch to execute code from the archive.

See also

PEP 338 - Executing modules as scripts: PEP written and implemented by Nick Coghlan.

PEP 341: Unified try/except/finally¶

Until Python 2.5, the try statement came in two flavours. You could use a finally block to ensure that code is always executed, or one or more except blocks to catch specific exceptions. You couldn’t combine both except blocks and a finally block, because generating the right bytecode for the combined version was complicated and it wasn’t clear what the semantics of the combined statement should be.

Guido van Rossum spent some time working with Java, which does support the equivalent of combining except blocks and a finally block, and this clarified what the statement should mean. In Python 2.5, you can now write:

try:
    block-1 ...
except Exception1:
    handler-1 ...
except Exception2:
    handler-2 ...
else:
    else-block
finally:
    final-block

The code in block-1 is executed. If the code raises an exception, the various except blocks are tested: if the exception is of class Exception1, handler-1 is executed; otherwise if it’s of class Exception2, handler-2 is executed, and so forth. If no exception is raised, the else-block is executed.

No matter what happened previously, the final-block is executed once the code block is complete and any raised exceptions handled. Even if there’s an error in an exception handler or the else-block and a new exception is raised, the code in the final-block is still run.

See also

PEP 341 - Unifying try-except and try-finally: PEP written by Georg Brandl; implementation by Thomas Lee.

PEP 342: New Generator Features¶

Python 2.5 adds a simple way to pass values into a generator. As introduced in Python 2.3, generators only produce output; once a generator’s code was invoked to create an iterator, there was no way to pass any new information into the function when its execution is resumed. Sometimes the ability to pass in some information would be useful. Hackish solutions to this include making the generator’s code look at a global variable and then changing the global variable’s value, or passing in some mutable object that callers then modify.

To refresh your memory of basic generators, here’s a simple example:

def counter (maximum):
    i = 0
    while i < maximum:
        yield i
        i += 1

When you call counter(10), the result is an iterator that returns the values from 0 up to 9. On encountering the yield statement, the iterator returns the provided value and suspends the function’s execution, preserving the local variables. Execution resumes on the following call to the iterator’s next() method, picking up after the yield statement.

In Python 2.3, yield was a statement; it didn’t return any value. In 2.5, yield is now an expression, returning a value that can be assigned to a variable or otherwise operated on:

val = (yield i)

I recommend that you always put parentheses around a yield expression when you’re doing something with the returned value, as in the above example. The parentheses aren’t always necessary, but it’s easier to always add them instead of having to remember when they’re needed.

(PEP 342 explains the exact rules, which are that a yield-expression must always be parenthesized except when it occurs at the top-level expression on the right-hand side of an assignment. This means you can write val = yield i but have to use parentheses when there’s an operation, as in val = (yield i) + 12.)

Values are sent into a generator by calling its send(value)() method. The generator’s code is then resumed and the yield expression returns the specified value. If the regular next() method is called, the yield returns None.

Here’s the previous example, modified to allow changing the value of the internal counter.

def counter (maximum):
    i = 0
    while i < maximum:
        val = (yield i)
        # If value provided, change counter
        if val is not None:
            i = val
        else:
            i += 1

And here’s an example of changing the counter:

>>> it = counter(10)
>>> print it.next()
0
>>> print it.next()
1
>>> print it.send(8)
8
>>> print it.next()
9
>>> print it.next()
Traceback (most recent call last):
  File "t.py", line 15, in ?
    print it.next()
StopIteration

yield will usually return None, so you should always check for this case. Don’t just use its value in expressions unless you’re sure that the send() method will be the only method used to resume your generator function.

In addition to send(), there are two other new methods on generators:

throw(type, value=None, traceback=None)() is used to raise an exception inside the generator; the exception is raised by the yield expression where the generator’s execution is paused.
close() raises a new GeneratorExit exception inside the generator to terminate the iteration. On receiving this exception, the generator’s code must either raise GeneratorExit or StopIteration. Catching the GeneratorExit exception and returning a value is illegal and will trigger a RuntimeError; if the function raises some other exception, that exception is propagated to the caller. close() will also be called by Python’s garbage collector when the generator is garbage-collected.

If you need to run cleanup code when a GeneratorExit occurs, I suggest using a try: ... finally: suite instead of catching GeneratorExit.

The cumulative effect of these changes is to turn generators from one-way producers of information into both producers and consumers.

Generators also become coroutines, a more generalized form of subroutines. Subroutines are entered at one point and exited at another point (the top of the function, and a return statement), but coroutines can be entered, exited, and resumed at many different points (the yield statements). We’ll have to figure out patterns for using coroutines effectively in Python.

The addition of the close() method has one side effect that isn’t obvious. close() is called when a generator is garbage-collected, so this means the generator’s code gets one last chance to run before the generator is destroyed. This last chance means that try...finally statements in generators can now be guaranteed to work; the finally clause will now always get a chance to run. The syntactic restriction that you couldn’t mix yield statements with a try...finally suite has therefore been removed. This seems like a minor bit of language trivia, but using generators and try...finally is actually necessary in order to implement the with statement described by PEP 343. I’ll look at this new statement in the following section.

Another even more esoteric effect of this change: previously, the gi_frame attribute of a generator was always a frame object. It’s now possible for gi_frame to be None once the generator has been exhausted.

See also

PEP 342 - Coroutines via Enhanced Generators

PEP written by Guido van Rossum and Phillip J. Eby; implemented by Phillip J. Eby. Includes examples of some fancier uses of generators as coroutines.

Earlier versions of these features were proposed in PEP 288 by Raymond Hettinger and PEP 325 by Samuele Pedroni.

http://en.wikipedia.org/wiki/Coroutine

The Wikipedia entry for coroutines.

http://www.sidhe.org/~dan/blog/archives/000178.html

An explanation of coroutines from a Perl point of view, written by Dan Sugalski.

PEP 343: The ‘with’ statement¶

The ‘with‘ statement is a new control-flow structure whose basic structure is:

with expression [as variable]:
    with-block

The expression is evaluated, and it should result in an object that supports the context management protocol (that is, has __enter__() and __exit__() methods.

After execution of the with-block is finished, the object’s __exit__() method is called, even if the block raised an exception, and can therefore run clean-up code.

To enable the statement in Python 2.5, you need to add the following directive to your module:

from __future__ import with_statement

The statement will always be enabled in Python 2.6.

Some standard Python objects now support the context management protocol and can be used with the ‘with‘ statement. File objects are one example:

with open('/etc/passwd', 'r') as f:
    for line in f:
        print line
        ... more processing code ...

After this statement has executed, the file object in f will have been automatically closed, even if the for loop raised an exception part- way through the block.

Note

In this case, f is the same object created by open(), because file.__enter__() returns self.

The threading module’s locks and condition variables also support the ‘with‘ statement:

lock = threading.Lock()
with lock:
    # Critical section of code
    ...

The lock is acquired before the block is executed and always released once the block is complete.

The new localcontext() function in the decimal module makes it easy to save and restore the current decimal context, which encapsulates the desired precision and rounding characteristics for computations:

from decimal import Decimal, Context, localcontext

# Displays with default precision of 28 digits
v = Decimal('578')
print v.sqrt()

with localcontext(Context(prec=16)):
    # All code in this block uses a precision of 16 digits.
    # The original context is restored on exiting the block.
    print v.sqrt()

Writing Context Managers¶

A high-level explanation of the context management protocol is:

The expression is evaluated and should result in an object called a “context manager”. The context manager must have __enter__() and __exit__() methods.
The context manager’s __enter__() method is called. The value returned is assigned to VAR. If no 'as VAR' clause is present, the value is simply discarded.
The code in BLOCK is executed.
If BLOCK raises an exception, the __exit__(type, value, traceback)() is called with the exception details, the same values returned by sys.exc_info(). The method’s return value controls whether the exception is re-raised: any false value re-raises the exception, and True will result in suppressing it. You’ll only rarely want to suppress the exception, because if you do the author of the code containing the ‘with‘ statement will never realize anything went wrong.
If BLOCK didn’t raise an exception, the __exit__() method is still called, but type, value, and traceback are all None.

Let’s think through an example. I won’t present detailed code but will only sketch the methods necessary for a database that supports transactions.

Let’s assume there’s an object representing a database connection. Our goal will be to let the user write code like this:

db_connection = DatabaseConnection()
with db_connection as cursor:
    cursor.execute('insert into ...')
    cursor.execute('delete from ...')
    # ... more operations ...

The transaction should be committed if the code in the block runs flawlessly or rolled back if there’s an exception. Here’s the basic interface for DatabaseConnection that I’ll assume:

class DatabaseConnection:
    # Database interface
    def cursor (self):
        "Returns a cursor object and starts a new transaction"
    def commit (self):
        "Commits current transaction"
    def rollback (self):
        "Rolls back current transaction"

class DatabaseConnection:
    ...
    def __enter__ (self):
        # Code to start a new transaction
        cursor = self.cursor()
        return cursor

class DatabaseConnection:
    ...
    def __exit__ (self, type, value, tb):
        if tb is None:
            # No exception, so commit
            self.commit()
        else:
            # Exception occurred, so rollback.
            self.rollback()
            # return False

The contextlib module¶

The new contextlib module provides some functions and a decorator that are useful for writing objects for use with the ‘with‘ statement.

Our database example from the previous section could be written using this decorator as:

from contextlib import contextmanager

@contextmanager
def db_transaction (connection):
    cursor = connection.cursor()
    try:
        yield cursor
    except:
        connection.rollback()
        raise
    else:
        connection.commit()

db = DatabaseConnection()
with db_transaction(db) as cursor:
    ...

The contextlib module also has a nested(mgr1, mgr2, ...)() function that combines a number of context managers so you don’t need to write nested ‘with‘ statements. In this example, the single ‘with‘ statement both starts a database transaction and acquires a thread lock:

lock = threading.Lock()
with nested (db_transaction(db), lock) as (cursor, locked):
    ...

Finally, the closing(object)() function returns object so that it can be bound to a variable, and calls object.close at the end of the block.

import urllib, sys
from contextlib import closing

with closing(urllib.urlopen('http://www.yahoo.com')) as f:
    for line in f:
        sys.stdout.write(line)

See also

PEP 343 - The “with” statement: PEP written by Guido van Rossum and Nick Coghlan; implemented by Mike Bland, Guido van Rossum, and Neal Norwitz. The PEP shows the code generated for a ‘with‘ statement, which can be helpful in learning how the statement works.

The documentation for the contextlib module.

PEP 352: Exceptions as New-Style Classes¶

Exception classes can now be new-style classes, not just classic classes, and the built-in Exception class and all the standard built-in exceptions (NameError, ValueError, etc.) are now new-style classes.

The inheritance hierarchy for exceptions has been rearranged a bit. In 2.5, the inheritance relationships are:

BaseException       # New in Python 2.5
|- KeyboardInterrupt
|- SystemExit
|- Exception
   |- (all other current built-in exceptions)

This rearrangement was done because people often want to catch all exceptions that indicate program errors. KeyboardInterrupt and SystemExit aren’t errors, though, and usually represent an explicit action such as the user hitting Control-C or code calling sys.exit(). A bare except: will catch all exceptions, so you commonly need to list KeyboardInterrupt and SystemExit in order to re-raise them. The usual pattern is:

try:
    ...
except (KeyboardInterrupt, SystemExit):
    raise
except:
    # Log error...
    # Continue running program...

In Python 2.5, you can now write except Exception to achieve the same result, catching all the exceptions that usually indicate errors but leaving KeyboardInterrupt and SystemExit alone. As in previous versions, a bare except: still catches all exceptions.

The goal for Python 3.0 is to require any class raised as an exception to derive from BaseException or some descendant of BaseException, and future releases in the Python 2.x series may begin to enforce this constraint. Therefore, I suggest you begin making all your exception classes derive from Exception now. It’s been suggested that the bare except: form should be removed in Python 3.0, but Guido van Rossum hasn’t decided whether to do this or not.

Raising of strings as exceptions, as in the statement raise "Error occurred", is deprecated in Python 2.5 and will trigger a warning. The aim is to be able to remove the string-exception feature in a few releases.

See also

PEP 352 - Required Superclass for Exceptions: PEP written by Brett Cannon and Guido van Rossum; implemented by Brett Cannon.

PEP 353: Using ssize_t as the index type¶

A wide-ranging change to Python’s C API, using a new Py_ssize_t type definition instead of int, will permit the interpreter to handle more data on 64-bit platforms. This change doesn’t affect Python’s capacity on 32-bit platforms.

Various pieces of the Python interpreter used C’s int type to store sizes or counts; for example, the number of items in a list or tuple were stored in an int. The C compilers for most 64-bit platforms still define int as a 32-bit type, so that meant that lists could only hold up to 2**31 - 1 = 2147483647 items. (There are actually a few different programming models that 64-bit C compilers can use – see http://www.unix.org/version2/whatsnew/lp64_wp.html for a discussion – but the most commonly available model leaves int as 32 bits.)

A limit of 2147483647 items doesn’t really matter on a 32-bit platform because you’ll run out of memory before hitting the length limit. Each list item requires space for a pointer, which is 4 bytes, plus space for a PyObject representing the item. 2147483647*4 is already more bytes than a 32-bit address space can contain.

It’s possible to address that much memory on a 64-bit platform, however. The pointers for a list that size would only require 16 GiB of space, so it’s not unreasonable that Python programmers might construct lists that large. Therefore, the Python interpreter had to be changed to use some type other than int, and this will be a 64-bit type on 64-bit platforms. The change will cause incompatibilities on 64-bit machines, so it was deemed worth making the transition now, while the number of 64-bit users is still relatively small. (In 5 or 10 years, we may all be on 64-bit machines, and the transition would be more painful then.)

This change most strongly affects authors of C extension modules. Python strings and container types such as lists and tuples now use Py_ssize_t to store their size. Functions such as PyList_Size() now return Py_ssize_t. Code in extension modules may therefore need to have some variables changed to Py_ssize_t.

The PyArg_ParseTuple() and Py_BuildValue() functions have a new conversion code, n, for Py_ssize_t. PyArg_ParseTuple()‘s s# and t# still output int by default, but you can define the macro PY_SSIZE_T_CLEAN before including Python.h to make them return Py_ssize_t.

PEP 353 has a section on conversion guidelines that extension authors should read to learn about supporting 64-bit platforms.

See also

PEP 353 - Using ssize_t as the index type: PEP written and implemented by Martin von Löwis.

PEP 357: The ‘index’ method¶

The NumPy developers had a problem that could only be solved by adding a new special method, __index__(). When using slice notation, as in [start:stop:step], the values of the start, stop, and step indexes must all be either integers or long integers. NumPy defines a variety of specialized integer types corresponding to unsigned and signed integers of 8, 16, 32, and 64 bits, but there was no way to signal that these types could be used as slice indexes.

Slicing can’t just use the existing __int__() method because that method is also used to implement coercion to integers. If slicing used __int__(), floating-point numbers would also become legal slice indexes and that’s clearly an undesirable behaviour.

Instead, a new special method called __index__() was added. It takes no arguments and returns an integer giving the slice index to use. For example:

class C:
    def __index__ (self):
        return self.value

The return value must be either a Python integer or long integer. The interpreter will check that the type returned is correct, and raises a TypeError if this requirement isn’t met.

A corresponding nb_index slot was added to the C-level PyNumberMethods structure to let C extensions implement this protocol. PyNumber_Index(obj)() can be used in extension code to call the __index__() function and retrieve its result.

See also

PEP 357 - Allowing Any Object to be Used for Slicing: PEP written and implemented by Travis Oliphant.

Other Language Changes¶

Here are all of the changes that Python 2.5 makes to the core Python language.

The dict type has a new hook for letting subclasses provide a default value when a key isn’t contained in the dictionary. When a key isn’t found, the dictionary’s __missing__(key)() method will be called. This hook is used to implement the new defaultdict class in the collections module. The following example defines a dictionary that returns zero for any missing key:
```
class zerodict (dict):
    def __missing__ (self, key):
        return 0

d = zerodict({1:1, 2:2})
print d[1], d[2]   # Prints 1, 2
print d[3], d[4]   # Prints 0, 0
```
Both 8-bit and Unicode strings have new partition(sep)() and rpartition(sep)() methods that simplify a common use case.

The find(S)() method is often used to get an index which is then used to slice the string and obtain the pieces that are before and after the separator. partition(sep)() condenses this pattern into a single method call that returns a 3-tuple containing the substring before the separator, the separator itself, and the substring after the separator. If the separator isn’t found, the first element of the tuple is the entire string and the other two elements are empty. rpartition(sep)() also returns a 3-tuple but starts searching from the end of the string; the r stands for ‘reverse’.

Some examples:
```
>>> ('http://www.python.org').partition('://')
('http', '://', 'www.python.org')
>>> ('file:/usr/share/doc/index.html').partition('://')
('file:/usr/share/doc/index.html', '', '')
>>> (u'Subject: a quick question').partition(':')
(u'Subject', u':', u' a quick question')
>>> 'www.python.org'.rpartition('.')
('www.python', '.', 'org')
>>> 'www.python.org'.rpartition(':')
('', '', 'www.python.org')
```
(Implemented by Fredrik Lundh following a suggestion by Raymond Hettinger.)
The startswith() and endswith() methods of string types now accept tuples of strings to check for.
```
def is_image_file (filename):
    return filename.endswith(('.gif', '.jpg', '.tiff'))
```
(Implemented by Georg Brandl following a suggestion by Tom Lynn.)
The min() and max() built-in functions gained a key keyword parameter analogous to the key argument for sort(). This parameter supplies a function that takes a single argument and is called for every value in the list; min()/max() will return the element with the smallest/largest return value from this function. For example, to find the longest string in a list, you can do:
```
L = ['medium', 'longest', 'short']
# Prints 'longest'
print max(L, key=len)
# Prints 'short', because lexicographically 'short' has the largest value
print max(L)
```
(Contributed by Steven Bethard and Raymond Hettinger.)
Two new built-in functions, any() and all(), evaluate whether an iterator contains any true or false values. any() returns True if any value returned by the iterator is true; otherwise it will return False. all() returns True only if all of the values returned by the iterator evaluate as true. (Suggested by Guido van Rossum, and implemented by Raymond Hettinger.)
The result of a class’s __hash__() method can now be either a long integer or a regular integer. If a long integer is returned, the hash of that value is taken. In earlier versions the hash value was required to be a regular integer, but in 2.5 the id() built-in was changed to always return non-negative numbers, and users often seem to use id(self) in __hash__() methods (though this is discouraged).
ASCII is now the default encoding for modules. It’s now a syntax error if a module contains string literals with 8-bit characters but doesn’t have an encoding declaration. In Python 2.4 this triggered a warning, not a syntax error. See PEP 263 for how to declare a module’s encoding; for example, you might add a line like this near the top of the source file:
```
# -*- coding: latin1 -*-
```
A new warning, UnicodeWarning, is triggered when you attempt to compare a Unicode string and an 8-bit string that can’t be converted to Unicode using the default ASCII encoding. The result of the comparison is false:
```
>>> chr(128) == unichr(128)   # Can't convert chr(128) to Unicode
__main__:1: UnicodeWarning: Unicode equal comparison failed
  to convert both arguments to Unicode - interpreting them
  as being unequal
False
>>> chr(127) == unichr(127)   # chr(127) can be converted
True
```
Previously this would raise a UnicodeDecodeError exception, but in 2.5 this could result in puzzling problems when accessing a dictionary. If you looked up unichr(128) and chr(128) was being used as a key, you’d get a UnicodeDecodeError exception. Other changes in 2.5 resulted in this exception being raised instead of suppressed by the code in dictobject.c that implements dictionaries.

Raising an exception for such a comparison is strictly correct, but the change might have broken code, so instead UnicodeWarning was introduced.

(Implemented by Marc-André Lemburg.)
One error that Python programmers sometimes make is forgetting to include an __init__.py module in a package directory. Debugging this mistake can be confusing, and usually requires running Python with the -v switch to log all the paths searched. In Python 2.5, a new ImportWarning warning is triggered when an import would have picked up a directory as a package but no __init__.py was found. This warning is silently ignored by default; provide the -Wd option when running the Python executable to display the warning message. (Implemented by Thomas Wouters.)
The list of base classes in a class definition can now be empty. As an example, this is now legal:
```
class C():
    pass
```
(Implemented by Brett Cannon.)

Interactive Interpreter Changes¶

In the interactive interpreter, quit and exit have long been strings so that new users get a somewhat helpful message when they try to quit:

>>> quit
'Use Ctrl-D (i.e. EOF) to exit.'

In Python 2.5, quit and exit are now objects that still produce string representations of themselves, but are also callable. Newbies who try quit() or exit() will now exit the interpreter as they expect. (Implemented by Georg Brandl.)

The Python executable now accepts the standard long options --help and --version; on Windows, it also accepts the /? option for displaying a help message. (Implemented by Georg Brandl.)

Optimizations¶

Several of the optimizations were developed at the NeedForSpeed sprint, an event held in Reykjavik, Iceland, from May 21–28 2006. The sprint focused on speed enhancements to the CPython implementation and was funded by EWT LLC with local support from CCP Games. Those optimizations added at this sprint are specially marked in the following list.

When they were introduced in Python 2.4, the built-in set and frozenset types were built on top of Python’s dictionary type. In 2.5 the internal data structure has been customized for implementing sets, and as a result sets will use a third less memory and are somewhat faster. (Implemented by Raymond Hettinger.)
The speed of some Unicode operations, such as finding substrings, string splitting, and character map encoding and decoding, has been improved. (Substring search and splitting improvements were added by Fredrik Lundh and Andrew Dalke at the NeedForSpeed sprint. Character maps were improved by Walter Dörwald and Martin von Löwis.)
The long(str, base)() function is now faster on long digit strings because fewer intermediate results are calculated. The peak is for strings of around 800–1000 digits where the function is 6 times faster. (Contributed by Alan McIntyre and committed at the NeedForSpeed sprint.)
It’s now illegal to mix iterating over a file with for line in file and calling the file object’s read()/readline()/readlines() methods. Iteration uses an internal buffer and the read*() methods don’t use that buffer. Instead they would return the data following the buffer, causing the data to appear out of order. Mixing iteration and these methods will now trigger a ValueError from the read*() method. (Implemented by Thomas Wouters.)
The struct module now compiles structure format strings into an internal representation and caches this representation, yielding a 20% speedup. (Contributed by Bob Ippolito at the NeedForSpeed sprint.)
The re module got a 1 or 2% speedup by switching to Python’s allocator functions instead of the system’s malloc() and free(). (Contributed by Jack Diederich at the NeedForSpeed sprint.)
The code generator’s peephole optimizer now performs simple constant folding in expressions. If you write something like a = 2+3, the code generator will do the arithmetic and produce code corresponding to a = 5. (Proposed and implemented by Raymond Hettinger.)
Function calls are now faster because code objects now keep the most recently finished frame (a “zombie frame”) in an internal field of the code object, reusing it the next time the code object is invoked. (Original patch by Michael Hudson, modified by Armin Rigo and Richard Jones; committed at the NeedForSpeed sprint.) Frame objects are also slightly smaller, which may improve cache locality and reduce memory usage a bit. (Contributed by Neal Norwitz.)
Python’s built-in exceptions are now new-style classes, a change that speeds up instantiation considerably. Exception handling in Python 2.5 is therefore about 30% faster than in 2.4. (Contributed by Richard Jones, Georg Brandl and Sean Reifschneider at the NeedForSpeed sprint.)
Importing now caches the paths tried, recording whether they exist or not so that the interpreter makes fewer open() and stat() calls on startup. (Contributed by Martin von Löwis and Georg Brandl.)

New, Improved, and Removed Modules¶

The standard library received many enhancements and bug fixes in Python 2.5. Here’s a partial list of the most notable changes, sorted alphabetically by module name. Consult the Misc/NEWS file in the source tree for a more complete list of changes, or look through the SVN logs for all the details.

The audioop module now supports the a-LAW encoding, and the code for u-LAW encoding has been improved. (Contributed by Lars Immisch.)
The codecs module gained support for incremental codecs. The codec.lookup() function now returns a CodecInfo instance instead of a tuple. CodecInfo instances behave like a 4-tuple to preserve backward compatibility but also have the attributes encode, decode, incrementalencoder, incrementaldecoder, streamwriter, and streamreader. Incremental codecs can receive input and produce output in multiple chunks; the output is the same as if the entire input was fed to the non-incremental codec. See the codecs module documentation for details. (Designed and implemented by Walter Dörwald.)
The collections module gained a new type, defaultdict, that subclasses the standard dict type. The new type mostly behaves like a dictionary but constructs a default value when a key isn’t present, automatically adding it to the dictionary for the requested key value.

The first argument to defaultdict‘s constructor is a factory function that gets called whenever a key is requested but not found. This factory function receives no arguments, so you can use built-in type constructors such as list() or int(). For example, you can make an index of words based on their initial letter like this:
```
words = """Nel mezzo del cammin di nostra vita
mi ritrovai per una selva oscura
che la diritta via era smarrita""".lower().split()

index = defaultdict(list)

for w in words:
    init_letter = w[0]
    index[init_letter].append(w)
```
Printing index results in the following output:
```
defaultdict(<type 'list'>, {'c': ['cammin', 'che'], 'e': ['era'],
        'd': ['del', 'di', 'diritta'], 'm': ['mezzo', 'mi'],
        'l': ['la'], 'o': ['oscura'], 'n': ['nel', 'nostra'],
        'p': ['per'], 's': ['selva', 'smarrita'],
        'r': ['ritrovai'], 'u': ['una'], 'v': ['vita', 'via']}
```
(Contributed by Guido van Rossum.)
The deque double-ended queue type supplied by the collections module now has a remove(value)() method that removes the first occurrence of value in the queue, raising ValueError if the value isn’t found. (Contributed by Raymond Hettinger.)
New module: The contextlib module contains helper functions for use with the new ‘with‘ statement. See section The contextlib module for more about this module.
New module: The cProfile module is a C implementation of the existing profile module that has much lower overhead. The module’s interface is the same as profile: you run cProfile.run('main()') to profile a function, can save profile data to a file, etc. It’s not yet known if the Hotshot profiler, which is also written in C but doesn’t match the profile module’s interface, will continue to be maintained in future versions of Python. (Contributed by Armin Rigo.)

Also, the pstats module for analyzing the data measured by the profiler now supports directing the output to any file object by supplying a stream argument to the Stats constructor. (Contributed by Skip Montanaro.)
The csv module, which parses files in comma-separated value format, received several enhancements and a number of bugfixes. You can now set the maximum size in bytes of a field by calling the csv.field_size_limit(new_limit)() function; omitting the new_limit argument will return the currently-set limit. The reader class now has a line_num attribute that counts the number of physical lines read from the source; records can span multiple physical lines, so line_num is not the same as the number of records read.

The CSV parser is now stricter about multi-line quoted fields. Previously, if a line ended within a quoted field without a terminating newline character, a newline would be inserted into the returned field. This behavior caused problems when reading files that contained carriage return characters within fields, so the code was changed to return the field without inserting newlines. As a consequence, if newlines embedded within fields are important, the input should be split into lines in a manner that preserves the newline characters.

(Contributed by Skip Montanaro and Andrew McNamara.)
The datetime class in the datetime module now has a strptime(string, format)() method for parsing date strings, contributed by Josh Spoerri. It uses the same format characters as time.strptime() and time.strftime():
```
from datetime import datetime

ts = datetime.strptime('10:13:15 2006-03-07',
                       '%H:%M:%S %Y-%m-%d')
```
The SequenceMatcher.get_matching_blocks() method in the difflib module now guarantees to return a minimal list of blocks describing matching subsequences. Previously, the algorithm would occasionally break a block of matching elements into two list entries. (Enhancement by Tim Peters.)
The doctest module gained a SKIP option that keeps an example from being executed at all. This is intended for code snippets that are usage examples intended for the reader and aren’t actually test cases.

An encoding parameter was added to the testfile() function and the DocFileSuite class to specify the file’s encoding. This makes it easier to use non-ASCII characters in tests contained within a docstring. (Contributed by Bjorn Tillenius.)
The email package has been updated to version 4.0. (Contributed by Barry Warsaw.)
The fileinput module was made more flexible. Unicode filenames are now supported, and a mode parameter that defaults to "r" was added to the input() function to allow opening files in binary or universal-newline mode. Another new parameter, openhook, lets you use a function other than open() to open the input files. Once you’re iterating over the set of files, the FileInput object’s new fileno() returns the file descriptor for the currently opened file. (Contributed by Georg Brandl.)
In the gc module, the new get_count() function returns a 3-tuple containing the current collection counts for the three GC generations. This is accounting information for the garbage collector; when these counts reach a specified threshold, a garbage collection sweep will be made. The existing gc.collect() function now takes an optional generation argument of 0, 1, or 2 to specify which generation to collect. (Contributed by Barry Warsaw.)

The nsmallest() and nlargest() functions in the heapq module now support a key keyword parameter similar to the one provided by the min()/max() functions and the sort() methods. For example:

>>> import heapq
>>> L = ["short", 'medium', 'longest', 'longer still']
>>> heapq.nsmallest(2, L)  # Return two lowest elements, lexicographically
['longer still', 'longest']
>>> heapq.nsmallest(2, L, key=len)   # Return two shortest elements
['short', 'medium']

(Contributed by Raymond Hettinger.)

The itertools.islice() function now accepts None for the start and step arguments. This makes it more compatible with the attributes of slice objects, so that you can now write the following:
```
s = slice(5)     # Create slice object
itertools.islice(iterable, s.start, s.stop, s.step)
```
(Contributed by Raymond Hettinger.)
The format() function in the locale module has been modified and two new functions were added, format_string() and currency().

The format() function’s val parameter could previously be a string as long as no more than one %char specifier appeared; now the parameter must be exactly one %char specifier with no surrounding text. An optional monetary parameter was also added which, if True, will use the locale’s rules for formatting currency in placing a separator between groups of three digits.

To format strings with multiple %char specifiers, use the new format_string() function that works like format() but also supports mixing %char specifiers with arbitrary text.

A new currency() function was also added that formats a number according to the current locale’s settings.

(Contributed by Georg Brandl.)
The mailbox module underwent a massive rewrite to add the capability to modify mailboxes in addition to reading them. A new set of classes that include mbox, MH, and Maildir are used to read mailboxes, and have an add(message)() method to add messages, remove(key)() to remove messages, and lock()/unlock() to lock/unlock the mailbox. The following example converts a maildir-format mailbox into an mbox-format one:
```
import mailbox

# 'factory=None' uses email.Message.Message as the class representing
# individual messages.
src = mailbox.Maildir('maildir', factory=None)
dest = mailbox.mbox('/tmp/mbox')

for msg in src:
    dest.add(msg)
```
(Contributed by Gregory K. Johnson. Funding was provided by Google’s 2005 Summer of Code.)
New module: the msilib module allows creating Microsoft Installer .msi files and CAB files. Some support for reading the .msi database is also included. (Contributed by Martin von Löwis.)
The nis module now supports accessing domains other than the system default domain by supplying a domain argument to the nis.match() and nis.maps() functions. (Contributed by Ben Bell.)
The operator module’s itemgetter() and attrgetter() functions now support multiple fields. A call such as operator.attrgetter('a', 'b') will return a function that retrieves the a and b attributes. Combining this new feature with the sort() method’s key parameter lets you easily sort lists using multiple fields. (Contributed by Raymond Hettinger.)
The optparse module was updated to version 1.5.1 of the Optik library. The OptionParser class gained an epilog attribute, a string that will be printed after the help message, and a destroy() method to break reference cycles created by the object. (Contributed by Greg Ward.)
The os module underwent several changes. The stat_float_times variable now defaults to true, meaning that os.stat() will now return time values as floats. (This doesn’t necessarily mean that os.stat() will return times that are precise to fractions of a second; not all systems support such precision.)

Constants named os.SEEK_SET, os.SEEK_CUR, and os.SEEK_END have been added; these are the parameters to the os.lseek() function. Two new constants for locking are os.O_SHLOCK and os.O_EXLOCK.

Two new functions, wait3() and wait4(), were added. They’re similar the waitpid() function which waits for a child process to exit and returns a tuple of the process ID and its exit status, but wait3() and wait4() return additional information. wait3() doesn’t take a process ID as input, so it waits for any child process to exit and returns a 3-tuple of process-id, exit-status, resource-usage as returned from the resource.getrusage() function. wait4(pid)() does take a process ID. (Contributed by Chad J. Schroeder.)

On FreeBSD, the os.stat() function now returns times with nanosecond resolution, and the returned object now has st_gen and st_birthtime. The st_flags attribute is also available, if the platform supports it. (Contributed by Antti Louko and Diego Pettenò.)
The Python debugger provided by the pdb module can now store lists of commands to execute when a breakpoint is reached and execution stops. Once breakpoint #1 has been created, enter commands 1 and enter a series of commands to be executed, finishing the list with end. The command list can include commands that resume execution, such as continue or next. (Contributed by Grégoire Dooms.)
The pickle and cPickle modules no longer accept a return value of None from the __reduce__() method; the method must return a tuple of arguments instead. The ability to return None was deprecated in Python 2.4, so this completes the removal of the feature.
The pkgutil module, containing various utility functions for finding packages, was enhanced to support PEP 302’s import hooks and now also works for packages stored in ZIP-format archives. (Contributed by Phillip J. Eby.)
The pybench benchmark suite by Marc-André Lemburg is now included in the Tools/pybench directory. The pybench suite is an improvement on the commonly used pystone.py program because pybench provides a more detailed measurement of the interpreter’s speed. It times particular operations such as function calls, tuple slicing, method lookups, and numeric operations, instead of performing many different operations and reducing the result to a single number as pystone.py does.
The pyexpat module now uses version 2.0 of the Expat parser. (Contributed by Trent Mick.)
The Queue class provided by the Queue module gained two new methods. join() blocks until all items in the queue have been retrieved and all processing work on the items have been completed. Worker threads call the other new method, task_done(), to signal that processing for an item has been completed. (Contributed by Raymond Hettinger.)
The old regex and regsub modules, which have been deprecated ever since Python 2.0, have finally been deleted. Other deleted modules: statcache, tzparse, whrandom.
Also deleted: the lib-old directory, which includes ancient modules such as dircmp and ni, was removed. lib-old wasn’t on the default sys.path, so unless your programs explicitly added the directory to sys.path, this removal shouldn’t affect your code.
The rlcompleter module is no longer dependent on importing the readline module and therefore now works on non-Unix platforms. (Patch from Robert Kiendl.)
The SimpleXMLRPCServer and DocXMLRPCServer classes now have a rpc_paths attribute that constrains XML-RPC operations to a limited set of URL paths; the default is to allow only '/' and '/RPC2'. Setting rpc_paths to None or an empty tuple disables this path checking.
The socket module now supports AF_NETLINK sockets on Linux, thanks to a patch from Philippe Biondi. Netlink sockets are a Linux-specific mechanism for communications between a user-space process and kernel code; an introductory article about them is at http://www.linuxjournal.com/article/7356. In Python code, netlink addresses are represented as a tuple of 2 integers, (pid, group_mask).

Two new methods on socket objects, recv_into(buffer)() and recvfrom_into(buffer)(), store the received data in an object that supports the buffer protocol instead of returning the data as a string. This means you can put the data directly into an array or a memory-mapped file.

Socket objects also gained getfamily(), gettype(), and getproto() accessor methods to retrieve the family, type, and protocol values for the socket.
New module: the spwd module provides functions for accessing the shadow password database on systems that support shadow passwords.
The struct is now faster because it compiles format strings into Struct objects with pack() and unpack() methods. This is similar to how the re module lets you create compiled regular expression objects. You can still use the module-level pack() and unpack() functions; they’ll create Struct objects and cache them. Or you can use Struct instances directly:
```
s = struct.Struct('ih3s')

data = s.pack(1972, 187, 'abc')
year, number, name = s.unpack(data)
```
You can also pack and unpack data to and from buffer objects directly using the pack_into(buffer, offset, v1, v2, ...)() and unpack_from(buffer, offset)() methods. This lets you store data directly into an array or a memory- mapped file.

(Struct objects were implemented by Bob Ippolito at the NeedForSpeed sprint. Support for buffer objects was added by Martin Blais, also at the NeedForSpeed sprint.)
The Python developers switched from CVS to Subversion during the 2.5 development process. Information about the exact build version is available as the sys.subversion variable, a 3-tuple of (interpreter-name, branch-name, revision-range). For example, at the time of writing my copy of 2.5 was reporting ('CPython', 'trunk', '45313:45315').

This information is also available to C extensions via the Py_GetBuildInfo() function that returns a string of build information like this: "trunk:45355:45356M, Apr 13 2006, 07:42:19". (Contributed by Barry Warsaw.)
Another new function, sys._current_frames(), returns the current stack frames for all running threads as a dictionary mapping thread identifiers to the topmost stack frame currently active in that thread at the time the function is called. (Contributed by Tim Peters.)
The TarFile class in the tarfile module now has an extractall() method that extracts all members from the archive into the current working directory. It’s also possible to set a different directory as the extraction target, and to unpack only a subset of the archive’s members.

The compression used for a tarfile opened in stream mode can now be autodetected using the mode 'r|*'. (Contributed by Lars Gustäbel.)
The threading module now lets you set the stack size used when new threads are created. The stack_size([*size*])() function returns the currently configured stack size, and supplying the optional size parameter sets a new value. Not all platforms support changing the stack size, but Windows, POSIX threading, and OS/2 all do. (Contributed by Andrew MacIntyre.)
The unicodedata module has been updated to use version 4.1.0 of the Unicode character database. Version 3.2.0 is required by some specifications, so it’s still available as unicodedata.ucd_3_2_0.

New module: the uuid module generates universally unique identifiers (UUIDs) according to RFC 4122. The RFC defines several different UUID versions that are generated from a starting string, from system properties, or purely randomly. This module contains a UUID class and functions named uuid1(), uuid3(), uuid4(), and uuid5() to generate different versions of UUID. (Version 2 UUIDs are not specified in RFC 4122 and are not supported by this module.)

>>> import uuid
>>> # make a UUID based on the host ID and current time
>>> uuid.uuid1()
UUID('a8098c1a-f86e-11da-bd1a-00112444be1e')

>>> # make a UUID using an MD5 hash of a namespace UUID and a name
>>> uuid.uuid3(uuid.NAMESPACE_DNS, 'python.org')
UUID('6fa459ea-ee8a-3ca4-894e-db77e160355e')

>>> # make a random UUID
>>> uuid.uuid4()
UUID('16fd2706-8baf-433b-82eb-8c7fada847da')

>>> # make a UUID using a SHA-1 hash of a namespace UUID and a name
>>> uuid.uuid5(uuid.NAMESPACE_DNS, 'python.org')
UUID('886313e1-3b8a-5372-9b90-0c9aee199e5d')

(Contributed by Ka-Ping Yee.)

The weakref module’s WeakKeyDictionary and WeakValueDictionary types gained new methods for iterating over the weak references contained in the dictionary. iterkeyrefs() and keyrefs() methods were added to WeakKeyDictionary, and itervaluerefs() and valuerefs() were added to WeakValueDictionary. (Contributed by Fred L. Drake, Jr.)
The webbrowser module received a number of enhancements. It’s now usable as a script with python -m webbrowser, taking a URL as the argument; there are a number of switches to control the behaviour (-n for a new browser window, -t for a new tab). New module-level functions, open_new() and open_new_tab(), were added to support this. The module’s open() function supports an additional feature, an autoraise parameter that signals whether to raise the open window when possible. A number of additional browsers were added to the supported list such as Firefox, Opera, Konqueror, and elinks. (Contributed by Oleg Broytmann and Georg Brandl.)
The xmlrpclib module now supports returning datetime objects for the XML-RPC date type. Supply use_datetime=True to the loads() function or the Unmarshaller class to enable this feature. (Contributed by Skip Montanaro.)
The zipfile module now supports the ZIP64 version of the format, meaning that a .zip archive can now be larger than 4 GiB and can contain individual files larger than 4 GiB. (Contributed by Ronald Oussoren.)
The zlib module’s Compress and Decompress objects now support a copy() method that makes a copy of the object’s internal state and returns a new Compress or Decompress object. (Contributed by Chris AtLee.)

The ctypes package¶

The ctypes package, written by Thomas Heller, has been added to the standard library. ctypes lets you call arbitrary functions in shared libraries or DLLs. Long-time users may remember the dl module, which provides functions for loading shared libraries and calling functions in them. The ctypes package is much fancier.

To load a shared library or DLL, you must create an instance of the CDLL class and provide the name or path of the shared library or DLL. Once that’s done, you can call arbitrary functions by accessing them as attributes of the CDLL object.

import ctypes

libc = ctypes.CDLL('libc.so.6')
result = libc.printf("Line of output\n")

Type constructors for the various C types are provided: c_int(), c_float(), c_double(), c_char_p() (equivalent to char *), and so forth. Unlike Python’s types, the C versions are all mutable; you can assign to their value attribute to change the wrapped value. Python integers and strings will be automatically converted to the corresponding C types, but for other types you must call the correct type constructor. (And I mean must; getting it wrong will often result in the interpreter crashing with a segmentation fault.)

You shouldn’t use c_char_p() with a Python string when the C function will be modifying the memory area, because Python strings are supposed to be immutable; breaking this rule will cause puzzling bugs. When you need a modifiable memory area, use create_string_buffer():

s = "this is a string"
buf = ctypes.create_string_buffer(s)
libc.strfry(buf)

C functions are assumed to return integers, but you can set the restype attribute of the function object to change this:

>>> libc.atof('2.71828')
-1783957616
>>> libc.atof.restype = ctypes.c_double
>>> libc.atof('2.71828')
2.71828

ctypes also provides a wrapper for Python’s C API as the ctypes.pythonapi object. This object does not release the global interpreter lock before calling a function, because the lock must be held when calling into the interpreter’s code. There’s a py_object() type constructor that will create a PyObject * pointer. A simple usage:

import ctypes

d = {}
ctypes.pythonapi.PyObject_SetItem(ctypes.py_object(d),
          ctypes.py_object("abc"),  ctypes.py_object(1))
# d is now {'abc', 1}.

Don’t forget to use py_object(); if it’s omitted you end up with a segmentation fault.

ctypes has been around for a while, but people still write and distribution hand-coded extension modules because you can’t rely on ctypes being present. Perhaps developers will begin to write Python wrappers atop a library accessed through ctypes instead of extension modules, now that ctypes is included with core Python.

See also

http://starship.python.net/crew/theller/ctypes/: The ctypes web page, with a tutorial, reference, and FAQ.

The documentation for the ctypes module.

The ElementTree package¶

A subset of Fredrik Lundh’s ElementTree library for processing XML has been added to the standard library as xml.etree. The available modules are ElementTree, ElementPath, and ElementInclude from ElementTree 1.2.6. The cElementTree accelerator module is also included.

The rest of this section will provide a brief overview of using ElementTree. Full documentation for ElementTree is available at http://effbot.org/zone/element-index.htm.

ElementTree represents an XML document as a tree of element nodes. The text content of the document is stored as the text and tail attributes of (This is one of the major differences between ElementTree and the Document Object Model; in the DOM there are many different types of node, including TextNode.)

The most commonly used parsing function is parse(), that takes either a string (assumed to contain a filename) or a file-like object and returns an ElementTree instance:

from xml.etree import ElementTree as ET

tree = ET.parse('ex-1.xml')

feed = urllib.urlopen(
          'http://planet.python.org/rss10.xml')
tree = ET.parse(feed)

Once you have an ElementTree instance, you can call its getroot() method to get the root Element node.

There’s also an XML() function that takes a string literal and returns an Element node (not an ElementTree). This function provides a tidy way to incorporate XML fragments, approaching the convenience of an XML literal:

svg = ET.XML("""<svg width="10px" version="1.0">
             </svg>""")
svg.set('height', '320px')
svg.append(elem1)

Each XML element supports some dictionary-like and some list-like access methods. Dictionary-like operations are used to access attribute values, and list-like operations are used to access child nodes.

Operation	Result
`elem[n]`	Returns n’th child element.
`elem[m:n]`	Returns list of m’th through n’th child elements.
`len(elem)`	Returns number of child elements.
`list(elem)`	Returns list of child elements.
`elem.append(elem2)`	Adds elem2 as a child.
`elem.insert(index, elem2)`	Inserts elem2 at the specified location.
`del elem[n]`	Deletes n’th child element.
`elem.keys()`	Returns list of attribute names.
`elem.get(name)`	Returns value of attribute name.
`elem.set(name, value)`	Sets new value for attribute name.
`elem.attrib`	Retrieves the dictionary containing attributes.
`del elem.attrib[name]`	Deletes attribute name.

Comments and processing instructions are also represented as Element nodes. To check if a node is a comment or processing instructions:

if elem.tag is ET.Comment:
    ...
elif elem.tag is ET.ProcessingInstruction:
    ...

To generate XML output, you should call the ElementTree.write() method. Like parse(), it can take either a string or a file-like object:

# Encoding is US-ASCII
tree.write('output.xml')

# Encoding is UTF-8
f = open('output.xml', 'w')
tree.write(f, encoding='utf-8')

(Caution: the default encoding used for output is ASCII. For general XML work, where an element’s name may contain arbitrary Unicode characters, ASCII isn’t a very useful encoding because it will raise an exception if an element’s name contains any characters with values greater than 127. Therefore, it’s best to specify a different encoding such as UTF-8 that can handle any Unicode character.)

This section is only a partial description of the ElementTree interfaces. Please read the package’s official documentation for more details.

See also

http://effbot.org/zone/element-index.htm: Official documentation for ElementTree.

The hashlib package¶

A new hashlib module, written by Gregory P. Smith, has been added to replace the md5 and sha modules. hashlib adds support for additional secure hashes (SHA-224, SHA-256, SHA-384, and SHA-512). When available, the module uses OpenSSL for fast platform optimized implementations of algorithms.

The old md5 and sha modules still exist as wrappers around hashlib to preserve backwards compatibility. The new module’s interface is very close to that of the old modules, but not identical. The most significant difference is that the constructor functions for creating new hashing objects are named differently.

# Old versions
h = md5.md5()
h = md5.new()

# New version
h = hashlib.md5()

# Old versions
h = sha.sha()
h = sha.new()

# New version
h = hashlib.sha1()

# Hash that weren't previously available
h = hashlib.sha224()
h = hashlib.sha256()
h = hashlib.sha384()
h = hashlib.sha512()

# Alternative form
h = hashlib.new('md5')          # Provide algorithm as a string

Once a hash object has been created, its methods are the same as before: update(string)() hashes the specified string into the current digest state, digest() and hexdigest() return the digest value as a binary string or a string of hex digits, and copy() returns a new hashing object with the same digest state.

See also

The documentation for the hashlib module.

The sqlite3 package¶

The pysqlite module (http://www.pysqlite.org), a wrapper for the SQLite embedded database, has been added to the standard library under the package name sqlite3.

SQLite is a C library that provides a lightweight disk-based database that doesn’t require a separate server process and allows accessing the database using a nonstandard variant of the SQL query language. Some applications can use SQLite for internal data storage. It’s also possible to prototype an application using SQLite and then port the code to a larger database such as PostgreSQL or Oracle.

pysqlite was written by Gerhard Häring and provides a SQL interface compliant with the DB-API 2.0 specification described by PEP 249.

If you’re compiling the Python source yourself, note that the source tree doesn’t include the SQLite code, only the wrapper module. You’ll need to have the SQLite libraries and headers installed before compiling Python, and the build process will compile the module when the necessary headers are available.

To use the module, you must first create a Connection object that represents the database. Here the data will be stored in the /tmp/example file:

conn = sqlite3.connect('/tmp/example')

You can also supply the special name :memory: to create a database in RAM.

Once you have a Connection, you can create a Cursor object and call its execute() method to perform SQL commands:

c = conn.cursor()

# Create table
c.execute('''create table stocks
(date text, trans text, symbol text,
 qty real, price real)''')

# Insert a row of data
c.execute("""insert into stocks
          values ('2006-01-05','BUY','RHAT',100,35.14)""")

Usually your SQL operations will need to use values from Python variables. You shouldn’t assemble your query using Python’s string operations because doing so is insecure; it makes your program vulnerable to an SQL injection attack.

Instead, use the DB-API’s parameter substitution. Put ? as a placeholder wherever you want to use a value, and then provide a tuple of values as the second argument to the cursor’s execute() method. (Other database modules may use a different placeholder, such as %s or :1.) For example:

# Never do this -- insecure!
symbol = 'IBM'
c.execute("... where symbol = '%s'" % symbol)

# Do this instead
t = (symbol,)
c.execute('select * from stocks where symbol=?', t)

# Larger example
for t in (('2006-03-28', 'BUY', 'IBM', 1000, 45.00),
          ('2006-04-05', 'BUY', 'MSOFT', 1000, 72.00),
          ('2006-04-06', 'SELL', 'IBM', 500, 53.00),
         ):
    c.execute('insert into stocks values (?,?,?,?,?)', t)

To retrieve data after executing a SELECT statement, you can either treat the cursor as an iterator, call the cursor’s fetchone() method to retrieve a single matching row, or call fetchall() to get a list of the matching rows.

This example uses the iterator form:

>>> c = conn.cursor()
>>> c.execute('select * from stocks order by price')
>>> for row in c:
...    print row
...
(u'2006-01-05', u'BUY', u'RHAT', 100, 35.140000000000001)
(u'2006-03-28', u'BUY', u'IBM', 1000, 45.0)
(u'2006-04-06', u'SELL', u'IBM', 500, 53.0)
(u'2006-04-05', u'BUY', u'MSOFT', 1000, 72.0)
>>>

For more information about the SQL dialect supported by SQLite, see http://www.sqlite.org.

See also

http://www.pysqlite.org: The pysqlite web page.
http://www.sqlite.org: The SQLite web page; the documentation describes the syntax and the available data types for the supported SQL dialect.

The documentation for the sqlite3 module.

PEP 249 - Database API Specification 2.0: PEP written by Marc-André Lemburg.

The wsgiref package¶

The Web Server Gateway Interface (WSGI) v1.0 defines a standard interface between web servers and Python web applications and is described in PEP 333. The wsgiref package is a reference implementation of the WSGI specification.

The package includes a basic HTTP server that will run a WSGI application; this server is useful for debugging but isn’t intended for production use. Setting up a server takes only a few lines of code:

from wsgiref import simple_server

wsgi_app = ...

host = ''
port = 8000
httpd = simple_server.make_server(host, port, wsgi_app)
httpd.serve_forever()

See also

http://www.wsgi.org: A central web site for WSGI-related resources.
PEP 333 - Python Web Server Gateway Interface v1.0: PEP written by Phillip J. Eby.

Build and C API Changes¶

Changes to Python’s build process and to the C API include:

The Python source tree was converted from CVS to Subversion, in a complex migration procedure that was supervised and flawlessly carried out by Martin von Löwis. The procedure was developed as PEP 347.
Coverity, a company that markets a source code analysis tool called Prevent, provided the results of their examination of the Python source code. The analysis found about 60 bugs that were quickly fixed. Many of the bugs were refcounting problems, often occurring in error-handling code. See http://scan.coverity.com for the statistics.
The largest change to the C API came from PEP 353, which modifies the interpreter to use a Py_ssize_t type definition instead of int. See the earlier section PEP 353: Using ssize_t as the index type for a discussion of this change.
The design of the bytecode compiler has changed a great deal, no longer generating bytecode by traversing the parse tree. Instead the parse tree is converted to an abstract syntax tree (or AST), and it is the abstract syntax tree that’s traversed to produce the bytecode.

It’s possible for Python code to obtain AST objects by using the compile() built-in and specifying _ast.PyCF_ONLY_AST as the value of the flags parameter:
```
from _ast import PyCF_ONLY_AST
ast = compile("""a=0
for i in range(10):
    a += i
""", "<string>", 'exec', PyCF_ONLY_AST)

assignment = ast.body[0]
for_loop = ast.body[1]
```
No official documentation has been written for the AST code yet, but PEP 339 discusses the design. To start learning about the code, read the definition of the various AST nodes in Parser/Python.asdl. A Python script reads this file and generates a set of C structure definitions in Include/Python-ast.h. The PyParser_ASTFromString() and PyParser_ASTFromFile(), defined in Include/pythonrun.h, take Python source as input and return the root of an AST representing the contents. This AST can then be turned into a code object by PyAST_Compile(). For more information, read the source code, and then ask questions on python-dev.

The AST code was developed under Jeremy Hylton’s management, and implemented by (in alphabetical order) Brett Cannon, Nick Coghlan, Grant Edwards, John Ehresman, Kurt Kaiser, Neal Norwitz, Tim Peters, Armin Rigo, and Neil Schemenauer, plus the participants in a number of AST sprints at conferences such as PyCon.
Evan Jones’s patch to obmalloc, first described in a talk at PyCon DC 2005, was applied. Python 2.4 allocated small objects in 256K-sized arenas, but never freed arenas. With this patch, Python will free arenas when they’re empty. The net effect is that on some platforms, when you allocate many objects, Python’s memory usage may actually drop when you delete them and the memory may be returned to the operating system. (Implemented by Evan Jones, and reworked by Tim Peters.)

Note that this change means extension modules must be more careful when allocating memory. Python’s API has many different functions for allocating memory that are grouped into families. For example, PyMem_Malloc(), PyMem_Realloc(), and PyMem_Free() are one family that allocates raw memory, while PyObject_Malloc(), PyObject_Realloc(), and PyObject_Free() are another family that’s supposed to be used for creating Python objects.

Previously these different families all reduced to the platform’s malloc() and free() functions. This meant it didn’t matter if you got things wrong and allocated memory with the PyMem() function but freed it with the PyObject() function. With 2.5’s changes to obmalloc, these families now do different things and mismatches will probably result in a segfault. You should carefully test your C extension modules with Python 2.5.
The built-in set types now have an official C API. Call PySet_New() and PyFrozenSet_New() to create a new set, PySet_Add() and PySet_Discard() to add and remove elements, and PySet_Contains() and PySet_Size() to examine the set’s state. (Contributed by Raymond Hettinger.)
C code can now obtain information about the exact revision of the Python interpreter by calling the Py_GetBuildInfo() function that returns a string of build information like this: "trunk:45355:45356M, Apr 13 2006, 07:42:19". (Contributed by Barry Warsaw.)
Two new macros can be used to indicate C functions that are local to the current file so that a faster calling convention can be used. Py_LOCAL(type)() declares the function as returning a value of the specified type and uses a fast-calling qualifier. Py_LOCAL_INLINE(type)() does the same thing and also requests the function be inlined. If PY_LOCAL_AGGRESSIVE() is defined before python.h is included, a set of more aggressive optimizations are enabled for the module; you should benchmark the results to find out if these optimizations actually make the code faster. (Contributed by Fredrik Lundh at the NeedForSpeed sprint.)
PyErr_NewException(name, base, dict)() can now accept a tuple of base classes as its base argument. (Contributed by Georg Brandl.)
The PyErr_Warn() function for issuing warnings is now deprecated in favour of PyErr_WarnEx(category, message, stacklevel)() which lets you specify the number of stack frames separating this function and the caller. A stacklevel of 1 is the function calling PyErr_WarnEx(), 2 is the function above that, and so forth. (Added by Neal Norwitz.)
The CPython interpreter is still written in C, but the code can now be compiled with a C++ compiler without errors. (Implemented by Anthony Baxter, Martin von Löwis, Skip Montanaro.)
The PyRange_New() function was removed. It was never documented, never used in the core code, and had dangerously lax error checking. In the unlikely case that your extensions were using it, you can replace it by something like the following:
```
range = PyObject_CallFunction((PyObject*) &PyRange_Type, "lll",
                              start, stop, step);
```

Port-Specific Changes¶

MacOS X (10.3 and higher): dynamic loading of modules now uses the dlopen() function instead of MacOS-specific functions.
MacOS X: an --enable-universalsdk switch was added to the configure script that compiles the interpreter as a universal binary able to run on both PowerPC and Intel processors. (Contributed by Ronald Oussoren; issue 2573.)
Windows: .dll is no longer supported as a filename extension for extension modules. .pyd is now the only filename extension that will be searched for.

Porting to Python 2.5¶

This section lists previously described changes that may require changes to your code:

ASCII is now the default encoding for modules. It’s now a syntax error if a module contains string literals with 8-bit characters but doesn’t have an encoding declaration. In Python 2.4 this triggered a warning, not a syntax error.
Previously, the gi_frame attribute of a generator was always a frame object. Because of the PEP 342 changes described in section PEP 342: New Generator Features, it’s now possible for gi_frame to be None.
A new warning, UnicodeWarning, is triggered when you attempt to compare a Unicode string and an 8-bit string that can’t be converted to Unicode using the default ASCII encoding. Previously such comparisons would raise a UnicodeDecodeError exception.
Library: the csv module is now stricter about multi-line quoted fields. If your files contain newlines embedded within fields, the input should be split into lines in a manner which preserves the newline characters.
Library: the locale module’s format() function’s would previously accept any string as long as no more than one %char specifier appeared. In Python 2.5, the argument must be exactly one %char specifier with no surrounding text.
Library: The pickle and cPickle modules no longer accept a return value of None from the __reduce__() method; the method must return a tuple of arguments instead. The modules also no longer accept the deprecated bin keyword parameter.
Library: The SimpleXMLRPCServer and DocXMLRPCServer classes now have a rpc_paths attribute that constrains XML-RPC operations to a limited set of URL paths; the default is to allow only '/' and '/RPC2'. Setting rpc_paths to None or an empty tuple disables this path checking.
C API: Many functions now use Py_ssize_t instead of int to allow processing more data on 64-bit machines. Extension code may need to make the same change to avoid warnings and to support 64-bit machines. See the earlier section PEP 353: Using ssize_t as the index type for a discussion of this change.
C API: The obmalloc changes mean that you must be careful to not mix usage of the PyMem_*() and PyObject_*() families of functions. Memory allocated with one family’s *_Malloc() must be freed with the corresponding family’s *_Free() function.

Acknowledgements¶

The author would like to thank the following people for offering suggestions, corrections and assistance with various drafts of this article: Georg Brandl, Nick Coghlan, Phillip J. Eby, Lars Gustäbel, Raymond Hettinger, Ralf W. Grosse- Kunstleve, Kent Johnson, Iain Lowe, Martin von Löwis, Fredrik Lundh, Andrew McNamara, Skip Montanaro, Gustavo Niemeyer, Paul Prescod, James Pryor, Mike Rovner, Scott Weikart, Barry Warsaw, Thomas Wouters.

What’s New in Python 2.4¶

Author:	A.M. Kuchling

This article explains the new features in Python 2.4.1, released on March 30, 2005.

Python 2.4 is a medium-sized release. It doesn’t introduce as many changes as the radical Python 2.2, but introduces more features than the conservative 2.3 release. The most significant new language features are function decorators and generator expressions; most other changes are to the standard library.

According to the CVS change logs, there were 481 patches applied and 502 bugs fixed between Python 2.3 and 2.4. Both figures are likely to be underestimates.

This article doesn’t attempt to provide a complete specification of every single new feature, but instead provides a brief introduction to each feature. For full details, you should refer to the documentation for Python 2.4, such as the Python Library Reference and the Python Reference Manual. Often you will be referred to the PEP for a particular new feature for explanations of the implementation and design rationale.

PEP 218: Built-In Set Objects¶

Python 2.3 introduced the sets module. C implementations of set data types have now been added to the Python core as two new built-in types, set(iterable)() and frozenset(iterable)(). They provide high speed operations for membership testing, for eliminating duplicates from sequences, and for mathematical operations like unions, intersections, differences, and symmetric differences.

>>> a = set('abracadabra')              # form a set from a string
>>> 'z' in a                            # fast membership testing
False
>>> a                                   # unique letters in a
set(['a', 'r', 'b', 'c', 'd'])
>>> ''.join(a)                          # convert back into a string
'arbcd'

>>> b = set('alacazam')                 # form a second set
>>> a - b                               # letters in a but not in b
set(['r', 'd', 'b'])
>>> a | b                               # letters in either a or b
set(['a', 'c', 'r', 'd', 'b', 'm', 'z', 'l'])
>>> a & b                               # letters in both a and b
set(['a', 'c'])
>>> a ^ b                               # letters in a or b but not both
set(['r', 'd', 'b', 'm', 'z', 'l'])

>>> a.add('z')                          # add a new element
>>> a.update('wxy')                     # add multiple new elements
>>> a
set(['a', 'c', 'b', 'd', 'r', 'w', 'y', 'x', 'z'])
>>> a.remove('x')                       # take one element out
>>> a
set(['a', 'c', 'b', 'd', 'r', 'w', 'y', 'z'])

The frozenset() type is an immutable version of set(). Since it is immutable and hashable, it may be used as a dictionary key or as a member of another set.

The sets module remains in the standard library, and may be useful if you wish to subclass the Set or ImmutableSet classes. There are currently no plans to deprecate the module.

See also

PEP 218 - Adding a Built-In Set Object Type: Originally proposed by Greg Wilson and ultimately implemented by Raymond Hettinger.

PEP 237: Unifying Long Integers and Integers¶

The lengthy transition process for this PEP, begun in Python 2.2, takes another step forward in Python 2.4. In 2.3, certain integer operations that would behave differently after int/long unification triggered FutureWarning warnings and returned values limited to 32 or 64 bits (depending on your platform). In 2.4, these expressions no longer produce a warning and instead produce a different result that’s usually a long integer.

The problematic expressions are primarily left shifts and lengthy hexadecimal and octal constants. For example, 2 << 32 results in a warning in 2.3, evaluating to 0 on 32-bit platforms. In Python 2.4, this expression now returns the correct answer, 8589934592.

See also

PEP 237 - Unifying Long Integers and Integers: Original PEP written by Moshe Zadka and GvR. The changes for 2.4 were implemented by Kalle Svensson.

PEP 289: Generator Expressions¶

The iterator feature introduced in Python 2.2 and the itertools module make it easier to write programs that loop through large data sets without having the entire data set in memory at one time. List comprehensions don’t fit into this picture very well because they produce a Python list object containing all of the items. This unavoidably pulls all of the objects into memory, which can be a problem if your data set is very large. When trying to write a functionally-styled program, it would be natural to write something like:

links = [link for link in get_all_links() if not link.followed]
for link in links:
    ...

instead of

for link in get_all_links():
    if link.followed:
        continue
    ...

The first form is more concise and perhaps more readable, but if you’re dealing with a large number of link objects you’d have to write the second form to avoid having all link objects in memory at the same time.

Generator expressions work similarly to list comprehensions but don’t materialize the entire list; instead they create a generator that will return elements one by one. The above example could be written as:

links = (link for link in get_all_links() if not link.followed)
for link in links:
    ...

Generator expressions always have to be written inside parentheses, as in the above example. The parentheses signalling a function call also count, so if you want to create an iterator that will be immediately passed to a function you could write:

print sum(obj.count for obj in list_all_objects())

Generator expressions differ from list comprehensions in various small ways. Most notably, the loop variable (obj in the above example) is not accessible outside of the generator expression. List comprehensions leave the variable assigned to its last value; future versions of Python will change this, making list comprehensions match generator expressions in this respect.

See also

PEP 289 - Generator Expressions: Proposed by Raymond Hettinger and implemented by Jiwon Seo with early efforts steered by Hye-Shik Chang.

PEP 292: Simpler String Substitutions¶

Some new classes in the standard library provide an alternative mechanism for substituting variables into strings; this style of substitution may be better for applications where untrained users need to edit templates.

The usual way of substituting variables by name is the % operator:

>>> '%(page)i: %(title)s' % {'page':2, 'title': 'The Best of Times'}
'2: The Best of Times'

When writing the template string, it can be easy to forget the i or s after the closing parenthesis. This isn’t a big problem if the template is in a Python module, because you run the code, get an “Unsupported format character” ValueError, and fix the problem. However, consider an application such as Mailman where template strings or translations are being edited by users who aren’t aware of the Python language. The format string’s syntax is complicated to explain to such users, and if they make a mistake, it’s difficult to provide helpful feedback to them.

PEP 292 adds a Template class to the string module that uses $ to indicate a substitution:

>>> import string
>>> t = string.Template('$page: $title')
>>> t.substitute({'page':2, 'title': 'The Best of Times'})
'2: The Best of Times'

If a key is missing from the dictionary, the substitute() method will raise a KeyError. There’s also a safe_substitute() method that ignores missing keys:

>>> t = string.Template('$page: $title')
>>> t.safe_substitute({'page':3})
'3: $title'

See also

PEP 292 - Simpler String Substitutions: Written and implemented by Barry Warsaw.

PEP 318: Decorators for Functions and Methods¶

Python 2.2 extended Python’s object model by adding static methods and class methods, but it didn’t extend Python’s syntax to provide any new way of defining static or class methods. Instead, you had to write a def statement in the usual way, and pass the resulting method to a staticmethod() or classmethod() function that would wrap up the function as a method of the new type. Your code would look like this:

class C:
   def meth (cls):
       ...

   meth = classmethod(meth)   # Rebind name to wrapped-up class method

If the method was very long, it would be easy to miss or forget the classmethod() invocation after the function body.

The intention was always to add some syntax to make such definitions more readable, but at the time of 2.2’s release a good syntax was not obvious. Today a good syntax still isn’t obvious but users are asking for easier access to the feature; a new syntactic feature has been added to meet this need.

The new feature is called “function decorators”. The name comes from the idea that classmethod(), staticmethod(), and friends are storing additional information on a function object; they’re decorating functions with more details.

The notation borrows from Java and uses the '@' character as an indicator. Using the new syntax, the example above would be written:

class C:

   @classmethod
   def meth (cls):
       ...

The @classmethod is shorthand for the meth=classmethod(meth) assignment. More generally, if you have the following:

@A
@B
@C
def f ():
    ...

It’s equivalent to the following pre-decorator code:

def f(): ...
f = A(B(C(f)))

Decorators must come on the line before a function definition, one decorator per line, and can’t be on the same line as the def statement, meaning that @A def f(): ... is illegal. You can only decorate function definitions, either at the module level or inside a class; you can’t decorate class definitions.

A decorator is just a function that takes the function to be decorated as an argument and returns either the same function or some new object. The return value of the decorator need not be callable (though it typically is), unless further decorators will be applied to the result. It’s easy to write your own decorators. The following simple example just sets an attribute on the function object:

>>> def deco(func):
...    func.attr = 'decorated'
...    return func
...
>>> @deco
... def f(): pass
...
>>> f
<function f at 0x402ef0d4>
>>> f.attr
'decorated'
>>>

As a slightly more realistic example, the following decorator checks that the supplied argument is an integer:

def require_int (func):
    def wrapper (arg):
        assert isinstance(arg, int)
        return func(arg)

    return wrapper

@require_int
def p1 (arg):
    print arg

@require_int
def p2(arg):
    print arg*2

An example in PEP 318 contains a fancier version of this idea that lets you both specify the required type and check the returned type.

Decorator functions can take arguments. If arguments are supplied, your decorator function is called with only those arguments and must return a new decorator function; this function must take a single function and return a function, as previously described. In other words, @A @B @C(args) becomes:

def f(): ...
_deco = C(args)
f = A(B(_deco(f)))

Getting this right can be slightly brain-bending, but it’s not too difficult.

A small related change makes the func_name attribute of functions writable. This attribute is used to display function names in tracebacks, so decorators should change the name of any new function that’s constructed and returned.

See also

PEP 318 - Decorators for Functions, Methods and Classes: Written by Kevin D. Smith, Jim Jewett, and Skip Montanaro. Several people wrote patches implementing function decorators, but the one that was actually checked in was patch #979728, written by Mark Russell.
http://www.python.org/moin/PythonDecoratorLibrary: This Wiki page contains several examples of decorators.

PEP 322: Reverse Iteration¶

A new built-in function, reversed(seq)(), takes a sequence and returns an iterator that loops over the elements of the sequence in reverse order.

>>> for i in reversed(xrange(1,4)):
...    print i
...
3
2
1

Compared to extended slicing, such as range(1,4)[::-1], reversed() is easier to read, runs faster, and uses substantially less memory.

Note that reversed() only accepts sequences, not arbitrary iterators. If you want to reverse an iterator, first convert it to a list with list().

>>> input = open('/etc/passwd', 'r')
>>> for line in reversed(list(input)):
...   print line
...
root:*:0:0:System Administrator:/var/root:/bin/tcsh
  ...

See also

PEP 322 - Reverse Iteration: Written and implemented by Raymond Hettinger.

PEP 324: New subprocess Module¶

The standard library provides a number of ways to execute a subprocess, offering different features and different levels of complexity. os.system(command)() is easy to use, but slow (it runs a shell process which executes the command) and dangerous (you have to be careful about escaping the shell’s metacharacters). The popen2 module offers classes that can capture standard output and standard error from the subprocess, but the naming is confusing. The subprocess module cleans this up, providing a unified interface that offers all the features you might need.

Instead of popen2‘s collection of classes, subprocess contains a single class called Popen whose constructor supports a number of different keyword arguments.

class Popen(args, bufsize=0, executable=None,
            stdin=None, stdout=None, stderr=None,
            preexec_fn=None, close_fds=False, shell=False,
            cwd=None, env=None, universal_newlines=False,
            startupinfo=None, creationflags=0):

args is commonly a sequence of strings that will be the arguments to the program executed as the subprocess. (If the shell argument is true, args can be a string which will then be passed on to the shell for interpretation, just as os.system() does.)

stdin, stdout, and stderr specify what the subprocess’s input, output, and error streams will be. You can provide a file object or a file descriptor, or you can use the constant subprocess.PIPE to create a pipe between the subprocess and the parent.

The constructor has a number of handy options:

close_fds requests that all file descriptors be closed before running the subprocess.
cwd specifies the working directory in which the subprocess will be executed (defaulting to whatever the parent’s working directory is).
env is a dictionary specifying environment variables.
preexec_fn is a function that gets called before the child is started.
universal_newlines opens the child’s input and output using Python’s universal newline feature.

Once you’ve created the Popen instance, you can call its wait() method to pause until the subprocess has exited, poll() to check if it’s exited without pausing, or communicate(data)() to send the string data to the subprocess’s standard input. communicate(data)() then reads any data that the subprocess has sent to its standard output or standard error, returning a tuple (stdout_data, stderr_data).

call() is a shortcut that passes its arguments along to the Popen constructor, waits for the command to complete, and returns the status code of the subprocess. It can serve as a safer analog to os.system():

sts = subprocess.call(['dpkg', '-i', '/tmp/new-package.deb'])
if sts == 0:
    # Success
    ...
else:
    # dpkg returned an error
    ...

The command is invoked without use of the shell. If you really do want to use the shell, you can add shell=True as a keyword argument and provide a string instead of a sequence:

sts = subprocess.call('dpkg -i /tmp/new-package.deb', shell=True)

The PEP takes various examples of shell and Python code and shows how they’d be translated into Python code that uses subprocess. Reading this section of the PEP is highly recommended.

See also

PEP 324 - subprocess - New process module: Written and implemented by Peter Åstrand, with assistance from Fredrik Lundh and others.

PEP 327: Decimal Data Type¶

Python has always supported floating-point (FP) numbers, based on the underlying C double type, as a data type. However, while most programming languages provide a floating-point type, many people (even programmers) are unaware that floating-point numbers don’t represent certain decimal fractions accurately. The new Decimal type can represent these fractions accurately, up to a user-specified precision limit.

Why is Decimal needed?¶

The limitations arise from the representation used for floating-point numbers. FP numbers are made up of three components:

The sign, which is positive or negative.
The mantissa, which is a single-digit binary number followed by a fractional part. For example, 1.01 in base-2 notation is 1 + 0/2 + 1/4, or 1.25 in decimal notation.
The exponent, which tells where the decimal point is located in the number represented.

For example, the number 1.25 has positive sign, a mantissa value of 1.01 (in binary), and an exponent of 0 (the decimal point doesn’t need to be shifted). The number 5 has the same sign and mantissa, but the exponent is 2 because the mantissa is multiplied by 4 (2 to the power of the exponent 2); 1.25 * 4 equals 5.

Modern systems usually provide floating-point support that conforms to a standard called IEEE 754. C’s double type is usually implemented as a 64-bit IEEE 754 number, which uses 52 bits of space for the mantissa. This means that numbers can only be specified to 52 bits of precision. If you’re trying to represent numbers whose expansion repeats endlessly, the expansion is cut off after 52 bits. Unfortunately, most software needs to produce output in base 10, and common fractions in base 10 are often repeating decimals in binary. For example, 1.1 decimal is binary 1.0001100110011 ...; .1 = 1/16 + 1/32 + 1/256 plus an infinite number of additional terms. IEEE 754 has to chop off that infinitely repeated decimal after 52 digits, so the representation is slightly inaccurate.

Sometimes you can see this inaccuracy when the number is printed:

>>> 1.1
1.1000000000000001

The inaccuracy isn’t always visible when you print the number because the FP-to- decimal-string conversion is provided by the C library, and most C libraries try to produce sensible output. Even if it’s not displayed, however, the inaccuracy is still there and subsequent operations can magnify the error.

For many applications this doesn’t matter. If I’m plotting points and displaying them on my monitor, the difference between 1.1 and 1.1000000000000001 is too small to be visible. Reports often limit output to a certain number of decimal places, and if you round the number to two or three or even eight decimal places, the error is never apparent. However, for applications where it does matter, it’s a lot of work to implement your own custom arithmetic routines.

Hence, the Decimal type was created.

The `Decimal` type¶

A new module, decimal, was added to Python’s standard library. It contains two classes, Decimal and Context. Decimal instances represent numbers, and Context instances are used to wrap up various settings such as the precision and default rounding mode.

Decimal instances are immutable, like regular Python integers and FP numbers; once it’s been created, you can’t change the value an instance represents. Decimal instances can be created from integers or strings:

>>> import decimal
>>> decimal.Decimal(1972)
Decimal("1972")
>>> decimal.Decimal("1.1")
Decimal("1.1")

You can also provide tuples containing the sign, the mantissa represented as a tuple of decimal digits, and the exponent:

>>> decimal.Decimal((1, (1, 4, 7, 5), -2))
Decimal("-14.75")

Cautionary note: the sign bit is a Boolean value, so 0 is positive and 1 is negative.

Converting from floating-point numbers poses a bit of a problem: should the FP number representing 1.1 turn into the decimal number for exactly 1.1, or for 1.1 plus whatever inaccuracies are introduced? The decision was to dodge the issue and leave such a conversion out of the API. Instead, you should convert the floating-point number into a string using the desired precision and pass the string to the Decimal constructor:

>>> f = 1.1
>>> decimal.Decimal(str(f))
Decimal("1.1")
>>> decimal.Decimal('%.12f' % f)
Decimal("1.100000000000")

Once you have Decimal instances, you can perform the usual mathematical operations on them. One limitation: exponentiation requires an integer exponent:

>>> a = decimal.Decimal('35.72')
>>> b = decimal.Decimal('1.73')
>>> a+b
Decimal("37.45")
>>> a-b
Decimal("33.99")
>>> a*b
Decimal("61.7956")
>>> a/b
Decimal("20.64739884393063583815028902")
>>> a ** 2
Decimal("1275.9184")
>>> a**b
Traceback (most recent call last):
  ...
decimal.InvalidOperation: x ** (non-integer)

You can combine Decimal instances with integers, but not with floating- point numbers:

>>> a + 4
Decimal("39.72")
>>> a + 4.5
Traceback (most recent call last):
  ...
TypeError: You can interact Decimal only with int, long or Decimal data types.
>>>

Decimal numbers can be used with the math and cmath modules, but note that they’ll be immediately converted to floating-point numbers before the operation is performed, resulting in a possible loss of precision and accuracy. You’ll also get back a regular floating-point number and not a Decimal.

>>> import math, cmath
>>> d = decimal.Decimal('123456789012.345')
>>> math.sqrt(d)
351364.18288201344
>>> cmath.sqrt(-d)
351364.18288201344j

Decimal instances have a sqrt() method that returns a Decimal, but if you need other things such as trigonometric functions you’ll have to implement them.

>>> d.sqrt()
Decimal("351364.1828820134592177245001")

The `Context` type¶

Instances of the Context class encapsulate several settings for decimal operations:

prec is the precision, the number of decimal places.
rounding specifies the rounding mode. The decimal module has constants for the various possibilities: ROUND_DOWN, ROUND_CEILING, ROUND_HALF_EVEN, and various others.
traps is a dictionary specifying what happens on encountering certain error conditions: either an exception is raised or a value is returned. Some examples of error conditions are division by zero, loss of precision, and overflow.

There’s a thread-local default context available by calling getcontext(); you can change the properties of this context to alter the default precision, rounding, or trap handling. The following example shows the effect of changing the precision of the default context:

>>> decimal.getcontext().prec
28
>>> decimal.Decimal(1) / decimal.Decimal(7)
Decimal("0.1428571428571428571428571429")
>>> decimal.getcontext().prec = 9
>>> decimal.Decimal(1) / decimal.Decimal(7)
Decimal("0.142857143")

The default action for error conditions is selectable; the module can either return a special value such as infinity or not-a-number, or exceptions can be raised:

>>> decimal.Decimal(1) / decimal.Decimal(0)
Traceback (most recent call last):
  ...
decimal.DivisionByZero: x / 0
>>> decimal.getcontext().traps[decimal.DivisionByZero] = False
>>> decimal.Decimal(1) / decimal.Decimal(0)
Decimal("Infinity")
>>>

The Context instance also has various methods for formatting numbers such as to_eng_string() and to_sci_string().

For more information, see the documentation for the decimal module, which includes a quick-start tutorial and a reference.

See also

PEP 327 - Decimal Data Type: Written by Facundo Batista and implemented by Facundo Batista, Eric Price, Raymond Hettinger, Aahz, and Tim Peters.
http://www.lahey.com/float.htm: The article uses Fortran code to illustrate many of the problems that floating- point inaccuracy can cause.
http://www2.hursley.ibm.com/decimal/: A description of a decimal-based representation. This representation is being proposed as a standard, and underlies the new Python decimal type. Much of this material was written by Mike Cowlishaw, designer of the Rexx language.

PEP 328: Multi-line Imports¶

One language change is a small syntactic tweak aimed at making it easier to import many names from a module. In a from module import names statement, names is a sequence of names separated by commas. If the sequence is very long, you can either write multiple imports from the same module, or you can use backslashes to escape the line endings like this:

from SimpleXMLRPCServer import SimpleXMLRPCServer,\
            SimpleXMLRPCRequestHandler,\
            CGIXMLRPCRequestHandler,\
            resolve_dotted_attribute

The syntactic change in Python 2.4 simply allows putting the names within parentheses. Python ignores newlines within a parenthesized expression, so the backslashes are no longer needed:

from SimpleXMLRPCServer import (SimpleXMLRPCServer,
                                SimpleXMLRPCRequestHandler,
                                CGIXMLRPCRequestHandler,
                                resolve_dotted_attribute)

The PEP also proposes that all import statements be absolute imports, with a leading . character to indicate a relative import. This part of the PEP was not implemented for Python 2.4, but was completed for Python 2.5.

See also

PEP 328 - Imports: Multi-Line and Absolute/Relative: Written by Aahz. Multi-line imports were implemented by Dima Dorfman.

PEP 331: Locale-Independent Float/String Conversions¶

The locale modules lets Python software select various conversions and display conventions that are localized to a particular country or language. However, the module was careful to not change the numeric locale because various functions in Python’s implementation required that the numeric locale remain set to the 'C' locale. Often this was because the code was using the C library’s atof() function.

Not setting the numeric locale caused trouble for extensions that used third- party C libraries, however, because they wouldn’t have the correct locale set. The motivating example was GTK+, whose user interface widgets weren’t displaying numbers in the current locale.

The solution described in the PEP is to add three new functions to the Python API that perform ASCII-only conversions, ignoring the locale setting:

PyOS_ascii_strtod(str, ptr)() and PyOS_ascii_atof(str, ptr)() both convert a string to a C double.
PyOS_ascii_formatd(buffer, buf_len, format, d)() converts a double to an ASCII string.

The code for these functions came from the GLib library (http://library.gnome.org/devel/glib/stable/), whose developers kindly relicensed the relevant functions and donated them to the Python Software Foundation. The locale module can now change the numeric locale, letting extensions such as GTK+ produce the correct results.

See also

PEP 331 - Locale-Independent Float/String Conversions: Written by Christian R. Reis, and implemented by Gustavo Carneiro.

Other Language Changes¶

Here are all of the changes that Python 2.4 makes to the core Python language.

Decorators for functions and methods were added (PEP 318).
Built-in set() and frozenset() types were added (PEP 218). Other new built-ins include the reversed(seq)() function (PEP 322).
Generator expressions were added (PEP 289).
Certain numeric expressions no longer return values restricted to 32 or 64 bits (PEP 237).
You can now put parentheses around the list of names in a from module import names statement (PEP 328).
The dict.update() method now accepts the same argument forms as the dict constructor. This includes any mapping, any iterable of key/value pairs, and keyword arguments. (Contributed by Raymond Hettinger.)
The string methods ljust(), rjust(), and center() now take an optional argument for specifying a fill character other than a space. (Contributed by Raymond Hettinger.)
Strings also gained an rsplit() method that works like the split() method but splits from the end of the string. (Contributed by Sean Reifschneider.)
```
>>> 'www.python.org'.split('.', 1)
['www', 'python.org']
'www.python.org'.rsplit('.', 1)
['www.python', 'org']
```
Three keyword parameters, cmp, key, and reverse, were added to the sort() method of lists. These parameters make some common usages of sort() simpler. All of these parameters are optional.

For the cmp parameter, the value should be a comparison function that takes two parameters and returns -1, 0, or +1 depending on how the parameters compare. This function will then be used to sort the list. Previously this was the only parameter that could be provided to sort().

key should be a single-parameter function that takes a list element and returns a comparison key for the element. The list is then sorted using the comparison keys. The following example sorts a list case-insensitively:
```
>>> L = ['A', 'b', 'c', 'D']
>>> L.sort()                 # Case-sensitive sort
>>> L
['A', 'D', 'b', 'c']
>>> # Using 'key' parameter to sort list
>>> L.sort(key=lambda x: x.lower())
>>> L
['A', 'b', 'c', 'D']
>>> # Old-fashioned way
>>> L.sort(cmp=lambda x,y: cmp(x.lower(), y.lower()))
>>> L
['A', 'b', 'c', 'D']
```
The last example, which uses the cmp parameter, is the old way to perform a case-insensitive sort. It works but is slower than using a key parameter. Using key calls lower() method once for each element in the list while using cmp will call it twice for each comparison, so using key saves on invocations of the lower() method.

For simple key functions and comparison functions, it is often possible to avoid a lambda expression by using an unbound method instead. For example, the above case-insensitive sort is best written as:
```
>>> L.sort(key=str.lower)
>>> L
['A', 'b', 'c', 'D']
```
Finally, the reverse parameter takes a Boolean value. If the value is true, the list will be sorted into reverse order. Instead of L.sort() ; L.reverse(), you can now write L.sort(reverse=True).

The results of sorting are now guaranteed to be stable. This means that two entries with equal keys will be returned in the same order as they were input. For example, you can sort a list of people by name, and then sort the list by age, resulting in a list sorted by age where people with the same age are in name-sorted order.

(All changes to sort() contributed by Raymond Hettinger.)
There is a new built-in function sorted(iterable)() that works like the in-place list.sort() method but can be used in expressions. The differences are:
the input may be any iterable;
a newly formed copy is sorted, leaving the original intact; and

the expression returns the new sorted copy

>>> L = [9,7,8,3,2,4,1,6,5]
>>> [10+i for i in sorted(L)]       # usable in a list comprehension
[11, 12, 13, 14, 15, 16, 17, 18, 19]
>>> L                               # original is left unchanged
[9,7,8,3,2,4,1,6,5]
>>> sorted('Monty Python')          # any iterable may be an input
[' ', 'M', 'P', 'h', 'n', 'n', 'o', 'o', 't', 't', 'y', 'y']

>>> # List the contents of a dict sorted by key values
>>> colormap = dict(red=1, blue=2, green=3, black=4, yellow=5)
>>> for k, v in sorted(colormap.iteritems()):
...     print k, v
...
black 4
blue 2
green 3
red 1
yellow 5

(Contributed by Raymond Hettinger.)

Integer operations will no longer trigger an OverflowWarning. The OverflowWarning warning will disappear in Python 2.5.
The interpreter gained a new switch, -m, that takes a name, searches for the corresponding module on sys.path, and runs the module as a script. For example, you can now run the Python profiler with python -m profile. (Contributed by Nick Coghlan.)
The eval(expr, globals, locals)() and execfile(filename, globals, locals)() functions and the exec statement now accept any mapping type for the locals parameter. Previously this had to be a regular Python dictionary. (Contributed by Raymond Hettinger.)
The zip() built-in function and itertools.izip() now return an empty list if called with no arguments. Previously they raised a TypeError exception. This makes them more suitable for use with variable length argument lists:
```
>>> def transpose(array):
...    return zip(*array)
...
>>> transpose([(1,2,3), (4,5,6)])
[(1, 4), (2, 5), (3, 6)]
>>> transpose([])
[]
```
(Contributed by Raymond Hettinger.)
Encountering a failure while importing a module no longer leaves a partially- initialized module object in sys.modules. The incomplete module object left behind would fool further imports of the same module into succeeding, leading to confusing errors. (Fixed by Tim Peters.)
None is now a constant; code that binds a new value to the name None is now a syntax error. (Contributed by Raymond Hettinger.)

Optimizations¶

The inner loops for list and tuple slicing were optimized and now run about one-third faster. The inner loops for dictionaries were also optimized, resulting in performance boosts for keys(), values(), items(), iterkeys(), itervalues(), and iteritems(). (Contributed by Raymond Hettinger.)
The machinery for growing and shrinking lists was optimized for speed and for space efficiency. Appending and popping from lists now runs faster due to more efficient code paths and less frequent use of the underlying system realloc(). List comprehensions also benefit. list.extend() was also optimized and no longer converts its argument into a temporary list before extending the base list. (Contributed by Raymond Hettinger.)
list(), tuple(), map(), filter(), and zip() now run several times faster with non-sequence arguments that supply a __len__() method. (Contributed by Raymond Hettinger.)
The methods list.__getitem__(), dict.__getitem__(), and dict.__contains__() are are now implemented as method_descriptor objects rather than wrapper_descriptor objects. This form of access doubles their performance and makes them more suitable for use as arguments to functionals: map(mydict.__getitem__, keylist). (Contributed by Raymond Hettinger.)
Added a new opcode, LIST_APPEND, that simplifies the generated bytecode for list comprehensions and speeds them up by about a third. (Contributed by Raymond Hettinger.)
The peephole bytecode optimizer has been improved to produce shorter, faster bytecode; remarkably, the resulting bytecode is more readable. (Enhanced by Raymond Hettinger.)
String concatenations in statements of the form s = s + "abc" and s += "abc" are now performed more efficiently in certain circumstances. This optimization won’t be present in other Python implementations such as Jython, so you shouldn’t rely on it; using the join() method of strings is still recommended when you want to efficiently glue a large number of strings together. (Contributed by Armin Rigo.)

The net result of the 2.4 optimizations is that Python 2.4 runs the pystone benchmark around 5% faster than Python 2.3 and 35% faster than Python 2.2. (pystone is not a particularly good benchmark, but it’s the most commonly used measurement of Python’s performance. Your own applications may show greater or smaller benefits from Python 2.4.)

New, Improved, and Deprecated Modules¶

As usual, Python’s standard library received a number of enhancements and bug fixes. Here’s a partial list of the most notable changes, sorted alphabetically by module name. Consult the Misc/NEWS file in the source tree for a more complete list of changes, or look through the CVS logs for all the details.

The asyncore module’s loop() function now has a count parameter that lets you perform a limited number of passes through the polling loop. The default is still to loop forever.
The base64 module now has more complete RFC 3548 support for Base64, Base32, and Base16 encoding and decoding, including optional case folding and optional alternative alphabets. (Contributed by Barry Warsaw.)
The bisect module now has an underlying C implementation for improved performance. (Contributed by Dmitry Vasiliev.)
The CJKCodecs collections of East Asian codecs, maintained by Hye-Shik Chang, was integrated into 2.4. The new encodings are:
Chinese (PRC): gb2312, gbk, gb18030, big5hkscs, hz
Chinese (ROC): big5, cp950
Japanese: cp932, euc-jis-2004, euc-jp, euc-jisx0213, iso-2022-jp,

iso-2022-jp-1, iso-2022-jp-2, iso-2022-jp-3, iso-2022-jp-ext, iso-2022-jp-2004, shift-jis, shift-jisx0213, shift-jis-2004
Korean: cp949, euc-kr, johab, iso-2022-kr
Some other new encodings were added: HP Roman8, ISO_8859-11, ISO_8859-16, PCTP-154, and TIS-620.
The UTF-8 and UTF-16 codecs now cope better with receiving partial input. Previously the StreamReader class would try to read more data, making it impossible to resume decoding from the stream. The read() method will now return as much data as it can and future calls will resume decoding where previous ones left off. (Implemented by Walter Dörwald.)

There is a new collections module for various specialized collection datatypes. Currently it contains just one type, deque, a double- ended queue that supports efficiently adding and removing elements from either end:

>>> from collections import deque
>>> d = deque('ghi')        # make a new deque with three items
>>> d.append('j')           # add a new entry to the right side
>>> d.appendleft('f')       # add a new entry to the left side
>>> d                       # show the representation of the deque
deque(['f', 'g', 'h', 'i', 'j'])
>>> d.pop()                 # return and remove the rightmost item
'j'
>>> d.popleft()             # return and remove the leftmost item
'f'
>>> list(d)                 # list the contents of the deque
['g', 'h', 'i']
>>> 'h' in d                # search the deque
True

Several modules, such as the Queue and threading modules, now take advantage of collections.deque for improved performance. (Contributed by Raymond Hettinger.)

The ConfigParser classes have been enhanced slightly. The read() method now returns a list of the files that were successfully parsed, and the set() method raises TypeError if passed a value argument that isn’t a string. (Contributed by John Belmonte and David Goodger.)
The curses module now supports the ncurses extension use_default_colors(). On platforms where the terminal supports transparency, this makes it possible to use a transparent background. (Contributed by Jörg Lehmann.)
The difflib module now includes an HtmlDiff class that creates an HTML table showing a side by side comparison of two versions of a text. (Contributed by Dan Gass.)
The email package was updated to version 3.0, which dropped various deprecated APIs and removes support for Python versions earlier than 2.3. The 3.0 version of the package uses a new incremental parser for MIME messages, available in the email.FeedParser module. The new parser doesn’t require reading the entire message into memory, and doesn’t raise exceptions if a message is malformed; instead it records any problems in the defect attribute of the message. (Developed by Anthony Baxter, Barry Warsaw, Thomas Wouters, and others.)
The heapq module has been converted to C. The resulting tenfold improvement in speed makes the module suitable for handling high volumes of data. In addition, the module has two new functions nlargest() and nsmallest() that use heaps to find the N largest or smallest values in a dataset without the expense of a full sort. (Contributed by Raymond Hettinger.)
The httplib module now contains constants for HTTP status codes defined in various HTTP-related RFC documents. Constants have names such as OK, CREATED, CONTINUE, and MOVED_PERMANENTLY; use pydoc to get a full list. (Contributed by Andrew Eland.)
The imaplib module now supports IMAP’s THREAD command (contributed by Yves Dionne) and new deleteacl() and myrights() methods (contributed by Arnaud Mazin).
The itertools module gained a groupby(iterable[, *func*])() function. iterable is something that can be iterated over to return a stream of elements, and the optional func parameter is a function that takes an element and returns a key value; if omitted, the key is simply the element itself. groupby() then groups the elements into subsequences which have matching values of the key, and returns a series of 2-tuples containing the key value and an iterator over the subsequence.

Here’s an example to make this clearer. The key function simply returns whether a number is even or odd, so the result of groupby() is to return consecutive runs of odd or even numbers.
```
>>> import itertools
>>> L = [2, 4, 6, 7, 8, 9, 11, 12, 14]
>>> for key_val, it in itertools.groupby(L, lambda x: x % 2):
...    print key_val, list(it)
...
0 [2, 4, 6]
1 [7]
0 [8]
1 [9, 11]
0 [12, 14]
>>>
```
groupby() is typically used with sorted input. The logic for groupby() is similar to the Unix uniq filter which makes it handy for eliminating, counting, or identifying duplicate elements:
```
>>> word = 'abracadabra'
>>> letters = sorted(word)   # Turn string into a sorted list of letters
>>> letters
['a', 'a', 'a', 'a', 'a', 'b', 'b', 'c', 'd', 'r', 'r']
>>> for k, g in itertools.groupby(letters):
...    print k, list(g)
...
a ['a', 'a', 'a', 'a', 'a']
b ['b', 'b']
c ['c']
d ['d']
r ['r', 'r']
>>> # List unique letters
>>> [k for k, g in groupby(letters)]
['a', 'b', 'c', 'd', 'r']
>>> # Count letter occurrences
>>> [(k, len(list(g))) for k, g in groupby(letters)]
[('a', 5), ('b', 2), ('c', 1), ('d', 1), ('r', 2)]
```
(Contributed by Hye-Shik Chang.)
itertools also gained a function named tee(iterator, N)() that returns N independent iterators that replicate iterator. If N is omitted, the default is 2.
```
>>> L = [1,2,3]
>>> i1, i2 = itertools.tee(L)
>>> i1,i2
(<itertools.tee object at 0x402c2080>, <itertools.tee object at 0x402c2090>)
>>> list(i1)               # Run the first iterator to exhaustion
[1, 2, 3]
>>> list(i2)               # Run the second iterator to exhaustion
[1, 2, 3]
```
Note that tee() has to keep copies of the values returned by the iterator; in the worst case, it may need to keep all of them. This should therefore be used carefully if the leading iterator can run far ahead of the trailing iterator in a long stream of inputs. If the separation is large, then you might as well use list() instead. When the iterators track closely with one another, tee() is ideal. Possible applications include bookmarking, windowing, or lookahead iterators. (Contributed by Raymond Hettinger.)
A number of functions were added to the locale module, such as bind_textdomain_codeset() to specify a particular encoding and a family of l*gettext() functions that return messages in the chosen encoding. (Contributed by Gustavo Niemeyer.)
Some keyword arguments were added to the logging package’s basicConfig() function to simplify log configuration. The default behavior is to log messages to standard error, but various keyword arguments can be specified to log to a particular file, change the logging format, or set the logging level. For example:
```
import logging
logging.basicConfig(filename='/var/log/application.log',
    level=0,  # Log all messages
    format='%(levelname):%(process):%(thread):%(message)')
```
Other additions to the logging package include a log(level, msg)() convenience method, as well as a TimedRotatingFileHandler class that rotates its log files at a timed interval. The module already had RotatingFileHandler, which rotated logs once the file exceeded a certain size. Both classes derive from a new BaseRotatingHandler class that can be used to implement other rotating handlers.

(Changes implemented by Vinay Sajip.)
The marshal module now shares interned strings on unpacking a data structure. This may shrink the size of certain pickle strings, but the primary effect is to make .pyc files significantly smaller. (Contributed by Martin von Löwis.)
The nntplib module’s NNTP class gained description() and descriptions() methods to retrieve newsgroup descriptions for a single group or for a range of groups. (Contributed by Jürgen A. Erhard.)
Two new functions were added to the operator module, attrgetter(attr)() and itemgetter(index)(). Both functions return callables that take a single argument and return the corresponding attribute or item; these callables make excellent data extractors when used with map() or sorted(). For example:
```
>>> L = [('c', 2), ('d', 1), ('a', 4), ('b', 3)]
>>> map(operator.itemgetter(0), L)
['c', 'd', 'a', 'b']
>>> map(operator.itemgetter(1), L)
[2, 1, 4, 3]
>>> sorted(L, key=operator.itemgetter(1)) # Sort list by second tuple item
[('d', 1), ('c', 2), ('b', 3), ('a', 4)]
```
(Contributed by Raymond Hettinger.)
The optparse module was updated in various ways. The module now passes its messages through gettext.gettext(), making it possible to internationalize Optik’s help and error messages. Help messages for options can now include the string '%default', which will be replaced by the option’s default value. (Contributed by Greg Ward.)
The long-term plan is to deprecate the rfc822 module in some future Python release in favor of the email package. To this end, the email.Utils.formatdate() function has been changed to make it usable as a replacement for rfc822.formatdate(). You may want to write new e-mail processing code with this in mind. (Change implemented by Anthony Baxter.)
A new urandom(n)() function was added to the os module, returning a string containing n bytes of random data. This function provides access to platform-specific sources of randomness such as /dev/urandom on Linux or the Windows CryptoAPI. (Contributed by Trevor Perrin.)
Another new function: os.path.lexists(path)() returns true if the file specified by path exists, whether or not it’s a symbolic link. This differs from the existing os.path.exists(path)() function, which returns false if path is a symlink that points to a destination that doesn’t exist. (Contributed by Beni Cherniavsky.)
A new getsid() function was added to the posix module that underlies the os module. (Contributed by J. Raynor.)
The poplib module now supports POP over SSL. (Contributed by Hector Urtubia.)
The profile module can now profile C extension functions. (Contributed by Nick Bastin.)
The random module has a new method called getrandbits(N)() that returns a long integer N bits in length. The existing randrange() method now uses getrandbits() where appropriate, making generation of arbitrarily large random numbers more efficient. (Contributed by Raymond Hettinger.)
The regular expression language accepted by the re module was extended with simple conditional expressions, written as (?(group)A|B). group is either a numeric group ID or a group name defined with (?P<group>...) earlier in the expression. If the specified group matched, the regular expression pattern A will be tested against the string; if the group didn’t match, the pattern B will be used instead. (Contributed by Gustavo Niemeyer.)
The re module is also no longer recursive, thanks to a massive amount of work by Gustavo Niemeyer. In a recursive regular expression engine, certain patterns result in a large amount of C stack space being consumed, and it was possible to overflow the stack. For example, if you matched a 30000-byte string of a characters against the expression (a|b)+, one stack frame was consumed per character. Python 2.3 tried to check for stack overflow and raise a RuntimeError exception, but certain patterns could sidestep the checking and if you were unlucky Python could segfault. Python 2.4’s regular expression engine can match this pattern without problems.
The signal module now performs tighter error-checking on the parameters to the signal.signal() function. For example, you can’t set a handler on the SIGKILL signal; previous versions of Python would quietly accept this, but 2.4 will raise a RuntimeError exception.
Two new functions were added to the socket module. socketpair() returns a pair of connected sockets and getservbyport(port)() looks up the service name for a given port number. (Contributed by Dave Cole and Barry Warsaw.)
The sys.exitfunc() function has been deprecated. Code should be using the existing atexit module, which correctly handles calling multiple exit functions. Eventually sys.exitfunc() will become a purely internal interface, accessed only by atexit.
The tarfile module now generates GNU-format tar files by default. (Contributed by Lars Gustaebel.)
The threading module now has an elegantly simple way to support thread-local data. The module contains a local class whose attribute values are local to different threads.
```
import threading

data = threading.local()
data.number = 42
data.url = ('www.python.org', 80)
```
Other threads can assign and retrieve their own values for the number and url attributes. You can subclass local to initialize attributes or to add methods. (Contributed by Jim Fulton.)
The timeit module now automatically disables periodic garbage collection during the timing loop. This change makes consecutive timings more comparable. (Contributed by Raymond Hettinger.)
The weakref module now supports a wider variety of objects including Python functions, class instances, sets, frozensets, deques, arrays, files, sockets, and regular expression pattern objects. (Contributed by Raymond Hettinger.)
The xmlrpclib module now supports a multi-call extension for transmitting multiple XML-RPC calls in a single HTTP operation. (Contributed by Brian Quinlan.)
The mpz, rotor, and xreadlines modules have been removed.

cookielib¶

The cookielib library supports client-side handling for HTTP cookies, mirroring the Cookie module’s server-side cookie support. Cookies are stored in cookie jars; the library transparently stores cookies offered by the web server in the cookie jar, and fetches the cookie from the jar when connecting to the server. As in web browsers, policy objects control whether cookies are accepted or not.

In order to store cookies across sessions, two implementations of cookie jars are provided: one that stores cookies in the Netscape format so applications can use the Mozilla or Lynx cookie files, and one that stores cookies in the same format as the Perl libwww library.

urllib2 has been changed to interact with cookielib: HTTPCookieProcessor manages a cookie jar that is used when accessing URLs.

This module was contributed by John J. Lee.

doctest¶

The doctest module underwent considerable refactoring thanks to Edward Loper and Tim Peters. Testing can still be as simple as running doctest.testmod(), but the refactorings allow customizing the module’s operation in various ways

The new DocTestFinder class extracts the tests from a given object’s docstrings:

def f (x, y):
    """>>> f(2,2)
4
>>> f(3,2)
6
    """
    return x*y

finder = doctest.DocTestFinder()

# Get list of DocTest instances
tests = finder.find(f)

The new DocTestRunner class then runs individual tests and can produce a summary of the results:

runner = doctest.DocTestRunner()
for t in tests:
    tried, failed = runner.run(t)

runner.summarize(verbose=1)

The above example produces the following output:

items passed all tests:
tests in f
tests in 1 items.
passed and 0 failed.
Test passed.

DocTestRunner uses an instance of the OutputChecker class to compare the expected output with the actual output. This class takes a number of different flags that customize its behaviour; ambitious users can also write a completely new subclass of OutputChecker.

The default output checker provides a number of handy features. For example, with the doctest.ELLIPSIS option flag, an ellipsis (...) in the expected output matches any substring, making it easier to accommodate outputs that vary in minor ways:

def o (n):
    """>>> o(1)
<__main__.C instance at 0x...>
>>>
"""

Another special string, <BLANKLINE>, matches a blank line:

def p (n):
    """>>> p(1)
<BLANKLINE>
>>>
"""

Another new capability is producing a diff-style display of the output by specifying the doctest.REPORT_UDIFF (unified diffs), doctest.REPORT_CDIFF (context diffs), or doctest.REPORT_NDIFF (delta-style) option flags. For example:

def g (n):
    """>>> g(4)
here
is
a
lengthy
>>>"""
    L = 'here is a rather lengthy list of words'.split()
    for word in L[:n]:
        print word

Running the above function’s tests with doctest.REPORT_UDIFF specified, you get the following output:

**********************************************************************
File "t.py", line 15, in g
Failed example:
    g(4)
Differences (unified diff with -expected +actual):
    @@ -2,3 +2,3 @@
     is
     a
    -lengthy
    +rather
**********************************************************************

Build and C API Changes¶

Some of the changes to Python’s build process and to the C API are:

Three new convenience macros were added for common return values from extension functions: Py_RETURN_NONE, Py_RETURN_TRUE, and Py_RETURN_FALSE. (Contributed by Brett Cannon.)
Another new macro, Py_CLEAR(obj), decreases the reference count of obj and sets obj to the null pointer. (Contributed by Jim Fulton.)
A new function, PyTuple_Pack(N, obj1, obj2, ..., objN)(), constructs tuples from a variable length argument list of Python objects. (Contributed by Raymond Hettinger.)
A new function, PyDict_Contains(d, k)(), implements fast dictionary lookups without masking exceptions raised during the look-up process. (Contributed by Raymond Hettinger.)
The Py_IS_NAN(X) macro returns 1 if its float or double argument X is a NaN. (Contributed by Tim Peters.)
C code can avoid unnecessary locking by using the new PyEval_ThreadsInitialized() function to tell if any thread operations have been performed. If this function returns false, no lock operations are needed. (Contributed by Nick Coghlan.)
A new function, PyArg_VaParseTupleAndKeywords(), is the same as PyArg_ParseTupleAndKeywords() but takes a va_list instead of a number of arguments. (Contributed by Greg Chapman.)
A new method flag, METH_COEXISTS, allows a function defined in slots to co-exist with a PyCFunction having the same name. This can halve the access time for a method such as set.__contains__(). (Contributed by Raymond Hettinger.)
Python can now be built with additional profiling for the interpreter itself, intended as an aid to people developing the Python core. Providing ----enable-profiling to the configure script will let you profile the interpreter with gprof, and providing the ----with-tsc switch enables profiling using the Pentium’s Time-Stamp- Counter register. Note that the ----with-tsc switch is slightly misnamed, because the profiling feature also works on the PowerPC platform, though that processor architecture doesn’t call that register “the TSC register”. (Contributed by Jeremy Hylton.)
The tracebackobject type has been renamed to PyTracebackObject.

Port-Specific Changes¶

The Windows port now builds under MSVC++ 7.1 as well as version 6. (Contributed by Martin von Löwis.)

Porting to Python 2.4¶

This section lists previously described changes that may require changes to your code:

Left shifts and hexadecimal/octal constants that are too large no longer trigger a FutureWarning and return a value limited to 32 or 64 bits; instead they return a long integer.
Integer operations will no longer trigger an OverflowWarning. The OverflowWarning warning will disappear in Python 2.5.
The zip() built-in function and itertools.izip() now return an empty list instead of raising a TypeError exception if called with no arguments.
You can no longer compare the date and datetime instances provided by the datetime module. Two instances of different classes will now always be unequal, and relative comparisons (<, >) will raise a TypeError.
dircache.listdir() now passes exceptions to the caller instead of returning empty lists.
LexicalHandler.startDTD() used to receive the public and system IDs in the wrong order. This has been corrected; applications relying on the wrong order need to be fixed.
fcntl.ioctl() now warns if the mutate argument is omitted and relevant.
The tarfile module now generates GNU-format tar files by default.
Encountering a failure while importing a module no longer leaves a partially- initialized module object in sys.modules.
None is now a constant; code that binds a new value to the name None is now a syntax error.
The signals.signal() function now raises a RuntimeError exception for certain illegal values; previously these errors would pass silently. For example, you can no longer set a handler on the SIGKILL signal.

Acknowledgements¶

The author would like to thank the following people for offering suggestions, corrections and assistance with various drafts of this article: Koray Can, Hye- Shik Chang, Michael Dyck, Raymond Hettinger, Brian Hurt, Hamish Lawson, Fredrik Lundh, Sean Reifschneider, Sadruddin Rejeb.

What’s New in Python 2.3¶

Author:	A.M. Kuchling

This article explains the new features in Python 2.3. Python 2.3 was released on July 29, 2003.

The main themes for Python 2.3 are polishing some of the features added in 2.2, adding various small but useful enhancements to the core language, and expanding the standard library. The new object model introduced in the previous version has benefited from 18 months of bugfixes and from optimization efforts that have improved the performance of new-style classes. A few new built-in functions have been added such as sum() and enumerate(). The in operator can now be used for substring searches (e.g. "ab" in "abc" returns True).

Some of the many new library features include Boolean, set, heap, and date/time data types, the ability to import modules from ZIP-format archives, metadata support for the long-awaited Python catalog, an updated version of IDLE, and modules for logging messages, wrapping text, parsing CSV files, processing command-line options, using BerkeleyDB databases... the list of new and enhanced modules is lengthy.

This article doesn’t attempt to provide a complete specification of the new features, but instead provides a convenient overview. For full details, you should refer to the documentation for Python 2.3, such as the Python Library Reference and the Python Reference Manual. If you want to understand the complete implementation and design rationale, refer to the PEP for a particular new feature.

PEP 218: A Standard Set Datatype¶

The new sets module contains an implementation of a set datatype. The Set class is for mutable sets, sets that can have members added and removed. The ImmutableSet class is for sets that can’t be modified, and instances of ImmutableSet can therefore be used as dictionary keys. Sets are built on top of dictionaries, so the elements within a set must be hashable.

Here’s a simple example:

>>> import sets
>>> S = sets.Set([1,2,3])
>>> S
Set([1, 2, 3])
>>> 1 in S
True
>>> 0 in S
False
>>> S.add(5)
>>> S.remove(3)
>>> S
Set([1, 2, 5])
>>>

The union and intersection of sets can be computed with the union() and intersection() methods; an alternative notation uses the bitwise operators & and |. Mutable sets also have in-place versions of these methods, union_update() and intersection_update().

>>> S1 = sets.Set([1,2,3])
>>> S2 = sets.Set([4,5,6])
>>> S1.union(S2)
Set([1, 2, 3, 4, 5, 6])
>>> S1 | S2                  # Alternative notation
Set([1, 2, 3, 4, 5, 6])
>>> S1.intersection(S2)
Set([])
>>> S1 & S2                  # Alternative notation
Set([])
>>> S1.union_update(S2)
>>> S1
Set([1, 2, 3, 4, 5, 6])
>>>

It’s also possible to take the symmetric difference of two sets. This is the set of all elements in the union that aren’t in the intersection. Another way of putting it is that the symmetric difference contains all elements that are in exactly one set. Again, there’s an alternative notation (^), and an in- place version with the ungainly name symmetric_difference_update().

>>> S1 = sets.Set([1,2,3,4])
>>> S2 = sets.Set([3,4,5,6])
>>> S1.symmetric_difference(S2)
Set([1, 2, 5, 6])
>>> S1 ^ S2
Set([1, 2, 5, 6])
>>>

There are also issubset() and issuperset() methods for checking whether one set is a subset or superset of another:

>>> S1 = sets.Set([1,2,3])
>>> S2 = sets.Set([2,3])
>>> S2.issubset(S1)
True
>>> S1.issubset(S2)
False
>>> S1.issuperset(S2)
True
>>>

See also

PEP 218 - Adding a Built-In Set Object Type: PEP written by Greg V. Wilson. Implemented by Greg V. Wilson, Alex Martelli, and GvR.

PEP 255: Simple Generators¶

In Python 2.2, generators were added as an optional feature, to be enabled by a from __future__ import generators directive. In 2.3 generators no longer need to be specially enabled, and are now always present; this means that yield is now always a keyword. The rest of this section is a copy of the description of generators from the “What’s New in Python 2.2” document; if you read it back when Python 2.2 came out, you can skip the rest of this section.

You’re doubtless familiar with how function calls work in Python or C. When you call a function, it gets a private namespace where its local variables are created. When the function reaches a return statement, the local variables are destroyed and the resulting value is returned to the caller. A later call to the same function will get a fresh new set of local variables. But, what if the local variables weren’t thrown away on exiting a function? What if you could later resume the function where it left off? This is what generators provide; they can be thought of as resumable functions.

Here’s the simplest example of a generator function:

def generate_ints(N):
    for i in range(N):
        yield i

When you call a generator function, it doesn’t return a single value; instead it returns a generator object that supports the iterator protocol. On executing the yield statement, the generator outputs the value of i, similar to a return statement. The big difference between yield and a return statement is that on reaching a yield the generator’s state of execution is suspended and local variables are preserved. On the next call to the generator’s .next() method, the function will resume executing immediately after the yield statement. (For complicated reasons, the yield statement isn’t allowed inside the try block of a try...finally statement; read PEP 255 for a full explanation of the interaction between yield and exceptions.)

Here’s a sample usage of the generate_ints() generator:

>>> gen = generate_ints(3)
>>> gen
<generator object at 0x8117f90>
>>> gen.next()
0
>>> gen.next()
1
>>> gen.next()
2
>>> gen.next()
Traceback (most recent call last):
  File "stdin", line 1, in ?
  File "stdin", line 2, in generate_ints
StopIteration

You could equally write for i in generate_ints(5), or a,b,c = generate_ints(3).

Inside a generator function, the return statement can only be used without a value, and signals the end of the procession of values; afterwards the generator cannot return any further values. return with a value, such as return 5, is a syntax error inside a generator function. The end of the generator’s results can also be indicated by raising StopIteration manually, or by just letting the flow of execution fall off the bottom of the function.

You could achieve the effect of generators manually by writing your own class and storing all the local variables of the generator as instance variables. For example, returning a list of integers could be done by setting self.count to 0, and having the next() method increment self.count and return it. However, for a moderately complicated generator, writing a corresponding class would be much messier. Lib/test/test_generators.py contains a number of more interesting examples. The simplest one implements an in-order traversal of a tree using generators recursively.

# A recursive generator that generates Tree leaves in in-order.
def inorder(t):
    if t:
        for x in inorder(t.left):
            yield x
        yield t.label
        for x in inorder(t.right):
            yield x

Two other examples in Lib/test/test_generators.py produce solutions for the N-Queens problem (placing $N$ queens on an $NxN$ chess board so that no queen threatens another) and the Knight’s Tour (a route that takes a knight to every square of an $NxN$ chessboard without visiting any square twice).

The idea of generators comes from other programming languages, especially Icon (http://www.cs.arizona.edu/icon/), where the idea of generators is central. In Icon, every expression and function call behaves like a generator. One example from “An Overview of the Icon Programming Language” at http://www.cs.arizona.edu/icon/docs/ipd266.htm gives an idea of what this looks like:

sentence := "Store it in the neighboring harbor"
if (i := find("or", sentence)) > 5 then write(i)

In Icon the find() function returns the indexes at which the substring “or” is found: 3, 23, 33. In the if statement, i is first assigned a value of 3, but 3 is less than 5, so the comparison fails, and Icon retries it with the second value of 23. 23 is greater than 5, so the comparison now succeeds, and the code prints the value 23 to the screen.

Python doesn’t go nearly as far as Icon in adopting generators as a central concept. Generators are considered part of the core Python language, but learning or using them isn’t compulsory; if they don’t solve any problems that you have, feel free to ignore them. One novel feature of Python’s interface as compared to Icon’s is that a generator’s state is represented as a concrete object (the iterator) that can be passed around to other functions or stored in a data structure.

See also

PEP 255 - Simple Generators: Written by Neil Schemenauer, Tim Peters, Magnus Lie Hetland. Implemented mostly by Neil Schemenauer and Tim Peters, with other fixes from the Python Labs crew.

PEP 263: Source Code Encodings¶

Python source files can now be declared as being in different character set encodings. Encodings are declared by including a specially formatted comment in the first or second line of the source file. For example, a UTF-8 file can be declared with:

#!/usr/bin/env python
# -*- coding: UTF-8 -*-

Without such an encoding declaration, the default encoding used is 7-bit ASCII. Executing or importing modules that contain string literals with 8-bit characters and have no encoding declaration will result in a DeprecationWarning being signalled by Python 2.3; in 2.4 this will be a syntax error.

The encoding declaration only affects Unicode string literals, which will be converted to Unicode using the specified encoding. Note that Python identifiers are still restricted to ASCII characters, so you can’t have variable names that use characters outside of the usual alphanumerics.

See also

PEP 263 - Defining Python Source Code Encodings: Written by Marc-André Lemburg and Martin von Löwis; implemented by Suzuki Hisao and Martin von Löwis.

PEP 273: Importing Modules from ZIP Archives¶

The new zipimport module adds support for importing modules from a ZIP- format archive. You don’t need to import the module explicitly; it will be automatically imported if a ZIP archive’s filename is added to sys.path. For example:

amk@nyman:~/src/python$ unzip -l /tmp/example.zip
Archive:  /tmp/example.zip
  Length     Date   Time    Name
 --------    ----   ----    ----
     8467  11-26-02 22:30   jwzthreading.py
 --------                   -------
     8467                   1 file
amk@nyman:~/src/python$ ./python
Python 2.3 (#1, Aug 1 2003, 19:54:32)
>>> import sys
>>> sys.path.insert(0, '/tmp/example.zip')  # Add .zip file to front of path
>>> import jwzthreading
>>> jwzthreading.__file__
'/tmp/example.zip/jwzthreading.py'
>>>

An entry in sys.path can now be the filename of a ZIP archive. The ZIP archive can contain any kind of files, but only files named *.py, *.pyc, or *.pyo can be imported. If an archive only contains *.py files, Python will not attempt to modify the archive by adding the corresponding *.pyc file, meaning that if a ZIP archive doesn’t contain *.pyc files, importing may be rather slow.

A path within the archive can also be specified to only import from a subdirectory; for example, the path /tmp/example.zip/lib/ would only import from the lib/ subdirectory within the archive.

See also

PEP 273 - Import Modules from Zip Archives: Written by James C. Ahlstrom, who also provided an implementation. Python 2.3 follows the specification in PEP 273, but uses an implementation written by Just van Rossum that uses the import hooks described in PEP 302. See section PEP 302: New Import Hooks for a description of the new import hooks.

PEP 277: Unicode file name support for Windows NT¶

On Windows NT, 2000, and XP, the system stores file names as Unicode strings. Traditionally, Python has represented file names as byte strings, which is inadequate because it renders some file names inaccessible.

Python now allows using arbitrary Unicode strings (within the limitations of the file system) for all functions that expect file names, most notably the open() built-in function. If a Unicode string is passed to os.listdir(), Python now returns a list of Unicode strings. A new function, os.getcwdu(), returns the current directory as a Unicode string.

Byte strings still work as file names, and on Windows Python will transparently convert them to Unicode using the mbcs encoding.

Other systems also allow Unicode strings as file names but convert them to byte strings before passing them to the system, which can cause a UnicodeError to be raised. Applications can test whether arbitrary Unicode strings are supported as file names by checking os.path.supports_unicode_filenames, a Boolean value.

Under MacOS, os.listdir() may now return Unicode filenames.

See also

PEP 277 - Unicode file name support for Windows NT: Written by Neil Hodgson; implemented by Neil Hodgson, Martin von Löwis, and Mark Hammond.

PEP 278: Universal Newline Support¶

The three major operating systems used today are Microsoft Windows, Apple’s Macintosh OS, and the various Unix derivatives. A minor irritation of cross- platform work is that these three platforms all use different characters to mark the ends of lines in text files. Unix uses the linefeed (ASCII character 10), MacOS uses the carriage return (ASCII character 13), and Windows uses a two-character sequence of a carriage return plus a newline.

Python’s file objects can now support end of line conventions other than the one followed by the platform on which Python is running. Opening a file with the mode 'U' or 'rU' will open a file for reading in universal newline mode. All three line ending conventions will be translated to a '\n' in the strings returned by the various file methods such as read() and readline().

Universal newline support is also used when importing modules and when executing a file with the execfile() function. This means that Python modules can be shared between all three operating systems without needing to convert the line-endings.

This feature can be disabled when compiling Python by specifying the --without-universal-newlines switch when running Python’s configure script.

See also

PEP 278 - Universal Newline Support: Written and implemented by Jack Jansen.

PEP 279: enumerate()¶

A new built-in function, enumerate(), will make certain loops a bit clearer. enumerate(thing), where thing is either an iterator or a sequence, returns a iterator that will return (0, thing[0]), (1, thing[1]), (2, thing[2]), and so forth.

A common idiom to change every element of a list looks like this:

for i in range(len(L)):
    item = L[i]
    # ... compute some result based on item ...
    L[i] = result

This can be rewritten using enumerate() as:

for i, item in enumerate(L):
    # ... compute some result based on item ...
    L[i] = result

See also

PEP 279 - The enumerate() built-in function: Written and implemented by Raymond D. Hettinger.

PEP 282: The logging Package¶

A standard package for writing logs, logging, has been added to Python 2.3. It provides a powerful and flexible mechanism for generating logging output which can then be filtered and processed in various ways. A configuration file written in a standard format can be used to control the logging behavior of a program. Python includes handlers that will write log records to standard error or to a file or socket, send them to the system log, or even e-mail them to a particular address; of course, it’s also possible to write your own handler classes.

The Logger class is the primary class. Most application code will deal with one or more Logger objects, each one used by a particular subsystem of the application. Each Logger is identified by a name, and names are organized into a hierarchy using . as the component separator. For example, you might have Logger instances named server, server.auth and server.network. The latter two instances are below server in the hierarchy. This means that if you turn up the verbosity for server or direct server messages to a different handler, the changes will also apply to records logged to server.auth and server.network. There’s also a root Logger that’s the parent of all other loggers.

For simple uses, the logging package contains some convenience functions that always use the root log:

import logging

logging.debug('Debugging information')
logging.info('Informational message')
logging.warning('Warning:config file %s not found', 'server.conf')
logging.error('Error occurred')
logging.critical('Critical error -- shutting down')

This produces the following output:

WARNING:root:Warning:config file server.conf not found
ERROR:root:Error occurred
CRITICAL:root:Critical error -- shutting down

In the default configuration, informational and debugging messages are suppressed and the output is sent to standard error. You can enable the display of informational and debugging messages by calling the setLevel() method on the root logger.

Notice the warning() call’s use of string formatting operators; all of the functions for logging messages take the arguments (msg, arg1, arg2, ...) and log the string resulting from msg % (arg1, arg2, ...).

There’s also an exception() function that records the most recent traceback. Any of the other functions will also record the traceback if you specify a true value for the keyword argument exc_info.

def f():
    try:    1/0
    except: logging.exception('Problem recorded')

f()

This produces the following output:

ERROR:root:Problem recorded
Traceback (most recent call last):
  File "t.py", line 6, in f
    1/0
ZeroDivisionError: integer division or modulo by zero

Slightly more advanced programs will use a logger other than the root logger. The getLogger(name)() function is used to get a particular log, creating it if it doesn’t exist yet. getLogger(None)() returns the root logger.

log = logging.getLogger('server')
 ...
log.info('Listening on port %i', port)
 ...
log.critical('Disk full')
 ...

Log records are usually propagated up the hierarchy, so a message logged to server.auth is also seen by server and root, but a Logger can prevent this by setting its propagate attribute to False.

There are more classes provided by the logging package that can be customized. When a Logger instance is told to log a message, it creates a LogRecord instance that is sent to any number of different Handler instances. Loggers and handlers can also have an attached list of filters, and each filter can cause the LogRecord to be ignored or can modify the record before passing it along. When they’re finally output, LogRecord instances are converted to text by a Formatter class. All of these classes can be replaced by your own specially-written classes.

With all of these features the logging package should provide enough flexibility for even the most complicated applications. This is only an incomplete overview of its features, so please see the package’s reference documentation for all of the details. Reading PEP 282 will also be helpful.

See also

PEP 282 - A Logging System: Written by Vinay Sajip and Trent Mick; implemented by Vinay Sajip.

PEP 285: A Boolean Type¶

A Boolean type was added to Python 2.3. Two new constants were added to the __builtin__ module, True and False. (True and False constants were added to the built-ins in Python 2.2.1, but the 2.2.1 versions are simply set to integer values of 1 and 0 and aren’t a different type.)

The type object for this new type is named bool; the constructor for it takes any Python value and converts it to True or False.

>>> bool(1)
True
>>> bool(0)
False
>>> bool([])
False
>>> bool( (1,) )
True

Most of the standard library modules and built-in functions have been changed to return Booleans.

>>> obj = []
>>> hasattr(obj, 'append')
True
>>> isinstance(obj, list)
True
>>> isinstance(obj, tuple)
False

Python’s Booleans were added with the primary goal of making code clearer. For example, if you’re reading a function and encounter the statement return 1, you might wonder whether the 1 represents a Boolean truth value, an index, or a coefficient that multiplies some other quantity. If the statement is return True, however, the meaning of the return value is quite clear.

Python’s Booleans were not added for the sake of strict type-checking. A very strict language such as Pascal would also prevent you performing arithmetic with Booleans, and would require that the expression in an if statement always evaluate to a Boolean result. Python is not this strict and never will be, as PEP 285 explicitly says. This means you can still use any expression in an if statement, even ones that evaluate to a list or tuple or some random object. The Boolean type is a subclass of the int class so that arithmetic using a Boolean still works.

>>> True + 1
2
>>> False + 1
1
>>> False * 75
0
>>> True * 75
75

To sum up True and False in a sentence: they’re alternative ways to spell the integer values 1 and 0, with the single difference that str() and repr() return the strings 'True' and 'False' instead of '1' and '0'.

See also

PEP 285 - Adding a bool type: Written and implemented by GvR.

PEP 293: Codec Error Handling Callbacks¶

When encoding a Unicode string into a byte string, unencodable characters may be encountered. So far, Python has allowed specifying the error processing as either “strict” (raising UnicodeError), “ignore” (skipping the character), or “replace” (using a question mark in the output string), with “strict” being the default behavior. It may be desirable to specify alternative processing of such errors, such as inserting an XML character reference or HTML entity reference into the converted string.

Python now has a flexible framework to add different processing strategies. New error handlers can be added with codecs.register_error(), and codecs then can access the error handler with codecs.lookup_error(). An equivalent C API has been added for codecs written in C. The error handler gets the necessary state information such as the string being converted, the position in the string where the error was detected, and the target encoding. The handler can then either raise an exception or return a replacement string.

Two additional error handlers have been implemented using this framework: “backslashreplace” uses Python backslash quoting to represent unencodable characters and “xmlcharrefreplace” emits XML character references.

See also

PEP 293 - Codec Error Handling Callbacks: Written and implemented by Walter Dörwald.

PEP 301: Package Index and Metadata for Distutils¶

Support for the long-requested Python catalog makes its first appearance in 2.3.

The heart of the catalog is the new Distutils register command. Running python setup.py register will collect the metadata describing a package, such as its name, version, maintainer, description, &c., and send it to a central catalog server. The resulting catalog is available from http://www.python.org/pypi.

To make the catalog a bit more useful, a new optional classifiers keyword argument has been added to the Distutils setup() function. A list of Trove-style strings can be supplied to help classify the software.

Here’s an example setup.py with classifiers, written to be compatible with older versions of the Distutils:

from distutils import core
kw = {'name': "Quixote",
      'version': "0.5.1",
      'description': "A highly Pythonic Web application framework",
      # ...
      }

if (hasattr(core, 'setup_keywords') and
    'classifiers' in core.setup_keywords):
    kw['classifiers'] = \
        ['Topic :: Internet :: WWW/HTTP :: Dynamic Content',
         'Environment :: No Input/Output (Daemon)',
         'Intended Audience :: Developers'],

core.setup(**kw)

The full list of classifiers can be obtained by running python setup.py register --list-classifiers.

See also

PEP 301 - Package Index and Metadata for Distutils: Written and implemented by Richard Jones.

PEP 302: New Import Hooks¶

While it’s been possible to write custom import hooks ever since the ihooks module was introduced in Python 1.3, no one has ever been really happy with it because writing new import hooks is difficult and messy. There have been various proposed alternatives such as the imputil and iu modules, but none of them has ever gained much acceptance, and none of them were easily usable from C code.

PEP 302 borrows ideas from its predecessors, especially from Gordon McMillan’s iu module. Three new items are added to the sys module:

sys.path_hooks is a list of callable objects; most often they’ll be classes. Each callable takes a string containing a path and either returns an importer object that will handle imports from this path or raises an ImportError exception if it can’t handle this path.
sys.path_importer_cache caches importer objects for each path, so sys.path_hooks will only need to be traversed once for each path.
sys.meta_path is a list of importer objects that will be traversed before sys.path is checked. This list is initially empty, but user code can add objects to it. Additional built-in and frozen modules can be imported by an object added to this list.

Importer objects must have a single method, find_module(fullname, path=None)(). fullname will be a module or package name, e.g. string or distutils.core. find_module() must return a loader object that has a single method, load_module(fullname)(), that creates and returns the corresponding module object.

Pseudo-code for Python’s new import logic, therefore, looks something like this (simplified a bit; see PEP 302 for the full details):

for mp in sys.meta_path:
    loader = mp(fullname)
    if loader is not None:
        <module> = loader.load_module(fullname)

for path in sys.path:
    for hook in sys.path_hooks:
        try:
            importer = hook(path)
        except ImportError:
            # ImportError, so try the other path hooks
            pass
        else:
            loader = importer.find_module(fullname)
            <module> = loader.load_module(fullname)

# Not found!
raise ImportError

See also

PEP 302 - New Import Hooks: Written by Just van Rossum and Paul Moore. Implemented by Just van Rossum.

PEP 305: Comma-separated Files¶

Comma-separated files are a format frequently used for exporting data from databases and spreadsheets. Python 2.3 adds a parser for comma-separated files.

Comma-separated format is deceptively simple at first glance:

Costs,150,200,3.95

Read a line and call line.split(','): what could be simpler? But toss in string data that can contain commas, and things get more complicated:

"Costs",150,200,3.95,"Includes taxes, shipping, and sundry items"

A big ugly regular expression can parse this, but using the new csv package is much simpler:

import csv

input = open('datafile', 'rb')
reader = csv.reader(input)
for line in reader:
    print line

The reader() function takes a number of different options. The field separator isn’t limited to the comma and can be changed to any character, and so can the quoting and line-ending characters.

Different dialects of comma-separated files can be defined and registered; currently there are two dialects, both used by Microsoft Excel. A separate csv.writer class will generate comma-separated files from a succession of tuples or lists, quoting strings that contain the delimiter.

See also

PEP 305 - CSV File API: Written and implemented by Kevin Altis, Dave Cole, Andrew McNamara, Skip Montanaro, Cliff Wells.

PEP 307: Pickle Enhancements¶

The pickle and cPickle modules received some attention during the 2.3 development cycle. In 2.2, new-style classes could be pickled without difficulty, but they weren’t pickled very compactly; PEP 307 quotes a trivial example where a new-style class results in a pickled string three times longer than that for a classic class.

The solution was to invent a new pickle protocol. The pickle.dumps() function has supported a text-or-binary flag for a long time. In 2.3, this flag is redefined from a Boolean to an integer: 0 is the old text-mode pickle format, 1 is the old binary format, and now 2 is a new 2.3-specific format. A new constant, pickle.HIGHEST_PROTOCOL, can be used to select the fanciest protocol available.

Unpickling is no longer considered a safe operation. 2.2’s pickle provided hooks for trying to prevent unsafe classes from being unpickled (specifically, a __safe_for_unpickling__ attribute), but none of this code was ever audited and therefore it’s all been ripped out in 2.3. You should not unpickle untrusted data in any version of Python.

To reduce the pickling overhead for new-style classes, a new interface for customizing pickling was added using three special methods: __getstate__(), __setstate__(), and __getnewargs__(). Consult PEP 307 for the full semantics of these methods.

As a way to compress pickles yet further, it’s now possible to use integer codes instead of long strings to identify pickled classes. The Python Software Foundation will maintain a list of standardized codes; there’s also a range of codes for private use. Currently no codes have been specified.

See also

PEP 307 - Extensions to the pickle protocol: Written and implemented by Guido van Rossum and Tim Peters.

Extended Slices¶

Ever since Python 1.4, the slicing syntax has supported an optional third “step” or “stride” argument. For example, these are all legal Python syntax: L[1:10:2], L[:-1:1], L[::-1]. This was added to Python at the request of the developers of Numerical Python, which uses the third argument extensively. However, Python’s built-in list, tuple, and string sequence types have never supported this feature, raising a TypeError if you tried it. Michael Hudson contributed a patch to fix this shortcoming.

For example, you can now easily extract the elements of a list that have even indexes:

>>> L = range(10)
>>> L[::2]
[0, 2, 4, 6, 8]

Negative values also work to make a copy of the same list in reverse order:

>>> L[::-1]
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

This also works for tuples, arrays, and strings:

>>> s='abcd'
>>> s[::2]
'ac'
>>> s[::-1]
'dcba'

If you have a mutable sequence such as a list or an array you can assign to or delete an extended slice, but there are some differences between assignment to extended and regular slices. Assignment to a regular slice can be used to change the length of the sequence:

>>> a = range(3)
>>> a
[0, 1, 2]
>>> a[1:3] = [4, 5, 6]
>>> a
[0, 4, 5, 6]

Extended slices aren’t this flexible. When assigning to an extended slice, the list on the right hand side of the statement must contain the same number of items as the slice it is replacing:

>>> a = range(4)
>>> a
[0, 1, 2, 3]
>>> a[::2]
[0, 2]
>>> a[::2] = [0, -1]
>>> a
[0, 1, -1, 3]
>>> a[::2] = [0,1,2]
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
ValueError: attempt to assign sequence of size 3 to extended slice of size 2

Deletion is more straightforward:

>>> a = range(4)
>>> a
[0, 1, 2, 3]
>>> a[::2]
[0, 2]
>>> del a[::2]
>>> a
[1, 3]

One can also now pass slice objects to the __getitem__() methods of the built-in sequences:

>>> range(10).__getitem__(slice(0, 5, 2))
[0, 2, 4]

Or use slice objects directly in subscripts:

>>> range(10)[slice(0, 5, 2)]
[0, 2, 4]

To simplify implementing sequences that support extended slicing, slice objects now have a method indices(length)() which, given the length of a sequence, returns a (start, stop, step) tuple that can be passed directly to range(). indices() handles omitted and out-of-bounds indices in a manner consistent with regular slices (and this innocuous phrase hides a welter of confusing details!). The method is intended to be used like this:

class FakeSeq:
    ...
    def calc_item(self, i):
        ...
    def __getitem__(self, item):
        if isinstance(item, slice):
            indices = item.indices(len(self))
            return FakeSeq([self.calc_item(i) for i in range(*indices)])
        else:
            return self.calc_item(i)

From this example you can also see that the built-in slice object is now the type object for the slice type, and is no longer a function. This is consistent with Python 2.2, where int, str, etc., underwent the same change.

Other Language Changes¶

Here are all of the changes that Python 2.3 makes to the core Python language.

The yield statement is now always a keyword, as described in section PEP 255: Simple Generators of this document.
A new built-in function enumerate() was added, as described in section PEP 279: enumerate() of this document.
Two new constants, True and False were added along with the built-in bool type, as described in section PEP 285: A Boolean Type of this document.
The int() type constructor will now return a long integer instead of raising an OverflowError when a string or floating-point number is too large to fit into an integer. This can lead to the paradoxical result that isinstance(int(expression), int) is false, but that seems unlikely to cause problems in practice.
Built-in types now support the extended slicing syntax, as described in section Extended Slices of this document.
A new built-in function, sum(iterable, start=0)(), adds up the numeric items in the iterable object and returns their sum. sum() only accepts numbers, meaning that you can’t use it to concatenate a bunch of strings. (Contributed by Alex Martelli.)
list.insert(pos, value) used to insert value at the front of the list when pos was negative. The behaviour has now been changed to be consistent with slice indexing, so when pos is -1 the value will be inserted before the last element, and so forth.
list.index(value), which searches for value within the list and returns its index, now takes optional start and stop arguments to limit the search to only part of the list.
Dictionaries have a new method, pop(key[, *default*])(), that returns the value corresponding to key and removes that key/value pair from the dictionary. If the requested key isn’t present in the dictionary, default is returned if it’s specified and KeyError raised if it isn’t.
```
>>> d = {1:2}
>>> d
{1: 2}
>>> d.pop(4)
Traceback (most recent call last):
  File "stdin", line 1, in ?
KeyError: 4
>>> d.pop(1)
2
>>> d.pop(1)
Traceback (most recent call last):
  File "stdin", line 1, in ?
KeyError: 'pop(): dictionary is empty'
>>> d
{}
>>>
```
There’s also a new class method, dict.fromkeys(iterable, value)(), that creates a dictionary with keys taken from the supplied iterator iterable and all values set to value, defaulting to None.

(Patches contributed by Raymond Hettinger.)

Also, the dict() constructor now accepts keyword arguments to simplify creating small dictionaries:
```
>>> dict(red=1, blue=2, green=3, black=4)
{'blue': 2, 'black': 4, 'green': 3, 'red': 1}
```
(Contributed by Just van Rossum.)
The assert statement no longer checks the __debug__ flag, so you can no longer disable assertions by assigning to __debug__. Running Python with the -O switch will still generate code that doesn’t execute any assertions.
Most type objects are now callable, so you can use them to create new objects such as functions, classes, and modules. (This means that the new module can be deprecated in a future Python version, because you can now use the type objects available in the types module.) For example, you can create a new module object with the following code:
```
>>> import types
>>> m = types.ModuleType('abc','docstring')
>>> m
<module 'abc' (built-in)>
>>> m.__doc__
'docstring'
```
A new warning, PendingDeprecationWarning was added to indicate features which are in the process of being deprecated. The warning will not be printed by default. To check for use of features that will be deprecated in the future, supply -Walways::PendingDeprecationWarning:: on the command line or use warnings.filterwarnings().
The process of deprecating string-based exceptions, as in raise "Error occurred", has begun. Raising a string will now trigger PendingDeprecationWarning.
Using None as a variable name will now result in a SyntaxWarning warning. In a future version of Python, None may finally become a keyword.
The xreadlines() method of file objects, introduced in Python 2.1, is no longer necessary because files now behave as their own iterator. xreadlines() was originally introduced as a faster way to loop over all the lines in a file, but now you can simply write for line in file_obj. File objects also have a new read-only encoding attribute that gives the encoding used by the file; Unicode strings written to the file will be automatically converted to bytes using the given encoding.
The method resolution order used by new-style classes has changed, though you’ll only notice the difference if you have a really complicated inheritance hierarchy. Classic classes are unaffected by this change. Python 2.2 originally used a topological sort of a class’s ancestors, but 2.3 now uses the C3 algorithm as described in the paper “A Monotonic Superclass Linearization for Dylan”. To understand the motivation for this change, read Michele Simionato’s article “Python 2.3 Method Resolution Order”, or read the thread on python-dev starting with the message at http://mail.python.org/pipermail/python-dev/2002-October/029035.html. Samuele Pedroni first pointed out the problem and also implemented the fix by coding the C3 algorithm.
Python runs multithreaded programs by switching between threads after executing N bytecodes. The default value for N has been increased from 10 to 100 bytecodes, speeding up single-threaded applications by reducing the switching overhead. Some multithreaded applications may suffer slower response time, but that’s easily fixed by setting the limit back to a lower number using sys.setcheckinterval(N)(). The limit can be retrieved with the new sys.getcheckinterval() function.
One minor but far-reaching change is that the names of extension types defined by the modules included with Python now contain the module and a '.' in front of the type name. For example, in Python 2.2, if you created a socket and printed its __class__, you’d get this output:
```
>>> s = socket.socket()
>>> s.__class__
<type 'socket'>
```
In 2.3, you get this:
```
>>> s.__class__
<type '_socket.socket'>
```
One of the noted incompatibilities between old- and new-style classes has been removed: you can now assign to the __name__ and __bases__ attributes of new-style classes. There are some restrictions on what can be assigned to __bases__ along the lines of those relating to assigning to an instance’s __class__ attribute.

String Changes¶

The in operator now works differently for strings. Previously, when evaluating X in Y where X and Y are strings, X could only be a single character. That’s now changed; X can be a string of any length, and X in Y will return True if X is a substring of Y. If X is the empty string, the result is always True.
```
>>> 'ab' in 'abcd'
True
>>> 'ad' in 'abcd'
False
>>> '' in 'abcd'
True
```
Note that this doesn’t tell you where the substring starts; if you need that information, use the find() string method.
The strip(), lstrip(), and rstrip() string methods now have an optional argument for specifying the characters to strip. The default is still to remove all whitespace characters:
```
>>> '   abc '.strip()
'abc'
>>> '><><abc<><><>'.strip('<>')
'abc'
>>> '><><abc<><><>\n'.strip('<>')
'abc<><><>\n'
>>> u'\u4000\u4001abc\u4000'.strip(u'\u4000')
u'\u4001abc'
>>>
```
(Suggested by Simon Brunning and implemented by Walter Dörwald.)
The startswith() and endswith() string methods now accept negative numbers for the start and end parameters.
Another new string method is zfill(), originally a function in the string module. zfill() pads a numeric string with zeros on the left until it’s the specified width. Note that the % operator is still more flexible and powerful than zfill().
```
>>> '45'.zfill(4)
'0045'
>>> '12345'.zfill(4)
'12345'
>>> 'goofy'.zfill(6)
'0goofy'
```
(Contributed by Walter Dörwald.)
A new type object, basestring, has been added. Both 8-bit strings and Unicode strings inherit from this type, so isinstance(obj, basestring) will return True for either kind of string. It’s a completely abstract type, so you can’t create basestring instances.
Interned strings are no longer immortal and will now be garbage-collected in the usual way when the only reference to them is from the internal dictionary of interned strings. (Implemented by Oren Tirosh.)

Optimizations¶

The creation of new-style class instances has been made much faster; they’re now faster than classic classes!
The sort() method of list objects has been extensively rewritten by Tim Peters, and the implementation is significantly faster.
Multiplication of large long integers is now much faster thanks to an implementation of Karatsuba multiplication, an algorithm that scales better than the O(n*n) required for the grade-school multiplication algorithm. (Original patch by Christopher A. Craig, and significantly reworked by Tim Peters.)
The SET_LINENO opcode is now gone. This may provide a small speed increase, depending on your compiler’s idiosyncrasies. See section Other Changes and Fixes for a longer explanation. (Removed by Michael Hudson.)
xrange() objects now have their own iterator, making for i in xrange(n) slightly faster than for i in range(n). (Patch by Raymond Hettinger.)
A number of small rearrangements have been made in various hotspots to improve performance, such as inlining a function or removing some code. (Implemented mostly by GvR, but lots of people have contributed single changes.)

The net result of the 2.3 optimizations is that Python 2.3 runs the pystone benchmark around 25% faster than Python 2.2.

New, Improved, and Deprecated Modules¶

The array module now supports arrays of Unicode characters using the 'u' format character. Arrays also now support using the += assignment operator to add another array’s contents, and the *= assignment operator to repeat an array. (Contributed by Jason Orendorff.)
The bsddb module has been replaced by version 4.1.6 of the PyBSDDB package, providing a more complete interface to the transactional features of the BerkeleyDB library.

The old version of the module has been renamed to bsddb185 and is no longer built automatically; you’ll have to edit Modules/Setup to enable it. Note that the new bsddb package is intended to be compatible with the old module, so be sure to file bugs if you discover any incompatibilities. When upgrading to Python 2.3, if the new interpreter is compiled with a new version of the underlying BerkeleyDB library, you will almost certainly have to convert your database files to the new version. You can do this fairly easily with the new scripts db2pickle.py and pickle2db.py which you will find in the distribution’s Tools/scripts directory. If you’ve already been using the PyBSDDB package and importing it as bsddb3, you will have to change your import statements to import it as bsddb.
The new bz2 module is an interface to the bz2 data compression library. bz2-compressed data is usually smaller than corresponding zlib-compressed data. (Contributed by Gustavo Niemeyer.)
A set of standard date/time types has been added in the new datetime module. See the following section for more details.
The Distutils Extension class now supports an extra constructor argument named depends for listing additional source files that an extension depends on. This lets Distutils recompile the module if any of the dependency files are modified. For example, if sampmodule.c includes the header file sample.h, you would create the Extension object like this:
```
ext = Extension("samp",
                sources=["sampmodule.c"],
                depends=["sample.h"])
```
Modifying sample.h would then cause the module to be recompiled. (Contributed by Jeremy Hylton.)
Other minor changes to Distutils: it now checks for the CC, CFLAGS, CPP, LDFLAGS, and CPPFLAGS environment variables, using them to override the settings in Python’s configuration (contributed by Robert Weber).
Previously the doctest module would only search the docstrings of public methods and functions for test cases, but it now also examines private ones as well. The DocTestSuite(() function creates a unittest.TestSuite object from a set of doctest tests.
The new gc.get_referents(object)() function returns a list of all the objects referenced by object.
The getopt module gained a new function, gnu_getopt(), that supports the same arguments as the existing getopt() function but uses GNU-style scanning mode. The existing getopt() stops processing options as soon as a non-option argument is encountered, but in GNU-style mode processing continues, meaning that options and arguments can be mixed. For example:
```
>>> getopt.getopt(['-f', 'filename', 'output', '-v'], 'f:v')
([('-f', 'filename')], ['output', '-v'])
>>> getopt.gnu_getopt(['-f', 'filename', 'output', '-v'], 'f:v')
([('-f', 'filename'), ('-v', '')], ['output'])
```
(Contributed by Peter Åstrand.)

The grp, pwd, and resource modules now return enhanced tuples:

>>> import grp
>>> g = grp.getgrnam('amk')
>>> g.gr_name, g.gr_gid
('amk', 500)

The gzip module can now handle files exceeding 2 GiB.
The new heapq module contains an implementation of a heap queue algorithm. A heap is an array-like data structure that keeps items in a partially sorted order such that, for every index k, heap[k] <= heap[2*k+1] and heap[k] <= heap[2*k+2]. This makes it quick to remove the smallest item, and inserting a new item while maintaining the heap property is O(lg n). (See http://www.nist.gov/dads/HTML/priorityque.html for more information about the priority queue data structure.)

The heapq module provides heappush() and heappop() functions for adding and removing items while maintaining the heap property on top of some other mutable Python sequence type. Here’s an example that uses a Python list:
```
>>> import heapq
>>> heap = []
>>> for item in [3, 7, 5, 11, 1]:
...    heapq.heappush(heap, item)
...
>>> heap
[1, 3, 5, 11, 7]
>>> heapq.heappop(heap)
1
>>> heapq.heappop(heap)
3
>>> heap
[5, 7, 11]
```
(Contributed by Kevin O’Connor.)
The IDLE integrated development environment has been updated using the code from the IDLEfork project (http://idlefork.sf.net). The most notable feature is that the code being developed is now executed in a subprocess, meaning that there’s no longer any need for manual reload() operations. IDLE’s core code has been incorporated into the standard library as the idlelib package.
The imaplib module now supports IMAP over SSL. (Contributed by Piers Lauder and Tino Lange.)
The itertools contains a number of useful functions for use with iterators, inspired by various functions provided by the ML and Haskell languages. For example, itertools.ifilter(predicate, iterator) returns all elements in the iterator for which the function predicate() returns True, and itertools.repeat(obj, N) returns obj N times. There are a number of other functions in the module; see the package’s reference documentation for details. (Contributed by Raymond Hettinger.)
Two new functions in the math module, degrees(rads)() and radians(degs)(), convert between radians and degrees. Other functions in the math module such as math.sin() and math.cos() have always required input values measured in radians. Also, an optional base argument was added to math.log() to make it easier to compute logarithms for bases other than e and 10. (Contributed by Raymond Hettinger.)
Several new POSIX functions (getpgid(), killpg(), lchown(), loadavg(), major(), makedev(), minor(), and mknod()) were added to the posix module that underlies the os module. (Contributed by Gustavo Niemeyer, Geert Jansen, and Denis S. Otkidach.)
In the os module, the *stat() family of functions can now report fractions of a second in a timestamp. Such time stamps are represented as floats, similar to the value returned by time.time().

During testing, it was found that some applications will break if time stamps are floats. For compatibility, when using the tuple interface of the stat_result time stamps will be represented as integers. When using named fields (a feature first introduced in Python 2.2), time stamps are still represented as integers, unless os.stat_float_times() is invoked to enable float return values:
```
>>> os.stat("/tmp").st_mtime
1034791200
>>> os.stat_float_times(True)
>>> os.stat("/tmp").st_mtime
1034791200.6335014
```
In Python 2.4, the default will change to always returning floats.

Application developers should enable this feature only if all their libraries work properly when confronted with floating point time stamps, or if they use the tuple API. If used, the feature should be activated on an application level instead of trying to enable it on a per-use basis.
The optparse module contains a new parser for command-line arguments that can convert option values to a particular Python type and will automatically generate a usage message. See the following section for more details.
The old and never-documented linuxaudiodev module has been deprecated, and a new version named ossaudiodev has been added. The module was renamed because the OSS sound drivers can be used on platforms other than Linux, and the interface has also been tidied and brought up to date in various ways. (Contributed by Greg Ward and Nicholas FitzRoy-Dale.)
The new platform module contains a number of functions that try to determine various properties of the platform you’re running on. There are functions for getting the architecture, CPU type, the Windows OS version, and even the Linux distribution version. (Contributed by Marc-André Lemburg.)
The parser objects provided by the pyexpat module can now optionally buffer character data, resulting in fewer calls to your character data handler and therefore faster performance. Setting the parser object’s buffer_text attribute to True will enable buffering.

The sample(population, k)() function was added to the random module. population is a sequence or xrange object containing the elements of a population, and sample() chooses k elements from the population without replacing chosen elements. k can be any value up to len(population). For example:

>>> days = ['Mo', 'Tu', 'We', 'Th', 'Fr', 'St', 'Sn']
>>> random.sample(days, 3)      # Choose 3 elements
['St', 'Sn', 'Th']
>>> random.sample(days, 7)      # Choose 7 elements
['Tu', 'Th', 'Mo', 'We', 'St', 'Fr', 'Sn']
>>> random.sample(days, 7)      # Choose 7 again
['We', 'Mo', 'Sn', 'Fr', 'Tu', 'St', 'Th']
>>> random.sample(days, 8)      # Can't choose eight
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "random.py", line 414, in sample
      raise ValueError, "sample larger than population"
ValueError: sample larger than population
>>> random.sample(xrange(1,10000,2), 10)   # Choose ten odd nos. under 10000
[3407, 3805, 1505, 7023, 2401, 2267, 9733, 3151, 8083, 9195]

The random module now uses a new algorithm, the Mersenne Twister, implemented in C. It’s faster and more extensively studied than the previous algorithm.

(All changes contributed by Raymond Hettinger.)

The readline module also gained a number of new functions: get_history_item(), get_current_history_length(), and redisplay().
The rexec and Bastion modules have been declared dead, and attempts to import them will fail with a RuntimeError. New-style classes provide new ways to break out of the restricted execution environment provided by rexec, and no one has interest in fixing them or time to do so. If you have applications using rexec, rewrite them to use something else.

(Sticking with Python 2.2 or 2.1 will not make your applications any safer because there are known bugs in the rexec module in those versions. To repeat: if you’re using rexec, stop using it immediately.)
The rotor module has been deprecated because the algorithm it uses for encryption is not believed to be secure. If you need encryption, use one of the several AES Python modules that are available separately.
The shutil module gained a move(src, dest)() function that recursively moves a file or directory to a new location.
Support for more advanced POSIX signal handling was added to the signal but then removed again as it proved impossible to make it work reliably across platforms.
The socket module now supports timeouts. You can call the settimeout(t)() method on a socket object to set a timeout of t seconds. Subsequent socket operations that take longer than t seconds to complete will abort and raise a socket.timeout exception.

The original timeout implementation was by Tim O’Malley. Michael Gilfix integrated it into the Python socket module and shepherded it through a lengthy review. After the code was checked in, Guido van Rossum rewrote parts of it. (This is a good example of a collaborative development process in action.)
On Windows, the socket module now ships with Secure Sockets Layer (SSL) support.
The value of the C PYTHON_API_VERSION macro is now exposed at the Python level as sys.api_version. The current exception can be cleared by calling the new sys.exc_clear() function.
The new tarfile module allows reading from and writing to tar-format archive files. (Contributed by Lars Gustäbel.)
The new textwrap module contains functions for wrapping strings containing paragraphs of text. The wrap(text, width)() function takes a string and returns a list containing the text split into lines of no more than the chosen width. The fill(text, width)() function returns a single string, reformatted to fit into lines no longer than the chosen width. (As you can guess, fill() is built on top of wrap(). For example:
```
>>> import textwrap
>>> paragraph = "Not a whit, we defy augury: ... more text ..."
>>> textwrap.wrap(paragraph, 60)
["Not a whit, we defy augury: there's a special providence in",
 "the fall of a sparrow. If it be now, 'tis not to come; if it",
 ...]
>>> print textwrap.fill(paragraph, 35)
Not a whit, we defy augury: there's
a special providence in the fall of
a sparrow. If it be now, 'tis not
to come; if it be not to come, it
will be now; if it be not now, yet
it will come: the readiness is all.
>>>
```
The module also contains a TextWrapper class that actually implements the text wrapping strategy. Both the TextWrapper class and the wrap() and fill() functions support a number of additional keyword arguments for fine-tuning the formatting; consult the module’s documentation for details. (Contributed by Greg Ward.)
The thread and threading modules now have companion modules, dummy_thread and dummy_threading, that provide a do-nothing implementation of the thread module’s interface for platforms where threads are not supported. The intention is to simplify thread-aware modules (ones that don’t rely on threads to run) by putting the following code at the top:
```
try:
    import threading as _threading
except ImportError:
    import dummy_threading as _threading
```
In this example, _threading is used as the module name to make it clear that the module being used is not necessarily the actual threading module. Code can call functions and use classes in _threading whether or not threads are supported, avoiding an if statement and making the code slightly clearer. This module will not magically make multithreaded code run without threads; code that waits for another thread to return or to do something will simply hang forever.
The time module’s strptime() function has long been an annoyance because it uses the platform C library’s strptime() implementation, and different platforms sometimes have odd bugs. Brett Cannon contributed a portable implementation that’s written in pure Python and should behave identically on all platforms.
The new timeit module helps measure how long snippets of Python code take to execute. The timeit.py file can be run directly from the command line, or the module’s Timer class can be imported and used directly. Here’s a short example that figures out whether it’s faster to convert an 8-bit string to Unicode by appending an empty Unicode string to it or by using the unicode() function:
```
import timeit

timer1 = timeit.Timer('unicode("abc")')
timer2 = timeit.Timer('"abc" + u""')

# Run three trials
print timer1.repeat(repeat=3, number=100000)
print timer2.repeat(repeat=3, number=100000)

# On my laptop this outputs:
# [0.36831796169281006, 0.37441694736480713, 0.35304892063140869]
# [0.17574405670166016, 0.18193507194519043, 0.17565798759460449]
```
The Tix module has received various bug fixes and updates for the current version of the Tix package.
The Tkinter module now works with a thread-enabled version of Tcl. Tcl’s threading model requires that widgets only be accessed from the thread in which they’re created; accesses from another thread can cause Tcl to panic. For certain Tcl interfaces, Tkinter will now automatically avoid this when a widget is accessed from a different thread by marshalling a command, passing it to the correct thread, and waiting for the results. Other interfaces can’t be handled automatically but Tkinter will now raise an exception on such an access so that you can at least find out about the problem. See http://mail.python.org/pipermail/python-dev/2002-December/031107.html for a more detailed explanation of this change. (Implemented by Martin von Löwis.)
Calling Tcl methods through _tkinter no longer returns only strings. Instead, if Tcl returns other objects those objects are converted to their Python equivalent, if one exists, or wrapped with a _tkinter.Tcl_Obj object if no Python equivalent exists. This behavior can be controlled through the wantobjects() method of tkapp objects.

When using _tkinter through the Tkinter module (as most Tkinter applications will), this feature is always activated. It should not cause compatibility problems, since Tkinter would always convert string results to Python types where possible.

If any incompatibilities are found, the old behavior can be restored by setting the wantobjects variable in the Tkinter module to false before creating the first tkapp object.
```
import Tkinter
Tkinter.wantobjects = 0
```
Any breakage caused by this change should be reported as a bug.

The UserDict module has a new DictMixin class which defines all dictionary methods for classes that already have a minimum mapping interface. This greatly simplifies writing classes that need to be substitutable for dictionaries, such as the classes in the shelve module.

Adding the mix-in as a superclass provides the full dictionary interface whenever the class defines __getitem__(), __setitem__(), __delitem__(), and keys(). For example:

>>> import UserDict
>>> class SeqDict(UserDict.DictMixin):
...     """Dictionary lookalike implemented with lists."""
...     def __init__(self):
...         self.keylist = []
...         self.valuelist = []
...     def __getitem__(self, key):
...         try:
...             i = self.keylist.index(key)
...         except ValueError:
...             raise KeyError
...         return self.valuelist[i]
...     def __setitem__(self, key, value):
...         try:
...             i = self.keylist.index(key)
...             self.valuelist[i] = value
...         except ValueError:
...             self.keylist.append(key)
...             self.valuelist.append(value)
...     def __delitem__(self, key):
...         try:
...             i = self.keylist.index(key)
...         except ValueError:
...             raise KeyError
...         self.keylist.pop(i)
...         self.valuelist.pop(i)
...     def keys(self):
...         return list(self.keylist)
...
>>> s = SeqDict()
>>> dir(s)      # See that other dictionary methods are implemented
['__cmp__', '__contains__', '__delitem__', '__doc__', '__getitem__',
 '__init__', '__iter__', '__len__', '__module__', '__repr__',
 '__setitem__', 'clear', 'get', 'has_key', 'items', 'iteritems',
 'iterkeys', 'itervalues', 'keylist', 'keys', 'pop', 'popitem',
 'setdefault', 'update', 'valuelist', 'values']

(Contributed by Raymond Hettinger.)

The DOM implementation in xml.dom.minidom can now generate XML output in a particular encoding by providing an optional encoding argument to the toxml() and toprettyxml() methods of DOM nodes.
The xmlrpclib module now supports an XML-RPC extension for handling nil data values such as Python’s None. Nil values are always supported on unmarshalling an XML-RPC response. To generate requests containing None, you must supply a true value for the allow_none parameter when creating a Marshaller instance.
The new DocXMLRPCServer module allows writing self-documenting XML-RPC servers. Run it in demo mode (as a program) to see it in action. Pointing the Web browser to the RPC server produces pydoc-style documentation; pointing xmlrpclib to the server allows invoking the actual methods. (Contributed by Brian Quinlan.)
Support for internationalized domain names (RFCs 3454, 3490, 3491, and 3492) has been added. The “idna” encoding can be used to convert between a Unicode domain name and the ASCII-compatible encoding (ACE) of that name.
```
>{}>{}> u"www.Alliancefrançaise.nu".encode("idna")
'www.xn--alliancefranaise-npb.nu'
```
The socket module has also been extended to transparently convert Unicode hostnames to the ACE version before passing them to the C library. Modules that deal with hostnames such as httplib and ftplib) also support Unicode host names; httplib also sends HTTP Host headers using the ACE version of the domain name. urllib supports Unicode URLs with non-ASCII host names as long as the path part of the URL is ASCII only.

To implement this change, the stringprep module, the mkstringprep tool and the punycode encoding have been added.

Date/Time Type¶

Date and time types suitable for expressing timestamps were added as the datetime module. The types don’t support different calendars or many fancy features, and just stick to the basics of representing time.

The three primary types are: date, representing a day, month, and year; time, consisting of hour, minute, and second; and datetime, which contains all the attributes of both date and time. There’s also a timedelta class representing differences between two points in time, and time zone logic is implemented by classes inheriting from the abstract tzinfo class.

You can create instances of date and time by either supplying keyword arguments to the appropriate constructor, e.g. datetime.date(year=1972, month=10, day=15), or by using one of a number of class methods. For example, the date.today() class method returns the current local date.

Once created, instances of the date/time classes are all immutable. There are a number of methods for producing formatted strings from objects:

>>> import datetime
>>> now = datetime.datetime.now()
>>> now.isoformat()
'2002-12-30T21:27:03.994956'
>>> now.ctime()  # Only available on date, datetime
'Mon Dec 30 21:27:03 2002'
>>> now.strftime('%Y %d %b')
'2002 30 Dec'

The replace() method allows modifying one or more fields of a date or datetime instance, returning a new instance:

>>> d = datetime.datetime.now()
>>> d
datetime.datetime(2002, 12, 30, 22, 15, 38, 827738)
>>> d.replace(year=2001, hour = 12)
datetime.datetime(2001, 12, 30, 12, 15, 38, 827738)
>>>

Instances can be compared, hashed, and converted to strings (the result is the same as that of isoformat()). date and datetime instances can be subtracted from each other, and added to timedelta instances. The largest missing feature is that there’s no standard library support for parsing strings and getting back a date or datetime.

For more information, refer to the module’s reference documentation. (Contributed by Tim Peters.)

The optparse Module¶

The getopt module provides simple parsing of command-line arguments. The new optparse module (originally named Optik) provides more elaborate command-line parsing that follows the Unix conventions, automatically creates the output for --help, and can perform different actions for different options.

You start by creating an instance of OptionParser and telling it what your program’s options are.

import sys
from optparse import OptionParser

op = OptionParser()
op.add_option('-i', '--input',
              action='store', type='string', dest='input',
              help='set input filename')
op.add_option('-l', '--length',
              action='store', type='int', dest='length',
              help='set maximum length of output')

Parsing a command line is then done by calling the parse_args() method.

options, args = op.parse_args(sys.argv[1:])
print options
print args

This returns an object containing all of the option values, and a list of strings containing the remaining arguments.

Invoking the script with the various arguments now works as you’d expect it to. Note that the length argument is automatically converted to an integer.

$ ./python opt.py -i data arg1
<Values at 0x400cad4c: {'input': 'data', 'length': None}>
['arg1']
$ ./python opt.py --input=data --length=4
<Values at 0x400cad2c: {'input': 'data', 'length': 4}>
[]
$

The help message is automatically generated for you:

$ ./python opt.py --help
usage: opt.py [options]

options:
  -h, --help            show this help message and exit
  -iINPUT, --input=INPUT
                        set input filename
  -lLENGTH, --length=LENGTH
                        set maximum length of output
$

See the module’s documentation for more details.

Optik was written by Greg Ward, with suggestions from the readers of the Getopt SIG.

Pymalloc: A Specialized Object Allocator¶

Pymalloc, a specialized object allocator written by Vladimir Marangozov, was a feature added to Python 2.1. Pymalloc is intended to be faster than the system malloc() and to have less memory overhead for allocation patterns typical of Python programs. The allocator uses C’s malloc() function to get large pools of memory and then fulfills smaller memory requests from these pools.

In 2.1 and 2.2, pymalloc was an experimental feature and wasn’t enabled by default; you had to explicitly enable it when compiling Python by providing the --with-pymalloc option to the configure script. In 2.3, pymalloc has had further enhancements and is now enabled by default; you’ll have to supply --without-pymalloc to disable it.

This change is transparent to code written in Python; however, pymalloc may expose bugs in C extensions. Authors of C extension modules should test their code with pymalloc enabled, because some incorrect code may cause core dumps at runtime.

There’s one particularly common error that causes problems. There are a number of memory allocation functions in Python’s C API that have previously just been aliases for the C library’s malloc() and free(), meaning that if you accidentally called mismatched functions the error wouldn’t be noticeable. When the object allocator is enabled, these functions aren’t aliases of malloc() and free() any more, and calling the wrong function to free memory may get you a core dump. For example, if memory was allocated using PyObject_Malloc(), it has to be freed using PyObject_Free(), not free(). A few modules included with Python fell afoul of this and had to be fixed; doubtless there are more third-party modules that will have the same problem.

As part of this change, the confusing multiple interfaces for allocating memory have been consolidated down into two API families. Memory allocated with one family must not be manipulated with functions from the other family. There is one family for allocating chunks of memory and another family of functions specifically for allocating Python objects.

To allocate and free an undistinguished chunk of memory use the “raw memory” family: PyMem_Malloc(), PyMem_Realloc(), and PyMem_Free().
The “object memory” family is the interface to the pymalloc facility described above and is biased towards a large number of “small” allocations: PyObject_Malloc(), PyObject_Realloc(), and PyObject_Free().
To allocate and free Python objects, use the “object” family PyObject_New(), PyObject_NewVar(), and PyObject_Del().

Thanks to lots of work by Tim Peters, pymalloc in 2.3 also provides debugging features to catch memory overwrites and doubled frees in both extension modules and in the interpreter itself. To enable this support, compile a debugging version of the Python interpreter by running configure with --with-pydebug.

To aid extension writers, a header file Misc/pymemcompat.h is distributed with the source to Python 2.3 that allows Python extensions to use the 2.3 interfaces to memory allocation while compiling against any version of Python since 1.5.2. You would copy the file from Python’s source distribution and bundle it with the source of your extension.

See also

http://svn.python.org/view/python/trunk/Objects/obmalloc.c: For the full details of the pymalloc implementation, see the comments at the top of the file Objects/obmalloc.c in the Python source code. The above link points to the file within the python.org SVN browser.

Build and C API Changes¶

Changes to Python’s build process and to the C API include:

The cycle detection implementation used by the garbage collection has proven to be stable, so it’s now been made mandatory. You can no longer compile Python without it, and the --with-cycle-gc switch to configure has been removed.
Python can now optionally be built as a shared library (libpython2.3.so) by supplying --enable-shared when running Python’s configure script. (Contributed by Ondrej Palkovsky.)
The DL_EXPORT and DL_IMPORT macros are now deprecated. Initialization functions for Python extension modules should now be declared using the new macro PyMODINIT_FUNC, while the Python core will generally use the PyAPI_FUNC and PyAPI_DATA macros.
The interpreter can be compiled without any docstrings for the built-in functions and modules by supplying --without-doc-strings to the configure script. This makes the Python executable about 10% smaller, but will also mean that you can’t get help for Python’s built-ins. (Contributed by Gustavo Niemeyer.)
The PyArg_NoArgs() macro is now deprecated, and code that uses it should be changed. For Python 2.2 and later, the method definition table can specify the METH_NOARGS flag, signalling that there are no arguments, and the argument checking can then be removed. If compatibility with pre-2.2 versions of Python is important, the code could use PyArg_ParseTuple(args, "") instead, but this will be slower than using METH_NOARGS.
PyArg_ParseTuple() accepts new format characters for various sizes of unsigned integers: B for unsigned char, H for unsigned short int, I for unsigned int, and K for unsigned long long.
A new function, PyObject_DelItemString(mapping, char *key)() was added as shorthand for PyObject_DelItem(mapping, PyString_New(key)).
File objects now manage their internal string buffer differently, increasing it exponentially when needed. This results in the benchmark tests in Lib/test/test_bufio.py speeding up considerably (from 57 seconds to 1.7 seconds, according to one measurement).
It’s now possible to define class and static methods for a C extension type by setting either the METH_CLASS or METH_STATIC flags in a method’s PyMethodDef structure.
Python now includes a copy of the Expat XML parser’s source code, removing any dependence on a system version or local installation of Expat.
If you dynamically allocate type objects in your extension, you should be aware of a change in the rules relating to the __module__ and __name__ attributes. In summary, you will want to ensure the type’s dictionary contains a '__module__' key; making the module name the part of the type name leading up to the final period will no longer have the desired effect. For more detail, read the API reference documentation or the source.

Port-Specific Changes¶

Support for a port to IBM’s OS/2 using the EMX runtime environment was merged into the main Python source tree. EMX is a POSIX emulation layer over the OS/2 system APIs. The Python port for EMX tries to support all the POSIX-like capability exposed by the EMX runtime, and mostly succeeds; fork() and fcntl() are restricted by the limitations of the underlying emulation layer. The standard OS/2 port, which uses IBM’s Visual Age compiler, also gained support for case-sensitive import semantics as part of the integration of the EMX port into CVS. (Contributed by Andrew MacIntyre.)

On MacOS, most toolbox modules have been weaklinked to improve backward compatibility. This means that modules will no longer fail to load if a single routine is missing on the current OS version. Instead calling the missing routine will raise an exception. (Contributed by Jack Jansen.)

The RPM spec files, found in the Misc/RPM/ directory in the Python source distribution, were updated for 2.3. (Contributed by Sean Reifschneider.)

Other new platforms now supported by Python include AtheOS (http://www.atheos.cx/), GNU/Hurd, and OpenVMS.

Other Changes and Fixes¶

As usual, there were a bunch of other improvements and bugfixes scattered throughout the source tree. A search through the CVS change logs finds there were 523 patches applied and 514 bugs fixed between Python 2.2 and 2.3. Both figures are likely to be underestimates.

Some of the more notable changes are:

If the PYTHONINSPECT environment variable is set, the Python interpreter will enter the interactive prompt after running a Python program, as if Python had been invoked with the -i option. The environment variable can be set before running the Python interpreter, or it can be set by the Python program as part of its execution.
The regrtest.py script now provides a way to allow “all resources except foo.” A resource name passed to the -u option can now be prefixed with a hyphen ('-') to mean “remove this resource.” For example, the option ‘-uall,-bsddb‘ could be used to enable the use of all resources except bsddb.
The tools used to build the documentation now work under Cygwin as well as Unix.
The SET_LINENO opcode has been removed. Back in the mists of time, this opcode was needed to produce line numbers in tracebacks and support trace functions (for, e.g., pdb). Since Python 1.5, the line numbers in tracebacks have been computed using a different mechanism that works with “python -O”. For Python 2.3 Michael Hudson implemented a similar scheme to determine when to call the trace function, removing the need for SET_LINENO entirely.

It would be difficult to detect any resulting difference from Python code, apart from a slight speed up when Python is run without -O.

C extensions that access the f_lineno field of frame objects should instead call PyCode_Addr2Line(f->f_code, f->f_lasti). This will have the added effect of making the code work as desired under “python -O” in earlier versions of Python.

A nifty new feature is that trace functions can now assign to the f_lineno attribute of frame objects, changing the line that will be executed next. A jump command has been added to the pdb debugger taking advantage of this new feature. (Implemented by Richie Hindle.)

Porting to Python 2.3¶

This section lists previously described changes that may require changes to your code:

yield is now always a keyword; if it’s used as a variable name in your code, a different name must be chosen.
For strings X and Y, X in Y now works if X is more than one character long.
The int() type constructor will now return a long integer instead of raising an OverflowError when a string or floating-point number is too large to fit into an integer.
If you have Unicode strings that contain 8-bit characters, you must declare the file’s encoding (UTF-8, Latin-1, or whatever) by adding a comment to the top of the file. See section PEP 263: Source Code Encodings for more information.
Calling Tcl methods through _tkinter no longer returns only strings. Instead, if Tcl returns other objects those objects are converted to their Python equivalent, if one exists, or wrapped with a _tkinter.Tcl_Obj object if no Python equivalent exists.
Large octal and hex literals such as 0xffffffff now trigger a FutureWarning. Currently they’re stored as 32-bit numbers and result in a negative value, but in Python 2.4 they’ll become positive long integers.

There are a few ways to fix this warning. If you really need a positive number, just add an L to the end of the literal. If you’re trying to get a 32-bit integer with low bits set and have previously used an expression such as ~(1 << 31), it’s probably clearest to start with all bits set and clear the desired upper bits. For example, to clear just the top bit (bit 31), you could write 0xffffffffL &~(1L<<31).
You can no longer disable assertions by assigning to __debug__.
The Distutils setup() function has gained various new keyword arguments such as depends. Old versions of the Distutils will abort if passed unknown keywords. A solution is to check for the presence of the new get_distutil_options() function in your setup.py and only uses the new keywords with a version of the Distutils that supports them:
```
from distutils import core

kw = {'sources': 'foo.c', ...}
if hasattr(core, 'get_distutil_options'):
    kw['depends'] = ['foo.h']
ext = Extension(**kw)
```
Using None as a variable name will now result in a SyntaxWarning warning.
Names of extension types defined by the modules included with Python now contain the module and a '.' in front of the type name.

Acknowledgements¶

The author would like to thank the following people for offering suggestions, corrections and assistance with various drafts of this article: Jeff Bauer, Simon Brunning, Brett Cannon, Michael Chermside, Andrew Dalke, Scott David Daniels, Fred L. Drake, Jr., David Fraser, Kelly Gerber, Raymond Hettinger, Michael Hudson, Chris Lambert, Detlef Lannert, Martin von Löwis, Andrew MacIntyre, Lalo Martins, Chad Netzer, Gustavo Niemeyer, Neal Norwitz, Hans Nowak, Chris Reedy, Francesco Ricciardi, Vinay Sajip, Neil Schemenauer, Roman Suzi, Jason Tishler, Just van Rossum.

What’s New in Python 2.2¶

Author:	A.M. Kuchling

Introduction¶

This article explains the new features in Python 2.2.2, released on October 14, 2002. Python 2.2.2 is a bugfix release of Python 2.2, originally released on December 21, 2001.

Python 2.2 can be thought of as the “cleanup release”. There are some features such as generators and iterators that are completely new, but most of the changes, significant and far-reaching though they may be, are aimed at cleaning up irregularities and dark corners of the language design.

This article doesn’t attempt to provide a complete specification of the new features, but instead provides a convenient overview. For full details, you should refer to the documentation for Python 2.2, such as the Python Library Reference and the Python Reference Manual. If you want to understand the complete implementation and design rationale for a change, refer to the PEP for a particular new feature.

PEPs 252 and 253: Type and Class Changes¶

The largest and most far-reaching changes in Python 2.2 are to Python’s model of objects and classes. The changes should be backward compatible, so it’s likely that your code will continue to run unchanged, but the changes provide some amazing new capabilities. Before beginning this, the longest and most complicated section of this article, I’ll provide an overview of the changes and offer some comments.

A long time ago I wrote a Web page listing flaws in Python’s design. One of the most significant flaws was that it’s impossible to subclass Python types implemented in C. In particular, it’s not possible to subclass built-in types, so you can’t just subclass, say, lists in order to add a single useful method to them. The UserList module provides a class that supports all of the methods of lists and that can be subclassed further, but there’s lots of C code that expects a regular Python list and won’t accept a UserList instance.

Python 2.2 fixes this, and in the process adds some exciting new capabilities. A brief summary:

You can subclass built-in types such as lists and even integers, and your subclasses should work in every place that requires the original type.
It’s now possible to define static and class methods, in addition to the instance methods available in previous versions of Python.
It’s also possible to automatically call methods on accessing or setting an instance attribute by using a new mechanism called properties. Many uses of __getattr__() can be rewritten to use properties instead, making the resulting code simpler and faster. As a small side benefit, attributes can now have docstrings, too.
The list of legal attributes for an instance can be limited to a particular set using slots, making it possible to safeguard against typos and perhaps make more optimizations possible in future versions of Python.

Some users have voiced concern about all these changes. Sure, they say, the new features are neat and lend themselves to all sorts of tricks that weren’t possible in previous versions of Python, but they also make the language more complicated. Some people have said that they’ve always recommended Python for its simplicity, and feel that its simplicity is being lost.

Personally, I think there’s no need to worry. Many of the new features are quite esoteric, and you can write a lot of Python code without ever needed to be aware of them. Writing a simple class is no more difficult than it ever was, so you don’t need to bother learning or teaching them unless they’re actually needed. Some very complicated tasks that were previously only possible from C will now be possible in pure Python, and to my mind that’s all for the better.

I’m not going to attempt to cover every single corner case and small change that were required to make the new features work. Instead this section will paint only the broad strokes. See section Related Links, “Related Links”, for further sources of information about Python 2.2’s new object model.

Old and New Classes¶

First, you should know that Python 2.2 really has two kinds of classes: classic or old-style classes, and new-style classes. The old-style class model is exactly the same as the class model in earlier versions of Python. All the new features described in this section apply only to new-style classes. This divergence isn’t intended to last forever; eventually old-style classes will be dropped, possibly in Python 3.0.

So how do you define a new-style class? You do it by subclassing an existing new-style class. Most of Python’s built-in types, such as integers, lists, dictionaries, and even files, are new-style classes now. A new-style class named object, the base class for all built-in types, has also been added so if no built-in type is suitable, you can just subclass object:

class C(object):
    def __init__ (self):
        ...
    ...

This means that class statements that don’t have any base classes are always classic classes in Python 2.2. (Actually you can also change this by setting a module-level variable named __metaclass__ — see PEP 253 for the details — but it’s easier to just subclass object.)

The type objects for the built-in types are available as built-ins, named using a clever trick. Python has always had built-in functions named int(), float(), and str(). In 2.2, they aren’t functions any more, but type objects that behave as factories when called.

>>> int
<type 'int'>
>>> int('123')
123

To make the set of types complete, new type objects such as dict() and file() have been added. Here’s a more interesting example, adding a lock() method to file objects:

class LockableFile(file):
    def lock (self, operation, length=0, start=0, whence=0):
        import fcntl
        return fcntl.lockf(self.fileno(), operation,
                           length, start, whence)

The now-obsolete posixfile module contained a class that emulated all of a file object’s methods and also added a lock() method, but this class couldn’t be passed to internal functions that expected a built-in file, something which is possible with our new LockableFile.

Descriptors¶

In previous versions of Python, there was no consistent way to discover what attributes and methods were supported by an object. There were some informal conventions, such as defining __members__ and __methods__ attributes that were lists of names, but often the author of an extension type or a class wouldn’t bother to define them. You could fall back on inspecting the __dict__ of an object, but when class inheritance or an arbitrary __getattr__() hook were in use this could still be inaccurate.

The one big idea underlying the new class model is that an API for describing the attributes of an object using descriptors has been formalized. Descriptors specify the value of an attribute, stating whether it’s a method or a field. With the descriptor API, static methods and class methods become possible, as well as more exotic constructs.

Attribute descriptors are objects that live inside class objects, and have a few attributes of their own:

__name__ is the attribute’s name.
__doc__ is the attribute’s docstring.
__get__(object)() is a method that retrieves the attribute value from object.
__set__(object, value)() sets the attribute on object to value.
__delete__(object, value)() deletes the value attribute of object.

For example, when you write obj.x, the steps that Python actually performs are:

descriptor = obj.__class__.x
descriptor.__get__(obj)

For methods, descriptor.__get__() returns a temporary object that’s callable, and wraps up the instance and the method to be called on it. This is also why static methods and class methods are now possible; they have descriptors that wrap up just the method, or the method and the class. As a brief explanation of these new kinds of methods, static methods aren’t passed the instance, and therefore resemble regular functions. Class methods are passed the class of the object, but not the object itself. Static and class methods are defined like this:

class C(object):
    def f(arg1, arg2):
        ...
    f = staticmethod(f)

    def g(cls, arg1, arg2):
        ...
    g = classmethod(g)

The staticmethod() function takes the function f(), and returns it wrapped up in a descriptor so it can be stored in the class object. You might expect there to be special syntax for creating such methods (def static f, defstatic f(), or something like that) but no such syntax has been defined yet; that’s been left for future versions of Python.

More new features, such as slots and properties, are also implemented as new kinds of descriptors, and it’s not difficult to write a descriptor class that does something novel. For example, it would be possible to write a descriptor class that made it possible to write Eiffel-style preconditions and postconditions for a method. A class that used this feature might be defined like this:

from eiffel import eiffelmethod

class C(object):
    def f(self, arg1, arg2):
        # The actual function
        ...
    def pre_f(self):
        # Check preconditions
        ...
    def post_f(self):
        # Check postconditions
        ...

    f = eiffelmethod(f, pre_f, post_f)

Note that a person using the new eiffelmethod() doesn’t have to understand anything about descriptors. This is why I think the new features don’t increase the basic complexity of the language. There will be a few wizards who need to know about it in order to write eiffelmethod() or the ZODB or whatever, but most users will just write code on top of the resulting libraries and ignore the implementation details.

Multiple Inheritance: The Diamond Rule¶

Multiple inheritance has also been made more useful through changing the rules under which names are resolved. Consider this set of classes (diagram taken from PEP 253 by Guido van Rossum):

      class A:
        ^ ^  def save(self): ...
       /   \
      /     \
     /       \
    /         \
class B     class C:
    ^         ^  def save(self): ...
     \       /
      \     /
       \   /
        \ /
      class D

The lookup rule for classic classes is simple but not very smart; the base classes are searched depth-first, going from left to right. A reference to D.save() will search the classes D, B, and then A, where save() would be found and returned. C.save() would never be found at all. This is bad, because if C‘s save() method is saving some internal state specific to C, not calling it will result in that state never getting saved.

New-style classes follow a different algorithm that’s a bit more complicated to explain, but does the right thing in this situation. (Note that Python 2.3 changes this algorithm to one that produces the same results in most cases, but produces more useful results for really complicated inheritance graphs.)

List all the base classes, following the classic lookup rule and include a class multiple times if it’s visited repeatedly. In the above example, the list of visited classes is [D, B, A, C, A].
Scan the list for duplicated classes. If any are found, remove all but one occurrence, leaving the last one in the list. In the above example, the list becomes [D, B, C, A] after dropping duplicates.

Following this rule, referring to D.save() will return C.save(), which is the behaviour we’re after. This lookup rule is the same as the one followed by Common Lisp. A new built-in function, super(), provides a way to get at a class’s superclasses without having to reimplement Python’s algorithm. The most commonly used form will be super(class, obj)(), which returns a bound superclass object (not the actual class object). This form will be used in methods to call a method in the superclass; for example, D‘s save() method would look like this:

class D (B,C):
    def save (self):
        # Call superclass .save()
        super(D, self).save()
        # Save D's private information here
        ...

super() can also return unbound superclass objects when called as super(class)() or super(class1, class2)(), but this probably won’t often be useful.

Attribute Access¶

A fair number of sophisticated Python classes define hooks for attribute access using __getattr__(); most commonly this is done for convenience, to make code more readable by automatically mapping an attribute access such as obj.parent into a method call such as obj.get_parent. Python 2.2 adds some new ways of controlling attribute access.

First, __getattr__(attr_name)() is still supported by new-style classes, and nothing about it has changed. As before, it will be called when an attempt is made to access obj.foo and no attribute named foo is found in the instance’s dictionary.

New-style classes also support a new method, __getattribute__(attr_name)(). The difference between the two methods is that __getattribute__() is always called whenever any attribute is accessed, while the old __getattr__() is only called if foo isn’t found in the instance’s dictionary.

However, Python 2.2’s support for properties will often be a simpler way to trap attribute references. Writing a __getattr__() method is complicated because to avoid recursion you can’t use regular attribute accesses inside them, and instead have to mess around with the contents of __dict__. __getattr__() methods also end up being called by Python when it checks for other methods such as __repr__() or __coerce__(), and so have to be written with this in mind. Finally, calling a function on every attribute access results in a sizable performance loss.

property is a new built-in type that packages up three functions that get, set, or delete an attribute, and a docstring. For example, if you want to define a size attribute that’s computed, but also settable, you could write:

class C(object):
    def get_size (self):
        result = ... computation ...
        return result
    def set_size (self, size):
        ... compute something based on the size
        and set internal state appropriately ...

    # Define a property.  The 'delete this attribute'
    # method is defined as None, so the attribute
    # can't be deleted.
    size = property(get_size, set_size,
                    None,
                    "Storage size of this instance")

That is certainly clearer and easier to write than a pair of __getattr__()/__setattr__() methods that check for the size attribute and handle it specially while retrieving all other attributes from the instance’s __dict__. Accesses to size are also the only ones which have to perform the work of calling a function, so references to other attributes run at their usual speed.

Finally, it’s possible to constrain the list of attributes that can be referenced on an object using the new __slots__ class attribute. Python objects are usually very dynamic; at any time it’s possible to define a new attribute on an instance by just doing obj.new_attr=1. A new-style class can define a class attribute named __slots__ to limit the legal attributes to a particular set of names. An example will make this clear:

>>> class C(object):
...     __slots__ = ('template', 'name')
...
>>> obj = C()
>>> print obj.template
None
>>> obj.template = 'Test'
>>> print obj.template
Test
>>> obj.newattr = None
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: 'C' object has no attribute 'newattr'

Note how you get an AttributeError on the attempt to assign to an attribute not listed in __slots__.

PEP 234: Iterators¶

Another significant addition to 2.2 is an iteration interface at both the C and Python levels. Objects can define how they can be looped over by callers.

In Python versions up to 2.1, the usual way to make for item in obj work is to define a __getitem__() method that looks something like this:

def __getitem__(self, index):
    return <next item>

__getitem__() is more properly used to define an indexing operation on an object so that you can write obj[5] to retrieve the sixth element. It’s a bit misleading when you’re using this only to support for loops. Consider some file-like object that wants to be looped over; the index parameter is essentially meaningless, as the class probably assumes that a series of __getitem__() calls will be made with index incrementing by one each time. In other words, the presence of the __getitem__() method doesn’t mean that using file[5] to randomly access the sixth element will work, though it really should.

In Python 2.2, iteration can be implemented separately, and __getitem__() methods can be limited to classes that really do support random access. The basic idea of iterators is simple. A new built-in function, iter(obj)() or iter(C, sentinel), is used to get an iterator. iter(obj)() returns an iterator for the object obj, while iter(C, sentinel) returns an iterator that will invoke the callable object C until it returns sentinel to signal that the iterator is done.

Python classes can define an __iter__() method, which should create and return a new iterator for the object; if the object is its own iterator, this method can just return self. In particular, iterators will usually be their own iterators. Extension types implemented in C can implement a tp_iter function in order to return an iterator, and extension types that want to behave as iterators can define a tp_iternext function.

So, after all this, what do iterators actually do? They have one required method, next(), which takes no arguments and returns the next value. When there are no more values to be returned, calling next() should raise the StopIteration exception.

>>> L = [1,2,3]
>>> i = iter(L)
>>> print i
<iterator object at 0x8116870>
>>> i.next()
1
>>> i.next()
2
>>> i.next()
3
>>> i.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
StopIteration
>>>

In 2.2, Python’s for statement no longer expects a sequence; it expects something for which iter() will return an iterator. For backward compatibility and convenience, an iterator is automatically constructed for sequences that don’t implement __iter__() or a tp_iter slot, so for i in [1,2,3] will still work. Wherever the Python interpreter loops over a sequence, it’s been changed to use the iterator protocol. This means you can do things like this:

>>> L = [1,2,3]
>>> i = iter(L)
>>> a,b,c = i
>>> a,b,c
(1, 2, 3)

Iterator support has been added to some of Python’s basic types. Calling iter() on a dictionary will return an iterator which loops over its keys:

>>> m = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6,
...      'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12}
>>> for key in m: print key, m[key]
...
Mar 3
Feb 2
Aug 8
Sep 9
May 5
Jun 6
Jul 7
Jan 1
Apr 4
Nov 11
Dec 12
Oct 10

That’s just the default behaviour. If you want to iterate over keys, values, or key/value pairs, you can explicitly call the iterkeys(), itervalues(), or iteritems() methods to get an appropriate iterator. In a minor related change, the in operator now works on dictionaries, so key in dict is now equivalent to dict.has_key(key).

Files also provide an iterator, which calls the readline() method until there are no more lines in the file. This means you can now read each line of a file using code like this:

for line in file:
    # do something for each line
    ...

Note that you can only go forward in an iterator; there’s no way to get the previous element, reset the iterator, or make a copy of it. An iterator object could provide such additional capabilities, but the iterator protocol only requires a next() method.

See also

PEP 234 - Iterators: Written by Ka-Ping Yee and GvR; implemented by the Python Labs crew, mostly by GvR and Tim Peters.

PEP 255: Simple Generators¶

Generators are another new feature, one that interacts with the introduction of iterators.

Here’s the simplest example of a generator function:

def generate_ints(N):
    for i in range(N):
        yield i

A new keyword, yield, was introduced for generators. Any function containing a yield statement is a generator function; this is detected by Python’s bytecode compiler which compiles the function specially as a result. Because a new keyword was introduced, generators must be explicitly enabled in a module by including a from __future__ import generators statement near the top of the module’s source code. In Python 2.3 this statement will become unnecessary.

When you call a generator function, it doesn’t return a single value; instead it returns a generator object that supports the iterator protocol. On executing the yield statement, the generator outputs the value of i, similar to a return statement. The big difference between yield and a return statement is that on reaching a yield the generator’s state of execution is suspended and local variables are preserved. On the next call to the generator’s next() method, the function will resume executing immediately after the yield statement. (For complicated reasons, the yield statement isn’t allowed inside the try block of a try...finally statement; read PEP 255 for a full explanation of the interaction between yield and exceptions.)

Here’s a sample usage of the generate_ints() generator:

>>> gen = generate_ints(3)
>>> gen
<generator object at 0x8117f90>
>>> gen.next()
0
>>> gen.next()
1
>>> gen.next()
2
>>> gen.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "<stdin>", line 2, in generate_ints
StopIteration

You could equally write for i in generate_ints(5), or a,b,c = generate_ints(3).

# A recursive generator that generates Tree leaves in in-order.
def inorder(t):
    if t:
        for x in inorder(t.left):
            yield x
        yield t.label
        for x in inorder(t.right):
            yield x

sentence := "Store it in the neighboring harbor"
if (i := find("or", sentence)) > 5 then write(i)

Python doesn’t go nearly as far as Icon in adopting generators as a central concept. Generators are considered a new part of the core Python language, but learning or using them isn’t compulsory; if they don’t solve any problems that you have, feel free to ignore them. One novel feature of Python’s interface as compared to Icon’s is that a generator’s state is represented as a concrete object (the iterator) that can be passed around to other functions or stored in a data structure.

See also

PEP 255 - Simple Generators: Written by Neil Schemenauer, Tim Peters, Magnus Lie Hetland. Implemented mostly by Neil Schemenauer and Tim Peters, with other fixes from the Python Labs crew.

PEP 237: Unifying Long Integers and Integers¶

In recent versions, the distinction between regular integers, which are 32-bit values on most machines, and long integers, which can be of arbitrary size, was becoming an annoyance. For example, on platforms that support files larger than 2**32 bytes, the tell() method of file objects has to return a long integer. However, there were various bits of Python that expected plain integers and would raise an error if a long integer was provided instead. For example, in Python 1.5, only regular integers could be used as a slice index, and 'abc'[1L:] would raise a TypeError exception with the message ‘slice index must be int’.

Python 2.2 will shift values from short to long integers as required. The ‘L’ suffix is no longer needed to indicate a long integer literal, as now the compiler will choose the appropriate type. (Using the ‘L’ suffix will be discouraged in future 2.x versions of Python, triggering a warning in Python 2.4, and probably dropped in Python 3.0.) Many operations that used to raise an OverflowError will now return a long integer as their result. For example:

>>> 1234567890123
1234567890123L
>>> 2 ** 64
18446744073709551616L

In most cases, integers and long integers will now be treated identically. You can still distinguish them with the type() built-in function, but that’s rarely needed.

See also

PEP 237 - Unifying Long Integers and Integers: Written by Moshe Zadka and Guido van Rossum. Implemented mostly by Guido van Rossum.

PEP 238: Changing the Division Operator¶

The most controversial change in Python 2.2 heralds the start of an effort to fix an old design flaw that’s been in Python from the beginning. Currently Python’s division operator, /, behaves like C’s division operator when presented with two integer arguments: it returns an integer result that’s truncated down when there would be a fractional part. For example, 3/2 is 1, not 1.5, and (-1)/2 is -1, not -0.5. This means that the results of division can vary unexpectedly depending on the type of the two operands and because Python is dynamically typed, it can be difficult to determine the possible types of the operands.

(The controversy is over whether this is really a design flaw, and whether it’s worth breaking existing code to fix this. It’s caused endless discussions on python-dev, and in July 2001 erupted into an storm of acidly sarcastic postings on comp.lang.python. I won’t argue for either side here and will stick to describing what’s implemented in 2.2. Read PEP 238 for a summary of arguments and counter-arguments.)

Because this change might break code, it’s being introduced very gradually. Python 2.2 begins the transition, but the switch won’t be complete until Python 3.0.

First, I’ll borrow some terminology from PEP 238. “True division” is the division that most non-programmers are familiar with: 3/2 is 1.5, 1/4 is 0.25, and so forth. “Floor division” is what Python’s / operator currently does when given integer operands; the result is the floor of the value returned by true division. “Classic division” is the current mixed behaviour of /; it returns the result of floor division when the operands are integers, and returns the result of true division when one of the operands is a floating-point number.

Here are the changes 2.2 introduces:

A new operator, //, is the floor division operator. (Yes, we know it looks like C++’s comment symbol.) // always performs floor division no matter what the types of its operands are, so 1 // 2 is 0 and 1.0 // 2.0 is also 0.0.

// is always available in Python 2.2; you don’t need to enable it using a __future__ statement.
By including a from __future__ import division in a module, the / operator will be changed to return the result of true division, so 1/2 is 0.5. Without the __future__ statement, / still means classic division. The default meaning of / will not change until Python 3.0.
Classes can define methods called __truediv__() and __floordiv__() to overload the two division operators. At the C level, there are also slots in the PyNumberMethods structure so extension types can define the two operators.
Python 2.2 supports some command-line arguments for testing whether code will works with the changed division semantics. Running python with -Q warn will cause a warning to be issued whenever division is applied to two integers. You can use this to find code that’s affected by the change and fix it. By default, Python 2.2 will simply perform classic division without a warning; the warning will be turned on by default in Python 2.3.

See also

PEP 238 - Changing the Division Operator: Written by Moshe Zadka and Guido van Rossum. Implemented by Guido van Rossum..

Unicode Changes¶

Python’s Unicode support has been enhanced a bit in 2.2. Unicode strings are usually stored as UCS-2, as 16-bit unsigned integers. Python 2.2 can also be compiled to use UCS-4, 32-bit unsigned integers, as its internal encoding by supplying --enable-unicode=ucs4 to the configure script. (It’s also possible to specify --disable-unicode to completely disable Unicode support.)

When built to use UCS-4 (a “wide Python”), the interpreter can natively handle Unicode characters from U+000000 to U+110000, so the range of legal values for the unichr() function is expanded accordingly. Using an interpreter compiled to use UCS-2 (a “narrow Python”), values greater than 65535 will still cause unichr() to raise a ValueError exception. This is all described in PEP 261, “Support for ‘wide’ Unicode characters”; consult it for further details.

Another change is simpler to explain. Since their introduction, Unicode strings have supported an encode() method to convert the string to a selected encoding such as UTF-8 or Latin-1. A symmetric decode([*encoding*])() method has been added to 8-bit strings (though not to Unicode strings) in 2.2. decode() assumes that the string is in the specified encoding and decodes it, returning whatever is returned by the codec.

Using this new feature, codecs have been added for tasks not directly related to Unicode. For example, codecs have been added for uu-encoding, MIME’s base64 encoding, and compression with the zlib module:

>>> s = """Here is a lengthy piece of redundant, overly verbose,
... and repetitive text.
... """
>>> data = s.encode('zlib')
>>> data
'x\x9c\r\xc9\xc1\r\x80 \x10\x04\xc0?Ul...'
>>> data.decode('zlib')
'Here is a lengthy piece of redundant, overly verbose,\nand repetitive text.\n'
>>> print s.encode('uu')
begin 666 <data>
M2&5R92!I<R!A(&QE;F=T:'D@<&EE8V4@;V8@<F5D=6YD86YT+"!O=F5R;'D@
>=F5R8F]S92P*86YD(')E<&5T:71I=F4@=&5X="X*

end
>>> "sheesh".encode('rot-13')
'furrfu'

To convert a class instance to Unicode, a __unicode__() method can be defined by a class, analogous to __str__().

encode(), decode(), and __unicode__() were implemented by Marc-André Lemburg. The changes to support using UCS-4 internally were implemented by Fredrik Lundh and Martin von Löwis.

See also

PEP 261 - Support for ‘wide’ Unicode characters: Written by Paul Prescod.

PEP 227: Nested Scopes¶

In Python 2.1, statically nested scopes were added as an optional feature, to be enabled by a from __future__ import nested_scopes directive. In 2.2 nested scopes no longer need to be specially enabled, and are now always present. The rest of this section is a copy of the description of nested scopes from my “What’s New in Python 2.1” document; if you read it when 2.1 came out, you can skip the rest of this section.

The largest change introduced in Python 2.1, and made complete in 2.2, is to Python’s scoping rules. In Python 2.0, at any given time there are at most three namespaces used to look up variable names: local, module-level, and the built-in namespace. This often surprised people because it didn’t match their intuitive expectations. For example, a nested recursive function definition doesn’t work:

def f():
    ...
    def g(value):
        ...
        return g(value-1) + 1
    ...

The function g() will always raise a NameError exception, because the binding of the name g isn’t in either its local namespace or in the module-level namespace. This isn’t much of a problem in practice (how often do you recursively define interior functions like this?), but this also made using the lambda statement clumsier, and this was a problem in practice. In code which uses lambda you can often find local variables being copied by passing them as the default values of arguments.

def find(self, name):
    "Return list of any entries equal to 'name'"
    L = filter(lambda x, name=name: x == name,
               self.list_attribute)
    return L

The readability of Python code written in a strongly functional style suffers greatly as a result.

The most significant change to Python 2.2 is that static scoping has been added to the language to fix this problem. As a first effect, the name=name default argument is now unnecessary in the above example. Put simply, when a given variable name is not assigned a value within a function (by an assignment, or the def, class, or import statements), references to the variable will be looked up in the local namespace of the enclosing scope. A more detailed explanation of the rules, and a dissection of the implementation, can be found in the PEP.

This change may cause some compatibility problems for code where the same variable name is used both at the module level and as a local variable within a function that contains further function definitions. This seems rather unlikely though, since such code would have been pretty confusing to read in the first place.

One side effect of the change is that the from module import * and exec statements have been made illegal inside a function scope under certain conditions. The Python reference manual has said all along that from module import * is only legal at the top level of a module, but the CPython interpreter has never enforced this before. As part of the implementation of nested scopes, the compiler which turns Python source into bytecodes has to generate different code to access variables in a containing scope. from module import * and exec make it impossible for the compiler to figure this out, because they add names to the local namespace that are unknowable at compile time. Therefore, if a function contains function definitions or lambda expressions with free variables, the compiler will flag this by raising a SyntaxError exception.

To make the preceding explanation a bit clearer, here’s an example:

x = 1
def f():
    # The next line is a syntax error
    exec 'x=2'
    def g():
        return x

Line 4 containing the exec statement is a syntax error, since exec would define a new local variable named x whose value should be accessed by g().

This shouldn’t be much of a limitation, since exec is rarely used in most Python code (and when it is used, it’s often a sign of a poor design anyway).

See also

PEP 227 - Statically Nested Scopes: Written and implemented by Jeremy Hylton.

New and Improved Modules¶

The xmlrpclib module was contributed to the standard library by Fredrik Lundh, providing support for writing XML-RPC clients. XML-RPC is a simple remote procedure call protocol built on top of HTTP and XML. For example, the following snippet retrieves a list of RSS channels from the O’Reilly Network, and then lists the recent headlines for one channel:

import xmlrpclib
s = xmlrpclib.Server(
      'http://www.oreillynet.com/meerkat/xml-rpc/server.php')
channels = s.meerkat.getChannels()
# channels is a list of dictionaries, like this:
# [{'id': 4, 'title': 'Freshmeat Daily News'}
#  {'id': 190, 'title': '32Bits Online'},
#  {'id': 4549, 'title': '3DGamers'}, ... ]

# Get the items for one channel
items = s.meerkat.getItems( {'channel': 4} )

# 'items' is another list of dictionaries, like this:
# [{'link': 'http://freshmeat.net/releases/52719/',
#   'description': 'A utility which converts HTML to XSL FO.',
#   'title': 'html2fo 0.3 (Default)'}, ... ]

The SimpleXMLRPCServer module makes it easy to create straightforward XML-RPC servers. See http://www.xmlrpc.com/ for more information about XML-RPC.

The new hmac module implements the HMAC algorithm described by RFC 2104. (Contributed by Gerhard Häring.)
Several functions that originally returned lengthy tuples now return pseudo- sequences that still behave like tuples but also have mnemonic attributes such as memberst_mtime or tm_year. The enhanced functions include stat(), fstat(), statvfs(), and fstatvfs() in the os module, and localtime(), gmtime(), and strptime() in the time module.

For example, to obtain a file’s size using the old tuples, you’d end up writing something like file_size = os.stat(filename)[stat.ST_SIZE], but now this can be written more clearly as file_size = os.stat(filename).st_size.

The original patch for this feature was contributed by Nick Mathewson.
The Python profiler has been extensively reworked and various errors in its output have been corrected. (Contributed by Fred L. Drake, Jr. and Tim Peters.)
The socket module can be compiled to support IPv6; specify the --enable-ipv6 option to Python’s configure script. (Contributed by Jun-ichiro “itojun” Hagino.)
Two new format characters were added to the struct module for 64-bit integers on platforms that support the C long long type. q is for a signed 64-bit integer, and Q is for an unsigned one. The value is returned in Python’s long integer type. (Contributed by Tim Peters.)
In the interpreter’s interactive mode, there’s a new built-in function help() that uses the pydoc module introduced in Python 2.1 to provide interactive help. help(object) displays any available help text about object. help() with no argument puts you in an online help utility, where you can enter the names of functions, classes, or modules to read their help text. (Contributed by Guido van Rossum, using Ka-Ping Yee’s pydoc module.)
Various bugfixes and performance improvements have been made to the SRE engine underlying the re module. For example, the re.sub() and re.split() functions have been rewritten in C. Another contributed patch speeds up certain Unicode character ranges by a factor of two, and a new finditer() method that returns an iterator over all the non-overlapping matches in a given string. (SRE is maintained by Fredrik Lundh. The BIGCHARSET patch was contributed by Martin von Löwis.)
The smtplib module now supports RFC 2487, “Secure SMTP over TLS”, so it’s now possible to encrypt the SMTP traffic between a Python program and the mail transport agent being handed a message. smtplib also supports SMTP authentication. (Contributed by Gerhard Häring.)
The imaplib module, maintained by Piers Lauder, has support for several new extensions: the NAMESPACE extension defined in RFC 2342, SORT, GETACL and SETACL. (Contributed by Anthony Baxter and Michel Pelletier.)
The rfc822 module’s parsing of email addresses is now compliant with RFC 2822, an update to RFC 822. (The module’s name is not going to be changed to rfc2822.) A new package, email, has also been added for parsing and generating e-mail messages. (Contributed by Barry Warsaw, and arising out of his work on Mailman.)
The difflib module now contains a new Differ class for producing human-readable lists of changes (a “delta”) between two sequences of lines of text. There are also two generator functions, ndiff() and restore(), which respectively return a delta from two sequences, or one of the original sequences from a delta. (Grunt work contributed by David Goodger, from ndiff.py code by Tim Peters who then did the generatorization.)
New constants ascii_letters, ascii_lowercase, and ascii_uppercase were added to the string module. There were several modules in the standard library that used string.letters to mean the ranges A-Za-z, but that assumption is incorrect when locales are in use, because string.letters varies depending on the set of legal characters defined by the current locale. The buggy modules have all been fixed to use ascii_letters instead. (Reported by an unknown person; fixed by Fred L. Drake, Jr.)
The mimetypes module now makes it easier to use alternative MIME-type databases by the addition of a MimeTypes class, which takes a list of filenames to be parsed. (Contributed by Fred L. Drake, Jr.)
A Timer class was added to the threading module that allows scheduling an activity to happen at some future time. (Contributed by Itamar Shtull-Trauring.)

Interpreter Changes and Fixes¶

Some of the changes only affect people who deal with the Python interpreter at the C level because they’re writing Python extension modules, embedding the interpreter, or just hacking on the interpreter itself. If you only write Python code, none of the changes described here will affect you very much.

Profiling and tracing functions can now be implemented in C, which can operate at much higher speeds than Python-based functions and should reduce the overhead of profiling and tracing. This will be of interest to authors of development environments for Python. Two new C functions were added to Python’s API, PyEval_SetProfile() and PyEval_SetTrace(). The existing sys.setprofile() and sys.settrace() functions still exist, and have simply been changed to use the new C-level interface. (Contributed by Fred L. Drake, Jr.)
Another low-level API, primarily of interest to implementors of Python debuggers and development tools, was added. PyInterpreterState_Head() and PyInterpreterState_Next() let a caller walk through all the existing interpreter objects; PyInterpreterState_ThreadHead() and PyThreadState_Next() allow looping over all the thread states for a given interpreter. (Contributed by David Beazley.)
The C-level interface to the garbage collector has been changed to make it easier to write extension types that support garbage collection and to debug misuses of the functions. Various functions have slightly different semantics, so a bunch of functions had to be renamed. Extensions that use the old API will still compile but will not participate in garbage collection, so updating them for 2.2 should be considered fairly high priority.

To upgrade an extension module to the new API, perform the following steps:
Rename Py_TPFLAGS_GC() to PyTPFLAGS_HAVE_GC().
Use PyObject_GC_New() or PyObject_GC_NewVar() to allocate

objects, and PyObject_GC_Del() to deallocate them.
Rename PyObject_GC_Init() to PyObject_GC_Track() and

PyObject_GC_Fini() to PyObject_GC_UnTrack().
Remove PyGC_HEAD_SIZE() from object size calculations.
Remove calls to PyObject_AS_GC() and PyObject_FROM_GC().
A new et format sequence was added to PyArg_ParseTuple(); et takes both a parameter and an encoding name, and converts the parameter to the given encoding if the parameter turns out to be a Unicode string, or leaves it alone if it’s an 8-bit string, assuming it to already be in the desired encoding. This differs from the es format character, which assumes that 8-bit strings are in Python’s default ASCII encoding and converts them to the specified new encoding. (Contributed by M.-A. Lemburg, and used for the MBCS support on Windows described in the following section.)
A different argument parsing function, PyArg_UnpackTuple(), has been added that’s simpler and presumably faster. Instead of specifying a format string, the caller simply gives the minimum and maximum number of arguments expected, and a set of pointers to PyObject* variables that will be filled in with argument values.
Two new flags METH_NOARGS and METH_O are available in method definition tables to simplify implementation of methods with no arguments or a single untyped argument. Calling such methods is more efficient than calling a corresponding method that uses METH_VARARGS. Also, the old METH_OLDARGS style of writing C methods is now officially deprecated.
Two new wrapper functions, PyOS_snprintf() and PyOS_vsnprintf() were added to provide cross-platform implementations for the relatively new snprintf() and vsnprintf() C lib APIs. In contrast to the standard sprintf() and vsprintf() functions, the Python versions check the bounds of the buffer used to protect against buffer overruns. (Contributed by M.-A. Lemburg.)
The _PyTuple_Resize() function has lost an unused parameter, so now it takes 2 parameters instead of 3. The third argument was never used, and can simply be discarded when porting code from earlier versions to Python 2.2.

Other Changes and Fixes¶

As usual there were a bunch of other improvements and bugfixes scattered throughout the source tree. A search through the CVS change logs finds there were 527 patches applied and 683 bugs fixed between Python 2.1 and 2.2; 2.2.1 applied 139 patches and fixed 143 bugs; 2.2.2 applied 106 patches and fixed 82 bugs. These figures are likely to be underestimates.

Some of the more notable changes are:

The code for the MacOS port for Python, maintained by Jack Jansen, is now kept in the main Python CVS tree, and many changes have been made to support MacOS X.

The most significant change is the ability to build Python as a framework, enabled by supplying the --enable-framework option to the configure script when compiling Python. According to Jack Jansen, “This installs a self- contained Python installation plus the OS X framework “glue” into /Library/Frameworks/Python.framework (or another location of choice). For now there is little immediate added benefit to this (actually, there is the disadvantage that you have to change your PATH to be able to find Python), but it is the basis for creating a full-blown Python application, porting the MacPython IDE, possibly using Python as a standard OSA scripting language and much more.”

Most of the MacPython toolbox modules, which interface to MacOS APIs such as windowing, QuickTime, scripting, etc. have been ported to OS X, but they’ve been left commented out in setup.py. People who want to experiment with these modules can uncomment them manually.
Keyword arguments passed to built-in functions that don’t take them now cause a TypeError exception to be raised, with the message “function takes no keyword arguments”.
Weak references, added in Python 2.1 as an extension module, are now part of the core because they’re used in the implementation of new-style classes. The ReferenceError exception has therefore moved from the weakref module to become a built-in exception.
A new script, Tools/scripts/cleanfuture.py by Tim Peters, automatically removes obsolete __future__ statements from Python source code.
An additional flags argument has been added to the built-in function compile(), so the behaviour of __future__ statements can now be correctly observed in simulated shells, such as those presented by IDLE and other development environments. This is described in PEP 264. (Contributed by Michael Hudson.)
The new license introduced with Python 1.6 wasn’t GPL-compatible. This is fixed by some minor textual changes to the 2.2 license, so it’s now legal to embed Python inside a GPLed program again. Note that Python itself is not GPLed, but instead is under a license that’s essentially equivalent to the BSD license, same as it always was. The license changes were also applied to the Python 2.0.1 and 2.1.1 releases.
When presented with a Unicode filename on Windows, Python will now convert it to an MBCS encoded string, as used by the Microsoft file APIs. As MBCS is explicitly used by the file APIs, Python’s choice of ASCII as the default encoding turns out to be an annoyance. On Unix, the locale’s character set is used if locale.nl_langinfo(CODESET)() is available. (Windows support was contributed by Mark Hammond with assistance from Marc-André Lemburg. Unix support was added by Martin von Löwis.)
Large file support is now enabled on Windows. (Contributed by Tim Peters.)
The Tools/scripts/ftpmirror.py script now parses a .netrc file, if you have one. (Contributed by Mike Romberg.)
Some features of the object returned by the xrange() function are now deprecated, and trigger warnings when they’re accessed; they’ll disappear in Python 2.3. xrange objects tried to pretend they were full sequence types by supporting slicing, sequence multiplication, and the in operator, but these features were rarely used and therefore buggy. The tolist() method and the start, stop, and step attributes are also being deprecated. At the C level, the fourth argument to the PyRange_New() function, repeat, has also been deprecated.
There were a bunch of patches to the dictionary implementation, mostly to fix potential core dumps if a dictionary contains objects that sneakily changed their hash value, or mutated the dictionary they were contained in. For a while python-dev fell into a gentle rhythm of Michael Hudson finding a case that dumped core, Tim Peters fixing the bug, Michael finding another case, and round and round it went.
On Windows, Python can now be compiled with Borland C thanks to a number of patches contributed by Stephen Hansen, though the result isn’t fully functional yet. (But this is progress...)
Another Windows enhancement: Wise Solutions generously offered PythonLabs use of their InstallerMaster 8.1 system. Earlier PythonLabs Windows installers used Wise 5.0a, which was beginning to show its age. (Packaged up by Tim Peters.)
Files ending in .pyw can now be imported on Windows. .pyw is a Windows-only thing, used to indicate that a script needs to be run using PYTHONW.EXE instead of PYTHON.EXE in order to prevent a DOS console from popping up to display the output. This patch makes it possible to import such scripts, in case they’re also usable as modules. (Implemented by David Bolen.)
On platforms where Python uses the C dlopen() function to load extension modules, it’s now possible to set the flags used by dlopen() using the sys.getdlopenflags() and sys.setdlopenflags() functions. (Contributed by Bram Stolk.)
The pow() built-in function no longer supports 3 arguments when floating-point numbers are supplied. pow(x, y, z) returns (x**y) % z, but this is never useful for floating point numbers, and the final result varies unpredictably depending on the platform. A call such as pow(2.0, 8.0, 7.0) will now raise a TypeError exception.

Acknowledgements¶

The author would like to thank the following people for offering suggestions, corrections and assistance with various drafts of this article: Fred Bremmer, Keith Briggs, Andrew Dalke, Fred L. Drake, Jr., Carel Fellinger, David Goodger, Mark Hammond, Stephen Hansen, Michael Hudson, Jack Jansen, Marc-André Lemburg, Martin von Löwis, Fredrik Lundh, Michael McLay, Nick Mathewson, Paul Moore, Gustavo Niemeyer, Don O’Donnell, Joonas Paalasma, Tim Peters, Jens Quade, Tom Reinhardt, Neil Schemenauer, Guido van Rossum, Greg Ward, Edward Welbourne.

What’s New in Python 2.1¶

Author:	A.M. Kuchling

Introduction¶

This article explains the new features in Python 2.1. While there aren’t as many changes in 2.1 as there were in Python 2.0, there are still some pleasant surprises in store. 2.1 is the first release to be steered through the use of Python Enhancement Proposals, or PEPs, so most of the sizable changes have accompanying PEPs that provide more complete documentation and a design rationale for the change. This article doesn’t attempt to document the new features completely, but simply provides an overview of the new features for Python programmers. Refer to the Python 2.1 documentation, or to the specific PEP, for more details about any new feature that particularly interests you.

One recent goal of the Python development team has been to accelerate the pace of new releases, with a new release coming every 6 to 9 months. 2.1 is the first release to come out at this faster pace, with the first alpha appearing in January, 3 months after the final version of 2.0 was released.

The final release of Python 2.1 was made on April 17, 2001.

PEP 227: Nested Scopes¶

The largest change in Python 2.1 is to Python’s scoping rules. In Python 2.0, at any given time there are at most three namespaces used to look up variable names: local, module-level, and the built-in namespace. This often surprised people because it didn’t match their intuitive expectations. For example, a nested recursive function definition doesn’t work:

def f():
    ...
    def g(value):
        ...
        return g(value-1) + 1
    ...

def find(self, name):
    "Return list of any entries equal to 'name'"
    L = filter(lambda x, name=name: x == name,
               self.list_attribute)
    return L

The readability of Python code written in a strongly functional style suffers greatly as a result.

The most significant change to Python 2.1 is that static scoping has been added to the language to fix this problem. As a first effect, the name=name default argument is now unnecessary in the above example. Put simply, when a given variable name is not assigned a value within a function (by an assignment, or the def, class, or import statements), references to the variable will be looked up in the local namespace of the enclosing scope. A more detailed explanation of the rules, and a dissection of the implementation, can be found in the PEP.

To make the preceding explanation a bit clearer, here’s an example:

x = 1
def f():
    # The next line is a syntax error
    exec 'x=2'
    def g():
        return x

Line 4 containing the exec statement is a syntax error, since exec would define a new local variable named x whose value should be accessed by g().

This shouldn’t be much of a limitation, since exec is rarely used in most Python code (and when it is used, it’s often a sign of a poor design anyway).

Compatibility concerns have led to nested scopes being introduced gradually; in Python 2.1, they aren’t enabled by default, but can be turned on within a module by using a future statement as described in PEP 236. (See the following section for further discussion of PEP 236.) In Python 2.2, nested scopes will become the default and there will be no way to turn them off, but users will have had all of 2.1’s lifetime to fix any breakage resulting from their introduction.

See also

PEP 227 - Statically Nested Scopes: Written and implemented by Jeremy Hylton.

PEP 236: future Directives¶

The reaction to nested scopes was widespread concern about the dangers of breaking code with the 2.1 release, and it was strong enough to make the Pythoneers take a more conservative approach. This approach consists of introducing a convention for enabling optional functionality in release N that will become compulsory in release N+1.

The syntax uses a from...import statement using the reserved module name __future__. Nested scopes can be enabled by the following statement:

from __future__ import nested_scopes

While it looks like a normal import statement, it’s not; there are strict rules on where such a future statement can be put. They can only be at the top of a module, and must precede any Python code or regular import statements. This is because such statements can affect how the Python bytecode compiler parses code and generates bytecode, so they must precede any statement that will result in bytecodes being produced.

See also

PEP 236 - Back to the __future__: Written by Tim Peters, and primarily implemented by Jeremy Hylton.

PEP 207: Rich Comparisons¶

In earlier versions, Python’s support for implementing comparisons on user- defined classes and extension types was quite simple. Classes could implement a __cmp__() method that was given two instances of a class, and could only return 0 if they were equal or +1 or -1 if they weren’t; the method couldn’t raise an exception or return anything other than a Boolean value. Users of Numeric Python often found this model too weak and restrictive, because in the number-crunching programs that numeric Python is used for, it would be more useful to be able to perform elementwise comparisons of two matrices, returning a matrix containing the results of a given comparison for each element. If the two matrices are of different sizes, then the compare has to be able to raise an exception to signal the error.

In Python 2.1, rich comparisons were added in order to support this need. Python classes can now individually overload each of the <, <=, >, >=, ==, and != operations. The new magic method names are:

Operation	Method name
`<`	`__lt__()`
`<=`	`__le__()`
`>`	`__gt__()`
`>=`	`__ge__()`
`==`	`__eq__()`
`!=`	`__ne__()`

(The magic methods are named after the corresponding Fortran operators .LT.. .LE., &c. Numeric programmers are almost certainly quite familiar with these names and will find them easy to remember.)

Each of these magic methods is of the form method(self, other), where self will be the object on the left-hand side of the operator, while other will be the object on the right-hand side. For example, the expression A < B will cause A.__lt__(B) to be called.

Each of these magic methods can return anything at all: a Boolean, a matrix, a list, or any other Python object. Alternatively they can raise an exception if the comparison is impossible, inconsistent, or otherwise meaningless.

The built-in cmp(A,B)() function can use the rich comparison machinery, and now accepts an optional argument specifying which comparison operation to use; this is given as one of the strings "<", "<=", ">", ">=", "==", or "!=". If called without the optional third argument, cmp() will only return -1, 0, or +1 as in previous versions of Python; otherwise it will call the appropriate method and can return any Python object.

There are also corresponding changes of interest to C programmers; there’s a new slot tp_richcmp in type objects and an API for performing a given rich comparison. I won’t cover the C API here, but will refer you to PEP 207, or to 2.1’s C API documentation, for the full list of related functions.

See also

PEP 207 - Rich Comparisions: Written by Guido van Rossum, heavily based on earlier work by David Ascher, and implemented by Guido van Rossum.

PEP 230: Warning Framework¶

Over its 10 years of existence, Python has accumulated a certain number of obsolete modules and features along the way. It’s difficult to know when a feature is safe to remove, since there’s no way of knowing how much code uses it — perhaps no programs depend on the feature, or perhaps many do. To enable removing old features in a more structured way, a warning framework was added. When the Python developers want to get rid of a feature, it will first trigger a warning in the next version of Python. The following Python version can then drop the feature, and users will have had a full release cycle to remove uses of the old feature.

Python 2.1 adds the warning framework to be used in this scheme. It adds a warnings module that provide functions to issue warnings, and to filter out warnings that you don’t want to be displayed. Third-party modules can also use this framework to deprecate old features that they no longer wish to support.

For example, in Python 2.1 the regex module is deprecated, so importing it causes a warning to be printed:

>>> import regex
__main__:1: DeprecationWarning: the regex module
         is deprecated; please use the re module
>>>

Warnings can be issued by calling the warnings.warn() function:

warnings.warn("feature X no longer supported")

The first parameter is the warning message; an additional optional parameters can be used to specify a particular warning category.

Filters can be added to disable certain warnings; a regular expression pattern can be applied to the message or to the module name in order to suppress a warning. For example, you may have a program that uses the regex module and not want to spare the time to convert it to use the re module right now. The warning can be suppressed by calling

import warnings
warnings.filterwarnings(action = 'ignore',
                        message='.*regex module is deprecated',
                        category=DeprecationWarning,
                        module = '__main__')

This adds a filter that will apply only to warnings of the class DeprecationWarning triggered in the __main__ module, and applies a regular expression to only match the message about the regex module being deprecated, and will cause such warnings to be ignored. Warnings can also be printed only once, printed every time the offending code is executed, or turned into exceptions that will cause the program to stop (unless the exceptions are caught in the usual way, of course).

Functions were also added to Python’s C API for issuing warnings; refer to PEP 230 or to Python’s API documentation for the details.

See also

PEP 5 - Guidelines for Language Evolution: Written by Paul Prescod, to specify procedures to be followed when removing old features from Python. The policy described in this PEP hasn’t been officially adopted, but the eventual policy probably won’t be too different from Prescod’s proposal.
PEP 230 - Warning Framework: Written and implemented by Guido van Rossum.

PEP 229: New Build System¶

When compiling Python, the user had to go in and edit the Modules/Setup file in order to enable various additional modules; the default set is relatively small and limited to modules that compile on most Unix platforms. This means that on Unix platforms with many more features, most notably Linux, Python installations often don’t contain all useful modules they could.

Python 2.0 added the Distutils, a set of modules for distributing and installing extensions. In Python 2.1, the Distutils are used to compile much of the standard library of extension modules, autodetecting which ones are supported on the current machine. It’s hoped that this will make Python installations easier and more featureful.

Instead of having to edit the Modules/Setup file in order to enable modules, a setup.py script in the top directory of the Python source distribution is run at build time, and attempts to discover which modules can be enabled by examining the modules and header files on the system. If a module is configured in Modules/Setup, the setup.py script won’t attempt to compile that module and will defer to the Modules/Setup file’s contents. This provides a way to specific any strange command-line flags or libraries that are required for a specific platform.

In another far-reaching change to the build mechanism, Neil Schemenauer restructured things so Python now uses a single makefile that isn’t recursive, instead of makefiles in the top directory and in each of the Python/, Parser/, Objects/, and Modules/ subdirectories. This makes building Python faster and also makes hacking the Makefiles clearer and simpler.

See also

PEP 229 - Using Distutils to Build Python: Written and implemented by A.M. Kuchling.

PEP 205: Weak References¶

Weak references, available through the weakref module, are a minor but useful new data type in the Python programmer’s toolbox.

Storing a reference to an object (say, in a dictionary or a list) has the side effect of keeping that object alive forever. There are a few specific cases where this behaviour is undesirable, object caches being the most common one, and another being circular references in data structures such as trees.

For example, consider a memoizing function that caches the results of another function f(x)() by storing the function’s argument and its result in a dictionary:

_cache = {}
def memoize(x):
    if _cache.has_key(x):
        return _cache[x]

    retval = f(x)

    # Cache the returned object
    _cache[x] = retval

    return retval

This version works for simple things such as integers, but it has a side effect; the _cache dictionary holds a reference to the return values, so they’ll never be deallocated until the Python process exits and cleans up This isn’t very noticeable for integers, but if f() returns an object, or a data structure that takes up a lot of memory, this can be a problem.

Weak references provide a way to implement a cache that won’t keep objects alive beyond their time. If an object is only accessible through weak references, the object will be deallocated and the weak references will now indicate that the object it referred to no longer exists. A weak reference to an object obj is created by calling wr = weakref.ref(obj). The object being referred to is returned by calling the weak reference as if it were a function: wr(). It will return the referenced object, or None if the object no longer exists.

This makes it possible to write a memoize() function whose cache doesn’t keep objects alive, by storing weak references in the cache.

_cache = {}
def memoize(x):
    if _cache.has_key(x):
        obj = _cache[x]()
        # If weak reference object still exists,
        # return it
        if obj is not None: return obj

    retval = f(x)

    # Cache a weak reference
    _cache[x] = weakref.ref(retval)

    return retval

The weakref module also allows creating proxy objects which behave like weak references — an object referenced only by proxy objects is deallocated – but instead of requiring an explicit call to retrieve the object, the proxy transparently forwards all operations to the object as long as the object still exists. If the object is deallocated, attempting to use a proxy will cause a weakref.ReferenceError exception to be raised.

proxy = weakref.proxy(obj)
proxy.attr   # Equivalent to obj.attr
proxy.meth() # Equivalent to obj.meth()
del obj
proxy.attr   # raises weakref.ReferenceError

See also

PEP 205 - Weak References: Written and implemented by Fred L. Drake, Jr.

PEP 232: Function Attributes¶

In Python 2.1, functions can now have arbitrary information attached to them. People were often using docstrings to hold information about functions and methods, because the __doc__ attribute was the only way of attaching any information to a function. For example, in the Zope Web application server, functions are marked as safe for public access by having a docstring, and in John Aycock’s SPARK parsing framework, docstrings hold parts of the BNF grammar to be parsed. This overloading is unfortunate, since docstrings are really intended to hold a function’s documentation; for example, it means you can’t properly document functions intended for private use in Zope.

Arbitrary attributes can now be set and retrieved on functions using the regular Python syntax:

def f(): pass

f.publish = 1
f.secure = 1
f.grammar = "A ::= B (C D)*"

The dictionary containing attributes can be accessed as the function’s __dict__. Unlike the __dict__ attribute of class instances, in functions you can actually assign a new dictionary to __dict__, though the new value is restricted to a regular Python dictionary; you can’t be tricky and set it to a UserDict instance, or any other random object that behaves like a mapping.

See also

PEP 232 - Function Attributes: Written and implemented by Barry Warsaw.

PEP 235: Importing Modules on Case-Insensitive Platforms¶

Some operating systems have filesystems that are case-insensitive, MacOS and Windows being the primary examples; on these systems, it’s impossible to distinguish the filenames FILE.PY and file.py, even though they do store the file’s name in its original case (they’re case-preserving, too).

In Python 2.1, the import statement will work to simulate case- sensitivity on case-insensitive platforms. Python will now search for the first case-sensitive match by default, raising an ImportError if no such file is found, so import file will not import a module named FILE.PY. Case- insensitive matching can be requested by setting the PYTHONCASEOK environment variable before starting the Python interpreter.

PEP 217: Interactive Display Hook¶

When using the Python interpreter interactively, the output of commands is displayed using the built-in repr() function. In Python 2.1, the variable sys.displayhook() can be set to a callable object which will be called instead of repr(). For example, you can set it to a special pretty- printing function:

>>> # Create a recursive data structure
... L = [1,2,3]
>>> L.append(L)
>>> L # Show Python's default output
[1, 2, 3, [...]]
>>> # Use pprint.pprint() as the display function
... import sys, pprint
>>> sys.displayhook = pprint.pprint
>>> L
[1, 2, 3,  <Recursion on list with id=135143996>]
>>>

See also

PEP 217 - Display Hook for Interactive Use: Written and implemented by Moshe Zadka.

PEP 208: New Coercion Model¶

How numeric coercion is done at the C level was significantly modified. This will only affect the authors of C extensions to Python, allowing them more flexibility in writing extension types that support numeric operations.

Extension types can now set the type flag Py_TPFLAGS_CHECKTYPES in their PyTypeObject structure to indicate that they support the new coercion model. In such extension types, the numeric slot functions can no longer assume that they’ll be passed two arguments of the same type; instead they may be passed two arguments of differing types, and can then perform their own internal coercion. If the slot function is passed a type it can’t handle, it can indicate the failure by returning a reference to the Py_NotImplemented singleton value. The numeric functions of the other type will then be tried, and perhaps they can handle the operation; if the other type also returns Py_NotImplemented, then a TypeError will be raised. Numeric methods written in Python can also return Py_NotImplemented, causing the interpreter to act as if the method did not exist (perhaps raising a TypeError, perhaps trying another object’s numeric methods).

See also

PEP 208 - Reworking the Coercion Model: Written and implemented by Neil Schemenauer, heavily based upon earlier work by Marc-André Lemburg. Read this to understand the fine points of how numeric operations will now be processed at the C level.

PEP 241: Metadata in Python Packages¶

A common complaint from Python users is that there’s no single catalog of all the Python modules in existence. T. Middleton’s Vaults of Parnassus at http://www.vex.net/parnassus/ are the largest catalog of Python modules, but registering software at the Vaults is optional, and many people don’t bother.

As a first small step toward fixing the problem, Python software packaged using the Distutils sdist command will include a file named PKG-INFO containing information about the package such as its name, version, and author (metadata, in cataloguing terminology). PEP 241 contains the full list of fields that can be present in the PKG-INFO file. As people began to package their software using Python 2.1, more and more packages will include metadata, making it possible to build automated cataloguing systems and experiment with them. With the result experience, perhaps it’ll be possible to design a really good catalog and then build support for it into Python 2.2. For example, the Distutils sdist and bdist_* commands could support a upload option that would automatically upload your package to a catalog server.

You can start creating packages containing PKG-INFO even if you’re not using Python 2.1, since a new release of the Distutils will be made for users of earlier Python versions. Version 1.0.2 of the Distutils includes the changes described in PEP 241, as well as various bugfixes and enhancements. It will be available from the Distutils SIG at http://www.python.org/sigs/distutils-sig/.

See also

PEP 241 - Metadata for Python Software Packages: Written and implemented by A.M. Kuchling.
PEP 243 - Module Repository Upload Mechanism: Written by Sean Reifschneider, this draft PEP describes a proposed mechanism for uploading Python packages to a central server.

New and Improved Modules¶

Ka-Ping Yee contributed two new modules: inspect.py, a module for getting information about live Python code, and pydoc.py, a module for interactively converting docstrings to HTML or text. As a bonus, Tools/scripts/pydoc, which is now automatically installed, uses pydoc.py to display documentation given a Python module, package, or class name. For example, pydoc xml.dom displays the following:
```
Python Library Documentation: package xml.dom in xml

NAME
    xml.dom - W3C Document Object Model implementation for Python.

FILE
    /usr/local/lib/python2.1/xml/dom/__init__.pyc

DESCRIPTION
    The Python mapping of the Document Object Model is documented in the
    Python Library Reference in the section on the xml.dom package.

    This package contains the following modules:
      ...
```
pydoc also includes a Tk-based interactive help browser. pydoc quickly becomes addictive; try it out!
Two different modules for unit testing were added to the standard library. The doctest module, contributed by Tim Peters, provides a testing framework based on running embedded examples in docstrings and comparing the results against the expected output. PyUnit, contributed by Steve Purcell, is a unit testing framework inspired by JUnit, which was in turn an adaptation of Kent Beck’s Smalltalk testing framework. See http://pyunit.sourceforge.net/ for more information about PyUnit.
The difflib module contains a class, SequenceMatcher, which compares two sequences and computes the changes required to transform one sequence into the other. For example, this module can be used to write a tool similar to the Unix diff program, and in fact the sample program Tools/scripts/ndiff.py demonstrates how to write such a script.
curses.panel, a wrapper for the panel library, part of ncurses and of SYSV curses, was contributed by Thomas Gellekum. The panel library provides windows with the additional feature of depth. Windows can be moved higher or lower in the depth ordering, and the panel library figures out where panels overlap and which sections are visible.
The PyXML package has gone through a few releases since Python 2.0, and Python 2.1 includes an updated version of the xml package. Some of the noteworthy changes include support for Expat 1.2 and later versions, the ability for Expat parsers to handle files in any encoding supported by Python, and various bugfixes for SAX, DOM, and the minidom module.
Ping also contributed another hook for handling uncaught exceptions. sys.excepthook() can be set to a callable object. When an exception isn’t caught by any try...except blocks, the exception will be passed to sys.excepthook(), which can then do whatever it likes. At the Ninth Python Conference, Ping demonstrated an application for this hook: printing an extended traceback that not only lists the stack frames, but also lists the function arguments and the local variables for each frame.
Various functions in the time module, such as asctime() and localtime(), require a floating point argument containing the time in seconds since the epoch. The most common use of these functions is to work with the current time, so the floating point argument has been made optional; when a value isn’t provided, the current time will be used. For example, log file entries usually need a string containing the current time; in Python 2.1, time.asctime() can be used, instead of the lengthier time.asctime(time.localtime(time.time())) that was previously required.

This change was proposed and implemented by Thomas Wouters.
The ftplib module now defaults to retrieving files in passive mode, because passive mode is more likely to work from behind a firewall. This request came from the Debian bug tracking system, since other Debian packages use ftplib to retrieve files and then don’t work from behind a firewall. It’s deemed unlikely that this will cause problems for anyone, because Netscape defaults to passive mode and few people complain, but if passive mode is unsuitable for your application or network setup, call set_pasv(0)() on FTP objects to disable passive mode.
Support for raw socket access has been added to the socket module, contributed by Grant Edwards.
The pstats module now contains a simple interactive statistics browser for displaying timing profiles for Python programs, invoked when the module is run as a script. Contributed by Eric S. Raymond.
A new implementation-dependent function, sys._getframe([depth])(), has been added to return a given frame object from the current call stack. sys._getframe() returns the frame at the top of the call stack; if the optional integer argument depth is supplied, the function returns the frame that is depth calls below the top of the stack. For example, sys._getframe(1) returns the caller’s frame object.

This function is only present in CPython, not in Jython or the .NET implementation. Use it for debugging, and resist the temptation to put it into production code.

Other Changes and Fixes¶

There were relatively few smaller changes made in Python 2.1 due to the shorter release cycle. A search through the CVS change logs turns up 117 patches applied, and 136 bugs fixed; both figures are likely to be underestimates. Some of the more notable changes are:

A specialized object allocator is now optionally available, that should be faster than the system malloc() and have less memory overhead. The allocator uses C’s malloc() function to get large pools of memory, and then fulfills smaller memory requests from these pools. It can be enabled by providing the --with-pymalloc option to the configure script; see Objects/obmalloc.c for the implementation details.

Authors of C extension modules should test their code with the object allocator enabled, because some incorrect code may break, causing core dumps at runtime. There are a bunch of memory allocation functions in Python’s C API that have previously been just aliases for the C library’s malloc() and free(), meaning that if you accidentally called mismatched functions, the error wouldn’t be noticeable. When the object allocator is enabled, these functions aren’t aliases of malloc() and free() any more, and calling the wrong function to free memory will get you a core dump. For example, if memory was allocated using PyMem_New(), it has to be freed using PyMem_Del(), not free(). A few modules included with Python fell afoul of this and had to be fixed; doubtless there are more third-party modules that will have the same problem.

The object allocator was contributed by Vladimir Marangozov.
The speed of line-oriented file I/O has been improved because people often complain about its lack of speed, and because it’s often been used as a naïve benchmark. The readline() method of file objects has therefore been rewritten to be much faster. The exact amount of the speedup will vary from platform to platform depending on how slow the C library’s getc() was, but is around 66%, and potentially much faster on some particular operating systems. Tim Peters did much of the benchmarking and coding for this change, motivated by a discussion in comp.lang.python.

A new module and method for file objects was also added, contributed by Jeff Epler. The new method, xreadlines(), is similar to the existing xrange() built-in. xreadlines() returns an opaque sequence object that only supports being iterated over, reading a line on every iteration but not reading the entire file into memory as the existing readlines() method does. You’d use it like this:
```
for line in sys.stdin.xreadlines():
    # ... do something for each line ...
    ...
```
For a fuller discussion of the line I/O changes, see the python-dev summary for January 1-15, 2001 at http://www.python.org/dev/summary/2001-01-1/.
A new method, popitem(), was added to dictionaries to enable destructively iterating through the contents of a dictionary; this can be faster for large dictionaries because there’s no need to construct a list containing all the keys or values. D.popitem() removes a random (key, value) pair from the dictionary D and returns it as a 2-tuple. This was implemented mostly by Tim Peters and Guido van Rossum, after a suggestion and preliminary patch by Moshe Zadka.
Modules can now control which names are imported when from module import * is used, by defining an __all__ attribute containing a list of names that will be imported. One common complaint is that if the module imports other modules such as sys or string, from module import * will add them to the importing module’s namespace. To fix this, simply list the public names in __all__:
```
# List public names
__all__ = ['Database', 'open']
```
A stricter version of this patch was first suggested and implemented by Ben Wolfson, but after some python-dev discussion, a weaker final version was checked in.
Applying repr() to strings previously used octal escapes for non-printable characters; for example, a newline was '\012'. This was a vestigial trace of Python’s C ancestry, but today octal is of very little practical use. Ka-Ping Yee suggested using hex escapes instead of octal ones, and using the \n, \t, \r escapes for the appropriate characters, and implemented this new formatting.
Syntax errors detected at compile-time can now raise exceptions containing the filename and line number of the error, a pleasant side effect of the compiler reorganization done by Jeremy Hylton.
C extensions which import other modules have been changed to use PyImport_ImportModule(), which means that they will use any import hooks that have been installed. This is also encouraged for third-party extensions that need to import some other module from C code.
The size of the Unicode character database was shrunk by another 340K thanks to Fredrik Lundh.
Some new ports were contributed: MacOS X (by Steven Majewski), Cygwin (by Jason Tishler); RISCOS (by Dietmar Schwertberger); Unixware 7 (by Billy G. Allie).

And there’s the usual list of minor bugfixes, minor memory leaks, docstring edits, and other tweaks, too lengthy to be worth itemizing; see the CVS logs for the full details if you want them.

Acknowledgements¶

The author would like to thank the following people for offering suggestions on various drafts of this article: Graeme Cross, David Goodger, Jay Graves, Michael Hudson, Marc-André Lemburg, Fredrik Lundh, Neil Schemenauer, Thomas Wouters.

What’s New in Python 2.0¶

Author:	A.M. Kuchling and Moshe Zadka

Introduction¶

A new release of Python, version 2.0, was released on October 16, 2000. This article covers the exciting new features in 2.0, highlights some other useful changes, and points out a few incompatible changes that may require rewriting code.

Python’s development never completely stops between releases, and a steady flow of bug fixes and improvements are always being submitted. A host of minor fixes, a few optimizations, additional docstrings, and better error messages went into 2.0; to list them all would be impossible, but they’re certainly significant. Consult the publicly-available CVS logs if you want to see the full list. This progress is due to the five developers working for PythonLabs are now getting paid to spend their days fixing bugs, and also due to the improved communication resulting from moving to SourceForge.

What About Python 1.6?¶

Python 1.6 can be thought of as the Contractual Obligations Python release. After the core development team left CNRI in May 2000, CNRI requested that a 1.6 release be created, containing all the work on Python that had been performed at CNRI. Python 1.6 therefore represents the state of the CVS tree as of May 2000, with the most significant new feature being Unicode support. Development continued after May, of course, so the 1.6 tree received a few fixes to ensure that it’s forward-compatible with Python 2.0. 1.6 is therefore part of Python’s evolution, and not a side branch.

So, should you take much interest in Python 1.6? Probably not. The 1.6final and 2.0beta1 releases were made on the same day (September 5, 2000), the plan being to finalize Python 2.0 within a month or so. If you have applications to maintain, there seems little point in breaking things by moving to 1.6, fixing them, and then having another round of breakage within a month by moving to 2.0; you’re better off just going straight to 2.0. Most of the really interesting features described in this document are only in 2.0, because a lot of work was done between May and September.

New Development Process¶

The most important change in Python 2.0 may not be to the code at all, but to how Python is developed: in May 2000 the Python developers began using the tools made available by SourceForge for storing source code, tracking bug reports, and managing the queue of patch submissions. To report bugs or submit patches for Python 2.0, use the bug tracking and patch manager tools available from Python’s project page, located at http://sourceforge.net/projects/python/.

The most important of the services now hosted at SourceForge is the Python CVS tree, the version-controlled repository containing the source code for Python. Previously, there were roughly 7 or so people who had write access to the CVS tree, and all patches had to be inspected and checked in by one of the people on this short list. Obviously, this wasn’t very scalable. By moving the CVS tree to SourceForge, it became possible to grant write access to more people; as of September 2000 there were 27 people able to check in changes, a fourfold increase. This makes possible large-scale changes that wouldn’t be attempted if they’d have to be filtered through the small group of core developers. For example, one day Peter Schneider-Kamp took it into his head to drop K&R C compatibility and convert the C source for Python to ANSI C. After getting approval on the python-dev mailing list, he launched into a flurry of checkins that lasted about a week, other developers joined in to help, and the job was done. If there were only 5 people with write access, probably that task would have been viewed as “nice, but not worth the time and effort needed” and it would never have gotten done.

The shift to using SourceForge’s services has resulted in a remarkable increase in the speed of development. Patches now get submitted, commented on, revised by people other than the original submitter, and bounced back and forth between people until the patch is deemed worth checking in. Bugs are tracked in one central location and can be assigned to a specific person for fixing, and we can count the number of open bugs to measure progress. This didn’t come without a cost: developers now have more e-mail to deal with, more mailing lists to follow, and special tools had to be written for the new environment. For example, SourceForge sends default patch and bug notification e-mail messages that are completely unhelpful, so Ka-Ping Yee wrote an HTML screen-scraper that sends more useful messages.

The ease of adding code caused a few initial growing pains, such as code was checked in before it was ready or without getting clear agreement from the developer group. The approval process that has emerged is somewhat similar to that used by the Apache group. Developers can vote +1, +0, -0, or -1 on a patch; +1 and -1 denote acceptance or rejection, while +0 and -0 mean the developer is mostly indifferent to the change, though with a slight positive or negative slant. The most significant change from the Apache model is that the voting is essentially advisory, letting Guido van Rossum, who has Benevolent Dictator For Life status, know what the general opinion is. He can still ignore the result of a vote, and approve or reject a change even if the community disagrees with him.

Producing an actual patch is the last step in adding a new feature, and is usually easy compared to the earlier task of coming up with a good design. Discussions of new features can often explode into lengthy mailing list threads, making the discussion hard to follow, and no one can read every posting to python-dev. Therefore, a relatively formal process has been set up to write Python Enhancement Proposals (PEPs), modelled on the Internet RFC process. PEPs are draft documents that describe a proposed new feature, and are continually revised until the community reaches a consensus, either accepting or rejecting the proposal. Quoting from the introduction to PEP 1, “PEP Purpose and Guidelines”:

PEP stands for Python Enhancement Proposal. A PEP is a design document providing information to the Python community, or describing a new feature for Python. The PEP should provide a concise technical specification of the feature and a rationale for the feature.

We intend PEPs to be the primary mechanisms for proposing new features, for collecting community input on an issue, and for documenting the design decisions that have gone into Python. The PEP author is responsible for building consensus within the community and documenting dissenting opinions.

Read the rest of PEP 1 for the details of the PEP editorial process, style, and format. PEPs are kept in the Python CVS tree on SourceForge, though they’re not part of the Python 2.0 distribution, and are also available in HTML form from http://www.python.org/peps/. As of September 2000, there are 25 PEPS, ranging from PEP 201, “Lockstep Iteration”, to PEP 225, “Elementwise/Objectwise Operators”.

Unicode¶

The largest new feature in Python 2.0 is a new fundamental data type: Unicode strings. Unicode uses 16-bit numbers to represent characters instead of the 8-bit number used by ASCII, meaning that 65,536 distinct characters can be supported.

The final interface for Unicode support was arrived at through countless often- stormy discussions on the python-dev mailing list, and mostly implemented by Marc-André Lemburg, based on a Unicode string type implementation by Fredrik Lundh. A detailed explanation of the interface was written up as PEP 100, “Python Unicode Integration”. This article will simply cover the most significant points about the Unicode interfaces.

In Python source code, Unicode strings are written as u"string". Arbitrary Unicode characters can be written using a new escape sequence, \uHHHH, where HHHH is a 4-digit hexadecimal number from 0000 to FFFF. The existing \xHHHH escape sequence can also be used, and octal escapes can be used for characters up to U+01FF, which is represented by \777.

Unicode strings, just like regular strings, are an immutable sequence type. They can be indexed and sliced, but not modified in place. Unicode strings have an encode( [encoding] ) method that returns an 8-bit string in the desired encoding. Encodings are named by strings, such as 'ascii', 'utf-8', 'iso-8859-1', or whatever. A codec API is defined for implementing and registering new encodings that are then available throughout a Python program. If an encoding isn’t specified, the default encoding is usually 7-bit ASCII, though it can be changed for your Python installation by calling the sys.setdefaultencoding(encoding)() function in a customised version of site.py.

Combining 8-bit and Unicode strings always coerces to Unicode, using the default ASCII encoding; the result of 'a' + u'bc' is u'abc'.

New built-in functions have been added, and existing built-ins modified to support Unicode:

unichr(ch) returns a Unicode string 1 character long, containing the character ch.
ord(u), where u is a 1-character regular or Unicode string, returns the number of the character as an integer.
unicode(string [, encoding] [, errors] ) creates a Unicode string from an 8-bit string. encoding is a string naming the encoding to use. The errors parameter specifies the treatment of characters that are invalid for the current encoding; passing 'strict' as the value causes an exception to be raised on any encoding error, while 'ignore' causes errors to be silently ignored and 'replace' uses U+FFFD, the official replacement character, in case of any problems.
The exec statement, and various built-ins such as eval(), getattr(), and setattr() will also accept Unicode strings as well as regular strings. (It’s possible that the process of fixing this missed some built-ins; if you find a built-in function that accepts strings but doesn’t accept Unicode strings at all, please report it as a bug.)

A new module, unicodedata, provides an interface to Unicode character properties. For example, unicodedata.category(u'A') returns the 2-character string ‘Lu’, the ‘L’ denoting it’s a letter, and ‘u’ meaning that it’s uppercase. unicodedata.bidirectional(u'\u0660') returns ‘AN’, meaning that U+0660 is an Arabic number.

The codecs module contains functions to look up existing encodings and register new ones. Unless you want to implement a new encoding, you’ll most often use the codecs.lookup(encoding)() function, which returns a 4-element tuple: (encode_func, decode_func, stream_reader, stream_writer).

encode_func is a function that takes a Unicode string, and returns a 2-tuple (string, length). string is an 8-bit string containing a portion (perhaps all) of the Unicode string converted into the given encoding, and length tells you how much of the Unicode string was converted.
decode_func is the opposite of encode_func, taking an 8-bit string and returning a 2-tuple (ustring, length), consisting of the resulting Unicode string ustring and the integer length telling how much of the 8-bit string was consumed.
stream_reader is a class that supports decoding input from a stream. stream_reader(file_obj) returns an object that supports the read(), readline(), and readlines() methods. These methods will all translate from the given encoding and return Unicode strings.
stream_writer, similarly, is a class that supports encoding output to a stream. stream_writer(file_obj) returns an object that supports the write() and writelines() methods. These methods expect Unicode strings, translating them to the given encoding on output.

For example, the following code writes a Unicode string into a file, encoding it as UTF-8:

import codecs

unistr = u'\u0660\u2000ab ...'

(UTF8_encode, UTF8_decode,
 UTF8_streamreader, UTF8_streamwriter) = codecs.lookup('UTF-8')

output = UTF8_streamwriter( open( '/tmp/output', 'wb') )
output.write( unistr )
output.close()

The following code would then read UTF-8 input from the file:

input = UTF8_streamreader( open( '/tmp/output', 'rb') )
print repr(input.read())
input.close()

Unicode-aware regular expressions are available through the re module, which has a new underlying implementation called SRE written by Fredrik Lundh of Secret Labs AB.

A -U command line option was added which causes the Python compiler to interpret all string literals as Unicode string literals. This is intended to be used in testing and future-proofing your Python code, since some future version of Python may drop support for 8-bit strings and provide only Unicode strings.

List Comprehensions¶

Lists are a workhorse data type in Python, and many programs manipulate a list at some point. Two common operations on lists are to loop over them, and either pick out the elements that meet a certain criterion, or apply some function to each element. For example, given a list of strings, you might want to pull out all the strings containing a given substring, or strip off trailing whitespace from each line.

The existing map() and filter() functions can be used for this purpose, but they require a function as one of their arguments. This is fine if there’s an existing built-in function that can be passed directly, but if there isn’t, you have to create a little function to do the required work, and Python’s scoping rules make the result ugly if the little function needs additional information. Take the first example in the previous paragraph, finding all the strings in the list containing a given substring. You could write the following to do it:

# Given the list L, make a list of all strings
# containing the substring S.
sublist = filter( lambda s, substring=S:
                     string.find(s, substring) != -1,
                  L)

Because of Python’s scoping rules, a default argument is used so that the anonymous function created by the lambda statement knows what substring is being searched for. List comprehensions make this cleaner:

sublist = [ s for s in L if string.find(s, S) != -1 ]

List comprehensions have the form:

[ expression for expr in sequence1
             for expr2 in sequence2 ...
             for exprN in sequenceN
             if condition ]

The for...in clauses contain the sequences to be iterated over. The sequences do not have to be the same length, because they are not iterated over in parallel, but from left to right; this is explained more clearly in the following paragraphs. The elements of the generated list will be the successive values of expression. The final if clause is optional; if present, expression is only evaluated and added to the result if condition is true.

To make the semantics very clear, a list comprehension is equivalent to the following Python code:

for expr1 in sequence1:
    for expr2 in sequence2:
    ...
        for exprN in sequenceN:
             if (condition):
                  # Append the value of
                  # the expression to the
                  # resulting list.

This means that when there are multiple for...in clauses, the resulting list will be equal to the product of the lengths of all the sequences. If you have two lists of length 3, the output list is 9 elements long:

seq1 = 'abc'
seq2 = (1,2,3)
>>> [ (x,y) for x in seq1 for y in seq2]
[('a', 1), ('a', 2), ('a', 3), ('b', 1), ('b', 2), ('b', 3), ('c', 1),
('c', 2), ('c', 3)]

To avoid introducing an ambiguity into Python’s grammar, if expression is creating a tuple, it must be surrounded with parentheses. The first list comprehension below is a syntax error, while the second one is correct:

# Syntax error
[ x,y for x in seq1 for y in seq2]
# Correct
[ (x,y) for x in seq1 for y in seq2]

The idea of list comprehensions originally comes from the functional programming language Haskell (http://www.haskell.org). Greg Ewing argued most effectively for adding them to Python and wrote the initial list comprehension patch, which was then discussed for a seemingly endless time on the python-dev mailing list and kept up-to-date by Skip Montanaro.

Augmented Assignment¶

Augmented assignment operators, another long-requested feature, have been added to Python 2.0. Augmented assignment operators include +=, -=, *=, and so forth. For example, the statement a += 2 increments the value of the variable a by 2, equivalent to the slightly lengthier a = a + 2.

The full list of supported assignment operators is +=, -=, *=, /=, %=, **=, &=, |=, ^=, >>=, and <<=. Python classes can override the augmented assignment operators by defining methods named __iadd__(), __isub__(), etc. For example, the following Number class stores a number and supports using += to create a new instance with an incremented value.

class Number:
    def __init__(self, value):
        self.value = value
    def __iadd__(self, increment):
        return Number( self.value + increment)

n = Number(5)
n += 3
print n.value

The __iadd__() special method is called with the value of the increment, and should return a new instance with an appropriately modified value; this return value is bound as the new value of the variable on the left-hand side.

Augmented assignment operators were first introduced in the C programming language, and most C-derived languages, such as awk, C++, Java, Perl, and PHP also support them. The augmented assignment patch was implemented by Thomas Wouters.

String Methods¶

Until now string-manipulation functionality was in the string module, which was usually a front-end for the strop module written in C. The addition of Unicode posed a difficulty for the strop module, because the functions would all need to be rewritten in order to accept either 8-bit or Unicode strings. For functions such as string.replace(), which takes 3 string arguments, that means eight possible permutations, and correspondingly complicated code.

Instead, Python 2.0 pushes the problem onto the string type, making string manipulation functionality available through methods on both 8-bit strings and Unicode strings.

>>> 'andrew'.capitalize()
'Andrew'
>>> 'hostname'.replace('os', 'linux')
'hlinuxtname'
>>> 'moshe'.find('sh')
2

One thing that hasn’t changed, a noteworthy April Fools’ joke notwithstanding, is that Python strings are immutable. Thus, the string methods return new strings, and do not modify the string on which they operate.

The old string module is still around for backwards compatibility, but it mostly acts as a front-end to the new string methods.

Two methods which have no parallel in pre-2.0 versions, although they did exist in JPython for quite some time, are startswith() and endswith(). s.startswith(t) is equivalent to s[:len(t)] == t, while s.endswith(t) is equivalent to s[-len(t):] == t.

One other method which deserves special mention is join(). The join() method of a string receives one parameter, a sequence of strings, and is equivalent to the string.join() function from the old string module, with the arguments reversed. In other words, s.join(seq) is equivalent to the old string.join(seq, s).

Garbage Collection of Cycles¶

The C implementation of Python uses reference counting to implement garbage collection. Every Python object maintains a count of the number of references pointing to itself, and adjusts the count as references are created or destroyed. Once the reference count reaches zero, the object is no longer accessible, since you need to have a reference to an object to access it, and if the count is zero, no references exist any longer.

Reference counting has some pleasant properties: it’s easy to understand and implement, and the resulting implementation is portable, fairly fast, and reacts well with other libraries that implement their own memory handling schemes. The major problem with reference counting is that it sometimes doesn’t realise that objects are no longer accessible, resulting in a memory leak. This happens when there are cycles of references.

Consider the simplest possible cycle, a class instance which has a reference to itself:

instance = SomeClass()
instance.myself = instance

After the above two lines of code have been executed, the reference count of instance is 2; one reference is from the variable named 'instance', and the other is from the myself attribute of the instance.

If the next line of code is del instance, what happens? The reference count of instance is decreased by 1, so it has a reference count of 1; the reference in the myself attribute still exists. Yet the instance is no longer accessible through Python code, and it could be deleted. Several objects can participate in a cycle if they have references to each other, causing all of the objects to be leaked.

Python 2.0 fixes this problem by periodically executing a cycle detection algorithm which looks for inaccessible cycles and deletes the objects involved. A new gc module provides functions to perform a garbage collection, obtain debugging statistics, and tuning the collector’s parameters.

Running the cycle detection algorithm takes some time, and therefore will result in some additional overhead. It is hoped that after we’ve gotten experience with the cycle collection from using 2.0, Python 2.1 will be able to minimize the overhead with careful tuning. It’s not yet obvious how much performance is lost, because benchmarking this is tricky and depends crucially on how often the program creates and destroys objects. The detection of cycles can be disabled when Python is compiled, if you can’t afford even a tiny speed penalty or suspect that the cycle collection is buggy, by specifying the --without-cycle-gc switch when running the configure script.

Several people tackled this problem and contributed to a solution. An early implementation of the cycle detection approach was written by Toby Kelsey. The current algorithm was suggested by Eric Tiedemann during a visit to CNRI, and Guido van Rossum and Neil Schemenauer wrote two different implementations, which were later integrated by Neil. Lots of other people offered suggestions along the way; the March 2000 archives of the python-dev mailing list contain most of the relevant discussion, especially in the threads titled “Reference cycle collection for Python” and “Finalization again”.

Other Core Changes¶

Various minor changes have been made to Python’s syntax and built-in functions. None of the changes are very far-reaching, but they’re handy conveniences.

Minor Language Changes¶

A new syntax makes it more convenient to call a given function with a tuple of arguments and/or a dictionary of keyword arguments. In Python 1.5 and earlier, you’d use the apply() built-in function: apply(f, args, kw) calls the function f() with the argument tuple args and the keyword arguments in the dictionary kw. apply() is the same in 2.0, but thanks to a patch from Greg Ewing, f(*args, **kw) as a shorter and clearer way to achieve the same effect. This syntax is symmetrical with the syntax for defining functions:

def f(*args, **kw):
    # args is a tuple of positional args,
    # kw is a dictionary of keyword args
    ...

The print statement can now have its output directed to a file-like object by following the print with >> file, similar to the redirection operator in Unix shells. Previously you’d either have to use the write() method of the file-like object, which lacks the convenience and simplicity of print, or you could assign a new value to sys.stdout and then restore the old value. For sending output to standard error, it’s much easier to write this:

print >> sys.stderr, "Warning: action field not supplied"

Modules can now be renamed on importing them, using the syntax import module as name or from module import name as othername. The patch was submitted by Thomas Wouters.

A new format style is available when using the % operator; ‘%r’ will insert the repr() of its argument. This was also added from symmetry considerations, this time for symmetry with the existing ‘%s’ format style, which inserts the str() of its argument. For example, '%r %s' % ('abc', 'abc') returns a string containing 'abc' abc.

Previously there was no way to implement a class that overrode Python’s built-in in operator and implemented a custom version. obj in seq returns true if obj is present in the sequence seq; Python computes this by simply trying every index of the sequence until either obj is found or an IndexError is encountered. Moshe Zadka contributed a patch which adds a __contains__() magic method for providing a custom implementation for in. Additionally, new built-in objects written in C can define what in means for them via a new slot in the sequence protocol.

Earlier versions of Python used a recursive algorithm for deleting objects. Deeply nested data structures could cause the interpreter to fill up the C stack and crash; Christian Tismer rewrote the deletion logic to fix this problem. On a related note, comparing recursive objects recursed infinitely and crashed; Jeremy Hylton rewrote the code to no longer crash, producing a useful result instead. For example, after this code:

a = []
b = []
a.append(a)
b.append(b)

The comparison a==b returns true, because the two recursive data structures are isomorphic. See the thread “trashcan and PR#7” in the April 2000 archives of the python-dev mailing list for the discussion leading up to this implementation, and some useful relevant links. Note that comparisons can now also raise exceptions. In earlier versions of Python, a comparison operation such as cmp(a,b) would always produce an answer, even if a user-defined __cmp__() method encountered an error, since the resulting exception would simply be silently swallowed.

Work has been done on porting Python to 64-bit Windows on the Itanium processor, mostly by Trent Mick of ActiveState. (Confusingly, sys.platform is still 'win32' on Win64 because it seems that for ease of porting, MS Visual C++ treats code as 32 bit on Itanium.) PythonWin also supports Windows CE; see the Python CE page at http://pythonce.sourceforge.net/ for more information.

Another new platform is Darwin/MacOS X; initial support for it is in Python 2.0. Dynamic loading works, if you specify “configure –with-dyld –with-suffix=.x”. Consult the README in the Python source distribution for more instructions.

An attempt has been made to alleviate one of Python’s warts, the often-confusing NameError exception when code refers to a local variable before the variable has been assigned a value. For example, the following code raises an exception on the print statement in both 1.5.2 and 2.0; in 1.5.2 a NameError exception is raised, while 2.0 raises a new UnboundLocalError exception. UnboundLocalError is a subclass of NameError, so any existing code that expects NameError to be raised should still work.

def f():
    print "i=",i
    i = i + 1
f()

Two new exceptions, TabError and IndentationError, have been introduced. They’re both subclasses of SyntaxError, and are raised when Python code is found to be improperly indented.

Changes to Built-in Functions¶

A new built-in, zip(seq1, seq2, ...)(), has been added. zip() returns a list of tuples where each tuple contains the i-th element from each of the argument sequences. The difference between zip() and map(None, seq1, seq2) is that map() pads the sequences with None if the sequences aren’t all of the same length, while zip() truncates the returned list to the length of the shortest argument sequence.

The int() and long() functions now accept an optional “base” parameter when the first argument is a string. int('123', 10) returns 123, while int('123', 16) returns 291. int(123, 16) raises a TypeError exception with the message “can’t convert non-string with explicit base”.

A new variable holding more detailed version information has been added to the sys module. sys.version_info is a tuple (major, minor, micro, level, serial) For example, in a hypothetical 2.0.1beta1, sys.version_info would be (2, 0, 1, 'beta', 1). level is a string such as "alpha", "beta", or "final" for a final release.

Dictionaries have an odd new method, setdefault(key, default)(), which behaves similarly to the existing get() method. However, if the key is missing, setdefault() both returns the value of default as get() would do, and also inserts it into the dictionary as the value for key. Thus, the following lines of code:

if dict.has_key( key ): return dict[key]
else:
    dict[key] = []
    return dict[key]

can be reduced to a single return dict.setdefault(key, []) statement.

The interpreter sets a maximum recursion depth in order to catch runaway recursion before filling the C stack and causing a core dump or GPF.. Previously this limit was fixed when you compiled Python, but in 2.0 the maximum recursion depth can be read and modified using sys.getrecursionlimit() and sys.setrecursionlimit(). The default value is 1000, and a rough maximum value for a given platform can be found by running a new script, Misc/find_recursionlimit.py.

Porting to 2.0¶

New Python releases try hard to be compatible with previous releases, and the record has been pretty good. However, some changes are considered useful enough, usually because they fix initial design decisions that turned out to be actively mistaken, that breaking backward compatibility can’t always be avoided. This section lists the changes in Python 2.0 that may cause old Python code to break.

The change which will probably break the most code is tightening up the arguments accepted by some methods. Some methods would take multiple arguments and treat them as a tuple, particularly various list methods such as append() and insert(). In earlier versions of Python, if L is a list, L.append( 1,2 ) appends the tuple (1,2) to the list. In Python 2.0 this causes a TypeError exception to be raised, with the message: ‘append requires exactly 1 argument; 2 given’. The fix is to simply add an extra set of parentheses to pass both values as a tuple: L.append( (1,2) ).

The earlier versions of these methods were more forgiving because they used an old function in Python’s C interface to parse their arguments; 2.0 modernizes them to use PyArg_ParseTuple(), the current argument parsing function, which provides more helpful error messages and treats multi-argument calls as errors. If you absolutely must use 2.0 but can’t fix your code, you can edit Objects/listobject.c and define the preprocessor symbol NO_STRICT_LIST_APPEND to preserve the old behaviour; this isn’t recommended.

Some of the functions in the socket module are still forgiving in this way. For example, socket.connect( ('hostname', 25) )() is the correct form, passing a tuple representing an IP address, but socket.connect( 'hostname', 25 )() also works. socket.connect_ex() and socket.bind() are similarly easy-going. 2.0alpha1 tightened these functions up, but because the documentation actually used the erroneous multiple argument form, many people wrote code which would break with the stricter checking. GvR backed out the changes in the face of public reaction, so for the socket module, the documentation was fixed and the multiple argument form is simply marked as deprecated; it will be tightened up again in a future Python version.

The \x escape in string literals now takes exactly 2 hex digits. Previously it would consume all the hex digits following the ‘x’ and take the lowest 8 bits of the result, so \x123456 was equivalent to \x56.

The AttributeError and NameError exceptions have a more friendly error message, whose text will be something like 'Spam' instance has no attribute 'eggs' or name 'eggs' is not defined. Previously the error message was just the missing attribute name eggs, and code written to take advantage of this fact will break in 2.0.

Some work has been done to make integers and long integers a bit more interchangeable. In 1.5.2, large-file support was added for Solaris, to allow reading files larger than 2 GiB; this made the tell() method of file objects return a long integer instead of a regular integer. Some code would subtract two file offsets and attempt to use the result to multiply a sequence or slice a string, but this raised a TypeError. In 2.0, long integers can be used to multiply or slice a sequence, and it’ll behave as you’d intuitively expect it to; 3L * 'abc' produces ‘abcabcabc’, and (0,1,2,3)[2L:4L] produces (2,3). Long integers can also be used in various contexts where previously only integers were accepted, such as in the seek() method of file objects, and in the formats supported by the % operator (%d, %i, %x, etc.). For example, "%d" % 2L**64 will produce the string 18446744073709551616.

The subtlest long integer change of all is that the str() of a long integer no longer has a trailing ‘L’ character, though repr() still includes it. The ‘L’ annoyed many people who wanted to print long integers that looked just like regular integers, since they had to go out of their way to chop off the character. This is no longer a problem in 2.0, but code which does str(longval)[:-1] and assumes the ‘L’ is there, will now lose the final digit.

Taking the repr() of a float now uses a different formatting precision than str(). repr() uses %.17g format string for C’s sprintf(), while str() uses %.12g as before. The effect is that repr() may occasionally show more decimal places than str(), for certain numbers. For example, the number 8.1 can’t be represented exactly in binary, so repr(8.1) is '8.0999999999999996', while str(8.1) is '8.1'.

The -X command-line option, which turned all standard exceptions into strings instead of classes, has been removed; the standard exceptions will now always be classes. The exceptions module containing the standard exceptions was translated from Python to a built-in C module, written by Barry Warsaw and Fredrik Lundh.

Extending/Embedding Changes¶

Some of the changes are under the covers, and will only be apparent to people writing C extension modules or embedding a Python interpreter in a larger application. If you aren’t dealing with Python’s C API, you can safely skip this section.

The version number of the Python C API was incremented, so C extensions compiled for 1.5.2 must be recompiled in order to work with 2.0. On Windows, it’s not possible for Python 2.0 to import a third party extension built for Python 1.5.x due to how Windows DLLs work, so Python will raise an exception and the import will fail.

Users of Jim Fulton’s ExtensionClass module will be pleased to find out that hooks have been added so that ExtensionClasses are now supported by isinstance() and issubclass(). This means you no longer have to remember to write code such as if type(obj) == myExtensionClass, but can use the more natural if isinstance(obj, myExtensionClass).

The Python/importdl.c file, which was a mass of #ifdefs to support dynamic loading on many different platforms, was cleaned up and reorganised by Greg Stein. importdl.c is now quite small, and platform-specific code has been moved into a bunch of Python/dynload_*.c files. Another cleanup: there were also a number of my*.h files in the Include/ directory that held various portability hacks; they’ve been merged into a single file, Include/pyport.h.

Vladimir Marangozov’s long-awaited malloc restructuring was completed, to make it easy to have the Python interpreter use a custom allocator instead of C’s standard malloc(). For documentation, read the comments in Include/pymem.h and Include/objimpl.h. For the lengthy discussions during which the interface was hammered out, see the Web archives of the ‘patches’ and ‘python-dev’ lists at python.org.

Recent versions of the GUSI development environment for MacOS support POSIX threads. Therefore, Python’s POSIX threading support now works on the Macintosh. Threading support using the user-space GNU pth library was also contributed.

Threading support on Windows was enhanced, too. Windows supports thread locks that use kernel objects only in case of contention; in the common case when there’s no contention, they use simpler functions which are an order of magnitude faster. A threaded version of Python 1.5.2 on NT is twice as slow as an unthreaded version; with the 2.0 changes, the difference is only 10%. These improvements were contributed by Yakov Markovitch.

Python 2.0’s source now uses only ANSI C prototypes, so compiling Python now requires an ANSI C compiler, and can no longer be done using a compiler that only supports K&R C.

Previously the Python virtual machine used 16-bit numbers in its bytecode, limiting the size of source files. In particular, this affected the maximum size of literal lists and dictionaries in Python source; occasionally people who are generating Python code would run into this limit. A patch by Charles G. Waldman raises the limit from 2^16 to 2^{32}.

Three new convenience functions intended for adding constants to a module’s dictionary at module initialization time were added: PyModule_AddObject(), PyModule_AddIntConstant(), and PyModule_AddStringConstant(). Each of these functions takes a module object, a null-terminated C string containing the name to be added, and a third argument for the value to be assigned to the name. This third argument is, respectively, a Python object, a C long, or a C string.

A wrapper API was added for Unix-style signal handlers. PyOS_getsig() gets a signal handler and PyOS_setsig() will set a new handler.

Distutils: Making Modules Easy to Install¶

Before Python 2.0, installing modules was a tedious affair – there was no way to figure out automatically where Python is installed, or what compiler options to use for extension modules. Software authors had to go through an arduous ritual of editing Makefiles and configuration files, which only really work on Unix and leave Windows and MacOS unsupported. Python users faced wildly differing installation instructions which varied between different extension packages, which made administering a Python installation something of a chore.

The SIG for distribution utilities, shepherded by Greg Ward, has created the Distutils, a system to make package installation much easier. They form the distutils package, a new part of Python’s standard library. In the best case, installing a Python module from source will require the same steps: first you simply mean unpack the tarball or zip archive, and the run “python setup.py install”. The platform will be automatically detected, the compiler will be recognized, C extension modules will be compiled, and the distribution installed into the proper directory. Optional command-line arguments provide more control over the installation process, the distutils package offers many places to override defaults – separating the build from the install, building or installing in non-default directories, and more.

In order to use the Distutils, you need to write a setup.py script. For the simple case, when the software contains only .py files, a minimal setup.py can be just a few lines long:

from distutils.core import setup
setup (name = "foo", version = "1.0",
       py_modules = ["module1", "module2"])

The setup.py file isn’t much more complicated if the software consists of a few packages:

from distutils.core import setup
setup (name = "foo", version = "1.0",
       packages = ["package", "package.subpackage"])

A C extension can be the most complicated case; here’s an example taken from the PyXML package:

from distutils.core import setup, Extension

expat_extension = Extension('xml.parsers.pyexpat',
     define_macros = [('XML_NS', None)],
     include_dirs = [ 'extensions/expat/xmltok',
                      'extensions/expat/xmlparse' ],
     sources = [ 'extensions/pyexpat.c',
                 'extensions/expat/xmltok/xmltok.c',
                 'extensions/expat/xmltok/xmlrole.c', ]
       )
setup (name = "PyXML", version = "0.5.4",
       ext_modules =[ expat_extension ] )

The Distutils can also take care of creating source and binary distributions. The “sdist” command, run by “python setup.py sdist‘, builds a source distribution such as foo-1.0.tar.gz. Adding new commands isn’t difficult, “bdist_rpm” and “bdist_wininst” commands have already been contributed to create an RPM distribution and a Windows installer for the software, respectively. Commands to create other distribution formats such as Debian packages and Solaris .pkg files are in various stages of development.

All this is documented in a new manual, Distributing Python Modules, that joins the basic set of Python documentation.

XML Modules¶

Python 1.5.2 included a simple XML parser in the form of the xmllib module, contributed by Sjoerd Mullender. Since 1.5.2’s release, two different interfaces for processing XML have become common: SAX2 (version 2 of the Simple API for XML) provides an event-driven interface with some similarities to xmllib, and the DOM (Document Object Model) provides a tree-based interface, transforming an XML document into a tree of nodes that can be traversed and modified. Python 2.0 includes a SAX2 interface and a stripped- down DOM interface as part of the xml package. Here we will give a brief overview of these new interfaces; consult the Python documentation or the source code for complete details. The Python XML SIG is also working on improved documentation.

SAX2 Support¶

SAX defines an event-driven interface for parsing XML. To use SAX, you must write a SAX handler class. Handler classes inherit from various classes provided by SAX, and override various methods that will then be called by the XML parser. For example, the startElement() and endElement() methods are called for every starting and end tag encountered by the parser, the characters() method is called for every chunk of character data, and so forth.

The advantage of the event-driven approach is that the whole document doesn’t have to be resident in memory at any one time, which matters if you are processing really huge documents. However, writing the SAX handler class can get very complicated if you’re trying to modify the document structure in some elaborate way.

For example, this little example program defines a handler that prints a message for every starting and ending tag, and then parses the file hamlet.xml using it:

from xml import sax

class SimpleHandler(sax.ContentHandler):
    def startElement(self, name, attrs):
        print 'Start of element:', name, attrs.keys()

    def endElement(self, name):
        print 'End of element:', name

# Create a parser object
parser = sax.make_parser()

# Tell it what handler to use
handler = SimpleHandler()
parser.setContentHandler( handler )

# Parse a file!
parser.parse( 'hamlet.xml' )

For more information, consult the Python documentation, or the XML HOWTO at http://pyxml.sourceforge.net/topics/howto/xml-howto.html.

DOM Support¶

The Document Object Model is a tree-based representation for an XML document. A top-level Document instance is the root of the tree, and has a single child which is the top-level Element instance. This Element has children nodes representing character data and any sub-elements, which may have further children of their own, and so forth. Using the DOM you can traverse the resulting tree any way you like, access element and attribute values, insert and delete nodes, and convert the tree back into XML.

The DOM is useful for modifying XML documents, because you can create a DOM tree, modify it by adding new nodes or rearranging subtrees, and then produce a new XML document as output. You can also construct a DOM tree manually and convert it to XML, which can be a more flexible way of producing XML output than simply writing <tag1>...</tag1> to a file.

The DOM implementation included with Python lives in the xml.dom.minidom module. It’s a lightweight implementation of the Level 1 DOM with support for XML namespaces. The parse() and parseString() convenience functions are provided for generating a DOM tree:

from xml.dom import minidom
doc = minidom.parse('hamlet.xml')

doc is a Document instance. Document, like all the other DOM classes such as Element and Text, is a subclass of the Node base class. All the nodes in a DOM tree therefore support certain common methods, such as toxml() which returns a string containing the XML representation of the node and its children. Each class also has special methods of its own; for example, Element and Document instances have a method to find all child elements with a given tag name. Continuing from the previous 2-line example:

perslist = doc.getElementsByTagName( 'PERSONA' )
print perslist[0].toxml()
print perslist[1].toxml()

For the Hamlet XML file, the above few lines output:

<PERSONA>CLAUDIUS, king of Denmark. </PERSONA>
<PERSONA>HAMLET, son to the late, and nephew to the present king.</PERSONA>

The root element of the document is available as doc.documentElement, and its children can be easily modified by deleting, adding, or removing nodes:

root = doc.documentElement

# Remove the first child
root.removeChild( root.childNodes[0] )

# Move the new first child to the end
root.appendChild( root.childNodes[0] )

# Insert the new first child (originally,
# the third child) before the 20th child.
root.insertBefore( root.childNodes[0], root.childNodes[20] )

Again, I will refer you to the Python documentation for a complete listing of the different Node classes and their various methods.

Relationship to PyXML¶

The XML Special Interest Group has been working on XML-related Python code for a while. Its code distribution, called PyXML, is available from the SIG’s Web pages at http://www.python.org/sigs/xml-sig/. The PyXML distribution also used the package name xml. If you’ve written programs that used PyXML, you’re probably wondering about its compatibility with the 2.0 xml package.

The answer is that Python 2.0’s xml package isn’t compatible with PyXML, but can be made compatible by installing a recent version PyXML. Many applications can get by with the XML support that is included with Python 2.0, but more complicated applications will require that the full PyXML package will be installed. When installed, PyXML versions 0.6.0 or greater will replace the xml package shipped with Python, and will be a strict superset of the standard package, adding a bunch of additional features. Some of the additional features in PyXML include:

4DOM, a full DOM implementation from FourThought, Inc.
The xmlproc validating parser, written by Lars Marius Garshol.
The sgmlop parser accelerator module, written by Fredrik Lundh.

Module changes¶

Lots of improvements and bugfixes were made to Python’s extensive standard library; some of the affected modules include readline, ConfigParser, cgi, calendar, posix, readline, xmllib, aifc, chunk, wave, random, shelve, and nntplib. Consult the CVS logs for the exact patch-by-patch details.

Brian Gallew contributed OpenSSL support for the socket module. OpenSSL is an implementation of the Secure Socket Layer, which encrypts the data being sent over a socket. When compiling Python, you can edit Modules/Setup to include SSL support, which adds an additional function to the socket module: socket.ssl(socket, keyfile, certfile)(), which takes a socket object and returns an SSL socket. The httplib and urllib modules were also changed to support https:// URLs, though no one has implemented FTP or SMTP over SSL.

The httplib module has been rewritten by Greg Stein to support HTTP/1.1. Backward compatibility with the 1.5 version of httplib is provided, though using HTTP/1.1 features such as pipelining will require rewriting code to use a different set of interfaces.

The Tkinter module now supports Tcl/Tk version 8.1, 8.2, or 8.3, and support for the older 7.x versions has been dropped. The Tkinter module now supports displaying Unicode strings in Tk widgets. Also, Fredrik Lundh contributed an optimization which makes operations like create_line and create_polygon much faster, especially when using lots of coordinates.

The curses module has been greatly extended, starting from Oliver Andrich’s enhanced version, to provide many additional functions from ncurses and SYSV curses, such as colour, alternative character set support, pads, and mouse support. This means the module is no longer compatible with operating systems that only have BSD curses, but there don’t seem to be any currently maintained OSes that fall into this category.

As mentioned in the earlier discussion of 2.0’s Unicode support, the underlying implementation of the regular expressions provided by the re module has been changed. SRE, a new regular expression engine written by Fredrik Lundh and partially funded by Hewlett Packard, supports matching against both 8-bit strings and Unicode strings.

New modules¶

A number of new modules were added. We’ll simply list them with brief descriptions; consult the 2.0 documentation for the details of a particular module.

atexit: For registering functions to be called before the Python interpreter exits. Code that currently sets sys.exitfunc directly should be changed to use the atexit module instead, importing atexit and calling atexit.register() with the function to be called on exit. (Contributed by Skip Montanaro.)
codecs, encodings, unicodedata: Added as part of the new Unicode support.
filecmp: Supersedes the old cmp, cmpcache and dircmp modules, which have now become deprecated. (Contributed by Gordon MacMillan and Moshe Zadka.)
gettext: This module provides internationalization (I18N) and localization (L10N) support for Python programs by providing an interface to the GNU gettext message catalog library. (Integrated by Barry Warsaw, from separate contributions by Martin von Löwis, Peter Funk, and James Henstridge.)
linuxaudiodev: Support for the /dev/audio device on Linux, a twin to the existing sunaudiodev module. (Contributed by Peter Bosch, with fixes by Jeremy Hylton.)
mmap: An interface to memory-mapped files on both Windows and Unix. A file’s contents can be mapped directly into memory, at which point it behaves like a mutable string, so its contents can be read and modified. They can even be passed to functions that expect ordinary strings, such as the re module. (Contributed by Sam Rushing, with some extensions by A.M. Kuchling.)
pyexpat: An interface to the Expat XML parser. (Contributed by Paul Prescod.)
robotparser: Parse a robots.txt file, which is used for writing Web spiders that politely avoid certain areas of a Web site. The parser accepts the contents of a robots.txt file, builds a set of rules from it, and can then answer questions about the fetchability of a given URL. (Contributed by Skip Montanaro.)
tabnanny: A module/script to check Python source code for ambiguous indentation. (Contributed by Tim Peters.)
UserString: A base class useful for deriving objects that behave like strings.
webbrowser: A module that provides a platform independent way to launch a web browser on a specific URL. For each platform, various browsers are tried in a specific order. The user can alter which browser is launched by setting the BROWSER environment variable. (Originally inspired by Eric S. Raymond’s patch to urllib which added similar functionality, but the final module comes from code originally implemented by Fred Drake as Tools/idle/BrowserControl.py, and adapted for the standard library by Fred.)
_winreg: An interface to the Windows registry. _winreg is an adaptation of functions that have been part of PythonWin since 1995, but has now been added to the core distribution, and enhanced to support Unicode. _winreg was written by Bill Tutt and Mark Hammond.
zipfile: A module for reading and writing ZIP-format archives. These are archives produced by PKZIP on DOS/Windows or zip on Unix, not to be confused with gzip-format files (which are supported by the gzip module) (Contributed by James C. Ahlstrom.)
imputil: A module that provides a simpler way for writing customised import hooks, in comparison to the existing ihooks module. (Implemented by Greg Stein, with much discussion on python-dev along the way.)

IDLE Improvements¶

IDLE is the official Python cross-platform IDE, written using Tkinter. Python 2.0 includes IDLE 0.6, which adds a number of new features and improvements. A partial list:

UI improvements and optimizations, especially in the area of syntax highlighting and auto-indentation.
The class browser now shows more information, such as the top level functions in a module.
Tab width is now a user settable option. When opening an existing Python file, IDLE automatically detects the indentation conventions, and adapts.
There is now support for calling browsers on various platforms, used to open the Python documentation in a browser.
IDLE now has a command line, which is largely similar to the vanilla Python interpreter.
Call tips were added in many places.
IDLE can now be installed as a package.
In the editor window, there is now a line/column bar at the bottom.
Three new keystroke commands: Check module (Alt-F5), Import module (F5) and Run script (Ctrl-F5).

Deleted and Deprecated Modules¶

A few modules have been dropped because they’re obsolete, or because there are now better ways to do the same thing. The stdwin module is gone; it was for a platform-independent windowing toolkit that’s no longer developed.

A number of modules have been moved to the lib-old subdirectory: cmp, cmpcache, dircmp, dump, find, grep, packmail, poly, util, whatsound, zmod. If you have code which relies on a module that’s been moved to lib-old, you can simply add that directory to sys.path to get them back, but you’re encouraged to update any code that uses these modules.

Acknowledgements¶

The authors would like to thank the following people for offering suggestions on various drafts of this article: David Bolen, Mark Hammond, Gregg Hauser, Jeremy Hylton, Fredrik Lundh, Detlef Lannert, Aahz Maruch, Skip Montanaro, Vladimir Marangozov, Tobias Polzin, Guido van Rossum, Neil Schemenauer, and Russ Schmidt.

Python 入门教程¶

Release:	3.2
Date:	August 02, 2015

Python 是种易学而强大的编程语言. 它包含了高效的高级数据结构, 能够用简单而高效的方式进行面向对象编程.Python 优雅的语法和和动态类型,以及它天然的解释能力, 使其成为了大多数平台上能广泛适用于各领域的理想脚本语言和开发环境.

Python 解释器及其扩展标准库的源码和编译版本可以从 Python 的 Web 站点 http://www.python.org 及其所有镜像站点上获得, 并且可以自由发布. 该站点上也提供了 Python 的一些第三方模块, 程序, 工具以及附加的文档.

Python 的解释器可以很容易的通过 C 或者 C++ (或者其它可以通过 C 调用的语言) 扩展新的函式和数据类型. Python 也可以作为定制应用的扩展语言.

本教程向读者介绍 Python 语言及其体系的基本知识与概念. 配合 Python 解释器学习会很有帮助, 因为文中己包含所有的完整例子, 所以这本手册也可以离线阅读.

需要有关标准对象和模块的详细介绍的话, 请参阅库参考手册. 而语言手册提供了更多关于语言本身的正式说明. 需要编写 C 或 C++ 扩展, 请阅读扩展和嵌入以及对C接口. 这几部分涵盖了Python 各领域的深入知识.

这份入门教程并未尝试讲解和涵盖 Python 的所有特性, 甚至也没有包含所有常用特性. 相反的, 只介绍 Python 中最引人注目的功能, 这对读者掌握这门语言的风格大有帮助. 读完后,你应该已能阅读和编写 Python 模块和程序, 接下去就可以从 Python 库参考手册中进一步学习 Python 丰富库和模块.

术语表也值得仔细阅读.

开胃菜¶

假如你用计算机做许多工作, 最终你会发现有些工作你会希望计算机能自动完成.

例如, 你可能希望在大量的文本文件中进行查找替换操作, 或是要以复杂的方式重命名并整理大量图片. 又或想要编写一个小型的定制数据库, 又或一个特殊的图形界面程序, 甚或一个小型的游戏.

如果你是名专业的软件开发者, 你可能不得不动用很多C/C++/Java库, 结果编写/编译/测试/重编译周期越来越长. 可能你需要给每个库编写对应的测试代码, 但发现这是个烦人的活儿. 或者你在编写的程序能使用一种扩展语言, 但你又不想整个重新设计和实现一回.

这时, Python 正是你要的语言.

或许你可以写一个 Unix shell 脚本抑或是 Windows 批处理文件来完成上述的一些工作, 但是脚本语言最擅长的是移动文件和对文本文件进行处理, 而在图形界面程序或游戏方面并不擅长.

你能写 C/C++/Java 程序, 但是这些技术即使开发最简单的程序也要用去大量的时间. 无论在 Windows 、MacOS X 或 Unix 操作系统上, Python 都非常易于使用, 可以帮助你更快的完成任务.

虽说 Python 很容易上手, 但她毕竟是一门真正的编程语言,

相对于 Shell脚本或是批处理文件 , 对大型程序的和数据结构的支持, Python 要多的多. 另一方面, 她提供了比 C 更多的错误检查, 而且, 作为一门 非常高阶的语言 (very-high-level language), 她拥有内置高级数据结构类型, 例如可变数组和字典.

因为 Python 拥有更多的通用数据类型，因此它相较 Awk 甚或是 Perl, 在更广泛的问题领域内有用武之地，而且在许多事情上 Python 表现的至少不比那些语言复杂。

因为 Python 拥有更多的通用数据类型, 因此她相较 Awk 甚或是 Perl, 在更广泛的问题领域内有用武之地, 而且在许多事情上 Python 表现的至少比别的语言要易用的多.

Python 允许你把自己的程序分隔成不同的模块, 以便在其它的 Python 程序中重用. Python 自带一个很大的标准模块集, 你应该把她们作为自己程序的基础 – 或者把它们做为开始学习 Python 时的编程实例. 其中一些模块中提供了诸如文件 I/O, 系统调用, sockets 甚至类似 TK 这样的图形接口.

Python是一门解释型语言, 因为不需要编译和链接的时间, 她可以帮你省下一些开发时间. 解释器可以交互式的使用, 这使你很容易实验用各种语言特征, 写可抛弃的程序, 或在自下而上的开发期间测试功能. 她也是一个随手可得的计算器.

Python 能让程序紧凑, 可读性增强. 用 Python 写的程序通常比同样的 C, C++ 或 Java 程序要短得多, 这是因为以下几个原因:

高级数据结构使你可以在单独的语句中也能表述复杂的操作;

语句的组织依赖于缩进而不是开始/结束符 (类似 C 族语言的 {} 符号或 Pascal 的begin/end关键字);

参数或变量不需要声明.

Note

(~_~)

有关Python 使用缩进来进行语法结构控制的特性,这是在技术社区中经常引发争论的一点,习惯用标识符的程序猿有诸多怨辞; 从译者看来这正是 Python 最可爱的一点:

精确体现出了内置的 简单就是美 的精神追求:

不得写出难以看懂的代码!

因为使用了空间上的缩进,所以:

超过3层的编辑结构,会导致代码行超出屏幕,难以阅读

团队中各自使用不同的缩进方式时,难以阅读其他人的代码

超过一屏高度的代码,将难以理解层次关系

...

那么这也意味着:

你忽然开始享受人类最优雅和NB的编辑器了

你的所有函式都能小于50行,简洁明了

你所在的团队有稳定统一的代码规约了,你看任何人的代码都没有反胃的感觉了

...

可能,这一特性唯一的问题就是 Python 没有从一开始硬性规定只能使用4个空格当成一级缩进.

Python 是可扩展的: 如果你会用 C 写程序, 就可以很容易的为解释器添加新的 内建 函式或模块, 或者优化性能瓶颈, 使其达到最大速度, 或者使 Python 能够链接到必要的二进制架构 (比如某个专用的商业图形库). 一旦你真正掌握了, 你可以将 Python 集成进由 C 写成的程序, 把 Python 当做是这个程序的扩展或命令行语言.

顺便说一下, 这个语言的名字来自于 BBC 的 “Monty Python’s Flying Circus” 节目, 和凶猛的爬行类生物没有任何关系. 在文档中引用 Monty Python 的典故不仅可行, 而且值得鼓励!

现在我们已经了解了 Python 中所有激动人心的东西, 大概你想详细的尝试一下了. 的确,学习一门语言最好的办法就是使用她, 如你所读到的, 教程将邀请你在Python 解释环境中进行试练.

下一节, 将先说明解释器的用法, 这没有什么神秘的内容, 不过有助于我们练习后面展示的例子.

本教程其它部分通过示例介绍了 Python 语言和系统的各种功能, 开始是简单表达式, 语法和数据类型, 接下来是函式和模块, 最后是诸如异常和自定义类这样的高级内容.

See also

(^.^)

原文: http://docs.python.org/py3k/tutorial/appetite.html
初译: 刘鑫
精译: DocsPy3zh
校对: Zoom.Quiet
复审:

使用 Python 解释器¶

调用 Python 解释器¶

Python 解释器通常安装在目标机器的 /usr/local/bin/ 目录下. 将 /usr/local/bin 目录放进你的 Unix Shell 的搜索路径里, 确保它可以通过输入:

python3.2

启动. [1] 因为安装路径是可选的, 所以也可能安装在其它位置, 具体的你可以咨询当地 Python 导师或系统管理员. (例如, /usr/local/python 就是一个很常见的选择)

在 Windows 机器中, Python 通常安装在 C:\Python32 目录. 当然, 我们可以在运行安装程序的时候改变它. 需要把这个目录加入到我们的 Path 中的话, 可以像下面这样在 DOS 窗中输入命令行:

set path=%path%;C:\python32

输入一个文件结束符 ( UNIX 上是 Control-D , Windows 上是 Control-Z ) 解释器会以 0 值退出. 如果没有起作用, 你可以输入以下命令退出: quit().

解释器的行编辑功能并不复杂, 装在 UNIX 上的解释器可能需要 GNU readline 库支持, 这样就可以额外得到精巧的交互编辑和历史记录功能. 确认命令行编辑器支持能力最方便的方式可能是在主提示符下输入 Control-P, 如果有嘟嘟声 (计算机扬声器), 说明你可以使用命令行编辑功能 (在交互式输入编辑及历史替代在快捷键的介绍). 如果什么也没有发声, 或者显示 ^P, 说明命令行编辑功能不可用; 你将只能用退格键删除当前行的字符.

解释器的操作有些像 UNIX Shell: 使用终端设备作为标准输入来调用它时, 解释器交互地解读和执行命令, 通过文件名参数或以文件作为标准输入时, 它从文件中解读并执行脚本.

启动解释器的第二个方法是 python -c command [arg] ..., 这种方法会执行 command 中的语句, 等同于 Shell 的 -c 选项. 因为 Python 语句中通常会包括空格之类的对 shell 有特殊含义的字符, 所以最好把整个 command 用单引号包起来.

有一些 Python 模块也可以当作脚本使用. 它们可以通过 python -m module [arg] ... 调用, 这如同在命令行中给出其完整文件名来运行一样.

使用脚本文件时, 经常会运行脚本然后进入交互模式. 这也可以通过在脚本之前加上 -i 参数来实现. (如果脚本来自标准输入, 就不能这样执行, 与前述提及原因一样. )

参数传递¶

调用解释器时, 脚本名和附加参数传入一个名为 sys.argv 的字符串列表.

没有给定脚本和参数时, 它至少有一个元素: sys.argv[0], 此时它是一个空字符串,

脚本名指定为 '-' (表示标准输入) 时, sys.argv[0] 被设为 '-' .

使用 -c 命令时, sys.argv[0] 被设定为 '-c' .

使用 -m 模块时, sys.argv[0] 被设定为模块的全名.

-c command 或 -m module 之后的参数不会被 Python 解释器的选项处理机制所截获, 而是留在 sys.argv 中, 供命令或模块操作.

交互模式¶

从 tty 读取命令时, 我们称解释器工作于 交互模式 ( interactive mode ). 这种模式下它通过*主提示符* (primary prompt*) 提示下一条命令, 主提示符通常为三个大于号 (>>>); 而通过从提示符由(三个点标识 ... 组成)提示一条命令的续行. 在第一条命令之前, 解释器会打印欢迎信息, 版本号和授权提示:

$ python3.2
Python 3.2 (py3k, Sep 12 2011, 12:21:02)
[GCC 3.4.6 20060404 (Red Hat 3.4.6-8)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>

输入多行结构时就需要从属提示符了, 例如, 下面这个 if 语句:

>>> the_world_is_flat = 1
>>> if the_world_is_flat:
...     print("Be careful not to fall off!")
...
Be careful not to fall off!

解释器及其环境¶

错误处理¶

有错误发生时, 解释器会输出错误信息和栈跟踪. 交互模式下, 它返回到主提示符, 如果从文件输入执行, 它在打印栈跟踪后以非零状态退出. (在 try 语句中抛出并被 except 从句处理的异常不是这里所讲的错误). 一些非常致命的错误会导致非零状态下退出, 这通常由内部问题或内存溢出造成, 所有的错误信息都写入标准错误流; 命令中执行的普通输出写入标准输出.

在主提示符或从属提示符后输入中断符 (通常是 Control-C 或者 DEL) 就会取消当前输入, 回到主提示符. [2] 执行命令时输入一个中断符会抛出一个 KeyboardInterrupt 异常, 它可以被 try 语句截获.

可执行的 Python 脚本¶

类 BSD 的 UNIX 系统中, Python 脚本可以像 Shell 脚本那样直接执行, 只要在脚本文件开头加一行文本来声明模式:

#! /usr/bin/env python3.2

(要先确认 Python 解释器存在于用户的 PATH 环境变量中). #! 这两个字符必须是文件的头两个字符. 在某些平台上, 第一行必须以 UNIX 风格的行结束符 ('\n') 结束, 不能用 Windows ('\r\n') 的行结束符. 注意 , '#' 用于 Python 一行注释的开始.

脚本可以用 chmod 命令指定可执行模式或权限:

$ chmod +x myscript.py

在 Windows 系统下, 没有 “可持行模式 (executable mode)” 的概念. Python 安装器会自动地把 .py 后缀的文件与 python.exe 绑定, 因此双击一个 Python 文件, 就可以把它作为脚本来运行. 扩展名也可以是 .pyw, 这时工作台窗口会隐藏不被打开.

源程序编码¶

默认情况下, Python 源码文件以 UTF-8 编码. 在这种编码下,世界上大多数语言的字符都可以用于, 字符串常量, 标识符, 以及注释 . 尽管标准库遵循一个所有可移植代码都应遵守的约定: 仅使用 ASCII 字符作为标识符, 这是所有可移植代码都应该遵守的约定.

要正确地显示所有这些字符, 你的编辑器一定要有能力辨认出是 UTF-8 编码, 还要使用一个支持所有文件中字符的字体.

为源文件选择一个另外的编码也是可行的. 为此, 要在 #! 行后面指定一个特殊的注释行, 以定义源码文件的编码:

# -*- coding: encoding -*-

有了这样的声明, 源文件中的所有字符都会被以 encoding 的编码来解读，而非是 UTF-8

在 Python 库参考的 codecs 一节可以找到所有可用的编码.

例如, 如果你使用的编辑器不支持 UTF-8 编码, 但是支持另一种称为 Windows-1252 的编码, 你可以在源码中写上:

# -*- coding: cp-1252 -*-

这样就可以在源码文件中使用 Windows-1252 字符集. 这个特殊的编码注释必须在代码文件的 第一或第二 行.

交互式启动文件¶

交互式地使用 Python 解释器时, 我们可能需要在每次启动时执行一些命令. 为了做到这点, 你可以设置一个名为 PYTHONSTARTUP 的变量, 指向包含启动命令的文件. 这类似于 Unix Shell 的 .profile 文件.

这个文件只在交互式会话中才被读取, 当 Python 从脚本中读取命令或显式地以 /dev/tty 作为命令源时 (尽管它的行为很像是处在交互会话期) 则不会如此. 它与解释器执行的命令处在同一个命名空间, 所以由它定义或引用的一切可以在解释器中不受限制的使用. 你也可以在这个文件中改变 sys.ps1 和 sys.ps2 的值.

如果你想要在当前目录中执行额外的启动文件, 可以在全局启动文件中加入类似以下的代码: if os.path.isfile('.pythonrc.py'): exec(open('.pythonrc.py').read()). 如果你想要在某个脚本中使用启动文件, 必须要在脚本中写入这样的语句:

import os
filename = os.environ.get('PYTHONSTARTUP')
if filename and os.path.isfile(filename):
    exec(open(filename).read())

定制模块¶

Python 为你提供两个钩子 (hook) 来定制交互环境: sitecustomize 和 usercustomize. 要知道它如何工作, 你需要先找到你的 user site-package 目录的位置. 打开 Python 并运行这段代码:

>>> import site
>>> site.getusersitepackages()
'/home/user/.local/lib/python3.2/site-packages'

现在你可以在那个目录下创建一个名为 usercustomize.py 的文件, 并在里面放置任何你想放的东西. 它将影响到每一次 Python 的调用, 除非使用了 -s 选项来禁用了自动导入功能.

sitecustomize 以同样的方式工作, 但通常由该计算机的管理员在全局 site-packages 目录下创建, 并且在 usercustomize 之前被导入. 参看 site 模块的文档获取更多细节.

Footnotes

[1]	在 Unix, Python 3.x 解释器默认不使用可执行文件名 `python` 安装, 所以同时安装 Python 2.x 并不会发生冲突.

[2]	一个GNU Readline 包的问题可能会禁止这个功能.

See also

(^.^)

原文: http://docs.python.org/py3k/tutorial/interpreter.html
初译: 刘鑫
精译: DocsPy3zh
校对: Zoom.Quiet
复审:

非正式介绍Python¶

在以下的例子中, 输入和输出通过提示符 (>>> 和 ...) 来区分: 要运行示例, 你必须在提示符出现后键入提示符后面的所有内容; 不以提示符开头的行是解释器的输出.

注意, 在例子中有以一个次提示符独占一行时意味着你必须加入一个空行; 用来表示多行命令的结束.

这里有许多例子, 都是在交互式的提示符后中输入, 包括注释. Python 中的注释以一个井号, # 开头, 一直延伸到该物理行的最后. 注释既可以出现在一行的开头,也可以跟着空白或代码后面, 但不能在字符串里面. 在字符串里面的井号只是一个井号字符. 因为注释使用来使代码清晰的, 而不会被 Python 解释, 所以在键入例子是可以省略它们.

例如:

# 这是第一个注释
SPAM = 1                 # 这是第二个注释
                         # ... 而现在是第三个!
STRING = "# 这不是注释."

把 Python 当计算器使用¶

让我们尝试一些简单的 Python 命令. 打开解释器, 等待主提示符, >>>, 出现. (这不会很久)

数值¶

解释器可作为简单的计算器: 输入表达式给它, 它将输出表达式的值. 表达式语法很直白: 操作符 +, -, *, / 就像大多数语言一样工作 (例如, Pasal 和 C); 圆括号用以分组. 例如:

>>> 2+2
4
>>> # 这是注释
... 2+2
4
>>> 2+2  # 代码同一行的注释
4
>>> (50-5*6)/4
5.0
>>> 8/5 # 整数相除时并不会丢失小数部分
1.6

注意: 在你那儿,可能结果并不完全相同;因为不同机器上的浮点数结果可能不同待会我们会讲如何控制浮点数输出地显示. 具体参见浮点算术: 问题和限制中关于浮点数的细节和表示法的完整讨论.

要从整数相除中得到一个整数, 丢弃任何小数部分, 可以使用另一个操作符, //:

>>> # 整数相除返回地板数:
... 7//3
2
>>> 7//-3
-3

等号 ('=') 用于把一个值分配给一个变量. 其后不会输出任何结果,而是下一个交互提示符:

>>> width = 20
>>> height = 5*9
>>> width * height
900

一个值可以同时被赋给多个变量:

>>> x = y = z = 0  # 给 x, y 和 z 赋值 0
>>> x
0
>>> y
0
>>> z
0

变量在使用之前必须要被 “定义” (分配一个值), 否则会产生一个错误:

>>> # 尝试访问未定义的变量
... n
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'n' is not defined

Python 完全支持浮点数; 在混合计算时,Pyhton 会把整型转换成为浮点数:

>>> 3 * 3.75 / 1.5
7.5
>>> 7.0 / 2
3.5

复数也有支持; 虚数部分写得时候要加上后缀, j 或 i. 实部非零的复数被写作 (real+imagj), 也可以通过函式 complex(real, imag) 生成.

>>> 1j * 1J
(-1+0j)
>>> 1j * complex(0, 1)
(-1+0j)
>>> 3+1j*3
(3+3j)
>>> (3+1j)*3
(9+3j)
>>> (1+2j)/(1+1j)
(1.5+0.5j)

复数总是表达为两个浮点数, 实部和虚部. 要从复数 z 中抽取这些部分,使用 z.real 和 z.imag

>>> a=1.5+0.5j
>>> a.real
1.5
>>> a.imag
0.5

浮点数和整数的转换函式 (float(), int()) 不能用于复数 — 没有正确的方法能把一个复数转换为一个实数. 使用 abs(z) 得到它的模 (以一个浮点数), 使用 z.real 得到他的实部:

>>> a=3.0+4.0j
>>> float(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: can't convert complex to float; use abs(z)
>>> a.real
3.0
>>> a.imag
4.0
>>> abs(a)  # sqrt(a.real**2 + a.imag**2)
5.0

在交互模式下, 最后一个表达式的值被分配给变量 _. 这意味着当你把 Python 用作桌面计算器时, 可以方便的进行连续计算, 例如:

>>> tax = 12.5 / 100
>>> price = 100.50
>>> price * tax
12.5625
>>> price + _
113.0625
>>> round(_, 2)
113.06

我们应该将这这个变量视作只读的. 不要试图给它赋值 — 否则你会创建一个同名的局部变量, 而隐藏原本内置变量的魔术效果.

字符串¶

除了数字, Python 也可以通过几种不同的方式来操作字符串. 字符串是用单引号或双引号包裹起来的:

>>> 'spam eggs'
'spam eggs'
>>> 'doesn\'t'
"doesn't"
>>> "doesn't"
"doesn't"
>>> '"Yes," he said.'
'"Yes," he said.'
>>> "\"Yes,\" he said."
'"Yes," he said.'
>>> '"Isn\'t," she said.'
'"Isn\'t," she said.'

解释器以字符串键入时相同的方式打印它们: 在引号里面, 使用引号或其它使用反斜杠的转义字符以说明精确的值. 当字符串包含单引号而没有双引号时, 就使用双引号包围它, 否则, 使用单引号. print() 函数为如此的字符串提供了一个更可读的输出.

字符串有几种方法来跨越多行. 继续行可以被使用, 在一行最后加上一个反斜杠以表明下一行是这行的逻辑延续:

hello = "这是一个相当长的字符串包含\n\
几行文本, 就像你在 C 里做的一样.\n\
    注意开通的空白是\
 有意义的."

print(hello)

注意, 换行依旧需要在字符串里嵌入 \n – 在后面的反斜杠后面的换行被丢弃了. 该示例会打印如下内容:

这是一个相当长的字符串包含
几行文本, 就像你在 C 里做的一样.
    注意开通的空白是 有意义的.

另一种方法, 字符串可以使用一对匹配的三引号对包围: """ 或 '''. 当使用三引号时, 回车不需要被舍弃, 他们会包含在字符串里. 于是下面的例子使用了一个反斜杠来避免最初不想要的空行.

print("""\
用途: thingy [OPTIONS]
     -h                        显示用途信息
     -H hostname               连接到的主机名
""")

产生如下输入:

用途: thingy [OPTIONS]
     -h                        显示用途信息
     -H hostname               连接到的主机名

如果我们把字符串变为一个 “未处理” 字符串, \n 序列不会转义成回车, 但是行末的反斜杠, 以及源中的回车符, 都会当成数据包含在字符串里. 因此, 这个例子:

hello = r"这是一个相当长的字符串包含\n\
几行文本, 就像你在 C 里做的一样."

print(hello)

将会打印:

这是一个相当长的字符串包含\n\
几行文本, 就像你在 C 里做的一样.

字符串可以使用 + 操作符来连接 (粘在一起), 使用 * 操作符重复:

>>> word = 'Help' + 'A'
>>> word
'HelpA'
>>> '<' + word*5 + '>'
'<HelpAHelpAHelpAHelpAHelpA>'

两个靠着一起的字符串会自动的连接; 上面例子的第一行也可以写成 word = 'Help' 'A'; 这只能用于两个字符串常量, 而不能用于任意字符串表达式:

>>> 'str' 'ing'                   #  <-  可以
'string'
>>> 'str'.strip() + 'ing'   #  <-  可以
'string'
>>> 'str'.strip() 'ing'     #  <-  不正确
  File "<stdin>", line 1, in ?
    'str'.strip() 'ing'
                      ^
SyntaxError: invalid syntax

字符串可以使用下标 (索引); 就像 C 一样, 字符串的第一个字符的下标 (索引) 为 0. 没有独立的字符串类型; 一个字符串就是一个大小为一的字符串. 就像 Icon 程序语言一样, 子字符串可以通过*切片符号*指定: 冒号分隔的两个索引.

>>> word[4]
'A'
>>> word[0:2]
'He'
>>> word[2:4]
'lp'

切片索引有一些有用的默认值; 省略的第一个索引默认为零, 省略的第二个索引默认为字符串的大小.

>>> word[:2]    # 头两个字符
'He'
>>> word[2:]    # 除了头两个字符
'lpA'

与 C 字符串不一样, Python 字符串不可以改变. 给字符串的索引位置赋值会产生一个错误:

>>> word[0] = 'x'
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: 'str' object does not support item assignment
>>> word[:1] = 'Splat'
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: 'str' object does not support slice assignment

然而, 使用内容组合创建新字符串是简单和有效的:

>>> 'x' + word[1:]
'xelpA'
>>> 'Splat' + word[4]
'SplatA'

这有一个有用的切片操作的恒等式: s[:i] + s[i:] 等于 s.

>>> word[:2] + word[2:]
'HelpA'
>>> word[:3] + word[3:]
'HelpA'

退化的切片索引被处理地很优雅: 太大的索引会被字符串大小所代替, 上界比下界小就返回空字符串.

>>> word[1:100]
'elpA'
>>> word[10:]
''
>>> word[2:1]
''

索引可以是负数, 那样就会从右边开始算起. 例如:

>>> word[-1]     # 最后一个字符
'A'
>>> word[-2]     # 倒数第二个字符
'p'
>>> word[-2:]    # 最后两个字符
'pA'
>>> word[:-2]    # 除最后两个字符的其他字符
'Hel'

但是要注意, -0 与 0 是完全一样的, 因此它不会从右边开始数!

>>> word[-0]     # (因为 -0 等于 0)
'H'

越界的负切片索引会被截断, 当不要对单元素 (非切片) 索引使用越界索引.

>>> word[-100:]
'HelpA'
>>> word[-10]    # 错误
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
IndexError: string index out of range

记忆切片工作方式的一个方法是把索引看作是字符*之间*的点, 第一字符的左边记作 0. 包含 n 个字符的字符串最后一个字符的右边的索引就是 n, 例如:

 +---+---+---+---+---+
 | H | e | l | p | A |
 +---+---+---+---+---+
 0   1   2   3   4   5
-5  -4  -3  -2  -1

第一行给出了索引 0...5 在字符串里的位置; 第二行给出了相应的负索引. i 到 j 的切片由标号为 i 和 j 的边缘中间的字符所构成.

对于没有越界非负索引, 切片的长度就是两个索引之差. 例如, word[1:3] 的长度是 2.

内建函数 len() 返回字符串的长度:

>>> s = 'supercalifragilisticexpialidocious'
>>> len(s)
34

See also

Sequence Types — str, bytes, bytearray, list, tuple, range: 字符串是*序列类型*的例子, 支持该类型的一般操作.
String Methods: 字符串支持大量用与基本变换和搜索的方法.
String Formatting: 在这描述了使用 str.format() 格式字符串的信息.
Old String Formatting Operations: 当字符串和 Unicode 字符串为 % 操作符的左操作数时, 老的格式操作就会被调用, 在这里描述了更多细节.

关于 Unicode¶

自 Python 3.0 开始, 所有字符串都支持 Unicode (参见 http://www.unicode.org/). Unicode 的益处在于它为自古至今所有文本中使用的每个字符提供了一个序号. 在以前, 只有 256 个序号表示文字字符. 一般地, 文本被一个映射序号到文本字符的编码页所限制. 尤其在软件的国际化 (internationalization, 通常被写作 i18n — 'i' + 18 个字符 + 'n') 时尤其混乱. Unicode 通过为所有文本定义一个编码页解决了这些难题.

如果你想在字符串里加入特殊字符, 可以使用 Unicode-Escape 编码. 下面的例子说明了如何做到这点:

>>> 'Hello\u0020World !'
'Hello World !'

转义序列 \u0020 表明在给出的位置, 使用序号值 0x0020 (空格字符), 插入这个 Unicode 字符.

其它字符通过直接地使用它们各自的序号值作为 Unicode 序号而被解释. 如果你有使用标准 Latin-1 编码, 在很多西方国家里使用, 的字符串, 你会方便地发现 Unicode 的前 256 个字符与 Latin-1 的一样.

除这些标准编码以外, Python 还提供了整套其它方法, 通过一个已知编码的基础来创建 Unicode 字符串.

字符串对象提供了一个 encode() 方法, 用于使用一个特殊编码转换字符串到字节序列, 该方法带有一个参数, 编码的名字. 优先选择编码的小写名字.

>>> "Äpfel".encode('utf-8')
b'\xc3\x84pfel'

列表¶

Python 有一些*复合*数据类型, 用来把其它值分组. 最全能的就是 list, 它可以写为在方括号中的通过逗号分隔的一列值 (项). 列表的项并不需要是同一类型.

>>> a = ['spam', 'eggs', 100, 1234]
>>> a
['spam', 'eggs', 100, 1234]

就像字符串索引, 列表的索引从 0 开始, 列表也可以切片, 连接等等:

>>> a[0]
'spam'
>>> a[3]
1234
>>> a[-2]
100
>>> a[1:-1]
['eggs', 100]
>>> a[:2] + ['bacon', 2*2]
['spam', 'eggs', 'bacon', 4]
>>> 3*a[:3] + ['Boo!']
['spam', 'eggs', 100, 'spam', 'eggs', 100, 'spam', 'eggs', 100, 'Boo!']

所有的切片操作返回一个包含请求元素的新列表. 这意味着, 下面的的切片返回列表 a 的一个浅复制:

>>> a[:]
['spam', 'eggs', 100, 1234]

不像*不可变*的字符串, 改变列表中单个元素是可能的.

>>> a
['spam', 'eggs', 100, 1234]
>>> a[2] = a[2] + 23
>>> a
['spam', 'eggs', 123, 1234]

为切片赋值同样可能, 这甚至能改变字符串的大小, 或者完全的清除它:

>>> # 替代一些项:
... a[0:2] = [1, 12]
>>> a
[1, 12, 123, 1234]
>>> # 移除一些:
... a[0:2] = []
>>> a
[123, 1234]
>>> # 插入一些:
... a[1:1] = ['bletch', 'xyzzy']
>>> a
[123, 'bletch', 'xyzzy', 1234]
>>> # 在开始处插入自身 (的一个拷贝)
>>> a[:0] = a
>>> a
[123, 'bletch', 'xyzzy', 1234, 123, 'bletch', 'xyzzy', 1234]
>>> # 清除列表: 用空列表替代所有的项
>>> a[:] = []
>>> a
[]

内建函数 len() 同样对列表有效:

>>> a = ['a', 'b', 'c', 'd']
>>> len(a)
4

嵌套列表 (创建包含其它列表的列表) 是可能的, 例如:

>>> q = [2, 3]
>>> p = [1, q, 4]
>>> len(p)
3
>>> p[1]
[2, 3]
>>> p[1][0]
2

你可以在列表末尾加入一些东西:

>>> p[1].append('xtra')
>>> p
[1, [2, 3, 'xtra'], 4]
>>> q
[2, 3, 'xtra']

注意在最后的例子里, p[1] 和 q 确实指向同一个对象! 我们在以后会回到*对象语义*.

编程第一步¶

当然, 我们可以使用 Python 做比 2 + 2 更复杂的任务. 例如, 我们可以如下的写出 Fibonacci 序列的最初子序列:

>>> # Fibonacci 序列:
... # 两个元素的值定义下一个
... a, b = 0, 1
>>> while b < 10:
...     print(b)
...     a, b = b, a+b
...
1
1
2
3
5
8

这个例子介绍了几个新特性.

第一行包括一次*多重赋值*: 变量 a 和 b 同时地得到新值 0 和 1. 在最后一行又使用了一次, 演示了右边的表达式在任何赋值之前就已经被计算了. 右边表达式从左至右地计算.
当条件 (在这里: b < 10) 保持为真时, while 循环会一直执行. 在 Python 中, 就像 C 里一样, 任何非零整数都为真; 零为假. 条件也可以是字符串或列表, 实际上可以是任意序列; 长度不为零时就为真, 空序列为假. 本例中使用的测试是一个简单的比较. 标准比较符与 C 中写得一样: < (小于), > (大于), == (等于), <= (小于或等于), >= (大于或等于) 和 != (不等于).
循环*体*是*缩进*的: 缩进是 Python 分组语句的方法. Python 不 (到目前!) 提供智能输入行编辑功能, 因此, 你需要为每个缩进键入制表符或空格. 在练习中, 你会使用一个文本编辑器来为 Python 准备更复杂的输入; 大多文本编辑器带有自动缩进功能. 当一个复合语句交互地输入时, 必须跟上一个空行以表明语句结束 (因为语法分析器猜不到何时你键入了最后一行). 注意, 在同一基本块里的每一行必须以同一个数量缩进.
print() 函数写出给它的表达是的值. 它与就写出你想要写的表达式有所不同 (就像我们在之前计算器例子中一样), 它可以处理多个表达式, 浮点数, 和字符串. 打印字符串时没有引号, 在不同项之间插入了一个空格, 因此, 你可以把东西格式得漂亮, 就像这:
```
>>> i = 256*256
>>> print('The value of i is', i)
The value of i is 65536
```
关键词 end 可以用来避免输出后的回车, 或者以一个不同的字符串结束输出:
```
>>> a, b = 0, 1
>>> while b < 1000:
...     print(b, end=',')
...     a, b = b, a+b
...
1,1,2,3,5,8,13,21,34,55,89,144,233,377,610,987,
```

See also

(^.^)

原文: http://docs.python.org/py3k/tutorial/introduction.html
初译: 刘鑫
精译: DocsPy3zh
校对: Zoom.Quiet
复审:

深入流程控制¶

除了刚介绍的 while 语句外, Python 也支持有其它语言中通见的流程控制语句, 当然有点小改动.

`if` 语句¶

也许最为人所知的语句类型就是 if 语句了. 例如:

>>> x = int(input("Please enter an integer: "))
Please enter an integer: 42
>>> if x < 0:
...      x = 0
...      print('Negative changed to zero')
... elif x == 0:
...      print('Zero')
... elif x == 1:
...      print('Single')
... else:
...      print('More')
...
More

这里可以有零个或多个 elif 分支, 而 else 是可选的. 关键字 ‘elif‘ 是 ‘else if’ 的缩写, 它可以有效避免过度缩进. if ... elif ... elif ... 序列是其它语言中 switch 或 case 语句的替代.

`for` 语句¶

Python 中的 for 语句与你在 C 或是 Pascal 中使用的略有不同. 不同于在 Pascal 中总是依据一个等差的数值序列迭代, 也不同于在 C 中允许用户同时定义迭代步骤和终止条件, Python 中的 for 语句在任意序列 (列表或者字符串) 中迭代时, 总是按照元素在序列中的出现顺序依次迭代.

for example (这行不循环;-):

>>> # 测试一些字符串:
... a = ['cat', 'window', 'defenestrate']
>>> for x in a:
...     print(x, len(x))
...
cat 3
window 6
defenestrate 12

在循环过程中修改被迭代的对象是不安全的 (这只可能发生在可变序列类型上,如列表).

若想在循环体内修改你正迭代的序列 (例如复制序列中选定的项), 最好是先制作一个副本. 但是,在序列上的迭代并不会自动隐式地创建一个副本.

而切片则让这种操作十分方便:

>>> for x in a[:]: # 制造整个列表的切片复本
...    if len(x) > 6: a.insert(0, x)
...
>>> a
['defenestrate', 'cat', 'window', 'defenestrate']

`range()` 函式¶

如果你需要一个数值序列, 使用内建函式 range() 会很方便. 它产生等差级数序列:

>>> for i in range(5):
...     print(i)
...
0
1
2
3
4

给出的终止点不会在生成的序列里; range(10) 生成 10 个值, 组成一个长度为10的合法序列. 可以让 range 的起始初值定为另一个数, 也可以指定一个不同的增量 (甚至可以为负; 有时这被称为 ‘步长’):

range(5, 10)
   5 through 9

range(0, 10, 3)
   0, 3, 6, 9

range(-10, -100, -30)
  -10, -40, -70

要对一个序列的索引进行迭代的话, 组合使用 range() 和 len():

>>> a = ['Mary', 'had', 'a', 'little', 'lamb']
>>> for i in range(len(a)):
...     print(i, a[i])
...
0 Mary
1 had
2 a
3 little
4 lamb

多数情况中, 用 enumerate() 函式更加方便, 参见遍历技巧.

当你想打印 range 时, 会奇怪:

>>> print(range(10))
range(0, 10)

在很多时候, 由 range() 返回的对象表现得就像一个列表, 但实际上它不是. 如果你对其进行迭代时, 它能返回所需要的连续项, 但实际上为了节省空间并没有真正地生成制造一个列表.

我们称这种对象叫做 iterable , 也就是说, 某些函式和构造器期望能从对象连续接收元素直至终结, 我们称这种对象叫做 iterable (可迭代的).

我们已经看到 for 语句就是这种迭代器 ( iterator ). list() 是另一个; 它从可迭代对象中生成列表:

>>> list(range(5))
[0, 1, 2, 3, 4]

后面我们将看到更多返回 可迭代对象 和将 可迭代对象 作为参数的函式.

`break` 和 `continue` 语句, 以及循环中的 `else` 子句¶

break 语句, 像 C 里的一样, 跳出最小的 for 或 while 循环. 循环语句可以有一个 else 子句; 当循环因耗尽整个列表而终止时 (使用 for) 或者当条件变为假时 (使用 while), 它就会被执行, 但是, 如果循环因为 break 语句终止的话, 它不会被执行. 下面的搜索质数的例子将证明这点:

>>> for n in range(2, 10):
...     for x in range(2, n):
...         if n % x == 0:
...             print(n, 'equals', x, '*', n//x)
...             break
...     else:
...         # 循环因为没有找到一个因数而停止
...         print(n, 'is a prime number')
...
2 is a prime number
3 is a prime number
4 equals 2 * 2
5 is a prime number
6 equals 2 * 3
7 is a prime number
8 equals 2 * 4
9 equals 3 * 3

(是的, 这是正确的代码. 仔细看: else 子句属于 for 循环, 而非是 if 语句)

与循环搭配使用时, else 子句的行为和它与 try 语句的搭配时相对于与 if 语句的搭配时有更多共性: try 语句的 else 子句在没有异常发生时被执行, 循环的 else 子句在没有 break 语句是被执行.

查阅异常处理一节获取更多关于 try 语句和异常的信息.

continue 语句同样是从 C 语言借用的, 它终止当前迭代而进行循环的下一次迭代.

>>> for num in range(2, 10):
...     if num % 2 == 0:
...         print("Found an even number", num)
...         continue
...     print("Found a number", num)
Found an even number 2
Found a number 3
Found an even number 4
Found a number 5
Found an even number 6
Found a number 7
Found an even number 8
Found a number 9

`pass` 语句¶

pass 语句什么都不做. 当语法上需要一个语句, 但程序不要动作时, 就可以使用它. 例如:

>>> while True:
...     pass  # 忙等待键盘中断 (Ctrl+C)
...

一般也可以用于创建最小类:

>>> class MyEmptyClass:
...     pass
...

另一个使用 pass 的地方是, 作为函式或条件体的占位符, 当你在新代码工作时, 它让你能保持在更抽象的级别思考. pass 会被默默地被忽略:

>>> def initlog(*args):
...     pass   # 记得实现这里!
...

定义函式¶

我们可以创建函式来输出任意指定范围内的菲波那契(Fibonacci) 数列:

>>> def fib(n):    # 打印 Fibonacci 序列到 n
...     """打印到 n 的 Fibonacci 序列."""
...     a, b = 0, 1
...     while a < n:
...         print(a, end=' ')
...         a, b = b, a+b
...     print()
...
>>> # 现在调用我们刚定义的函式:
... fib(2000)
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597

关键字 def 引入了一个函式定义. 后面必须跟上函式名和在圆括号里的参数序列. 函式体从一行开始, 并且一定要缩进.

函式体的第一个语句可以是字串; 这个字串就是函式的文档字符串, 或称为 docstring. (可以在文档字串一节找到更多信息) 有很多能将文档字串自动转换为在线或可打印文档的工具, 或让用户在代码中交互地浏览它的工具; 在代码里加上文档字符串是一个好的实践, 因此, 请养成这个习惯.

执行函式,会引入新的符号表(symbol table)用于该函式的局部变量. 更精确地说, 所有在函式中被赋值的变量和值都将存储在局部符号表中; 鉴于变量引用会首先在局部符号表里寻找, 然后才是闭包函式的局部符号表, 再然后是全局变量, 最后是内建名字表. 因此, 在函式中的尽管全局变量可以引用,但是不可直接赋值 (除非用 global 语句进行声明).

函式的实参在它被调用时被引入到这个函式的局部变量表; 因此, 参数是按值传递的 (值总是对象的一个引用 , 而不是对象本身的值). [1] 当一个函式调用另一个时, 对应这次调用,一个新的局部符号表就会被创建.

函式定义会在当前的符号表里引入该函式的名字. 函式名对应的值被解释器认定为自定义函式类型函式名的值可以被赋予另一个名字, 使其也能作为函式使用. 这是常规的重命名机制:

>>> fib
<function fib at 10042ed0>
>>> f = fib
>>> f(100)
0 1 1 2 3 5 8 13 21 34 55 89

根据其它语言的经验, 你可能会指出 fib 不是一个函式, 而是一个程序, 因为它不返回值. 事实上, 即使没有 return 语句的函式也会返回一个值, 尽管这个值相当无聊. 这个值名为 None (它是个内建名字). 如果要唯一输出的值是 None, 那么解释器会正当的抑制这次返回. 如你实在想看看这个值,可以使用 print():

>>> fib(0)
>>> print(fib(0))
None

写个返回 Fibonacci 序列而不是打印输出的函式, 很简单:

>>> def fib2(n): # 放回直到 n 的 Fibonacci 序列
...     """返回一个列表, 包含直到 n 的 Fibonacci 序列."""
...     result = []
...     a, b = 0, 1
...     while a < n:
...         result.append(a)    # 见下文
...         a, b = b, a+b
...     return result
...
>>> f100 = fib2(100)    # 调用
>>> f100                # 输出结果
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]

像往常一样, 这里介绍了一些 Python 特性:

return 语句从函式中返回一个值. 没有表达式参数的 return 语句返回 None. 直到函式结束也没有 return 语句也返回 None.
语句 result.append(a) 调用了列表对象 result 的一个方法. 所谓 方法 就是 ‘属于’ 对象的函式, 调用形式为 obj.methodname, 在这里 obj 是某对象的名字 (这可能是个表达式), methodname 是此对象类型定义中一个方法的名字. 在不同类型中定义的方法是不同的. 不同类型中定义相同名字的方法不会引起歧义. (你可以定义自己的对象类型和方法, 参阅类) 例子中的 append() 方法是为列表对象定义的; 能在列表的末尾添加新的元素. 在本例中, 等价于 result = result + [a], 但相对而言更加高效.

深入函式定义¶

函式定义时候可以带若干参数, 有三种可以组合使用的不同形式.

默认参数¶

最有用的形式是为一个或更多参数指定默认值. 这样创建的函式调用时可以不用给足参数. 例如:

def ask_ok(prompt, retries=4, complaint='Yes or no, please!'):
    while True:
        ok = input(prompt)
        if ok in ('y', 'ye', 'yes'):
            return True
        if ok in ('n', 'no', 'nop', 'nope'):
            return False
        retries = retries - 1
        if retries < 0:
            raise IOError('refusenik user')
        print(complaint)

这个函式有以下几种合法调用形式:

仅给出强制的参数: ask_ok('Do you really want to quit?')
多出一个可选参数: ask_ok('OK to overwrite the file?', 2)
或给出所有参数: ask_ok('OK to overwrite the file?', 2, 'Come on, only yes or no!')

这个例子也引入了一个关键字, in 用以测试序列中是否包含某一值.

默认参数的值等于函式 定义域 中的值, 因此

i = 5

def f(arg=i):
    print(arg)

i = 6
f()

将打印 5.

重要警告: 默认参数的值只会被求一次值. 但这在默认参数是可变参数的情况下就不一样了, 如列表, 字典, 或大多类的对象时. 例如, 下面的函式在随后的调用中会累积参数值:

def f(a, L=[]):
    L.append(a)
    return L

print(f(1))
print(f(2))
print(f(3))

将会打印

[1]
[1, 2]
[1, 2, 3]

如果你不想让参数值被后来的调用共享, 你可以改写成这样:

def f(a, L=None):
    if L is None:
        L = []
    L.append(a)
    return L

关键字参数¶

函式也可以通过 keyword = value 形式的关键字参数来调用. 例如, 下面的函式:

def parrot(voltage, state='a stiff', action='voom', type='Norwegian Blue'):
    print("-- This parrot wouldn't", action, end=' ')
    print("if you put", voltage, "volts through it.")
    print("-- Lovely plumage, the", type)
    print("-- It's", state, "!")

通过以下任一方法调用:

parrot(1000)
parrot(action = 'VOOOOOM', voltage = 1000000)
parrot('a thousand', state = 'pushing up the daisies')
parrot('a million', 'bereft of life', 'jump')

但如下的调用是非法的:

parrot()                     # 缺少必要的参数
parrot(voltage=5.0, 'dead')  # 在关键字后面跟着非关键字参数
parrot(110, voltage=220)     # 同一参数给了多个值
parrot(actor='John Cleese')  # 未知关键字

在函式调用时, 关键字参数必须跟在位置参数之后. 所有的关键字参数都必须与函式接受的形式参数匹配 (例如, actor 在函式 parrot 看来就是非法参数), 但他们的顺序是无关紧要的. 这条规则也适用于非可选参数 (例如, parrot(voltage=1000) 也可以的). 任何 形参 都不能多次接受传值. 下面的例子产生错误的原因正是违反了这一约定:

>>> def function(a):
...     pass
...
>>> function(0, a=0)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: function() got multiple values for keyword argument 'a'

当最后一个形参的形式为 **name 时, 则排除其他的形参的值, 它将以字典 (参阅映射类型——字典) 的形式包含所有剩余关键字参数. 这种调用可以与具有 *name 形式的形式参数 (在下一小节中介绍) 联合使用, 这种形参接受所有超出函式接受范围的位置参数. ( *name 必须在 **name 之前使用) 例如, 如果我们像这样定义一个函式:

def cheeseshop(kind, *arguments, **keywords):
    print("-- Do you have any", kind, "?")
    print("-- I'm sorry, we're all out of", kind)
    for arg in arguments:
        print(arg)
    print("-" * 40)
    keys = sorted(keywords.keys())
    for kw in keys:
        print(kw, ":", keywords[kw])

它可以如下地调用:

cheeseshop("Limburger", "It's very runny, sir.",
           "It's really very, VERY runny, sir.",
           shopkeeper="Michael Palin",
           client="John Cleese",
           sketch="Cheese Shop Sketch")

当然它将打印:

-- Do you have any Limburger ?
-- I'm sorry, we're all out of Limburger
It's very runny, sir.
It's really very, VERY runny, sir.
----------------------------------------
client : John Cleese
shopkeeper : Michael Palin
sketch : Cheese Shop Sketch

注意, 关键字参数名的列表是通过之前对字典 keys() 进行排序操作而创建的; 如果不这样做, 参数打印的顺序是不确定的.

任意参数表¶

最后, 最不常用的选择, 是指定函式能够在调用时接受任意数量的参数. 这些参数会被包装进一个元组 (参看元组和序列). 在变长参数之前, 可以使用任意多个正常参数

def write_multiple_items(file, separator, *args):
    file.write(separator.join(args))

一般地, 这种 variadic 参数必须在形参列表的末尾, 因为它们将接收传递给函式的所有剩余输入参数. 任何出现在 *arg 之后的形式参数只能是关键字参数, 这意味着它们只能使用关键字参数的方式接收传值, 而不能使用位置参数.

>>> def concat(*args, sep="/"):
...    return sep.join(args)
...
>>> concat("earth", "mars", "venus")
'earth/mars/venus'
>>> concat("earth", "mars", "venus", sep=".")
'earth.mars.venus'

参数列表解包¶

也存在相反的情形: 当参数存在于一个既存的列表或者元组之中, 但却需要解包以若干位置参数的形式被函数调用. 例如, 内建的 range() 函数期望接收分别的开始和结束的位置参数. 如果它们不是分别可用 (而是同时存在于一个列表或者元组中), 下面是一个利用 *`-操作符解从列表或者元组中解包参数以供函数调用的例子:

>>> list(range(3, 6))            # 使用分离的参数正常调用
[3, 4, 5]
>>> args = [3, 6]
>>> list(range(*args))           # 通过解包列表参数调用
[3, 4, 5]

同样的, 字典可以通过 **-操作符来解包参数:

>>> def parrot(voltage, state='a stiff', action='voom'):
...     print("-- This parrot wouldn't", action, end=' ')
...     print("if you put", voltage, "volts through it.", end=' ')
...     print("E's", state, "!")
...
>>> d = {"voltage": "four million", "state": "bleedin' demised", "action": "VOOM"}
>>> parrot(**d)
-- This parrot wouldn't VOOM if you put four million volts through it. E's bleedin' demised !

Lambda 形式¶

根据大众的需要, 一些通常出现在诸如 Lisp 等函式编程语言中的特性也已被加入到了 Python. 使用关键字 lambda, 就可以创建短小的匿名函式. 这就是能返回它两个参数和的函式: lambda a, b: a+b. Lambda 形式可以在任意需要函式对象的地方使用. 语法上限制它们为单一的表达式. 像内嵌函式一样, lambda 形式可以引用当前域里的变量:

>>> def make_incrementor(n):
...     return lambda x: x + n
...
>>> f = make_incrementor(42)
>>> f(0)
42
>>> f(1)
43

文档字符串¶

这里介绍一些文档字串有关内容和格式的约定.

第一行总应当是对该对象的目的进行简述. 为了简短, 它不用显式地陈述对象的名字或类型, 因为都是可以用其它手段获得 (除非这名字恰巧是描述函式操作的动词). 这行应当以一个大写字母开始, 并以句号结束. ^{译注:出于良好的编程素养考虑,尽可能的用E文注释吧.}

如果这个文档字符串不只一行, 那么第二行应当为空, 以能从视觉上分隔概述和其它部分. 接下来的行应当为一个或更多段来描述该对象的调用条件, 它的边界效应等等.

Python 的语法分析器并不会去除多行字符串里的缩进, 所以必要的时候, 就不得不使用处理文档的工具来去除缩进. 使用下面这条约定. 在文档字符串第一行后的第一个非空行决定整个文档字符串缩进的数量. (我们不使用第一行的原因是它通常与字符串的外引号相连而使得它的缩进不明显.) 留白 “相当于” 是文档字串的起始缩进将会被清除. 每行不应该当有不足的缩进, 如果有前导空白,将会全部清除. 由制表符扩展成的空白应该测试是否可用(一般被兑换成 8 个空格).

这有一个多行文档的例子:

>>> def my_function():
...     """Do nothing, but document it.
...
...     No, really, it doesn't do anything.
...     """
...     pass
...
>>> print(my_function.__doc__)
Do nothing, but document it.

    No, really, it doesn't do anything.

插曲: 代码风格¶

从现在开始, 你将写更长更复杂的 Python 代码, 是时候谈论 代码风格 了. 大多语言可以用不同风格写 (简洁地说: 格式化) 代码; 总是有一些会比其它的更具可读性. 使其它人能够轻松读懂你的代码通常是个好主意, 而接受一个漂亮的代码风格会对那有很大的帮助.

对于 Python, PEP 8 已经呈现了大多数项目遵循的风格; 它宣传了一种十分可读而悦目的代码风格. 每个 Python 开发者都应当在某个时刻阅读它; 这里为你萃取了最重要的几点:

使用 4-空格缩进, 且没有制表符.

4 空格是在小缩进 (允许更多嵌套) 和大缩进 (更易读) 之间的好的妥协. 制表符会带来混乱, 最好不要使用.
设定自动换行 (Wrap),使它们不超过 79 个字符.

这会帮助小屏幕的用户, 而且使得可以在大屏幕上同时显示几个代码文件成为可能.
使用空行分隔函式和类, 以及函式中的大的代码块.
尽可能令注释独占一行.
使用文档字串.
在操作符两边, 逗号后面使用空格, 但是括号内部与括号之间直接相连的部分不要空格: a = f(1, 2) + g(3, 4).
保持类名和函式名的一致性; 约定是, 类名使用 CamelCase 格式, 方法名和函式名使用 lower_case_with_underscres 形式. 永远使用 self 作为方法的第一个参数名 (参阅类的初印象获得更多有关类和方法的信息).
若代码打算用在国际化的环境中, 那么不要使用奇特的编码. Python 默认的 UTF-8, 或者甚至是简单的 ASCII 在任何情况下工作得最好.
同样地, 如果代码的读者或维护者只有很小的概率使用不同的语言, 那么不要在标识符里使用非ASCII 字符.

Footnotes

[1]	实际上, 通过对象引用调用会是个更好的描述, 因为如果传入了一个可变参数, 调用者将看到被调用者对它作出的任何改变 (项被插入到列表).

Function Annotations¶

Function annotations are completely optional, arbitrary metadata information about user-defined functions. Neither Python itself nor the standard library use function annotations in any way; this section just shows the syntax. Third-party projects are free to use function annotations for documentation, type checking, and other uses.

Annotations are stored in the __annotations__ attribute of the function as a dictionary and have no effect on any other part of the function. Parameter annotations are defined by a colon after the parameter name, followed by an expression evaluating to the value of the annotation. Return annotations are defined by a literal ->, followed by an expression, between the parameter list and the colon denoting the end of the def statement. The following example has a positional argument, a keyword argument, and the return value annotated with nonsense: >>>

>>> def f(ham: 42, eggs: int = 'spam') -> "Nothing to see here":
...     print("Annotations:", f.__annotations__)
...     print("Arguments:", ham, eggs)
...
>>> f('wonderful')
Annotations: {'eggs': <class 'int'>, 'return': 'Nothing to see here', 'ham': 42}
Arguments: wonderful spam

See also

(^.^)

原文: http://docs.python.org/py3k/tutorial/controlflow.html
初译: 刘鑫
精译: DocsPy3zh
校对: Zoom.Quiet
复审:

数据结构¶

本章深入讲述一些你已经学过的东西, 当然也同样增加了一些新的内容.

深入列表¶

列表数据类型还有一些方法. 这里把列表对象的所有的方法都列了出来:

list.append(x): 在列表的尾部添加一个项; 等价于 a[len(a):] = [x].

list.extend(L): 用给入的列表将当前列表接长; 等价于 a[len(a):] = L.

list.insert(i, x): 在给定的位置上插入项. 第一个参数就是准备在它之前插入的元素的索引, 因此 a.insert(0, x) 会在列表的头部插入, 而 a.insert(len(a), x) 则等价于 a.append(x).

list.remove(x): 移除列表中第一个值为 x 的项. 没有符合要求的项时, 会产生一个错误.

list.pop([i]): 删除列表给定位置的项, 并返回它. 如果没有指定索引, a.pop 移除并返回列表的最后一项. (函式原型的 i 在中方括号中 意味着它是一个可选参数, 而不是你应当在那里键入一个方括号. 你将会在 Python 库参考中经常见到这种表示法.)

list.index(x): 返回列表中第一个值为 x 的项索引值. 如果没有匹配的项, 则产生一个错误.

list.count(x): 返回列表中 x 出现的次数.

list.sort(): 就地完成列表排序.

list.reverse(): 就地完成列表项的翻转.

下面这个示例演示了列表的大部分方法:

>>> a = [66.25, 333, 333, 1, 1234.5]
>>> print(a.count(333), a.count(66.25), a.count('x'))
2 1 0
>>> a.insert(2, -1)
>>> a.append(333)
>>> a
[66.25, 333, -1, 333, 1, 1234.5, 333]
>>> a.index(333)
1
>>> a.remove(333)
>>> a
[66.25, -1, 333, 1, 1234.5, 333]
>>> a.reverse()
>>> a
[333, 1234.5, 1, 333, -1, 66.25]
>>> a.sort()
>>> a
[-1, 1, 66.25, 333, 333, 1234.5]

把列表当成堆栈用¶

列表的方法使得其能十分简便的当成堆栈来使用, 堆栈的特性是最后添加的元素就是第一个取出的元素 (即”后入先出”). 要在栈顶添加一个项, 就使用 append(). 要从栈顶取回一个项, 就使用不带显式索引的 pop(). 例如:

>>> stack = [3, 4, 5]
>>> stack.append(6)
>>> stack.append(7)
>>> stack
[3, 4, 5, 6, 7]
>>> stack.pop()
7
>>> stack
[3, 4, 5, 6]
>>> stack.pop()
6
>>> stack.pop()
5
>>> stack
[3, 4]

把列表当队列使用¶

也可以把列表当成队列使用, 队列的特性是第一个添加的元素就是第一个取回的元素 (即”先入先出”); 然而, 这时列表是低效的. 从列表的尾部添加和弹出是很快的, 而在列表的开头插入或弹出是慢的 (因为所有元素都得移动一个位置).

要实现一个队列, 使用 collection.deque, 它被设计成在两端添加和弹出都很快. 例如:

>>> from collections import deque
>>> queue = deque(["Eric", "John", "Michael"])
>>> queue.append("Terry")           # Terry 进入
>>> queue.append("Graham")          # Graham 进入
>>> queue.popleft()                 # 第一个进入的现在离开
'Eric'
>>> queue.popleft()                 # 第二个进入的现在离开
'John'
>>> queue                           # 剩余的队列, 它按照进入的顺序排列
deque(['Michael', 'Terry', 'Graham'])

列表推导式¶

列表推导式提供了从序列中创建列表的简便途径. 通常程序会对序列的每一个元素做些操作,并以其结果作为新列表的元素, 或者根据指定的条件来创建子序列.

而列表推导式的结构是, 在一个方括号里, 首先是一个表达式, 随后是一个 for 子句, 然后是零个或更多的 for 或 if 子句. 结果将是通过计算 for 和 if 子句来获得的一个列表. 如果要使表达式推导式出元组, 就必须用圆括号.

这里我们将一个数字列表每个元素翻三倍从而生成一个新列表:

>>> vec = [2, 4, 6]
>>> [3*x for x in vec]
[6, 12, 18]

现在加点儿小花样:

>>> [[x, x**2] for x in vec]
[[2, 4], [4, 16], [6, 36]]

这里我们对序列里每一项逐个调用某方法:

>>> freshfruit = ['  banana', '  loganberry ', 'passion fruit  ']
>>> [weapon.strip() for weapon in freshfruit]
['banana', 'loganberry', 'passion fruit']

我们可以用 if 子句来进行过滤:

>>> [3*x for x in vec if x > 3]
[12, 18]
>>> [3*x for x in vec if x < 2]
[]

元组经常能不用圆括号而创建, 但这里不行:

>>> [x, x**2 for x in vec]  # error - parens required for tuples
  File "<stdin>", line 1, in ?
    [x, x**2 for x in vec]
               ^
SyntaxError: invalid syntax
>>> [(x, x**2) for x in vec]
[(2, 4), (4, 16), (6, 36)]

这里是一些循环的嵌套和其它技巧的演示:

>>> vec1 = [2, 4, 6]
>>> vec2 = [4, 3, -9]
>>> [x*y for x in vec1 for y in vec2]
[8, 6, -18, 16, 12, -36, 24, 18, -54]
>>> [x+y for x in vec1 for y in vec2]
[6, 5, -7, 8, 7, -5, 10, 9, -3]
>>> [vec1[i]*vec2[i] for i in range(len(vec1))]
[8, 12, -54]

列表推导式可使用复杂的表达式和嵌套的函式:

>>> [str(round(355/113, i)) for i in range(1, 6)]
['3.1', '3.14', '3.142', '3.1416', '3.14159']

嵌套列表推导式¶

如果你受的了的话, 其实列表推导式是可以嵌套的. 它的确是个强大的工具, 但 – 就像所有强大的工具一样 – 需要被小心地使用,

考虑下面的例子, 有一个 3x3 的矩阵, 存储为一个包含三个列表的列表, 每一行一个列表:

>>> mat = [
...        [1, 2, 3],
...        [4, 5, 6],
...        [7, 8, 9],
...       ]

现在, 如果你想交换行和列, 可以使用列表推导式:

>>> print([[row[i] for row in mat] for i in [0, 1, 2]])
[[1, 4, 7], [2, 5, 8], [3, 6, 9]]

使用*嵌套*列表推导式时特别需要注意:

从右至左地阅读嵌套列表推导式更容易理解.

该代码的冗长版本, 就明白地表述了流程:

for i in [0, 1, 2]:
    for row in mat:
        print(row[i], end="")
    print()

现实中, 你应当选择内建函式来处理复杂流程. 这里, 函式 zip() 就非常好用.

>>> list(zip(*mat))
[(1, 4, 7), (2, 5, 8), (3, 6, 9)]

参见参数列表解包了解本行中星号的详细内容.

`del` 语句¶

这有一种通过给定索引而不是值, 来删除列表中项的方法: 用 del 语句. 它与返回一个值的 pop() 方法不同. del 语句也可以移除列表中的切片, 或者清除整个列表 (之前我们通过给切片赋值为空列表来完成这点). 例如:

>>> a = [-1, 1, 66.25, 333, 333, 1234.5]
>>> del a[0]
>>> a
[1, 66.25, 333, 333, 1234.5]
>>> del a[2:4]
>>> a
[1, 66.25, 1234.5]
>>> del a[:]
>>> a
[]

del 也可以用于删除变量实体:

>>> del a

在这之后引用 a 的话会产生一个错误 (至少到给它赋另一个值之前). 我们将在后面找到 del 的其它用法.

元组和序列¶

我们看到列表和字串有很多通用的属性, 例如索引和切片操作. 它们是序列数据类型的两个例子 (参考 Sequence Types — str, bytes, bytearray, list, tuple, range). Python 作为一门进化中的语言, 可能还有其它序列类型会被加入. 这里就有另一种标准序列数据类型: 元组 .

元组由若干逗号分隔的值组成, 例如:

>>> t = 12345, 54321, 'hello!'
>>> t[0]
12345
>>> t
(12345, 54321, 'hello!')
>>> # Tuples may be nested:
... u = t, (1, 2, 3, 4, 5)
>>> u
((12345, 54321, 'hello!'), (1, 2, 3, 4, 5))

如你所见, 元组输出时用圆括号包围, 以便正确表达元组的嵌套; 在输入时圆括号可加可不加, 不过圆括号经常是必要的 (特别是当元组是更大的表达式的一部分时).

元组有许多用途. 例如: (x, y) 坐标对, 数据库里的员工记录等. 元组同字串都是不可变的: 无法对元组指定项进行赋值 (尽管可通过切片和连接来模拟这个操作). 元组中可以包含可变的对象, 如列表.

构造包含 0 或 1 个项的元组是个特殊问题: 语法上为了适应这一情况,有些额外的规则. 空元组由一对空的圆括号构造; 一个项的元组由一个值后面跟着一个逗号构造 (把一个值放入一对圆括号里并不足以构造一个元组). 丑陋, 但有效. 例如:

>>> empty = ()
>>> singleton = 'hello',    # <-- 注意后面的逗号
>>> len(empty)
0
>>> len(singleton)
1
>>> singleton
('hello',)

语句 t = 12345, 54321, 'hello!' 是 元组打包 的一个例子: 值 12345, 54321 和 'hello!' 被打包进一个元组. 反过来, 这个操作也是可行的:

>>> x, y, z = t

这种对右侧任一序列的处理很合适称为 序列解包 . 序列解包时要求等号左边的值个数与右边序列元素个数相等. 注意, 多重赋值其实是联合使用了元组打包和序列解包. (虽然元组和列表都算序列,但是必须有所不同)

集合(Set)¶

Python 还包含了 集合(set) 数据类型. 集合是种无序不重复的元素集. 基本用途包括成员关系测试和重复条目消除. 集合对象也支持合(union),交(intersection), 差(difference), 和对称差(sysmmetric difference)等数学操作.

花括号或函式 set() 可用于创建集合. 注意: 创建一个空集合只能使用 set(), 而不能使用 {}; 后者是创建一个空字典, 字典我们会在下一节里讨论.

以下是简明示范:

>>> basket = {'apple', 'orange', 'apple', 'pear', 'orange', 'banana'}
>>> print(basket)                      # 重复的被移除了
{'orange', 'banana', 'pear', 'apple'}
>>> 'orange' in basket                 # 快速成员关系测试
True
>>> 'crabgrass' in basket
False

>>> # 在两个单词的不重复的字母里演示集合操作
...
>>> a = set('abracadabra')
>>> b = set('alacazam')
>>> a                                  # a 中的不重复字母
{'a', 'r', 'b', 'c', 'd'}
>>> a - b                              # a 中有而 b 中没有的字母
{'r', 'd', 'b'}
>>> a | b                              # 既有 a 的字母又有 b 的字母
{'a', 'c', 'r', 'd', 'b', 'm', 'z', 'l'}
>>> a & b                              # a 和 b 中都有的字母
{'a', 'c'}
>>> a ^ b                              # a 或 b 中只有一个有的字母
{'r', 'd', 'b', 'm', 'z', 'l'}

就像列表, 集合也支持推导式:

>>> a = {x for x in 'abracadabra' if x not in 'abc'}
>>> a
{'r', 'd'}

字典¶

Python 中另一很有用的内建数据类型为字典 (参考 Mapping Types — dict). 在其它语言中字典一般被叫做 “关联存储” 或 “关联数组”. 与使用某个范围作为索引的序列不一样, 字典通过键来索引, 而键可以是任意不可变类型; 通常用字符串和数字作为键. 如果元组只包含字符串和数字, 元组也可以作为键; 但是, 当元组直接或间接地包含可变对象时, 就不能用作一个键. 不能使用列表作为键, 因为列表可以通过索引, 切片, 或如 append() 和 extend() 方法原地赋值而被改变.

最好把字典看成是一个没有顺序的 键:值 对集合, 键必须是唯一的 (在一个字典里). 一对花括号创建一个空字典: {}. 在括号中间放置的以逗号分隔的 键:值对 列表就是字典的初始 键:值对. 这也是字典输出时的格式.

字典最主要的操作是通过某键存储一个值, 以及从给定的键里提取它的值. 使用 del 可以删除一个键:值对. 如果你使用一个已被使用的键进行存储操作, 该键的旧值就没有了. 使用一个不存在的键提取值会产生一个错误.

在一个字典上执行 list(d.keys()) 返回该字典中所使用键的列表, 该列表的顺序不确定 (如果需要有序, 只要使用 sorted(d.keys())). [1] 要检查某一个键是否在字典里, 使用 in 关键字.

这是一个使用字典的小例子:

>>> tel = {'jack': 4098, 'sape': 4139}
>>> tel['guido'] = 4127
>>> tel
{'sape': 4139, 'guido': 4127, 'jack': 4098}
>>> tel['jack']
4098
>>> del tel['sape']
>>> tel['irv'] = 4127
>>> tel
{'guido': 4127, 'irv': 4127, 'jack': 4098}
>>> list(tel.keys())
['irv', 'guido', 'jack']
>>> sorted(tel.keys())
['guido', 'irv', 'jack']
>>> 'guido' in tel
True
>>> 'jack' not in tel
False

构造器 dict() 从键-值对序列里直接生成字典,如果有固定的模式,可在列表推导式指定特定的键值对:

>>> dict([('sape', 4139), ('guido', 4127), ('jack', 4098)])
{'sape': 4139, 'jack': 4098, 'guido': 4127}
>>> dict([(x, x**2) for x in (2, 4, 6)])     # 使用列表推导式
{2: 4, 4: 16, 6: 36}

在本教程后面的章节, 会学习到生成器表达式, 这更适于为 dict() 构造器生成键-值对序列. 若键为字符串, 有时用关键字参数指定键-值对更为简单:

>>> dict(sape=4139, guido=4127, jack=4098)
{'sape': 4139, 'jack': 4098, 'guido': 4127}

遍历技巧¶

当对字典遍历时, 可用 items() 方法同时取回键和对应的值.

>>> knights = {'gallahad': 'the pure', 'robin': 'the brave'}
>>> for k, v in knights.items():
...     print(k, v)
...
gallahad the pure
robin the brave

对序列遍历时, 可以使用 enumerate() 函式来同时取回位置索引和相应的值.

>>> for i, v in enumerate(['tic', 'tac', 'toe']):
...     print(i, v)
...
0 tic
1 tac
2 toe

同时对两个或更多的序列进行遍历时, 可用 zip() 进行组合

>>> questions = ['name', 'quest', 'favorite color']
>>> answers = ['lancelot', 'the holy grail', 'blue']
>>> for q, a in zip(questions, answers):
...     print('What is your {0}?  It is {1}.'.format(q, a))
...
What is your name?  It is lancelot.
What is your quest?  It is the holy grail.
What is your favorite color?  It is blue.

反向遍历序列时, 先指定这个序列, 然后调用 reversed() 函式

>>> for i in reversed(range(1, 10, 2)):
...     print(i)
...
9
7
5
3
1

想有序地遍历一个序列, 用 sorted() 函式返回排序后的序列,原序列将不被触及

>>> basket = ['apple', 'orange', 'apple', 'pear', 'orange', 'banana']
>>> for f in sorted(set(basket)):
...     print(f)
...
apple
banana
orange
pear

深入条件控制¶

在 while 和 if 语句中使用的条件可以包含任何操作符, 而不仅仅是比较.

比较操作符 in 和 not in 检查一个值是否在序列中.
操作符 is 和 is not 比较两个对象是否为同一对象; 这只对诸如列表的可变对象有用.

所有比较操作符具有相同的优先级, 低于所有的数值操作.

比较操作符可以连起来使用. 例如, - a < b == c 测试 a 小于 b 且 b 与 c 相等.

比较操作(或其它任何布尔表达式)都能用逻辑操作符 and 和 or 连接, 结果值可以用 not 取反.

逻辑操作符的优先级又低于比较操作符;
这其中, not 优先级最高, 而 or 的优先级最低, 因此 A and not B or C 等价于 (A and (not B)) or C . 同样, 可以使用圆括号来表达想要的结果.
逻辑操作符 and 和 or 被称为短路操作符: 它从左至右计算参数,并且当结果确定时计算就立即停止.
- 例如, 如果 A 和 C 为真, 而 B 为假时, A and B and C 不会计算表达式 C.
- 当把短路操作符的返回值作为一个常规值而不是布尔值时, 它的值就是最后计算的参数值.

可以把比较式或其它逻辑表达式的值赋给一个变量. 例如,

>>> string1, string2, string3 = '', 'Trondheim', 'Hammer Dance'
>>> non_null = string1 or string2 or string3
>>> non_null
'Trondheim'

注意, 在 Python 中, 不像 C, 赋值不可以发生在表达式内部. C 程序员可能对此有抱怨, 但是这样就避免了 C 程序中常见的一类错误,比如说:

在使用 == 的表达式里键入了 = .

序列和其它类型的比较¶

序列对象可以与同一类型的其它对象比较. 使用 字典编纂 顺序比较:

首先比较头两项, 如果它们不同, 它们的比较就决定整个比较的结果;
如果它们相同, 就比较下两项, 就这样直到其中有序列被比较完了.
如果要被比较的两项本身就是相同类型的序列, 那么就递归进行比较.
如果两个序列所有的项都相等, 那么, 它们就相等.
如果一个序列是另一个序列的初始子序列(initial sub-sequence),那么短的就是较小的.
字符串的 字典编纂 顺序由单个字符的 Unicode 字码来决定.

以下是比较相同类型序列的例子:

(1, 2, 3)              < (1, 2, 4)
[1, 2, 3]              < [1, 2, 4]
'ABC' < 'C' < 'Pascal' < 'Python'
(1, 2, 3, 4)           < (1, 2, 4)
(1, 2)                 < (1, 2, -1)
(1, 2, 3)             == (1.0, 2.0, 3.0)
(1, 2, ('aa', 'ab'))   < (1, 2, ('abc', 'a'), 4)

注意, 使用 < 或 > 比较两个不同类型的对象有时候是合法的, 条件是它们要有合适的比较方法. 例如, 不同的数字类型可以按照它们的数字大小来比较, 因此 0 等于 0.0, 等等. 否则, 解释器不会提供一个任意的顺序, 而会抛出一个 TypeError 异常.

Footnotes

[1]	调用 `d.keys()` 将返回一个 dictionary view 对象. 它支持类似成员关系测试以及迭代操作, 但是它的内容不是独立于原始字典的 – 它只是一个视图.

See also

(^.^)

原文: http://docs.python.org/py3k/tutorial/datastructures.html
初译: 刘鑫
精译: DocsPy3zh
校对: Zoom.Quiet
复审:

模块¶

如果你从 Python 解释器退出后再重新进入, 那么你之前定义的所有 (函式和变量) 都将丢失. 因此, 如果你想写一个更长的程序, 你最好离线地使用文本编辑器保存成文件,替代解释器的输入来运行. 这称作创建一个脚本 . 当你的程序变得更长, 你可能想把它分割成几个文件以能够更简单地维护. 你也许还想在几个下同的程序里使用写过的程序, 而不用把一坨代码拷来拷去.

为此 Python 提供了方法, 能使用户把定义存放在文件里, 同时又能在脚本或交互式环境下方便的使用它们. 这样的文件称为 模块 ; 一个 模块 中的定义可以 导入(import) 到另一个模块或主模块 ( 主模块是执行脚本的最上层或计算模式下的一组可访问变量的集合).

模块就是包含 Python 定义和语句的文件. 文件的名字就是这个模块名再加上 .py. 在一个模块中, 模块的名字 (一个字符串) 可以通过全局变量 __name__ 得到. 例如, 使用你最喜欢的文档编辑器在当前目录下创建一个名为 fibo.py 的文件, 并输入以下内容:

# Fibonacci 数列模块

def fib(n):    # 打印小于 n 的 Fibonacci 数列
    a, b = 0, 1
    while b < n:
        print(b, end=' ')
        a, b = b, a+b
    print()

def fib2(n): # 返回小于 n 的 Fibonacci 数列
    result = []
    a, b = 0, 1
    while b < n:
        result.append(b)
        a, b = b, a+b
    return result

现在打开 Python 解释器并通过以下命令 导入(import) 这个模块:

>>> import fibo

这样并不会把 fibo 中定义的函式名 导入(import) 到当前的符号表里; 它只导入了模块名 fibo. 你可以使用模块名来访问函式:

>>> fibo.fib(1000)
1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
>>> fibo.fib2(100)
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
>>> fibo.__name__
'fibo'

如果你要经常使用一个函式的话, 可以把它赋给一个局部变量:

>>> fib = fibo.fib
>>> fib(500)
1 1 2 3 5 8 13 21 34 55 89 144 233 377

深入模块¶

模块不仅包含函式定义, 还可以包含可执行的语句. 这些语句一般用以初始化模块. 他们仅在模块第一次被导入时才被执行. [1]

每个模块有其私有的符号表, 由模块内部定义的函式当成全局符号表来使用. 因此, 模块的作者可以在模块中放胆使用全局变量而无需担心与用户的全局变量发生冲突. 另一方面, 当你确实明白你在做什么的时候, 你可以通过 modname.itemname 形式来访问模块的全局变量.

模块中可以导入其它的模块. 习惯上把 import 语句放在一个模块 (或者脚本, ) 的最开始, 当然这只是惯例不是强制的. 被导入模块的名称被放入当前模块的全局符号表里.

import 语句这有一种不同用法, 它可以直接把一个模块内(函式,变量)名称导入当前模块符号表里. 例如:

>>> from fibo import fib, fib2
>>> fib(500)
1 1 2 3 5 8 13 21 34 55 89 144 233 377

这样不会导入相应的模块名 (在这个例子里, fibo 并没有被定义).

还有一种方法可一次性导入模块中所有的名字定义:

>>> from fibo import *
>>> fib(500)
1 1 2 3 5 8 13 21 34 55 89 144 233 377

这样可以导入除以下划线开头 (_) 的所有名字. 多数情况中, Python 程序员不使用这个窍门, 因为它导入了一些未知的名字到解释器里, 因此可能会意外重载一些你已经定义的东西.

注意: 在一般的实践中, 导入 * 是不好的, 因为它常常产生难以阅读的代码. 然而, 在一个交互式会话里使用它可以节省键入.

Note

因为效率的原因, 每个模块在每个解释器会话中只被导入一次. 一旦你修订了你的模块, 就需要重启解释器 —- 或者, 若你只是想交互式地测试一个模块, 使用 imp.reload(), 例如 import imp; imp.reload(modulename).

把模块当脚本执行¶

当你以如下方式运行一个 Python 模块时

python fibo.py <arguments>

模块中的代码就会被执行, 就像被导入时一样, 但 __name__ 被设为 "__main__" . 这就意味着通过在模块最后加入以下代码:

if __name__ == "__main__":
    import sys
    fib(int(sys.argv[1]))

就能够把这个文件既当成脚本使用, 也可以当成可导入的模块使用, 因为解析命令行的代码只当模块被当成 “主(main)” 时才被直接运行:

$ python fibo.py 50
1 1 2 3 5 8 13 21 34

如果模块被导入, 这段代码并不执行:

>>> import fibo
>>>

模块常通过这种形式提供一些方便的用户接口, 或用于测试 (把模块当脚本执行一个测试套件).

模块搜索路径¶

当名为 spam 的模块导入时, 解释器会先从内建模块尝试匹配. 如果没找到, 则将在 sys.path 记录的所有目录中搜索 spam.py 文件. 而 sys.path 则由以下场景声明:

包含导入脚本的目录 (即当前目录)
PYTHONPATH (一个目录列表,其形式同shell 变量 PATH 的语法).
安裝时定义的默认目录.

Python程序初始化后, 依然可以修改 sys.path . 该目录包含运行的脚本放置在搜索路径的开始, 标准库的路径之前. 这意味着, 在当前目录中的脚本将被优先加载, 所以如果同系统模块有重名现象时, 这通常会抛出一个错误.

参看标准模块小节获取更多信息.

“已编译” 的 Python 文件¶

为了减少使用了大量标准模块的小程序的启动时间, 如果 spam.py 所在目录下中有名为 spam.pyc 的文件, 解释器就会优先导入 spam 模块的这一 “已编译字节” 版本文件. 用来创建 spam.pyc 的 spam.py 的版本修改时间被记录在 spam.pyc 中, 如果不匹配的话, .pyc 文件就会被忽略.

一般, 你无需特意做什么来创建 spam.pyc 文件. 每次 spam.py 被成功编译后, 都会尝试把结果写入到 spam.pyc. 这时有任何问题,并不会抛出错误; 如果因某些原因导致这文件没有被完全的被写入, 那么产生的 spam.pyc 文件会被辨别出是无效的, 从而在导入时被忽略. spam.pyc 文件的内容是平台无关的, 因此, 一个 Python 模块目录可以在不同的体系架构中共享.

给专家的小技巧:

当使用 -O 参数来调用Python 解释器时, python会对代码进行优化,并存入在 .pyo 文件里. 当前优化器仅仅只是移除了 assert 语句. 当使用 -O 时, 所有 bytecode 都被优化了; 所有 .pyc 文件被忽略, 而 .py 文件被编译为优化的字节码.
传递两个 -O 参数到 Python 解释器 (-OO) 会使编译器对字节码进一步优化, 而该步骤在极少的情况下会产生发生故障的程序. 一般地, 只是将 __doc__ 字符串被从字节码中移除, 以产生更为紧凑的 .pyo 文件. 因为有些程序可能依赖于这些, 因此, 建议只有当你真正明确这意味着什么时,才使用这个选项.
程序从 .pyc 或 .pyo 文件里读取时, 并不会比它从 .py 文件中读取会有更快的执行速度; 唯一提高的是载入速度.
在在命令行中直接调用脚本运行时, 编译后的字节码不会被写入 .pyc 或 .pyo 文件. 因此, 通过移动该脚本的大量代码到一个模块, 并由一个小的引导脚本来导入这个模块, 可能减少这个脚本的启动时间. 也可以直接在命令行里直接命名一个 .pyc 或 .pyo 文件.
对于同一个模块, 可以只包含 spam.pyc (或者 spam.pyo 当使用 -O 时) 文件而无需 spam.py 文件. 使用这种形式可用以发布 Python代码库, 并使得反编译工程有一定的难度.

模块 compileall 可以为一个目录下的所有模块创建 .pyc 文件 (或 .pyo 文件, 当使用 -O 时).

标准模块¶

Python 本身带有一个标准库, 有专门文档: Python 库参考 (以后简称 “库参考”)进行介绍.

有些模块内建到了解释器中; 有些操作尽管并不是语言核心的一部分, 但是通过模块内建提供后,执行效率不错, 包含操作系统的一些基本访问, 例如系统调用.

这种模块能根据不同的操作系统进行专门配置, 例如, winreg 模块只在 Windows 系统中提供. 有一个特别的模块需要特别注意: sys, 它内建于每个 Python 解释器. 其中变量 sys.ps1 和 sys.ps2 定义了用于主和次提示符的字符串:

>>> import sys
>>> sys.ps1
'>>> '
>>> sys.ps2
'... '
>>> sys.ps1 = 'C> '
C> print('Yuck!')
Yuck!
C>

只有解释器在交互模式下运行时,这两个变量才有定义.

变量 sys.path 是一个字符串列表, 它为解释器指定了模块的搜索路径. 它通过环境变量 PATHONPATH 初始化为一个默认路径, 当没有设置 PYTHONPATH 时, 就使用内建默认值来初始化. 你可以通过标准列表操作来修订之:

>>> import sys
>>> sys.path.append('/ufs/guido/lib/python')

`dir()` 函式¶

内建函式 dir() 用于找出一个模块里定义了那些名字. 它返回一个有序字串列表:

>>> import fibo, sys
>>> dir(fibo)
['__name__', 'fib', 'fib2']
>>> dir(sys)
['__displayhook__', '__doc__', '__excepthook__', '__name__', '__stderr__',
 '__stdin__', '__stdout__', '_getframe', 'api_version', 'argv',
 'builtin_module_names', 'byteorder', 'callstats', 'copyright',
 'displayhook', 'exc_info', 'excepthook',
 'exec_prefix', 'executable', 'exit', 'getdefaultencoding', 'getdlopenflags',
 'getrecursionlimit', 'getrefcount', 'hexversion', 'maxint', 'maxunicode',
 'meta_path', 'modules', 'path', 'path_hooks', 'path_importer_cache',
 'platform', 'prefix', 'ps1', 'ps2', 'setcheckinterval', 'setdlopenflags',
 'setprofile', 'setrecursionlimit', 'settrace', 'stderr', 'stdin', 'stdout',
 'version', 'version_info', 'warnoptions']

不给参数时, dir() 就罗列出当前已定义的所有名字.

>>> a = [1, 2, 3, 4, 5]
>>> import fibo
>>> fib = fibo.fib
>>> dir()
['__builtins__', '__doc__', '__file__', '__name__', 'a', 'fib', 'fibo', 'sys']

注意, 它列举出了所有类型的名字: 变量, 模块, 函式, 等等.

dir() 并不列出内建函式和变量的名字. 如果你真心想看一下, 可以直接查询标准模块 buildin

>>> import builtins
>>> dir(builtins)

['ArithmeticError', 'AssertionError', 'AttributeError', 'BaseException', 'Buffer
Error', 'BytesWarning', 'DeprecationWarning', 'EOFError', 'Ellipsis', 'Environme
ntError', 'Exception', 'False', 'FloatingPointError', 'FutureWarning', 'Generato
rExit', 'IOError', 'ImportError', 'ImportWarning', 'IndentationError', 'IndexErr
or', 'KeyError', 'KeyboardInterrupt', 'LookupError', 'MemoryError', 'NameError',
 'None', 'NotImplemented', 'NotImplementedError', 'OSError', 'OverflowError', 'P
endingDeprecationWarning', 'ReferenceError', 'RuntimeError', 'RuntimeWarning', '
StopIteration', 'SyntaxError', 'SyntaxWarning', 'SystemError', 'SystemExit', 'Ta
bError', 'True', 'TypeError', 'UnboundLocalError', 'UnicodeDecodeError', 'Unicod
eEncodeError', 'UnicodeError', 'UnicodeTranslateError', 'UnicodeWarning', 'UserW
arning', 'ValueError', 'Warning', 'ZeroDivisionError', '__build_class__', '__deb
ug__', '__doc__', '__import__', '__name__', '__package__', 'abs', 'all', 'any',
'ascii', 'bin', 'bool', 'bytearray', 'bytes', 'chr', 'classmethod', 'compile', '
complex', 'copyright', 'credits', 'delattr', 'dict', 'dir', 'divmod', 'enumerate
', 'eval', 'exec', 'exit', 'filter', 'float', 'format', 'frozenset', 'getattr',
'globals', 'hasattr', 'hash', 'help', 'hex', 'id', 'input', 'int', 'isinstance',
 'issubclass', 'iter', 'len', 'license', 'list', 'locals', 'map', 'max', 'memory
view', 'min', 'next', 'object', 'oct', 'open', 'ord', 'pow', 'print', 'property'
, 'quit', 'range', 'repr', 'reversed', 'round', 'set', 'setattr', 'slice', 'sort
ed', 'staticmethod', 'str', 'sum', 'super', 'tuple', 'type', 'vars', 'zip']

包¶

包是一种 Python 模块命名空间的组织方法, 通过使用 “带点号的模块名”. 例如, 模块名 A.B 指定了一个名为 A 的包里的一个名为 B 的子模块. 就像模块的使用使不同模块的作者避免担心其它全局变量的名字, 而带点号的模块使得多模块包, 例如 NumPy 或 Python 图像库, 的作者避免担心其它模块名.

假设你想设计一个模块集 (一个 “包”), 用于统一声音文件和声音数据的处理. 有许多不同的声音格式 (通常通过它们的后缀来辨认, 例如: .wave, .aiff, .au), 因此你可能需要创建和维护一个不断增长的模块集, 用以各种各样的文件格式间的转换. 还有许多你想对声音数据执行的不同操作 (例如混频, 增加回音, 应用一个均衡器功能, 创建人造的立体声效果), 因此, 你将额外的写一个永无止尽的模块流来执行这些操作. 这是你的包的一个可能的结构:

sound/                          顶级包
      __init__.py               初始化这个声音包
      formats/                  文件格式转换子包
              __init__.py
              wavread.py
              wavwrite.py
              aiffread.py
              aiffwrite.py
              auread.py
              auwrite.py
              ...
      effects/                  音效子包
              __init__.py
              echo.py
              surround.py
              reverse.py
              ...
      filters/                  过滤器子包
              __init__.py
              equalizer.py
              vocoder.py
              karaoke.py
              ...

当导入这个包时, Python 搜索 sys.path 上的目录以寻找这个包的子目录.

需要 __init__.py 文件来使得 Python 知道这个目录包含了包; 这用来预防名字为一个通用名字, 如 string, 的目录以外地隐藏了在模块搜索路径靠后的正当的模块. 在最简单的例子里, __init__.py 可以就是个空文件, 但它也可以为这个包执行初始化代码, 或者设置 __all__ 变量, 在后面描述.

包的用户可以包里的单独的模块, 例如:

import sound.effects.echo

这载入里 sound.effects.echo 子模块. 一定要使用全名来引用它.

sound.effects.echo.echofilter(input, output, delay=0.7, atten=4)

导入子模块的一个替代方法是:

from sound.effects import echo

这样也载入 echo 子模块, 并且可以不加包前缀地使用, 因此可以如下地使用:

echo.echofilter(input, output, delay=0.7, atten=4)

另一个变种是直接导入想要的函式或变量:

from sound.effects.echo import echofilter

再一次, 载入了 echo 子模块, 但是使它的函式 echofilter() 可以直接使用.

注意, 当使用 from package import item 时, 这个项即可以是这个包的一个子模块 (或子包), 也可以是其它的定义在这个包里的名字, 如函式, 类或变量. import 语句首先测试这个项是否在包里定义; 如果没有, 就假设它是一个模块并试图载入它. 如果寻找它失败, 就会抛出一个 ImportError.

相反地, 当使用 import item.subitem.subsubitem 时, 除最后的每一项都必须是包; 最后一项可以是模块或包, 但不能是在之前项中定义的类, 函式或变量.

从包中导入 *¶

当开发者写下 from sound.effects import * 会发生什么?

理想地, 我们期望程序会以以某种方法进入文件系统, 寻找在指定的包文件中,找到所有子模块, 并把它们全部导入. 这可能花费很长的时间, 而且对子模块进行显式导入时,还可能引发非期待的副作用!

~ 这些副作用可能是仅仅发生在显式导入子模块的时候才会发生的那些

对于包作者, 唯一解决方案是提供包的显式索引. import 语句有以下约定:

如果一个包的 __init__.py 代码定义了一个名为 __all__ 的列表,
当遇到 from package import * 时, 它被用来作为导入的模块名字的列表.

是否在发布包的新版本时保持这个列表的更新取决于包的作者. 包作者也可能决定不支持它, 如果他们没有发现从他们的包里导入 * 的用途. 例如, 文件 sound/effects/__init__.py 可能包含如下代码

__all__ = ["echo", "surround", "reverse"]

这意味这 from sound.effects import * 将导入 sound 中这几个名字的子模块.

如果 __all__ 没有被定义, from sound.effects import * 语句不把包 sound.effects 中所有的子模块都导入到当前命名空间里; 它只能确保包 sound.effects 被导入了 (可能同时运行在 __init__.py` 里的一些初始化代码), 并随后导入包中定义的任何名字. 这包含任何在 __init__.py 定义的任何名字 (和显式载入的子模块). 它还包含通过前面的 import 语句显式载入的包的子模块. 考虑这段代码:

import sound.effects.echo
import sound.effects.surround
from sound.effects import *

在这个例子中, 模块 echo 和 surround 被导入到当前命名空间, 因为当执行 from...import 语句时它们就被定义在包 sound.effects 里. (当定义 __all__ 定义时, 这也会工作.)

虽然有些模块被设计成当使用 import * 时仅导出遵循特定模式的名称, 但是在产品代码中仍然感觉算糟糕实践.

记住, 使用 from Package import specific_submodule 是没有问题的! 事实上, 这是推荐的用法, 只在以下情况例外: 正在导入的模块需要使用的子模块与其他包中的子模块具有相同的名字.

内部包参考¶

当包被构造到子包时 (如例子中的 sound 包), 你可以独立地导入来获取兄弟包的子模块的引用. 例如, 如果模块 sound.filters.vocoder 需要使用 sound.effects 包下的 echo 模块, 就可以使用 from sound.effects import echo.

你还可以使用相对导入, 通过 import 语句的 from module import name 格式. 这些导入使用句点来表明涉及这次相对导入的当前包和父包. 从例子中的 surround , 您可以使用:

from . import echo
from .. import formats
from ..filters import equalizer

注意, 相对导入基于当前模块的名字. 因为主模块的名字总是 "__main__", 有意用作一个 Python 程序的主模块的模块必须总使用 绝对导入 .

多目录的包¶

包支持额外一个特殊的属性, __path__ . 它在文件中的代码执行之前, 被初始化为一个列表, 它包含保存在这个包的 __init__.py 文件中目录名. 这个变量可以被更改; 这样做会影响以后对包中模块和子包的搜索.

虽然这个特性不经常需要, 但它可以用于扩展在一个包里发现的模块的集合.

Footnotes

[1]	实际上, 函式定义也是 ‘被执行’ 的 ‘语句’; 模块级函式的执行让函式名进入这个模块的全局变量表.

See also

(^.^)

原文: http://docs.python.org/py3k/tutorial/modules.html
初译: 刘鑫
精译: DocsPy3zh
校对: Zoom.Quiet
复审:

输入和输出¶

有多种方式可以展现一个程序的输出; 数据可以以一种可读的形式输出, 或者是保存于一个文件便于以后使用. 本章就将讨论几种可能.

输出格式美化¶

至今为止, 我们知道了两种输出值的方式: 表达式语句 和 print() 函数. (第三种方式是使用文件对象的 write() 方法; 标准输出文件可以用 sys.stdout 引用. 参考库手册了解更多的信息.)

一般来说你会希望更多的控制其输出格式, 而不是简单的以空格分割. 有两种方式格式化你的输出; 第一种方式是由你自己控制; 使用字符串切片和连接操作, 来实现你所想象的外观. 标准模块 string 包含了一些有用的操作, 用以填充字符串至某一给定的宽度; 很快就会讨论这些. 第二种方式是使用 str.format() 方法.

string 模块包含了一个类模板, 提供了另一种替换字符串的方式.

还有一个问题, 当然了: 如何把值转成字符串? 幸运的是, Python 有多种方式将任何值转为字符串: 将它传给 repr() 或 str() 函数.

str() 函数意味着返回一个用户易读的表达形式, 而 repr() 则意味着产生一个解释器易读的表达形式 (或者如果没有这样的语法会给出 SyntaxError ). 对于那些没有特殊表达的对象, str() 将会与 repr() 返回相同的值. 很多的值, 如数字或一些如列表和字典那样的结构, 使用这两个函数的结果完全一致. 字符串与浮点型则有两种不同的表达.

例如:

>>> s = 'Hello, world.'
>>> str(s)
'Hello, world.'
>>> repr(s)
"'Hello, world.'"
>>> str(1.0/7.0)
'0.142857142857'
>>> repr(1.0/7.0)
'0.14285714285714285'
>>> x = 10 * 3.25
>>> y = 200 * 200
>>> s = 'The value of x is ' + repr(x) + ', and y is ' + repr(y) + '...'
>>> print(s)
The value of x is 32.5, and y is 40000...
>>> # The repr() of a string adds string quotes and backslashes:
... hello = 'hello, world\n'
>>> hellos = repr(hello)
>>> print(hellos)
'hello, world\n'
>>> # The argument to repr() may be any Python object:
... repr((x, y, ('spam', 'eggs')))
"(32.5, 40000, ('spam', 'eggs'))"

这里有两种方式输出一个平方与立方的表:

>>> for x in range(1, 11):
...     print(repr(x).rjust(2), repr(x*x).rjust(3), end=' ')
...     # Note use of 'end' on previous line 注意前一行 'end' 的使用
...     print(repr(x*x*x).rjust(4))
...
 1    1
 4    8
 9   27
16   64
25  125
36  216
49  343
64  512
81  729
100 1000

>>> for x in range(1, 11):
...     print('{0:2d} {1:3d} {2:4d}'.format(x, x*x, x*x*x))
...
 1    1
 4    8
 9   27
16   64
25  125
36  216
49  343
64  512
81  729
100 1000

(注意在第一个例子中, 每列间的空格是由 print() 添加的: 它总会在每个参数后面加个空格.)

这个例子展示了字符串对象的 rjust() 方法, 它可以将字符串靠右, 并在左边填充空格. 还有类似的方法, 如 ljust() 和 center(). 这些方法并不会写任何东西, 它们仅仅返回新的字符串. 如果输入很长, 它们并不会对字符串进行截断, 仅仅返回没有任何变化的字符串; 这虽然会影响你的布局, 但是这一般比截断的要好. (如果你的确需要截断, 那么就增加一个切片的操作, 如 x.ljust(n)[:n].)

有另一个方法, zfill(), 它会在数字的左边填充 0. 它知道正负号:

>>> '12'.zfill(5)
'00012'
>>> '-3.14'.zfill(7)
'-003.14'
>>> '3.14159265359'.zfill(5)
'3.14159265359'

str.format() 的基本使用如下:

>>> print('We are the {} who say "{}!"'.format('knights', 'Ni'))
We are the knights who say "Ni!"

括号及其里面的字符 (称作 format field) 将会被 format() 中的参数替换. 在括号中的数字用于指向传入对象在 format() 中的位置.

>>> print('{0} and {1}'.format('spam', 'eggs'))
spam and eggs
>>> print('{1} and {0}'.format('spam', 'eggs'))
eggs and spam

如果在 format() 中使用了关键字参数, 那么它们的值会指向使用该名字的参数.

>>> print('This {food} is {adjective}.'.format(
...       food='spam', adjective='absolutely horrible'))
This spam is absolutely horrible.

位置及关键字参数可以任意的结合:

>>> print('The story of {0}, {1}, and {other}.'.format('Bill', 'Manfred',
                                                       other='Georg'))
The story of Bill, Manfred, and Georg.

'!a' (使用 ascii()), '!s' (使用 str()) 和 '!r' (使用 repr()) 可以用于在格式化某个值之前对其进行转化:

>>> import math
>>> print('The value of PI is approximately {}.'.format(math.pi))
The value of PI is approximately 3.14159265359.
>>> print('The value of PI is approximately {!r}.'.format(math.pi))
The value of PI is approximately 3.141592653589793.

可选项 ':' 和格式标识符可以跟着 field name. 这就允许对值进行更好的格式化. 下面的例子将 Pi 保留到小数点后三位.

>>> import math
>>> print('The value of PI is approximately {0:.3f}.'.format(math.pi))
The value of PI is approximately 3.142.

在 ':' 后传入一个整数, 可以保证该域至少有这么多的宽度. 用于美化表格时很有用.

>>> table = {'Sjoerd': 4127, 'Jack': 4098, 'Dcab': 7678}
>>> for name, phone in table.items():
...     print('{0:10} ==> {1:10d}'.format(name, phone))
...
Jack       ==>       4098
Dcab       ==>       7678
Sjoerd     ==>       4127

如果你有一个的确很长的格式化字符串, 而你不想将它们分开, 那么在格式化时通过变量名而非位置会是很好的事情. 最简单的就是传入一个字典, 然后使用方括号 '[]' 来访问键值 :

>>> table = {'Sjoerd': 4127, 'Jack': 4098, 'Dcab': 8637678}
>>> print('Jack: {0[Jack]:d}; Sjoerd: {0[Sjoerd]:d}; '
          'Dcab: {0[Dcab]:d}'.format(table))
Jack: 4098; Sjoerd: 4127; Dcab: 8637678

这也可以通过在 table 变量前使用 ‘**’ 来实现相同的功能.

>>> table = {'Sjoerd': 4127, 'Jack': 4098, 'Dcab': 8637678}
>>> print('Jack: {Jack:d}; Sjoerd: {Sjoerd:d}; Dcab: {Dcab:d}'.format(**table))
Jack: 4098; Sjoerd: 4127; Dcab: 8637678

在结合新的内置函数 vars() (这会以字典的形式返回所有的局部变量) 和这个时会特别有用.

要了解更多关于 str.format() 的知识, 参考 Format String Syntax.

旧式字符串格式化¶

% 操作符也可以实现字符串格式化. 它将左边的参数作为类似 sprintf() 式的格式化字符串, 而将右边的代入, 然后返回格式化后的字符串. 例如:

>>> import math
>>> print('The value of PI is approximately %5.3f.' % math.pi)
The value of PI is approximately 3.142.

因为 str.format() 很新, 大多数的 Python 代码仍然使用 % 操作符. 但是因为这种旧式的格式化最终会从该语言中移除, 应该更多的使用 str.format().

更多的信息可以在 Old String Formatting Operations 中找到.

读和写文件¶

open() 将会返回一个 file object, 并且一般使用两个参数进行使用: open(filename, mode).

>>> f = open('/tmp/workfile', 'w')

第一个参数是包含文件名的字符串. 第二个参数是另一个字符串, 包含描述文件如何使用的字符. mode 可以是 'r' 如果文件只读, 'w' 只用于写 (如果存在同名文件则将被删除), 和 'a' 用于追加文件内容; 所写的任何数据都会被自动增加到末尾. 'r+' 同时用于读写. mode 参数是可选的; 'r' 将是默认值.

一般而言, 文件以 text mode 打开, 这就意味着, 从文件中读写的字符串, 是以一种特定的编码进行编码 (默认的是 UTF-8). 追加到 mode 后的 'b' , 将意味着以 binary mode 打开文件: 现在的数据是以字节对象的形式进行读写. 这个模式应该用于那些不包含文本的文件.

在文本模式下 (text mode), 默认是将特定平台的行末标识符 ( Unix 下为 \n, Windows 下为 \r\n ) 在读时转为 \n 而写时将 \n 转为特定平台的标识符. 这种隐藏的行为对于文本文件是没有问题的, 但是对于二进制数据像 JPEG 或 EXE 是会出问题的. 在使用这些文件时请小心使用二进制模式.

文件对象的方法¶

本节中剩下的例子假设已经创建了一个称为 f 的文件对象.

为了读取一个文件的内容, 调用 f.read(size), 这将读取一定数目的数据, 然后作为字符串或字节对象返回. size 是一个可选的数字类型的参数. 当 size 被忽略了或者为负, 那么该文件的所有内容都将被读取并且返回; 如果文件比你的内存大两倍, 那么就会成为你的问题了. 否则, 最多 size 字节将被读取并返回. 如果到达了文件的末尾, f.read() 将会返回一个空字符串 ('').

>>> f.read()
'This is the entire file.\n'
>>> f.read()
''

f.readline() 会从文件中读取单独的一行; 在每个字符串的末尾都会留下换行符 (\n), 除非是该文件的最后一行并且没有以换行符结束, 这个字符才会被忽略. 这就使结果很明确; f.readline() 如果返回一个空字符串, 那么文件已到底了, 而如果是以 '\n' 表示, 那么就是只包行一个新行.

>>> f.readline()
'This is the first line of the file.\n'
>>> f.readline()
'Second line of the file\n'
>>> f.readline()
''

f.readlines() 将返回该文件中包含的所有行. 如果给定一个可选参数 sizehint, 它就读取这么多字节, 并且将这些字节按行分割. 这经常用于允许按行读取一个大文件, 但是不需要载入全部的文件时非常有用. 只会返回完整的行.

>>> f.readlines()
['This is the first line of the file.\n', 'Second line of the file\n']

另一种方式是迭代一个文件对象然后读取每行. 这是内存有效, 快速, 并用最少的代码:

>>> for line in f:
...     print(line, end='')
...
This is the first line of the file.
Second line of the file

这个方法很简单, 但是并没有提供一个很好的控制. 因为两者的处理机制不同, 最好不要混用.

f.write(string) 将 string 写入到文件中, 然后返回写入的字符数.

>>> f.write('This is a test\n')
15

如果要写入一些不是字符串的东西, 那么将需要先进行转换:

>>> value = ('the answer', 42)
>>> s = str(value)
>>> f.write(s)
18

f.tell() 返回文件对象当前所处的位置, 它是从文件开头开始算起的字节数. 要改变文件当前的位置, 使用 f.seek(offset, from_what). 这个位置是通过将当前位置加上 offset 所得. from_what 的值, 如果是 0 表示开头, 如果是 1 表示当前位置, 2 表示文件的结尾. from_what 的默认为 0, 即从开头开始.

>>> f = open('/tmp/workfile', 'rb+')
>>> f.write(b'0123456789abcdef')
16
>>> f.seek(5)     # Go to the 6th byte in the file
5
>>> f.read(1)
b'5'
>>> f.seek(-3, 2) # Go to the 3rd byte before the end
13
>>> f.read(1)
b'd'

在文本文件中 (那些打开文件的模式下没有 b 的), 只会相对于文件起始位置进行定位, (如果要定文件的最后面, 要用 seek(0, 2) ).

当你处理完一个文件后, 调用 f.close() 会关闭它, 并释放系统的资源. 在调用完 f.close() 之后, 尝试使用那个文件对象是会失败的.

>>> f.close()
>>> f.read()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
ValueError: I/O operation on closed file

当处理一个文件对象时, 使用 with 关键字是非常好的方式. 在结束后, 它会帮你正确的关闭文件, 即使发生了异常. 而且写起来也比 try - finally 语句块要简短:

>>> with open('/tmp/workfile', 'r') as f:
...     read_data = f.read()
>>> f.closed
True

文件对象有些额外的方法, 如 isatty() 和 trucate(), 但它们都较少的使用; 更多的信息需要参考标准库手册.

`pickle` 模块¶

在文件中, 字符串可以很方便的读取写入. 数字可能稍微麻烦一些, 因为 read() 方法只返回字符串, 我们还需要将其传给 int() 这样的函数, 使其将如 '123' 的字符串转为数字 123. 但是, 如果要保存更复杂的数据类型, 如列表, 字典, 或者类的实例, 那么就会更复杂了.

为了让用户在时常的编程和测试时保存复杂的数据类型, Python 提供了标准模块, 称为 pickle. 这个模块可以将几乎任何的 Python 对象 (甚至是 Python 的代码), 转换为字符串表示; 这个过程称为 pickling. 而要从里面重新构造回原来的对象, 则称为 unpickling. 在 pickling 和 unpickling 之间, 表示这些对象的字符串表示, 可以存于一个文件, 也可以通过网络在远程机器间传输.

如果你有一个对象 x, 和一个已经打开并用于写的文件对象 f, pickle 这个对象最简单的方式就是使用:

pickle.dump(x, f)

有了 pickle 这个对象, 就能对 f 以读取的形式打开:

x = pickle.load(f)

(还有其他不同的形式, 比如 pickling 很多对象, 或者不想保存至文件; 更多的信息参考 pickle 模块.)

pickle 是 Python 中保存及重用对象的标准方式; 标准的属于称为 persistent 对象 (即持久化对象). 因为 pickle 被广泛使用, 很多写 Python 扩展的作者都会确保, 如矩阵这样的数据类型能被合理的 pickle 和 unpickle.

See also

(^.^)

原文: http://docs.python.org/py3k/tutorial/inputoutput.html
初译: 刘鑫
精译: DocsPy3zh
校对: Zoom.Quiet
复审:

错误和异常¶

到现在为止, 没有更多的提及错误信息, 但是当你在尝试这些例子时, 或多或少会碰到一些. 这里 (至少) 有两种可以分辨的错误: syntax error 和 exception , 按中文来说, 就是语法错误和异常.

语法错误¶

语法错误, 也可以认为是解析时错误, 这是在你学习 Python 过程中最有可能碰到的:

>>> while True print('Hello world')
  File "<stdin>", line 1, in ?
    while True print('Hello world')
                   ^
SyntaxError: invalid syntax

解析器会重复出错的那行, 然后显示一个小箭头, 指出探测到错误时最早的那个点. 错误一般是由箭头所指的地方导致 (或者至少是此处被探测到): 在这个例子中, 错误是在 print() 函数这里被发现的, 因为在它之前少了一个冒号 (':'). 文件的名称与行号会被打印出来, 以便于你能找到一个脚本中导致错误的地方.

异常¶

尽管语句或表达式语法上是没有问题的, 它同样也会在尝试运行时导致一个错误. 在执行时探测到的错误被成为 exception , 也就是异常, 但它并不是致命的问题: 你将会很快学到如何在 Python 程序中处理它们. 大多数异常并不会被程序处理, 不过, 导致错误的信息会被显示出来:

>>> 10 * (1/0)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
ZeroDivisionError: int division or modulo by zero
>>> 4 + spam*3
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
NameError: name 'spam' is not defined
>>> '2' + 2
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: Can't convert 'int' object to str implicitly

每个错误信息的最后一行或说明发生了什么. 异常会有很多的类型, 而这个类型会作为消息的一部分打印出来: 在此处的例子中的类型有 ZeroDivisionError, NameError 和 TypeError. 作为异常类型被输出的字符串其实是发生内建异常的名称. 对于所有内建异常都是那样的, 但是对于用户自定义的异常, 则可能不是这样 (尽管有某些约定). 标准异常的名字是内建的标识符 (但并不是关键字).

改行剩下的部分则提供更详细的信息, 是什么样的异常, 是怎么导致的.

错误消息的前面部分指出了异常发生的上下文, 以 stack traceback (栈追踪) 的方式显示. 一般来说列出了源代码的行数; 但是并不会显示从标准输入得到的行数.

Built-in Exceptions 列出了内建的异常和它们的意义.

处理异常¶

写程序来处理异常是可能的. 看看下面的例子, 它请求用户输入一个合法的整数, 但是也允许用户来中断程序 (使用 Control-C 或任何操作系统支持的); 注意, 用户生成的中断是通过产生异常 KeyboardInterrupt:

>>> while True:
...     try:
...         x = int(input("Please enter a number: "))
...         break
...     except ValueError:
...         print("Oops!  That was no valid number.  Try again...")
...

try 语句像下面这样使用.

首先, try clause (在 try 和 except 之间的语句) 将被执行.
如果没有异常发生, except clause 将被跳过, try 语句就算执行完了.
如果在 try 语句执行时, 出现了一个异常, 该语句的剩下部分将被跳过. 然后如果它的类型匹配到了 except 后面的异常名, 那么该异常的语句将被执行, 而执行完后会运行 try 后面的问题.
如果一个异常发生时并没有匹配到 except 语句中的异常名, 那么它就被传到 try 语句外面; 如果没有处理, 那么它就是 unhandled exception 并且将会像前面那样给出一个消息然后执行.

一个 try 语句可以有多于一条的 except 语句, 用以指定不同的异常. 但至多只有一个会被执行. Handler 仅仅处理在相应 try 语句中的异常, 而不是在同一 try 语句中的其他 Handler. 一个异常的语句可以同时包括多个异常名, 但需要用括号括起来, 比如:

... except (RuntimeError, TypeError, NameError):
...     pass

最后的异常段可以忽略异常的名字, 用以处理其他的情况. 使用这个时需要特别注意, 因为它很容易屏蔽了程序中的错误! 它也用于输出错误消息, 然后重新产生异常 (让调用者处理该异常):

import sys

try:
    f = open('myfile.txt')
    s = f.readline()
    i = int(s.strip())
except IOError as err:
    print("I/O error: {0}".format(err))
except ValueError:
    print("Could not convert data to an integer.")
except:
    print("Unexpected error:", sys.exc_info()[0])
    raise

try ... except 语句可以有一个可选的 else 语句, 在这里, 必须要放在所有 except 语句后面. 它常用于没有产生异常时必须执行的语句. 例如:

for arg in sys.argv[1:]:
    try:
        f = open(arg, 'r')
    except IOError:
        print('cannot open', arg)
    else:
        print(arg, 'has', len(f.readlines()), 'lines')
        f.close()

使用 else 比额外的添加代码到 try 中要好, 因为这样可以避免偶然的捕获一个异常, 但却不是由于我们保护的代码所抛出的.

当一个异常发生了, 它可能有相关的值, 这也就是所谓的异常的参数. 该参数是否出现及其类型依赖于异常的类型.

在 except 语句中可以在异常名后指定一个变量. 变量会绑定值这个异常的实例上, 并且把参数存于 instance.args. 为了方便, 异常的实例会定义 __str__() 来直接将参数打印出来, 而不用引用 .args. 当然也可以在产生异常前, 首先实例化一个异常, 然后把需要的属性绑定给它.

>>> try:
...    raise Exception('spam', 'eggs')
... except Exception as inst:
...    print(type(inst))    # the exception instance
...    print(inst.args)     # arguments stored in .args
...    print(inst)          # __str__ allows args to be printed directly,
...                         # but may be overridden in exception subclasses
...    x, y = inst.args     # unpack args
...    print('x =', x)
...    print('y =', y)
...
<class 'Exception'>
('spam', 'eggs')
('spam', 'eggs')
x = spam
y = eggs

如果一个异常有参数, 它们将作为异常消息的最后一部分打印出来.

异常的 handler 处理的异常, 不仅仅是 try 语句中那些直接的异常, 也可以是在此处调用的函数所产生的异常. 例如:

>>> def this_fails():
...     x = 1/0
...
>>> try:
...     this_fails()
... except ZeroDivisionError as err:
...     print('Handling run-time error:', err)
...
Handling run-time error: int division or modulo by zero

抛出异常¶

raise 语句允许程序员强制一个特定的异常的发生. 举个例子:

>>> raise NameError('HiThere')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
NameError: HiThere

给 raise 的唯一参数表示产生的异常. 这必须是一个异常实例或类 (派生自 Exception 的类).

如果你需要决定产生一个异常, 但是不准备处理它, 那么一个简单的方式就是, 重新抛出异常:

>>> try:
...     raise NameError('HiThere')
... except NameError:
...     print('An exception flew by!')
...     raise
...
An exception flew by!
Traceback (most recent call last):
  File "<stdin>", line 2, in ?
NameError: HiThere

自定义异常¶

程序中可以通过定义一个新的异常类 (更多的类请参考类) 来命名它们自己的异常. 异常需要从 Exception 类派生, 既可以是直接也可以是间接. 例如:

>>> class MyError(Exception):
...     def __init__(self, value):
...         self.value = value
...     def __str__(self):
...         return repr(self.value)
...
>>> try:
...     raise MyError(2*2)
... except MyError as e:
...     print('My exception occurred, value:', e.value)
...
My exception occurred, value: 4
>>> raise MyError('oops!')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
__main__.MyError: 'oops!'

在这个例子中, Exception 的默认方法 __init__() 被覆写了. 现在新的异常类可以像其他的类一样做任何的事, 但是常常会保持简单性, 仅仅提供一些可以被 handler 处理的异常信息. 当创建一个模块时, 可能会有多种不同的异常, 一种常用的做法就是, 创建一个基类, 然后派生出各种不同的异常:

class Error(Exception):
    """Base class for exceptions in this module."""
    pass

class InputError(Error):
    """Exception raised for errors in the input.

    Attributes:
        expression -- input expression in which the error occurred
        message -- explanation of the error
    """

    def __init__(self, expression, message):
        self.expression = expression
        self.message = message

class TransitionError(Error):
    """Raised when an operation attempts a state transition that's not
    allowed.

    Attributes:
        previous -- state at beginning of transition
        next -- attempted new state
        message -- explanation of why the specific transition is not allowed
    """

    def __init__(self, previous, next, message):
        self.previous = previous
        self.next = next
        self.message = message

大多数异常定义时都会以 “Error” 结尾, 就像标准异常的命名.

大多数标准模块都定义了它们自己的异常, 用于报告在它们定义的函数中发生的错误. 关于更多类的信息请参考类.

定义清理动作¶

try 语句有另一种可选的从句, 用于定义一些扫尾的工作, 此处定义的语句在任何情况下都会被执行. 例如:

>>> try:
...     raise KeyboardInterrupt
... finally:
...     print('Goodbye, world!')
...
Goodbye, world!
KeyboardInterrupt
Traceback (most recent call last):
  File "<stdin>", line 2, in ?

一个 *finally 语句*总是在离开 try 语句前被执行, 而无论此处有无异常发生. 当一个异常在 try 中产生, 但是并没有被 except 处理 (或者它发生在 except 或 else 语句中), 那么在 finally 语句执行后会被重新抛出. finally 语句在其他语句要退出 try 时也会被执行, 像是使用 break, continue 或者 return. 一个更复杂的例子:

>>> def divide(x, y):
...     try:
...         result = x / y
...     except ZeroDivisionError:
...         print("division by zero!")
...     else:
...         print("result is", result)
...     finally:
...         print("executing finally clause")
...
>>> divide(2, 1)
result is 2.0
executing finally clause
>>> divide(2, 0)
division by zero!
executing finally clause
>>> divide("2", "1")
executing finally clause
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "<stdin>", line 3, in divide
TypeError: unsupported operand type(s) for /: 'str' and 'str'

正如你所看到的, finally 语句在任何情况下都被执行了. 由于将两个字符串相除而产生的 TypeError 并没有被 except 语句处理, 因此在执行 finally 后被重新抛出.

在真正的应用中, finally 是非常有用的, 特别是释放额外的资源 (类似文件或网络连接), 无论此资源是否成功使用.

预定义的清理动作¶

有些对象定义了标准的清理工作, 特别是对象不再需要时, 无论对其使用的操作是否成功. 看看下面的例子, 它尝试打开一个文件并输出内容至屏幕.

for line in open("myfile.txt"):
    print(line)

前面这段代码的问题在于, 在此代码成功执行后, 文件依然被打开着. 在简单的脚本中这可能不是什么问题, 但是对于更大的应用来说却是个问题. with 语句就允许像文件这样的对象在使用后会被正常的清理掉.

with open("myfile.txt") as f:
    for line in f:
        print(line)

在执行该语句后, 文件 f 就会被关闭, 就算是在读取时碰到了问题. 像文件这样的对象, 总会提供预定义的清理工作, 更多的可以参考它们的文档.

See also

(^.^)

原文: http://docs.python.org/py3k/tutorial/errors.html
初译: 刘鑫
精译: DocsPy3zh
校对: Zoom.Quiet
复审:

类¶

同别的编程语言相比, Python 的类机制中增加了少量新的语法和语义. 它是 C++ 的类机制和 Modula-3 的类机制的混合体. Python 类提供了面向对象编程的所有基本特征: 允许多继承的类继承机制, 派生类可以重写它父类的任何方法, 一个方法可以调用父类中重名的方法. 对象可以包含任意数量和类型的数据成员. 作为模块, 类也拥有 Python 的动态特征: 他们可以被动态创建, 并且可以在创建之后被修改.

从C++术语上讲, Python 类的成员 (包括数据成员) 通常都是 public 的 (例外见下私有变量), 并且所有的成员函数都是 virtual 的. 和 Modula-3 中一样, Python 中没有关联对象成员和方法的隐式表达: 所有方法函数在声明时显式地将第一个参数表示为对象, 这个参数的值在方法被调用时隐式赋值. 同 Smalltalk 类似, Python 类本身就是对象. 这就提供了导入和重命名的语义. 与 C++ 和 Modula-3 不同的是, Python 的内置类型可以被当做基类来让使用者扩展. 另外, 像 C++ 一样, 大多数有特殊语法的内置操作符(算数运算符, 下标操作符等等) 在类的实例中都可以重定义.

(由于在谈论类的时候缺乏公认的术语, 我会偶尔使用 Smalltalk 和 C++ 的术语. 我更愿意用 Modula-3的术语, 因为它面向对象的语义比C++更贴近Python, 但是我估计没有读者听过这个说法.)

关于名称和对象的讨论¶

对象都具有个别性, 多个名称(在多个作用域中) 可以被绑定到同一个对象上. 这就是其他语言中所谓的别名. 通常第一次接触 Python 可能不会意识到这一点, 而且在处理不变的基本类型(数值, 字符串, 元组)时这一点可能会被安全的忽略. 但是, 在涉及到可变对象如 lists, dictionaries, 以及大多数其他类型时, 别名可能会在 Python 代码的语义上起到惊人的效果. 别名通常对编程有益处, 因为别名在某些方面表现得像指针. 比如, 由于在实现的时候传递的是指针, 所以传递一个对象的开销很小; 又比如将对象作为参数传递给一个函数来对它进行修改, 调用者将会看到对象的变化 — 这就消除了像Pascal 语言中的两个不同参数之间的传递机制的必要.

Python 的作用域和命名空间¶

在介绍类之前, 我必须先告诉你一些关于 Python 作用域规则的事. 类定义用命名空间玩了一些巧妙的把戏, 而你为了完全理解发生了什么就必须知道命名空间和作用域是怎么工作的. 顺便说一下, 这一主题的知识对任何高级 Python 程序员都是有用的.

让我们从定义开始.

命名空间 是从名称到对象的映射. 大多数命名空间现在的实现就如同 Python 的字典, 但通常这一点并不明显(除了在性能上), 而且它有可能在将来发生改变.

顺便说一下, 我用了属性这个词来称呼任何点后面跟的名称 — 比如, 在表达式 z.real 中, real 就是对象 z 的属性. 更直接的说, 对模块中名称的引用就是属性引用: 在表达式 modname.funcname 中, modname 是模块对象而 funcname 是它的一个属性. 在这种情况下模块的属性和它里面所定义的全局名称之间就刚好有一个直接的映射关系: 他们共享同一个命名空间! [1]

属性可以是只读的或可写的. 在后一种情况下, 给属性赋值才是可能的. 模块属性是可写的: 你可以写 modname.the_answer = 42. 可以利用:keyword:del 语句来删除可写属性. 例如, del modname.the_answer 将从名为 modname 的模块中移除属性 the_answer.

命名空间们是在不同时刻创建的,并且有着不同的生命期. 包含内置名称的命名空间是在 Python 解释器启动时创建的, 而且它永远不被删除. 一个模块的全局命名空间在模块的定义被读取的时候创建; 通常情况下, 模块的命名空间一直持续到解释器退出时. 被最高级别的解释器调用的语句, 不论是从脚本还是从交互读取的, 都被认为是一个名叫 __main__ 的模块的一部分, 所以它们有自己的全局命名空间. (内置名称实际上也存在于一个模块中; 这个模块叫 builtins.)

函数的局部命名空间在函数调用时被创建, 在函数返回时或者发生异常而终止时被删除. (事实上, 忘记可能是更好的方式来描述真正发生了什么.) 当然, 递归调用会有它们自己的局部命名空间.

在 Python 中, 一个作用域只是一个结构上的区域, 在这里命名空间可以直接访问. “直接访问” 就意味着无须特殊的指明引用.

尽管作用域是静态的决定的, 它们使用时却是动态的. 在执行时的任何时刻, 至少有三个嵌套的作用域其命名空间可以直接访问:

最内层的作用域, 首先被搜索, 包含局部变量名
任意函数的作用域, 它从最接近的作用域开始搜索, 包括非局部的, 但也是非全局的名字
紧邻最后的作用域包含了当前模块的全局变量
最外层的作用域 (最后搜索) 是包含内置名字的命名空间

如果一个名字在全局声明, 那么所有的引用和赋值都直接到这个模块的全局名中. 为了在最内部作用域中重新绑定变量, nonlocal 语句就可以使用了; 如果没有声明 nonlocal , 那些变量只是只读 (尝试给这样的变量赋值, 只是会简单的创建一个新的局部变量, 而外部的并没有什么改变)重新绑定.

一般来说, 局部作用域引用当前函数的局部变量名. 在函数外部, 局部变量引用和全局作用域相同的命名空间: 模块的命名空间. 类定义又放置了另一个命名空间.

意识到作用域是在结构上被决定的这很重要. 一个定义在模块中的函数的全局作用域, 就是模块的命名空间, 无论它从哪里被访问. 另一个方面, 搜寻名字的过程是动态完成的, 在运行时 — 但是, 语言的定义一般是静态的, 在 “编译” 时完成, 所以不要依赖动态命名! (事实上, 局部变量都是静态的被决定的.)

Python 的一个怪事就是 – 如果 global 语句没有起效果 – 赋值总是会使用最里层作用域的值. 赋值并没有拷贝数据 — 它们仅仅是绑定名字到对象上. 删除也是如此: del x 移除了 x 从局部作用域的绑定. 事实上, 所有操作引入新的名字都使用局部作用域: 特别的, import 语句, 和函数定义都将模块或函数绑定到了当前作用域.

global 语句可以用于指示, 在全局作用域中的变量可以在这里重新绑定; nonlocal 则表示在一个闭合的作用域中的变量可以在此处绑定.

域和命名空间的例子¶

这是一个例子用于说明如何引用不同的作用域和命名空间, global 和 nonlocal 如何影响变量绑定:

def scope_test():
    def do_local():
        spam = "local spam"
    def do_nonlocal():
        nonlocal spam
        spam = "nonlocal spam"
    def do_global():
        global spam
        spam = "global spam"

    spam = "test spam"
    do_local()
    print("After local assignment:", spam)
    do_nonlocal()
    print("After nonlocal assignment:", spam)
    do_global()
    print("After global assignment:", spam)

scope_test()
print("In global scope:", spam)

输出的结果是:

After local assignment: test spam
After nonlocal assignment: nonlocal spam
After global assignment: nonlocal spam
In global scope: global spam

注意局部的赋值 (默认) 并没有改变 scope_test 绑定的 spam. 而 nonlocal 则改变了 scope_test 中的 spam, 而 global 则改变了模块级别的绑定.

你可以看到在 global 赋值之前并没有绑定 spam 的值.

类的初印象¶

类引入了一些新的语法, 三种新的对象类型, 和一些新的语义.

类定义的语法¶

最简单的类的定义形式看起来像这样:

class ClassName:
    <statement-1>
    .
    .
    .
    <statement-N>

类的定义, 和函数定义 (def 语句) 一样必须在使用它们前执行. (你可以将一个类定义放置于 if 语句的分支中, 或一个函数中.)

事实上, 类定义内部的语句一般是函数的定义, 但其他的语句也是允许的, 而且还很有用 — 我们在后面将会继续讨论该问题. 类内的函数定义一般有一个特殊形式的参数列表, 习惯上称之为方法 — 同样, 也将在后面解释.

当进入一个类定义, 新的命名空间就被创建了, 这一般作为局部的作用域 — 因此, 所有的局部变量都在这个新的作用域中. 特别是, 函数定义会绑定.

当离开一个类定义后, 一个 class object 就被创建. 通过类的定义, 就将这个命名空间包装了起来; 我们将在后面学到更多关于类对象的知识. 原来的局部作用域 (在进入一个类定义前的作用域) 将会复位, 而类对象就会在这里绑定, 并且命名为类定义时的名字 (在此例中是 ClassName).

类对象¶

类对象支持两种操作: 属性引用和实例化.

属性引用 使用的语法和 Python 中所有的属性引用一样. 合法的属性名是那些在类的命名空间中定义的名字. 所以一个类定义如果是这样:

class MyClass:
    """A simple example class"""
    i = 12345
    def f(self):
        return 'hello world'

那么, MyClass.i 和 MyClass.f 就是合法的属性引用, 分别返回一个整数和一个函数对象. 类属性也可以被指定, 所以你可以给 MyClass.i 赋值以改变其数值. __doc__ 也是一个合法的属性, 返回属于这个类的 docstring : "A simple example class".

类的 实例化 使用函数的形式. 只要当作一个无参的函数然后返回一个类的实例就可以了. 比如 (假设有前面的类了):

x = MyClass()

创建了一个新的实例, 并且将其指定给局部变量 x.

实例化的操作 (“调用” 一个类对象) 创建了空的对象. 在创建实例时, 很多类可能都需要有特定的初始状态. 所以一个类可以定义一个特殊的方法, 称为 __init__(), 像这样:

def __init__(self):
    self.data = []

当一个类定义了 __init__() 方法, 类在实例化时会自动调用 __init__() 方法, 用于创建新的类实例. 所以在这个例子中, 一个新的初始化过的实例被创建:

x = MyClass()

当然, 为了更大的灵活性, 方法 __init__() 可以有更多的参数. 在这种情况下, 给类的参数会传给 __init__(). 例如,

>>> class Complex:
...     def __init__(self, realpart, imagpart):
...         self.r = realpart
...         self.i = imagpart
...
>>> x = Complex(3.0, -4.5)
>>> x.r, x.i
(3.0, -4.5)

实例对象¶

那么我们现在可以对实例对象做什么? 实例对象唯一能理解的操作就是属性引用. 有两种合法的属性, 数据属性和方法.

data attribute 在 Smalltalk 中相应于 “instance variable”, 在 C++ 中相应于 “data member”. 数据属性不需要声明; 像局部变量, 当它们第一次指定时就会被引入. 比如, 如果 x 是前面创建的 MyClass 的实例, 那么下面的例子就会打印出 16, 而不会有问题:

x.counter = 1
while x.counter < 10:
    x.counter = x.counter * 2
print(x.counter)
del x.counter

实例属性引用的另一种是方法. 一个方法就是 “属于” 一个对象的函数. (在 Python 中, 方法的概念并不是类实例所特有: 其他对象类型也可以有方法. 例如, 列表对象有 append, insert, remove, sort, 及等等的方法. 但是, 在下面的讨论中, 我们指的就是类实例对象的方法, 除非特别指出.)

合法的方法名依赖于实例的类. 在定义中, 类的属性如果是那些定义的函数对象, 而这也就是实例的方法. 所以在我们的例子中, x.f 是一个合法的方法引用, 因为 MyClass.f 是一个函数, 但是 x.i 就不是, 因为 MyClass.i 就不是. 但是 x.f 和 MyClass.f 并不一样 — 它是一个 method object, 而不是 function object.

方法对象¶

通常, 一个方法在其绑定后就可以调用了:

x.f()

在 MyClass 这个例子中, 这将会返回字符串 'hello world'. 但是, 像这样的调用并不是必须的: x.f 是一个方法对象, 它可以被保存起来以供下次调用. 例如:

xf = x.f
while True:
    print(xf())

将会持续的打印 'hello world'.

那么在方法调用是发生了什么? 你可能注意到 x.f() 调用时并没有参数, 尽管 f() 定义时是有一个参数的. 那么这个参数怎么了? 当然, Python 在一个参数缺少时调用一个函数是会发生异常的 — 就算这个参数没有真正用到...

事实上, 你会猜想到: 关于方法, 特殊的东西就是, 对象作为参数传递给了函数的第一个参数. 在我们的例子中, x.f() 是严格等价于 MyClass.f(x). 在多数情况下, 调用一个方法 (有个 n 个参数), 和调用相应的函数 (也有那 n 个参数, 但是再额外加入一个使用该方法的对象), 是等价的.

如果你仍然不知道方法如何工作, 那么看看实现或许会解决这些问题. 当一个实例属性被引用时, 但是不是数据属性, 那么它的类将被搜索. 如果该名字代表一个合法的类属性并且是一个函数对象, 一个方法对象就会被创建, 通过包装 (指向) 实例对象, 而函数对象仍然只是在抽象的对象中: 这就是方法对象. 当方法对象用一个参数列表调用, 新的参数列表会从实例对象中重新构建, 然后函数对象则调用新的参数列表.

随机备注¶

数据属性覆写了同名的方法属性; 为了避免这个偶然的名字冲突, 在大型的程序中这会导致很难寻找的 bug, 使用某些命名约定是非常明智的, 这样可以最小的避免冲突. 可能的约定包括大写方法名称, 在数据类型前增加特殊的前缀 (或者就是一个下划线), 或对于方法使用动词, 而数据成员则使用名词.

数据属性可以被该类的方法或者普通的用户 (“客户”) 引用. 换句话说, 类是不能实现完全的抽象数据类型. 事实上, 在 Python 中没有任何东西是强制隐藏的 — 这完全是基于约定. (在另一方面, Python 是用 C 实现的, 这样就可以实现细节的隐藏和控制访问; 这可以通过编写 Python 的扩展实现.)

客户需要小心地使用数据属性 — 客户会弄乱被方法控制的不变量, 通过使用它们自己的方法属性. 注意用户可以增加它们自己的数据到实例对象上, 而没有检查有没有影响方法的有效性, 只要避免名字冲突 – 在说一次, 命名约定可以避免很多这样令人头疼的问题.

在引用数据属性 (或其他方法 !) 并没有快速的方法. 我发现这的确增加了方法的可读性: 这样就不会被局部变量和实例中的变量所困惑, 特别是在随便看看一个方法时.

通常, 方法的第一个参数称为 self. 这更多的只是约定: self 对于 Python 来说没有任何意义. 但注意, 如果不遵循这个约定, 对于其他的程序员来说就比较难以理解了, 一个 class browser 程序可能会依赖此约定.

作为类属性的任何函数对象, 定义了一个方法用于那个类的实例. 函数是否在一个类体中其实并不重要: 指定一个函数对象给类中的局部变量也是可以的. 例如:

# Function defined outside the class
def f1(self, x, y):
    return min(x, x+y)

class C:
    f = f1
    def g(self):
        return 'hello world'
    h = g

现在 f, g 和 h 都是类 C 的属性, 并且指向函数对象, 而且都是类 C 实例的方法 — h 和 g 是等价的. 注意这个只会是读者感到困惑.

方法可以通过使用 self 参数调用其他的方法:

class Bag:
    def __init__(self):
        self.data = []
    def add(self, x):
        self.data.append(x)
    def addtwice(self, x):
        self.add(x)
        self.add(x)

方法可以引用全局变量, 就像普通函数中那样. 与这个方法相关的全局作用域, 是包含那个类定义的模块. (类本身永远不会作为全局作用域使用.) 如果的确需要在方法中使用全局数据, 那么需要合法的使用: 首先一件事, 被导入全局作用域的函数和模块可以被方法使用, 就如定义在里面的函数和类一样. 通常来说, 定义在全局作用域中, 包含方法的类是它自己本身, 并且在后面我们会知道为何方法应该引用自己的类.

每个值都是一个对象, 所以对于 class (或称为它的 type) 也是这样. 它存于 object.__class__.

继承¶

当然, 一个有 “class” 的语言如果没有继承就没有多大的价值了. 派生类的定义如下:

class DerivedClassName(BaseClassName):
    <statement-1>
    .
    .
    .
    <statement-N>

BaseClassName 的定义对于派生类而言必须是可见的. 在基类的地方, 任意的表达式都是允许的. 这就会非常有用, 比如基类定义在另一个模块:

class DerivedClassName(modname.BaseClassName):

派生类就可以像基类一样使用. 当一个类被构建, 那么它就会记下基类. 这是用于解决属性引用的问题: 当一个属性在这个类中没有被找到, 那么就会去基类中寻找. 然后搜索就会递归, 因为如果基类本身也是从其他的派生.

实例化一个派生类没有什么特别: DerivedClassName() 会创建这个类的新实例. 方法的引用如下: 相应的类的属性会被搜寻, 如果需要回去搜寻基类, 如果返回一个函数对象, 那么这个引用就是合法的.

派生类会覆写基类的方法. 因为当调用同样的对象的其他方法时方法并没有什么特别的, 基类的方法会因为先调用派生类的方法而被覆写. (对于 C++ 程序员: 所有的方法在 Python 中都是 vitual 的.)

一个在派生类中覆写的方法可能需要基类的方法. 最简单的方式就是直接调用基类的方法: 调用 BaseClassName.methodname(self, arguments). 这对于可续来说也是很方便的. (这仅在 BaseClassName 可访问时才有效.)

Python 有两个内置函数用于继承:

使用 isinstance() 检查实例的类型: isinstance(obj, int) 只有在 obj.__class__ 是 int 或其派生类时才为 True.
使用 issubclass() 用于检查类的继承关系: issubclass(bool, int) 会返回 True, 因为 bool 是 int 的派生类. 但是, issubclass(float, int) 会是 False 因为 float 并不是 int 的派生类.

多重继承¶

Python 支持多重继承. 一个多重继承的类定义看起来像这样:

class DerivedClassName(Base1, Base2, Base3):
    <statement-1>
    .
    .
    .
    <statement-N>

对于大多数目的, 在最简单的情况下, 你可以将属性搜寻的方式是, 从下至上, 从左到右, 在继承体系中, 同样的类只会被搜寻一次. 如果一个属性在 DerivedClassName 中没有被找到, 它就会搜寻 Base1, 然后 (递归地) 搜寻 Base1 的基类, 然后如果还是没有找到, 那么就会搜索 Base2, 等等.

事实上, 这更加的复杂; 方法的搜寻顺序会根据调用 super() 而变化. 这个方法在某些其他多重继承的语言中以 call-next-method 被熟知, 而且比单继承的语言中要有用.

动态的顺序是很有必要的, 因为在那些处于菱形继承体系中 (这里至少有个父类被多次派生). 比如, 所有的类都从 object 派生, 所以到达 object 的路径不止一条. 为了防止基类被多次访问, 动态的算法线性化了搜寻的路径, 先从左至右搜索指定的类, 然后这样就可以让每个父类只搜寻一次, 并且单一 (这就意味一个类可以被派生, 但是不会影响其父类的搜寻路径. 使用了这些, 就使得以多重继承设计的类更可靠和可扩展. 具体参考http://www.python.org/download/releases/2.3/mro/.

私有变量¶

在 Python 之中, 并不存在那种无法访问的 “私有” 变量. 但是, 在多数的 Python 代码中有个约定: 以一个下划线带头的名字 (如 _spam) 应该作为非公共的 API (不管是函数, 方法或者数据成员). 这应该作为具体的实现, 而且变化它也无须提醒.

因为有一个合法的情况用于使用私有的成员 (名义上是说在派生类中避免名字的冲突), 因此就有这样的一种机制称为 name mangling. 任何如 __spam 形式的标识符, (在开头至少有两个下划线) 将被替换为 _classname__spam, 此处的 classname 就是当前的类. 这样的处理无须关注标识符的句法上的位置, 尽管它是在一个类的定义中.

注意, 这样的规则只是用于防止冲突; 它仍然可以访问或修改, 尽管认为这是一个私有变量. 在某些特殊情况下, 如测试等, 是有用的.

注意, 传递给 exec() 或 eval() 的代码并不会考虑被调用类的类名是当前的类; 这个和 global 语句的效果一样, 字节编译的代码也有同样的限制. 而对于 getattr(), setattr() 和 delattr() 也有这种限制, 直接访问 __dict__ 也是有这样的问题.

杂物¶

有些时候, 有类似于 Pascal 的 “record” 或 C 的 “struct” 这样的数据类型非常有用, 绑定一些命名的数据. 一个空的类定义就将很好:

class Employee:
    pass

john = Employee() # Create an empty employee record

# Fill the fields of the record
john.name = 'John Doe'
john.dept = 'computer lab'
john.salary = 1000

一段 Python 代码中如果希望一个抽象的数据类型, 那么可以通过传递一个类给那个方法, 就好像有了那个数据类型一样.

例如, 如果你有一个函数用于格式化某些从文件对象中读取的数据, 你可以定义一个类, 然后有方法 read() 和 readline() 用于读取数据, 然后将这个类作为一个参数传递给那个函数.

实例方法对象也有属性: m.__self__ 就是一个方法 m() 的实例对象, 而 m.__func__ 是相应于该方法的函数对象.

异常也是类¶

用户定义的异常其实也是类. 使用这个机制, 就可以创建可扩展的异常继承体系.

有两种合法的形式用于 raise 语句:

raise Class

raise Instance

在第一种形式下, Class 必须是 type 的实例或者其派生. 第一种形式可以简化为这样这样:

raise Class()

一个在 except 中的类, 可以与一个异常相容, 如果该异常是同样的类, 或是它的基类 (但是并不是另一种 – 一个 except 语句列出的派生类与其基类并不相容). 如下面的代码, 以那种顺序打印出 B, C, D:

class B(Exception):
    pass
class C(B):
    pass
class D(C):
    pass

for c in [B, C, D]:
    try:
        raise c()
    except D:
        print("D")
    except C:
        print("C")
    except B:
        print("B")

但是注意, 如果 except 语句是反着的 (先用 except B), 那么打印的结果将是 B, B, B – 第一个总是匹配.

当因为一个未处理的异常发生时, 错误信息将被打印, 异常的类名将被打印, 然后是一个冒号和空格, 最后是使用 str() 转换后的实例.

迭代器¶

到目前为止, 你可能注意到, 大多数的容器对象都可以使用 for 来迭代:

for element in [1, 2, 3]:
    print(element)
for element in (1, 2, 3):
    print(element)
for key in {'one':1, 'two':2}:
    print(key)
for char in "123":
    print(char)
for line in open("myfile.txt"):
    print(line)

这种形式简洁, 明了并且方便. 迭代器的使用遍布于 Python 之中. 在这个外表之下, for 语句对容器对象调用了 iter(). 这个函数返回一个迭代器对象, 它定义了 __next__() 方法, 用以在每次访问时得到一个元素. 当没有任何元素时, __next__() 将产生 StopIteration 异常, 它告诉 for 停止迭代. 你可以使用内置函数 next() 来调用 __next__() 方法; 这个例子展示了它如何工作:

>>> s = 'abc'
>>> it = iter(s)
>>> it
<iterator object at 0x00A1DB50>
>>> next(it)
'a'
>>> next(it)
'b'
>>> next(it)
'c'
>>> next(it)

Traceback (most recent call last):
  File "<stdin>", line 1, in ?
    next(it)
StopIteration

在看到迭代器的机制之后, 就可以很简单的将迭代行为增加到你的类中. 定义一个 __iter__() 方法用以返回一个具有 __next__() 的对象. 如果这个类定义了 __next__() , 那么 __iter__() 仅需要返回 self:

class Reverse:
    "Iterator for looping over a sequence backwards"
    def __init__(self, data):
        self.data = data
        self.index = len(data)
    def __iter__(self):
        return self
    def __next__(self):
        if self.index == 0:
            raise StopIteration
        self.index = self.index - 1
        return self.data[self.index]

>>> rev = Reverse('spam')
>>> iter(rev)
<__main__.Reverse object at 0x00A1DB50>
>>> for char in rev:
...     print(char)
...
m
a
p
s

发生器¶

Generator (生成器) 是一个用于创建迭代器简单而且强大的工具. 它们和普通的函数很像, 但是当它们需要返回值时, 则使用 yield 语句. 每次 next() 被调用时, 生成器会从它上次离开的地方继续执行 ( 它会记住所有的数据值和最后一次执行的语句). 一个例子用以展示如何创建生成器:

def reverse(data):
    for index in range(len(data)-1, -1, -1):
        yield data[index]

>>> for char in reverse('golf'):
...     print(char)
...
f
l
o
g

任何可用生成器实现的东西都能用基于迭代器的类实现, 这个在前面有所描述. 让生成器看起来很紧密的原因是它自动创建了 __iter() 和 __next__().

另一个关键的特性在于, 局部变量和执行状态都被自动保存下来. 这就使函数更容易编写并且更加清晰, 相对于使用实例的变量, 如 self.index 和 self.data.

除了自动创建方法和保存程序状态, 当生成器终止时, 它们会自动产生 StopIteration 异常. 在这些结合起来后, 这就使得能够很简单的创建迭代器, 除了仅需要编写一个函数.

生成器表达式¶

有些简单的生成器可以简洁的写出来, 而且和列表推导很类似, 仅仅是将方括号换成了圆括号. 这些表达式设计用于在一个函数中正好可以用生成器的情况. 生成器表达式更加紧密, 但是功能相对来说也少点, 并且与同样的列表推导式来说更节约内存.

例子:

>>> sum(i*i for i in range(10))                 # sum of squares
285

>>> xvec = [10, 20, 30]
>>> yvec = [7, 5, 3]
>>> sum(x*y for x,y in zip(xvec, yvec))         # dot product
260

>>> from math import pi, sin
>>> sine_table = {x: sin(x*pi/180) for x in range(0, 91)}

>>> unique_words = set(word  for line in page  for word in line.split())

>>> valedictorian = max((student.gpa, student.name) for student in graduates)

>>> data = 'golf'
>>> list(data[i] for i in range(len(data)-1, -1, -1))
['f', 'l', 'o', 'g']

Footnotes

[1]	除了一种情况. Module 对象有一个私有的只读属性, 名为 `__dict__`, 它返回实现这个模块命名空间的字典. 显然, 使用这会违反命名空间实现的抽象, 而只应当限于在如 post-mortem debuggers 的事情中使用.

See also

(^.^)

原文: http://docs.python.org/py3k/tutorial/classes.html
初译: 刘鑫
精译: DocsPy3zh
校对: Zoom.Quiet
复审:

标准库的简明介绍¶

与操作系统的接口¶

os 模块提供了许多与操作系统交互的接口:

>>> import os
>>> os.getcwd()      # 返回当前工作目录 (current working directory)
'C:\\Python32'
>>> os.chdir('/server/accesslogs')   # 改变当前工作目录
>>> os.system('mkdir today')   # 在系统的shell中运行mkdir命令
0

记住要使用 import os 这种风格而不是 from os import *. 这样会避免使得 :fun:`os.open` 覆盖了功能截然不同的 open().

内置函数 dir() 和 help() 对于处理像 os:: 这样的大型模块来说是一种十分有效的工具:

>>> import os
>>> dir(os)
<returns a list of all module functions>
>>> help(os)
<returns an extensive manual page created from the module's docstrings>

对于日常文件和目录的管理, shutil 模块提供了更便捷、更高层次的接口:

>>> import shutil

>>> shutil.copyfile('data.db', 'archive.db')
>>> shutil.move('/build/executables', 'installdir')

文件的通配符¶

glob 模块提供了这样一个函数, 这个函数使我们能以通配符的方式搜索某个目录下的特定文件, 并列出它们:

>>> import glob
>>> glob.glob('*.py')
['primes.py', 'random.py', 'quote.py']

命令行参数¶

一些实用的脚本通常需要处理命令行参数. 这些参数被 sys 模块的 argv 属性以列表的方式存储起来. 下例中, 命令行中运行 python demo.py one two three , 其结果便能说明这一点:

>>> import sys
>>> print(sys.argv)
['demo.py', 'one', 'two', 'three']

getopt 模块使用Unix通常用的 getopt() 函数去处理 sys.argv. argparse 提供了更强大且更灵活的处理方法.

错误的重定向输出和程序的终止¶

sys 模块还包括了 stdin, stdout, stderr 属性. 而最后一个属性 stderr 可以有效地使警告和出错信息以可见的方式传输出来, 即使是 stdout 被重定向了:

>>> sys.stderr.write('Warning, log file not found starting a new one\n')
Warning, log file not found starting a new one

最直接地结束整个程序的方法是调用 sys.exit().

字符串模式的区配¶

re 模块提供了一些正则表达式工具, 以便对字符串做进一步的处理. 对于复杂的区配和操作, 正则表达式提供了简洁的、优化了的解决方法:

>>> import re
>>> re.findall(r'\bf[a-z]*', 'which foot or hand fell fastest')
['foot', 'fell', 'fastest']
>>> re.sub(r'(\b[a-z]+) \1', r'\1', 'cat in the the hat')
'cat in the hat'

而当只需要一些简单功能的时候, 我们更倾向于字符串方法, 因为它们更容易阅读和调试:

>>> 'tea for too'.replace('too', 'two')
'tea for two'

数学处理¶

math 模块使我们可以访问底层的 C 语言库里关于浮点数的一些函数:

>>> import math
>>> math.cos(math.pi / 4)
0.70710678118654757
>>> math.log(1024, 2)
10.0

random 模块提供了产生随机数的工具:

>>> import random
>>> random.choice(['apple', 'pear', 'banana'])
'apple'
>>> random.sample(range(100), 10)   # 生成无需更换的随机抽样样本
[30, 83, 16, 4, 8, 81, 41, 50, 18, 33]
>>> random.random()    # 生成随机浮点数
0.17970987693706186
>>> random.randrange(6)    # 以range(6)里的数为基准生成随机整数
4

Scipy 项目 <http://scipy.org> 里有许多关于数值计算的模块.

访问互联网¶

python里包含了许多访问互联网和处理互联网协议的模块. 其中最简单的两个分别是, 从网址中检索数据的 urllib.request 模块, 和发送邮件的 smtplib 模块:

>>> from urllib.request import urlopen
>>> for line in urlopen('http://tycho.usno.navy.mil/cgi-bin/timer.pl'):
...     line = line.decode('utf-8')  # 将二进制文件解码成普通字符
...     if 'EST' in line or 'EDT' in line:  # 查找西方国家的时间
...         print(line)

<BR>Nov. 25, 09:43:32 PM EST

>>> import smtplib
>>> server = smtplib.SMTP('localhost')
>>> server.sendmail('soothsayer@example.org', 'jcaesar@example.org',
... """To: jcaesar@example.org
... From: soothsayer@example.org
...
... Beware the Ides of March.
... """)
>>> server.quit()

(注意：第二个例子需要本地有一个邮件服务器. )

日期和时间¶

datetime 模块提供了操作日期和时间的类, 包括了简单和复杂两种方式. 当我们知道了时间和日期的算法后, 工作的重心便放在了如何有效地格式化输出和操作之上了. 该模块也提供了区分时区的对象.

>>> # dates are easily constructed and formatted
>>> from datetime import date
>>> now = date.today()
>>> now
datetime.date(2003, 12, 2)
>>> now.strftime("%m-%d-%y. %d %b %Y is a %A on the %d day of %B.")
'12-02-03. 02 Dec 2003 is a Tuesday on the 02 day of December.'

>>> # dates support calendar arithmetic
>>> birthday = date(1964, 7, 31)
>>> age = now - birthday
>>> age.days
14368

数据的压缩¶

有些模块可以支持常规的数据压缩和解压, 这些模块块包括: zlib, gzip, zipfile 和 tarfile.

>>> import zlib
>>> s = b'witch which has which witches wrist watch'
>>> len(s)
41
>>> t = zlib.compress(s)
>>> len(t)
37
>>> zlib.decompress(t)
b'witch which has which witches wrist watch'
>>> zlib.crc32(s)
226805979

性能测试¶

一些 Python 的使用者对相同问题的不同解决方法的相对性能优劣有着极大的兴趣. 而 Python 也提供了一些测试工具, 使得这些问题的答案一目了然.

例如, 我们会使用元组的打包和解包的特性而不是传统的方法去交换参数. timeit 模块可以很快地显示出性能上的优势, 即使这些优势很微小:

>>> from timeit import Timer
>>> Timer('t=a; a=b; b=t', 'a=1; b=2').timeit()
0.57535828626024577
>>> Timer('a,b = b,a', 'a=1; b=2').timeit()
0.54962537085770791

和 timeit 良好的精确性不同的是, profile 模块和 pstats 模块提供了一些以大块代码的方式辨别临界时间的工具.

质量控制¶

一种开发高质量软件的方法是, 针对每一个功能, 在假使它们已经开发完成的状态下, 编写一些测试程序, 而且在开发的过程中, 不断地去运行这些测试程序.

doctest 模块提供了工具去浏览一个模块并通过嵌入在文档中的测试程序进行有效性测试. 测试的构成简单到只需将这个模块的调用过程和结果进行剪切和粘贴操作, 保存到文档当中. 通过在文档中给用户呈现一个例子, 从而提高了文档的可读性. 同时, 它还确保了代码是忠实于文档的:

def average(values):
    """Computes the arithmetic mean of a list of numbers.

    >>> print(average([20, 30, 70]))
    40.0
    """
    return sum(values) / len(values)

import doctest
doctest.testmod()   # 自动地通过嵌入的测试程序进行有效性检测

unittest 模块并没有 doctest 这么轻松简单, 但它在一个独立维护的文件中, 提供了更综合的测试集:

import unittest

class TestStatisticalFunctions(unittest.TestCase):

    def test_average(self):
        self.assertEqual(average([20, 30, 70]), 40.0)
        self.assertEqual(round(average([1, 5, 7]), 1), 4.3)
        self.assertRaises(ZeroDivisionError, average, [])
        self.assertRaises(TypeError, average, 20, 30, 70)

unittest.main() # 通过命令行调用所有的测试程序

充电区¶

Python有一个关于原理的 “充电区” (batteries included). 这是你了解 python 原理和它的各种包的强大功能的最佳方式. 例如：

xmlrpc.client 模块和 xmlrpc.server 模块使得远距离程序的调用变得简单便捷. 你不用去管任何模块的名字, 也不必掌握XML的知识.
email 包是一个处理 email 消息的库, 包括MIME和其它以 RFC 2822 为基准的消息文档. 它不像 poplib 模块和 smtplib 模块只发送和接收消息, email 包有一个完整的工具集去创建或者解码复杂的消息结构（包括附件）和执和互联网编码和包头协议.
xml.dom 包和 xml.sax 包为解析这种流行的数据交换格式提供了强大的支持. 同样地, csv 模块对读写常规的数据库文件提供了支持. 这些包和模块结合在一起, 大大简化了Python应用程序和其它工具的数据交换方法.
一些模块如 :mode:`gettext` , locale 和包 codecs, 为Python的国际化, 提供了支持.

See also

(^.^)

原文: http://docs.python.org/py3k/tutorial/stdlib.html
初译: 刘鑫
精译: DocsPy3zh
校对: Zoom.Quiet
复审:

标准库的简明介绍（第二部分）¶

第二部分涵盖了许多关于专业编程需求的高级模块. 这些模块很少出现在小的程序当中.

格式化输出¶

reprlib 模块提供了一个面向内容很多或者深度很广的嵌套容器的自定义版本 repr(), 它只显示缩略图

>>> import reprlib
>>> reprlib.repr(set('supercalifragilisticexpialidocious'))
"set(['a', 'c', 'd', 'e', 'f', 'g', ...])"

pprint 模块通过一种能够让解释器读懂的方法, 来对内置的和用户自定义的一些对象的输出进行更复杂的操作. 当返回的结果大于一行时, “pretty printer” 功能模块会加上断行符和适当的缩进, 以使数据的结构更加清晰明朗:

>>> import pprint
>>> t = [[[['black', 'cyan'], 'white', ['green', 'red']], [['magenta',
...     'yellow'], 'blue']]]
...
>>> pprint.pprint(t, width=30)
[[[['black', 'cyan'],
   'white',
   ['green', 'red']],
  [['magenta', 'yellow'],
   'blue']]]

textwrap 模块会根据屏幕的宽度而适当地去调整文本段落:

>>> import textwrap
>>> doc = """The wrap() method is just like fill() except that it returns
... a list of strings instead of one big string with newlines to separate
... the wrapped lines."""
...
>>> print(textwrap.fill(doc, width=40))
The wrap() method is just like fill()
except that it returns a list of strings
instead of one big string with newlines
to separate the wrapped lines.

locale 模块访问一个包含因特定语言环境而异的数据格式的数据库. locale模块的格式化函数的分组属性, 可以用组别分离器, 直接地去格式化数字:

>>> import locale
>>> locale.setlocale(locale.LC_ALL, 'English_United States.1252')
'English_United States.1252'
>>> conv = locale.localeconv()          # 得到一个常规框架的映射
>>> x = 1234567.8
>>> locale.format("%d", x, grouping=True)
'1,234,567'
>>> locale.format_string("%s%.*f", (conv['currency_symbol'],
...                      conv['frac_digits'], x), grouping=True)
'$1,234,567.80'

模板化¶

string 模块包括一个多元化的 Template 类, 为用户提供简化了的语法格式, 使其可以方便的编辑. 这样可以使用户自定义自己的程序而不用去修改程序本身.

这种格式使用由 $ 和合法的Python标识（包括文字、数字和下划线）组成的占位符名称. 将这些占位符包含在一对花括号里时允许周围存在更多的字母或者数字而不用理会是否有空格. 必要时使用 $$ 去表示单独的 $

>>> from string import Template
>>> t = Template('${village}folk send $$10 to $cause.')
>>> t.substitute(village='Nottingham', cause='the ditch fund')
'Nottinghamfolk send $10 to the ditch fund.'

当一个字典或者关键字参数没有给占位符提供相应的值时, substitute() 方法会抛出一个 KeyError 异常. 对于像mail-merge风格的应用程序, 用户可能会提供不完整的数据, 此时, safe_substitute() 方法可能会更适合——当数据缺失的时候, 它不会改变占位符:

>>> t = Template('Return the $item to $owner.')
>>> d = dict(item='unladen swallow')
>>> t.substitute(d)
Traceback (most recent call last):
  . . .
KeyError: 'owner'
>>> t.safe_substitute(d)
'Return the unladen swallow to $owner.'

Template类的子类可以指定一个自己的分隔符. 例如, 现在有一大批文件的重命名工作, 针对的是一个照片浏览器, 它可能会选择使用百分符号将当前时间、图片的序列号或者文件格式分隔出来作为占位符:

>>> import time, os.path
>>> photofiles = ['img_1074.jpg', 'img_1076.jpg', 'img_1077.jpg']
>>> class BatchRename(Template):
...     delimiter = '%'
>>> fmt = input('Enter rename style (%d-date %n-seqnum %f-format):  ')
Enter rename style (%d-date %n-seqnum %f-format):  Ashley_%n%f

>>> t = BatchRename(fmt)
>>> date = time.strftime('%d%b%y')
>>> for i, filename in enumerate(photofiles):
...     base, ext = os.path.splitext(filename)
...     newname = t.substitute(d=date, n=i, f=ext)
...     print('{0} --> {1}'.format(filename, newname))

img_1074.jpg --> Ashley_0.jpg
img_1076.jpg --> Ashley_1.jpg
img_1077.jpg --> Ashley_2.jpg

另一个用于模板化的应用程序是将项目的逻辑按多种输出格式的细节分离开来. 这使得从传统的模板形式转化为XML文件、纯文本形式和html网页成为了可能.

Working with Binary Data Record Layouts¶

struct 模块一些函数, 如 :fun:`pack` 和 :fun:`unpack` 函数去处理长度可变的二进制记录格式. 下面这个例子演示了如何在不使用 zipfile 模块的情况下去循环得到一个ZIP文件的标题信息. 包代码 H 和 I 分别表示两个和四个字节的无符号数字. 而 < 则表示它们是标准大小并以字节大小的顺序排列在后面:

import struct

data = open('myfile.zip', 'rb').read()
start = 0
for i in range(3):                      # 显示最开始的3个标题
    start += 14
    fields = struct.unpack('<IIIHH', data[start:start+16])
    crc32, comp_size, uncomp_size, filenamesize, extra_size = fields

    start += 16
    filename = data[start:start+filenamesize]
    start += filenamesize
    extra = data[start:start+extra_size]
    print(filename, hex(crc32), comp_size, uncomp_size)

    start += extra_size + comp_size     # 跳到另一个标题

多线程¶

线程是一种使没有顺序关系的任务并发执行的技术. 线程可以用来改进应用程序的响应方式使这些程序可在接受用户输入的同时在后台执行另一些操作. 一个与此相关的例子是运行输入输出程序的同时在另一个线程序中执行计算操作.

下面的代码显示了高级模块 threading 如何实现主程序执行的同时在后台执行另一个相应的程序:

import threading, zipfile

class AsyncZip(threading.Thread):
    def __init__(self, infile, outfile):
        threading.Thread.__init__(self)
        self.infile = infile
        self.outfile = outfile
    def run(self):
        f = zipfile.ZipFile(self.outfile, 'w', zipfile.ZIP_DEFLATED)
        f.write(self.infile)
        f.close()
        print('Finished background zip of:', self.infile)

background = AsyncZip('mydata.txt', 'myarchive.zip')
background.start()
print('The main program continues to run in foreground.')

background.join()    # 等待后台任务结束
print('Main program waited until background was done.')

多线层应用程序的最大挑战就是协调行线程之间数据或者其它资源的共享. 为此, 线程模块提供了许多同步原始函数, 包括锁定、条件变量和信号.

虽然有这些强大的工具, 但设计上的一个小错误仍然可以导至难以恢复的问题. 因此, 对于协调各线程我们更倾向于把的有的访问集中在一个单独的线程上, 这个线程使用 queue 模块把其它线程的请求全部集中起来. 应用程序使用 Queue 的对象来进行跨线程的交流和协调可以使得设计变得更简单, 而且更易阅读, 更可靠.

日志¶

logging 模块提供一整套富有特色且灵活的日志系统. 用最简单的方式说, 就是把日志消息传送给一个文件或者 ``sys.stderr`:

import logging
logging.debug('Debugging information')
logging.info('Informational message')
logging.warning('Warning:config file %s not found', 'server.conf')
logging.error('Error occurred')
logging.critical('Critical error -- shutting down')

上述会产生以下的输出:

WARNING:root:Warning:config file server.conf not found
ERROR:root:Error occurred
CRITICAL:root:Critical error -- shutting down

默认的情况下, 信息和调试消息会被捕捉, 并发送给标准错误流. 其它的一些输出选项包括经由邮件、数据报套接字或者发送给一个HTTP服务器的路由消息. 新的地滤器可以选择不同的基于消息优先级的的路由, 而消息的优先级有: DEBUG, INFO, WARING, ERROR, 和 CRITICAL.

日志系统可以被Python语言配置, 或者被一个用户的可编辑的配置文件加载, 以此去自定义日志而不用去修程序本身.

弱引用¶

Python语言自动管理内存（大多数对象都有一个对它的引用而 garbage collection 对它们进行回收）. 当最后一个引用结束之后内存即刻被回收.

这种机制对大多数应用程序来说都是有效的, 但也有偶然情况, 只有当它们被其它东西引用的时候才需要去跟踪对象. 不幸的是, 仅仅是跟踪它们也需要创建一个引用. weakref 模块提供一些工具可以达到同样的效果而不用去创建一个引用. 当不再需要这个对象的时候, 它会自动地从一个weakref表中移除然后鉵发对weadref对象的回调. 通常应用程序都会对那些创建时花费较多时间的对象提供一个缓存:

>>> import weakref, gc
>>> class A:
...     def __init__(self, value):
...             self.value = value
...     def __repr__(self):
...             return str(self.value)
...
>>> a = A(10)                   # 创建一个引用
>>> d = weakref.WeakValueDictionary()
>>> d['primary'] = a            # 没有创建一个引用
>>> d['primary']                # 如果存在的话获取这个对旬
10
>>> del a                       # 移除这个引和
>>> gc.collect()                # 直接调用回收机制
0
>>> d['primary']                # 调用的入口已经自动被移除了
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
    d['primary']                # 调用的入口已经自动被移除了
  File "C:/python31/lib/weakref.py", line 46, in __getitem__
    o = self.data[key]()
KeyError: 'primary'

处理列表的工具¶

许多数据的结构都需要用到内置的列表类型. 但有时候需要在可选择地不同呈现方式中进行权衡.

array 模块提供了一个 array() 对象, 这个对象像一个列表一样存储同一类型的数据, 而且更简洁. 下面这个例子演示了将一组数字以两个字节的无符号整数（类型码 H) 形式存储为一个数组而不是通常的Python的list对象的16字节的形式:

>>> from array import array
>>> a = array('H', [4000, 10, 700, 22222])
>>> sum(a)
26932
>>> a[1:3]
array('H', [10, 700])

cllections 模块提供了一个 deque() 对象, 它可以像一个列表一样在左边进行快速的apend和pop操作, 但在内部查寻时相对较慢. 这些对象可以方便地成为一个队列和地行广度优先树搜索:

>>> from collections import deque
>>> d = deque(["task1", "task2", "task3"])
>>> d.append("task4")
>>> print("Handling", d.popleft())
Handling task1

unsearched = deque([starting_node])
def breadth_first_search(unsearched):
    node = unsearched.popleft()
    for m in gen_moves(node):
        if is_goal(m):
            return m
        unsearched.append(m)

此外, 标准库里也提供了一些其它的工具, 如 bisect 模块, 它有一些对列表进行排序的函数:

>>> import bisect
>>> scores = [(100, 'perl'), (200, 'tcl'), (400, 'lua'), (500, 'python')]
>>> bisect.insort(scores, (300, 'ruby'))
>>> scores
[(100, 'perl'), (200, 'tcl'), (300, 'ruby'), (400, 'lua'), (500, 'python')]

heapq 模块提供了一些函数通过常规列表去实现堆. 最低层的入口通常都在零处. 这对于重复访问一些很小的元素但又不想对整个列表进行排序的应用程序来说十分有效:

>>> from heapq import heapify, heappop, heappush
>>> data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]
>>> heapify(data)                      # 将列表重新整理成堆
>>> heappush(data, -5)                 # 增加一个新入口
>>> [heappop(data) for i in range(3)]  # 取出这三个最小的入口
[-5, 0, 1]

十进制浮点数的运算¶

decimal 模块提供了一个 Decimal 针对十进制浮点小数运算的数据类型. 与内置的数据类型 float （针对二进制浮点小数）相比而言, 它对以下几种情况更为有效

金融方面的应用程序和其它需要准确显示小数的地方
需要精确控制,
需要四舍五入以满足法制或者监管要求,
需要跟踪有意义的小数部分, 即精度, 或者,
一些用户希望结果符合自己的计算要求的应用程序.

例如, 计算七毛钱话费的5％的税收, 用十进制浮点小数和二进制浮点小数, 得到的结果会不同. 如果结果以分的精确度来舍入的话, 这种差异就会变得很重要:

>>> from decimal import *
>>> round(Decimal('0.70') * Decimal('1.05'), 2)
Decimal('0.74')
>>> round(.70 * 1.05, 2)
0.73

Decimal 的结果会在末尾追加0, 自动从有两位有效数字的乘数相乘中判断应有四位有效数字. Decimal复制了手工运算的精度, 避免了二进制浮点小数不能准确表示十进数精度而产生的问题.

>>> Decimal('1.00') % Decimal('.10')
Decimal('0.00')
>>> 1.00 % 0.10
0.09999999999999995

>>> sum([Decimal('0.1')]*10) == Decimal('1.0')
True
>>> sum([0.1]*10) == 1.0
False

精确的显示使得 Decimal 可以进行模运算和判断值的等同性, 而这些是二进制浮点数不适合的:

>>> Decimal('1.00') % Decimal('.10')
Decimal('0.00')
>>> 1.00 % 0.10
0.09999999999999995

>>> sum([Decimal('0.1')]*10) == Decimal('1.0')
True
>>> sum([0.1]*10) == 1.0
False

decimal 模块可以实现各种需求的精度运算:

>>> getcontext().prec = 36
>>> Decimal(1) / Decimal(7)
Decimal('0.142857142857142857142857142857142857')

See also

(^.^)

原文: http://docs.python.org/py3k/tutorial/stdlib2.html
初译: 刘鑫
精译: DocsPy3zh
校对: Zoom.Quiet
复审:

现在干什么?¶

读完这篇教程, 可能会提高你对使用 Python 的兴趣 — 你应该能够使用 Python 来解决你现实中的问题. 那么, 你应该去哪里学习更多的东西呢?

This tutorial is part of Python’s documentation set. Some other documents in the set are:

这篇文档是 Python 文档的一部分. 其他的部分还有:

Python 标准库:

你应该浏览这个手册, 这里给出了完整 (虽然简洁) 的参考资料, 包括类型, 函数, 标准库模块. 在标准 Python 的发布版本中, 包含了大量的额外代码. 比如读取 Unix 的邮箱, 通过 HTTP 获取文档, 产生随机数字, 解析命令行, 编写 CGI 程序, 压缩数据, 以及等等. 浏览标准库参考手册会让你知道什么是可以用的.
Installing Python Modules 解释了如何安装其他 Python 用户编写的额外的模块.
Python 语言参考: 更详细的介绍了 Python 的语法和语义. 这里可能有大量的需要阅读, 但是作为语言本身来说是非常详细的指导.

更多的 Python 资源:

http://www.python.org: Python 主要的网站. 这里包含了代码, 文档, 和各种指向 Python 相关的网页. 这个网站在世界各地都有镜像, 比如欧洲, 日本, 和澳大利亚; 依赖于你的地理位置, 镜像会比主网站更快.
http://docs.python.org: 快速访问的 Python 参考手册.
http://pypi.python.org: Python 包索引, 先前也被叫做 Cheese Shop, 是用户创建的 Python 模块的索引, 此处就可以提供下载. 如果你开始发布代码, 你可以在这里注册一个帐号, 以便其他用户可以找到它.
http://aspn.activestate.com/ASPN/Python/Cookbook/: Python Cookbook 是一个收集了大量代码的地方, 此处有很多模块, 有用的脚本. 很多收集于此的优秀的代码被收录到一本书中, 名为 Python Cookbook (O’Reilly & Associates, ISBN 0-596-00797-3.)
http://scipy.org: 科学计算的 Python 项目. 包括更快的数组运算和操作, 还有像线性代数, 傅立叶变换, 非线性解, 随机数, 统计及等等.

关于更多 Python 相关的问题, 你可以在 comp.lang.python 新闻组发贴, 或者发到邮件列表 python-list@python.org 中. 新闻组和邮件列表会自动将这些问题转发, 也就是所有人都可以看到. 一般一天有 120 条信息 (更多时会有几百), 提问 (和解答) 问题, 给出建议, 发布新的模块. 在发布前, 请先确保在 Frequently Asked Questions (也称为 FAQ), 或在 Python 源代码的 Misc/ 目录下查看. 邮件列表归档可以在 http://mail.python.org/pipermail/ 中找到. FAQ 中回答了很多重复会问的问题, 所以一般也包含了你所需的问题.

See also

(^.^)

原文: http://docs.python.org/py3k/tutorial/whatnow.html
初译: 刘鑫
精译: DocsPy3zh
校对: Zoom.Quiet
复审:

交互式输入编辑及历史替代¶

某些版本的 Python 解释器支持编辑当前输入行和历史替代, 就像 Korn shell 和 GNU Bash shell 一样. 这是通过 GNU Readline 库实现的, 它支持 Emacs 模式和 vi 模式. 这个库有它自己的文档, 所以在这里就不再说了; 但是基础的会简单的解释. 交互式的编辑及历史记录可能仅适用于 Unix, 及 Cygwin 版本上的解释器.

本章并*不*会讲 Mark Hammond 的 PythonWin 包或者是伴随 Python 一起发布, 基于 Tk 的IDLE. 而在 NT 平台下的命令行窗口或者是 DOS 这些 Windows 流派的东西, 也不会描述.

行编辑¶

如果可以, 行编辑是一直激活的, 无论是处于第一或第二提示符. 当前行可以使用通用的 Emacs 控制符进行编辑. 最重要的几个: C-A (Control-A) 移动光标至行首, C-E 移动至行尾, C-B 向左移动光标, C-F 向右. 退格键删除光标左边的一个字符, C-D 则删除右边的字符. C-K 删除光标右边剩余的所有字符, C-Y 则召回最后一次被删的字符串. C-underscore 取消最后一次的改变; 这可以重复的执行.

历史替代¶

历史替换运行如下. 所有的非空输入行都会被保存于一个历史记录缓存, 当新的提示符给出时, 你处于这个缓存的最底部. C-P 可以向前翻一条记录, C-N 则向后翻一条. 任何的历史记录都是可以被编辑的; 如果被修改了, 那么会在提示符前增加一个星号. 按下 Return (也就是回车) 将当前行传递给解释器. C-R 进行反向搜索; C-S 开始前向搜索.

按键绑定¶

按键的绑定和 Readline 库其他的一些参数可以在 ~/.inputrc 中指明. 按键绑定的形式:

key-name: function-name

或:

"string": function-name

选项可以这样设置:

set option-name value

举个例子:

# I prefer vi-style editing:
set editing-mode vi

# Edit using a single line:
set horizontal-scroll-mode On

# Rebind some keys:
Meta-h: backward-kill-word
"\C-u": universal-argument
"\C-x\C-r": re-read-init-file

注意, 默认情况下 Tab 在 Python 中是用于插入一个 Tab 字符, 而非 Readline 默认的文件名补全函数. 如果你要用, 可以写入下面这行用于改写这个行为:

Tab: complete

在你的 ~/.inputrc 中. (当然, 这样在缩进时就会有些麻烦.)

自动补全变量与模块名也是可选的. 为了开启这个模式, 将下面这段增加到你的启动文件中: [1]

import rlcompleter, readline
readline.parse_and_bind('tab: complete')

这就将 Tab 键绑定了补全的功能, 所以按两次 Tab 键就会给你提示; 它会搜寻 Python 中语句的名字, 当前的局部变量, 和可用的模块名. 对于点号的表达式如 string.a, 则会执行到最后的点号位置的语句, 然后给出这个对象的属性. 注意, 这会执行程序中定义的代码, 如果一个有 __getattr__() 方法的对象是表达式的一部分.

一个更聪明的启动文件可以是这样子的. 注意这个会删除它创建的但是不需要再用的名字; 从启动文件在同一命名空间执行后, 它就开始运行, 然后移除这些名字以避免对后面的交互环境产生副作用. 你会发现它对于用于保留某些导入的模块非常有用, 比如 os, 这在多数会话中都会被大量使用.

# Add auto-completion and a stored history file of commands to your Python
# interactive interpreter. Requires Python 2.0+, readline. Autocomplete is
# bound to the Esc key by default (you can change it - see readline docs).
#
# Store the file in ~/.pystartup, and set an environment variable to point
# to it:  "export PYTHONSTARTUP=/home/user/.pystartup" in bash.
#
# Note that PYTHONSTARTUP does *not* expand "~", so you have to put in the
# full path to your home directory.

import atexit
import os
import readline
import rlcompleter

historyPath = os.path.expanduser("~/.pyhistory")

def save_history(historyPath=historyPath):
    import readline
    readline.write_history_file(historyPath)

if os.path.exists(historyPath):
    readline.read_history_file(historyPath)

atexit.register(save_history)
del os, atexit, readline, rlcompleter, save_history, historyPath

交互式解释器的替代品¶

相对于早期的解析器来说, 这是一个很大的进步; 但是, 有些愿望仍未实现: 如果在续行时有合适的缩进那么会很棒 (解析器会知道下面的缩进是否需要). 自动补全的机制可能使用解释器的符号表. 用于检查或建议匹配括号, 引号等的工具也会非常有用.

一个可选的增强型解释器应该算 IPython 了, 它有 tab 补全, 对象探索, 和更高级的历史管理功能. 它也可以被定制, 嵌入其他的应用. 另外一个则是 bpython.

Footnotes

[1]	在你打开一个交互式解释器的时候, Python 会执行一个文件的内容, 这个文件由环境变量 `PYTHONSTARTUP` 指定.

See also

(^.^)

原文: http://docs.python.org/py3k/tutorial/interactive.html
初译: 刘鑫
精译: DocsPy3zh
校对: Zoom.Quiet
复审:

浮点算术: 问题和限制¶

浮点数在计算机中以二进制的除法表示. 比如, 十进制的:

0.125

其值为 1/10 + 2/100 + 5/1000, 同样, 以二进制表示则为:

0.001

其值为 0/2 + 0/4 + 1/8. 这两种表示法的值是一样的, 唯一的区别是, 前者以十进制表示, 而后者则以二进制表示.

不幸的是, 大多数的十进制小数都无法严格的以二进制来表示. 一个结果就是, 普遍来说, 你输入的十进制的小数, 通常只是以接近的二进制数表示.

在十进制中这个问题很容易理解. 考虑分数 1/3 . 你可以用一个接近的十进制数表示:

0.3

要更一些,

0.33

更好一些,

0.333

等等. 但是不管你怎么写, 都不是严格的等于 1/3, 但是可以使结果更接近于 1/3.

同样, 无论你用了多少位, 二进制的数也无法精确表示十进制的 0.1 . 在以二为底的情况下, 将是个无限循环

0.0001100110011001100110011001100110011001100110011...

在任意有限的位上停止, 可以得到一个近似值. 在今天的大多数机子上, 浮点数近似地使用二元的分数来表示, 其分子是一个 53 位的数字, 分母则是 2 的幂. 像 1/10, 其二进制表示为 3602879701896397 / 2 ** 55. 这个很接近, 但不是准确的等于.

很多用户因为这些值显示的方法而并不知道近似. Python 仅仅打印一个合适的十进制分数, 而真正的二进制近似还是存储于机器中. 在大多数情况下, 如果让 Python 打印一个十进制小数, 那么其会以真实存储的数字显示, 例如 0.1

>>> 0.1
0.1000000000000000055511151231257827021181583404541015625

此处的位数比一般用到的要多, 所以 Python 使用一个近似的值代替之:

>>> 1 / 10
0.1

只要记住, 尽管打印的值看起来是准确的 1/10 , 但是真正存储的只是最接近的二元分数罢了.

有趣的是, 有很多不同的值共享同样的分数. 举个例子, 数字 0.1 和 0.10000000000000001 及 0.1000000000000000055511151231257827021181583404541015625 都是 3602879701896397 / 2 ** 55 的近似. 因为同样的分数值表示不同值的近似, 这样任何的值都可以保证不变式 eval(repr(x)) == x.

历史原因, Python 的提示符和内置的 repr() 函数会选择 17 位, 0.10000000000000001 . 在 Python 3.1 开始, Python (在绝大数的系统上), 会选择最简的表示 0.1.

注意, 二进制表示的浮点数在此处是非常自然的: 这个不是 Python 中的 bug, 也不是你代码中的 bug . 在很多支持硬件浮点型的语言中也可以看到这样的事情 (尽管有些语言默认下不会显示不同, 或在所有的输出模式下).

对于更多友好的输出, 你可能需要使用字符串格式化来产生一个有限制的数字:

>>> format(math.pi, '.12g')  # give 12 significant digits
'3.14159265359'

>>> format(math.pi, '.2f')   # give 2 digits after the point
'3.14'

>>> repr(math.pi)
'3.141592653589793'

有一点很重要, 你需要意识到, 在真实情况下, 这是个幻觉: 你仅仅是四舍五入了显示的真实值.

其中一个幻觉会产生另一个. 举个例子, 因为 0.1 并不是严格的 1/10, 三个 0.1 相加并不会生成准确的 0.3:

>>> .1 + .1 + .1 == .3
False

同样, 因为 0.1 不能够得到更接近 1/10 的值, 而 0.3 不能得到更接近 3/10 的值, 因此使用 round() 函数来进行四舍五入也是不起作用的:

>>> round(.1, 1) + round(.1, 1) + round(.1, 1) == round(.3, 1)
False

尽管数字不能够更接近它们理想上的准确值, round() 函数在计算后使用, 确实可以实现两个数字之间的比较:

>>> round(.1 + .1 + .1, 10) == round(.3, 10)
True

像这样的浮点数算法, 会导致很多令人奇怪的事情. “0.1” 的问题将在后面更详细的解释, 具体看 “Representation Error” 那节. 更多常见的令人吃惊的事情, 可以参考 The Perils of Floating Point .

就像后面说的, “没有简单的答案.” 但是, 对于浮点数请不要过度谨慎. 在 Python 中这些浮点数的问题来源于其硬件, 在多数的机器中, 浮点的精度没有必要达到 1/2**53 . 对于普通的任务已经足够了, 你需要记住的就是, 这不是算数的问题, 每个浮点数操作都会遇到这样的问题.

当不合理的情况真的存在时, 对于多数情况, 你最终还是能得到希望的结果, 如果你将显示的数值进行四舍五入, 并确保你所需要的位数. str() 函数经常就能满足需要了, 使用 str.format() 方法, 来指明其 Format String Syntax.

在需要严格的数值表示时, 试试使用 decimal 模块, 这个模块实现了用于账目运算或更高精度时用到的数值算法.

另一种就是 fractions 模块, 它实现了基于有理数的算法 ( 所以 1/3 就可以准确的表述 ).

如果你有大量浮点数的运算, 那么你可以看看 Python 的数值库或其他的计算和统计的包, SciPy 这个项目对此有很好的支持. 参考 <http://scipy.org>.

Python 提供了工具来帮助你获得浮点数的准确值. 你可以使用 float.as_integer_ratio() 方法来表示一个分数:

>>> x = 3.14159
>>> x.as_integer_ratio()
(3537115888337719, 1125899906842624)

因为这个比率是准确的, 它就可以用来比较原始的数字:

>>> x == 3537115888337719 / 1125899906842624
True

float.hex() 方法以十六进制表述, 这也同样给出了一个被你计算机准确存储的值:

>>> x.hex()
'0x1.921f9f01b866ep+1'

前面的十六进制表示, 可以用来重新建立一个浮点值:

>>> x == float.fromhex('0x1.921f9f01b866ep+1')
True

因为这个表示是严格的, 所以对于不同版本的 Python (跨平台) 都是兼容的, 而且也可以和其他的语言进行交换 (比如 java 和 C99).

另一个有用的工具就是 math.fsum() 函数. 它可以在计算总和时减少精度的丢失. 它会记录在求和时丢失的精度. 这样误差就不会积累而最终影响结果了.

>>> sum([0.1] * 10) == 1.0
False
>>> math.fsum([0.1] * 10) == 1.0
True

表示错误¶

本节会更详细的解释 “0.1” 的例子, 并且教你如何进行准确的分析. 此处假设你已有了基本的二元浮点数表示的基础.

Representation error 涉及到这样的事实, 有些 (更准确来书是大多数) 小数的分数表示不能够以二进制为底的分数表述. 这就是主要的原因, 为何 Python (或者 Perl, C, C++, Java, Fortran, 还有更多的) 常常不能够如你所愿的表示.

为什么会那样? 1/10 不能够被二进制的分数准确表示. 基本上全部的机器在今 (2000年11月) 来说都是使用了 IEEE-754 浮点数算法, 并且几乎全部的平台将 Python 的浮点映射为 IEEE-754 “double精度”. 754 doubles 包含了 53 位的精度, 所以在计算机中, 0.1 被转成一个很接近的分数, 而它又可以这种 J/2**N 的形式表示, 此处的 J 是一个包含 53 位的整数. 重写:

1 / 10 ~= J / (2**N)

为:

J ~= 2**N / 10

并且记着 J 有严格的 53 位 (也就是 >= 2**52 但 < 2**53), 对于 N 最好的值就是 56:

>>> 2**52 <=  2**56 // 10  < 2**53
True

也就是说, 56 是唯一能让 J 为 53 位的 N 值. 而 J 最有可能的值就是那个商:

>>> q, r = divmod(2**56, 10)
>>> r
6

因为剩余的如果大于10的一半, 那么最好的近似就是进一位:

>>> q+1
7205759403792794

所以以 754 double 精度表示的 1/10 最合适的值就是:

7205759403792794 / 2 ** 56

将分子分母约化:

3602879701896397 / 2 ** 55

注意, 因为我们进了一位, 所以值会比 1/10 稍微大一点; 如果我们没有进位, 那么商又会比 1/10 稍微小点. 但无论如何, 都不是准确的 1/10!

所以计算机从没有看过 1/10: 它看到的是前面给出的分数, 以 754 双进度近似的结果:

>>> 0.1 * 2 ** 55
3602879701896397.0

如果我们将这个分数乘以 10**55, 我们可以看到55位的数字:

>>> 3602879701896397 * 10 ** 55 // 2 ** 55
1000000000000000055511151231257827021181583404541015625

这意味存于计算机中的准确值等于 0.1000000000000000055511151231257827021181583404541015625. 很多语言 (包括 Python 的旧版本) 不是直接显示所有的位数, 而是将其保留为17位有效数字:

>>> format(0.1, '.17f')
'0.10000000000000001'

fractions 和 decimal 模块使这些计算变得简单:

>>> from decimal import Decimal
>>> from fractions import Fraction

>>> Fraction.from_float(0.1)
Fraction(3602879701896397, 36028797018963968)

>>> (0.1).as_integer_ratio()
(3602879701896397, 36028797018963968)

>>> Decimal.from_float(0.1)
Decimal('0.1000000000000000055511151231257827021181583404541015625')

>>> format(Decimal.from_float(0.1), '.17')
'0.10000000000000001'

See also

(^.^)

原文: http://docs.python.org/py3k/tutorial/floatingpoint.html
初译: 刘鑫
精译: DocsPy3zh
校对: Zoom.Quiet
复审:

See also

(^.^)

原文: http://docs.python.org/py3k/tutorial/index.html
初译: 刘鑫
精译: DocsPy3zh
校对: Zoom.Quiet
复审:

进一步的

本章的pdf 版本下载: Python_Tutorial_zh-cn_liam0205.pdf
由 ChinaTeX.org 社区版主 Liam Huang 支持,手工校订,精排编译而得!
比Sphinx 默认输出要漂亮的多! 大家感谢他吧!

Python 的安装和使用¶

文档的这一部分致力于在不同平台上安装 Python 环境的常规信息, 解释器的调用方法以及一些让 Python 工作得更容易的事情.

Command line and environment¶

The CPython interpreter scans the command line and the environment for various settings.

CPython implementation detail: Other implementations’ command line schemes may differ. See 其它实现 for further resources.

Command line¶

When invoking Python, you may specify any of these options:

python [-bBdEhiOsSuvVWx?] [-c command | -m module-name | script | - ] [args]

The most common use case is, of course, a simple invocation of a script:

python myscript.py

Interface options¶

The interpreter interface resembles that of the UNIX shell, but provides some additional methods of invocation:

When called with standard input connected to a tty device, it prompts for commands and executes them until an EOF (an end-of-file character, you can produce that with Ctrl-D on UNIX or Ctrl-Z, Enter on Windows) is read.
When called with a file name argument or with a file as standard input, it reads and executes a script from that file.
When called with a directory name argument, it reads and executes an appropriately named script from that directory.
When called with -c command, it executes the Python statement(s) given as command. Here command may contain multiple statements separated by newlines. Leading whitespace is significant in Python statements!
When called with -m module-name, the given module is located on the Python module path and executed as a script.

In non-interactive mode, the entire input is parsed before it is executed.

An interface option terminates the list of options consumed by the interpreter, all consecutive arguments will end up in sys.argv – note that the first element, subscript zero (sys.argv[0]), is a string reflecting the program’s source.

-c <command>¶

Execute the Python code in command. command can be one or more statements separated by newlines, with significant leading whitespace as in normal module code.

If this option is given, the first element of sys.argv will be "-c" and the current directory will be added to the start of sys.path (allowing modules in that directory to be imported as top level modules).

-m <module-name>¶

Search sys.path for the named module and execute its contents as the __main__ module.

Since the argument is a module name, you must not give a file extension (.py). The module-name should be a valid Python module name, but the implementation may not always enforce this (e.g. it may allow you to use a name that includes a hyphen).

Package names are also permitted. When a package name is supplied instead of a normal module, the interpreter will execute <pkg>.__main__ as the main module. This behaviour is deliberately similar to the handling of directories and zipfiles that are passed to the interpreter as the script argument.

Note

This option cannot be used with built-in modules and extension modules written in C, since they do not have Python module files. However, it can still be used for precompiled modules, even if the original source file is not available.

If this option is given, the first element of sys.argv will be the full path to the module file (while the module file is being located, the first element will be set to "-m"). As with the -c option, the current directory will be added to the start of sys.path.

Many standard library modules contain code that is invoked on their execution as a script. An example is the timeit module:

python -mtimeit -s 'setup here' 'benchmarked code here'
python -mtimeit -h # for details

See also

runpy.run_module(): Equivalent functionality directly available to Python code

PEP 338 – Executing modules as scripts

Changed in version 3.1:

Changed in version 3.1: Supply the package name to run a __main__ submodule.

-

Read commands from standard input (sys.stdin). If standard input is a terminal, -i is implied.

If this option is given, the first element of sys.argv will be "-" and the current directory will be added to the start of sys.path.

<script>

Execute the Python code contained in script, which must be a filesystem path (absolute or relative) referring to either a Python file, a directory containing a __main__.py file, or a zipfile containing a __main__.py file.

If this option is given, the first element of sys.argv will be the script name as given on the command line.

If the script name refers directly to a Python file, the directory containing that file is added to the start of sys.path, and the file is executed as the __main__ module.

If the script name refers to a directory or zipfile, the script name is added to the start of sys.path and the __main__.py file in that location is executed as the __main__ module.

If no interface option is given, -i is implied, sys.argv[0] is an empty string ("") and the current directory will be added to the start of sys.path.

See also

调用 Python 解释器

Generic options¶

-?¶
-h¶
--help¶: Print a short description of all command line options.

-V¶

--version¶

Print the Python version number and exit. Example output could be:

Python 3.0

Miscellaneous options¶

-b¶: Issue a warning when comparing str and bytes. Issue an error when the option is given twice (-bb).

-B¶: If given, Python won’t try to write .pyc or .pyo files on the import of source modules. See also PYTHONDONTWRITEBYTECODE.

-d¶: Turn on parser debugging output (for wizards only, depending on compilation options). See also PYTHONDEBUG.

-E¶: Ignore all PYTHON* environment variables, e.g. PYTHONPATH and PYTHONHOME, that might be set.

-i¶

When a script is passed as first argument or the -c option is used, enter interactive mode after executing the script or the command, even when sys.stdin does not appear to be a terminal. The PYTHONSTARTUP file is not read.

This can be useful to inspect global variables or a stack trace when a script raises an exception. See also PYTHONINSPECT.

-O¶: Turn on basic optimizations. This changes the filename extension for compiled (bytecode) files from .pyc to .pyo. See also PYTHONOPTIMIZE.

-OO¶: Discard docstrings in addition to the -O optimizations.

-q¶

Don’t display the copyright and version messages even in interactive mode.

New in version 3.2:

New in version 3.2.

-s¶: Don’t add the user site-packages directory to sys.path.

See also

PEP 370 – Per user site-packages directory

-S¶: Disable the import of the module site and the site-dependent manipulations of sys.path that it entails.

-u¶

Force the binary layer of the stdin, stdout and stderr streams (which is available as their buffer attribute) to be unbuffered. The text I/O layer will still be line-buffered.

See also

warnings – the warnings module

PEP 230 – Warning framework

PYTHONWARNINGS

-x¶: Skip the first line of the source, allowing use of non-Unix forms of #!cmd. This is intended for a DOS specific hack only.

Note

The line numbers in error messages will be off by one.

-X¶

Reserved for various implementation-specific options. CPython currently defines none of them, but allows to pass arbitrary values and retrieve them through the sys._xoptions dictionary.

Changed in version 3.2:

Changed in version 3.2: It is now allowed to pass -X with CPython.

Options you shouldn’t use¶

-J¶: Reserved for use by Jython.

Environment variables¶

These environment variables influence Python’s behavior.

PYTHONHOME¶

Change the location of the standard Python libraries. By default, the libraries are searched in prefix/lib/pythonversion and exec_prefix/lib/pythonversion, where prefix and exec_prefix are installation-dependent directories, both defaulting to /usr/local.

When PYTHONHOME is set to a single directory, its value replaces both prefix and exec_prefix. To specify different values for these, set PYTHONHOME to prefix:exec_prefix.

PYTHONPATH¶

Augment the default search path for module files. The format is the same as the shell’s PATH: one or more directory pathnames separated by os.pathsep (e.g. colons on Unix or semicolons on Windows). Non-existent directories are silently ignored.

In addition to normal directories, individual PYTHONPATH entries may refer to zipfiles containing pure Python modules (in either source or compiled form). Extension modules cannot be imported from zipfiles.

The default search path is installation dependent, but generally begins with prefix/lib/pythonversion (see PYTHONHOME above). It is always appended to PYTHONPATH.

An additional directory will be inserted in the search path in front of PYTHONPATH as described above under Interface options. The search path can be manipulated from within a Python program as the variable sys.path.

PYTHONSTARTUP¶: If this is the name of a readable file, the Python commands in that file are executed before the first prompt is displayed in interactive mode. The file is executed in the same namespace where interactive commands are executed so that objects defined or imported in it can be used without qualification in the interactive session. You can also change the prompts sys.ps1 and sys.ps2 in this file.

PYTHONY2K¶: Set this to a non-empty string to cause the time module to require dates specified as strings to include 4-digit years, otherwise 2-digit years are converted based on rules described in the time module documentation.

PYTHONOPTIMIZE¶: If this is set to a non-empty string it is equivalent to specifying the -O option. If set to an integer, it is equivalent to specifying -O multiple times.

PYTHONDEBUG¶: If this is set to a non-empty string it is equivalent to specifying the -d option. If set to an integer, it is equivalent to specifying -d multiple times.

PYTHONINSPECT¶

If this is set to a non-empty string it is equivalent to specifying the -i option.

This variable can also be modified by Python code using os.environ to force inspect mode on program termination.

PYTHONUNBUFFERED¶: If this is set to a non-empty string it is equivalent to specifying the -u option.

PYTHONVERBOSE¶: If this is set to a non-empty string it is equivalent to specifying the -v option. If set to an integer, it is equivalent to specifying -v multiple times.

PYTHONCASEOK¶: If this is set, Python ignores case in import statements. This only works on Windows.

PYTHONDONTWRITEBYTECODE¶: If this is set, Python won’t try to write .pyc or .pyo files on the import of source modules.

PYTHONIOENCODING¶

If this is set before running the interpreter, it overrides the encoding used for stdin/stdout/stderr, in the syntax encodingname:errorhandler. The :errorhandler part is optional and has the same meaning as in str.encode().

For stderr, the :errorhandler part is ignored; the handler will always be 'backslashreplace'.

PYTHONNOUSERSITE¶: If this is set, Python won’t add the user site-packages directory to sys.path.

See also

PEP 370 – Per user site-packages directory

PYTHONUSERBASE¶: Defines the user base directory, which is used to compute the path of the user site-packages directory and Distutils installation paths for python setup.py install --user.

See also

PEP 370 – Per user site-packages directory

PYTHONEXECUTABLE¶: If this environment variable is set, sys.argv[0] will be set to its value instead of the value got through the C runtime. Only works on Mac OS X.

PYTHONWARNINGS¶: This is equivalent to the -W option. If set to a comma separated string, it is equivalent to specifying -W multiple times.

Debug-mode variables¶

Setting these variables only has an effect in a debug build of Python, that is, if Python was configured with the --with-pydebug build option.

PYTHONTHREADDEBUG¶: If set, Python will print threading debug info.

PYTHONDUMPREFS¶: If set, Python will dump objects and reference counts still alive after shutting down the interpreter.

PYTHONMALLOCSTATS¶: If set, Python will print memory allocation statistics every time a new object arena is created, and on shutdown.

在 Unix 平台上使用 Python¶

获取和安装 Python 的最新版本¶

在 Linux 上¶

在大多数 Linux 发行版上预装了 Python, 其它没有预装的发行版可以以包的形式获取 Python. 然而, 在你使用的发行版中的 Python 包不包含某些你想使用的特性. 你可以轻易地从源代码里编译最新版本的 Python.

在没有预装 Python, 软件仓库里也没有 Python 的情况下, 你可以轻易地为你所用的发行版建造一个包.

See also

http://www.linux.com/articles/60383: for Debian users
http://linuxmafia.com/pub/linux/suse-linux-internals/chapter35.html: for OpenSuse users
http://docs.fedoraproject.org/drafts/rpm-guide-en/ch-creating-rpms.html: for Fedora users
http://www.slackbook.org/html/package-management-making-packages.html: for Slackware users

在 FreeBSD 和 OpenBSD 上¶

FreeBSD 用户, 使用:
```
pkg_add -r python
```
添加 Python 包.

OpenBSD 用户使用:

pkg_add ftp://ftp.openbsd.org/pub/OpenBSD/4.2/packages/<insert your architecture here>/python-<version>.tgz

例如 i386 用户要获得 Python 的 2.5.1 版本, 可以使用:

pkg_add ftp://ftp.openbsd.org/pub/OpenBSD/4.2/packages/i386/python-2.5.1p2.tgz

在 OpenSolaris 上¶

要在 OpenSolaris 上安装最新版本的 Python, 先安装 blastwave (http://www.blastwave.org/howto.html), 再在命令提示里键入 “pkg_get -i python”.

建造 Python¶

如果你想自己编译 CPython, 首先你要获取`源代码 <http://python.org/download/source/>`_. 你可以下载最新版本的代码, 也可以抓取一个新鲜的 checkout.

建造流程通常是调用以下的命令

./configure
make
make install

特殊 Unix 平台的配置选项和注意事项记录在 README 文件里, 该文件位于 Python 源代码树的根目录下.

Warning

make install 可能会对 python 二进制文件重写或化妆. 因此推荐使用 make altinstall 替代 make install, 因为前者只会安装 exec_prefix/bin/pythonersion.

与 Python 相关的路径和文件¶

这些是关于基于本地安装约定的不同的主题; :envar:`prefix` (${prefix}) 和 exec_prefix (${exec_prefix}) 都是安装依赖, 并将应当被解释为 GNU 软件; 它们可能相同.

杂项¶

要更容易的在 Unix 上使用 Python 脚本, 你需要使它们可执行, 例如通过

$ chmod +x script

并且在脚本的开头添加一个适当的 Shebang 行. 一个好的选择是

#!/usr/bin/env python

这会在整个 :envar:`PATH` 里搜索 Python 解释器. 然而, 有些 Unix 系统没有 env 命令, 因此你可能需要硬编码 /usr/bin/python 作为解释器路径.

要在 Python 脚本里使用 shell 命令, 参见 subprocess` 模块.

编辑器¶

Vim 和 Emacs 都是优秀的编辑器, 它们都能很好的支持 Python. 想要更多如何在这两款编辑器里编写 Python 代码的信息, 参见:

Geany 是个优秀的 IDE, 它支持很多语言. 想要更多信息, 阅读: http://geany.uvena.de/

Komodo edit 是另一个极好的 IDE. 它同样支持很多原因. 想要更多信息, 阅读: http://www.activestate.com/store/productdetail.aspx?prdGuid=20f4ed15-6684-4118-a78b-d37ff4058c5f

在 Windows 上使用 Python¶

这份文档旨在给出一份 Python 在 Windows 特定的行为的概要, 这些是你在 Microsoft Windows 上使用 Python 应该知道的.

安装 Python¶

不像大多 Unix 系统和服务, Windows 不原生的需求 Python, 因此不会预先安装 Python 的一个版本. 然而, 多年以来, CPython 团队为每个`发行版 <http://www.python.org/download/releases/>`_ 编译了 Windows 安装程序 (MSI 包).

Check PEP 11 for details on all unsupported platforms. 随着 Python 持续地开发, 有些平台曾被支持但现在不在被支持 (因为缺少用户或开发者). 检查 PEP 11 获得所有不被支持平台的细节.

直到 2.5, Python 还可以兼容 Windows 95, 98 和 ME (但在安装时会抛出一个警告). 在 Python 2.6 (以及所有随后版本)里, 放弃了对它们的支持, 新版本只被预期在 Windows NT 家族里工作.
Windows CE 依旧被支持.
Cygwin 安装程序同样提供 Python 解释器; 它位于 “Interpreters” 下. (cf. Cygwin 包源, Maintainer releases)

参阅 Python for Windows (和 DOS) 获取有关预编译安装包的细节信息.

See also

Python on XP: “7 Minutes to “Hello World!”” by Richard Dooling, 2006
Installing on Windows: in “Dive into Python: Python from novice to pro” by Mark Pilgrim, 2004, ISBN 1-59059-356-1
For Windows users: in “Installing Python” in “A Byte of Python” by Swaroop C H, 2003

替代软件集¶

除了标准 CPython 发行版, 还有包含附加功能的修改包. 以下是一些流行的版本以及它们的关键特性:

ActivePython: 兼容多平台的安装包, 文档, PyWin32
Enthought Python Distribution: 带有各自文档的流行模块 (如 PyWin32), 建造可扩展 Python 应用的工具包

注意, 这些包有可能安装 Python 的*老*版本.

配置 Python¶

为了完美的运行 Python, 你可能要修改 Windows 当前的环境设置.

附录: 设置环境变量¶

Windows 有一个内建的对话框来更改环境变量 (下面教程使用与 XP 经典视图): 右击你的机器图标 (在桌面电脑上通常叫 “我的电脑), 选择:menuselection:属性. 然后打开 :guilaber:`高级` 选项卡, 点击 :guilaber:`环境变量` 按钮.

简单地说, 过程为:

我的电脑 ‣ 属性 ‣ 高级 ‣ 环境变量

在这个对话框里, 你可以添加或更改用户和系统变量. 要改变系统变量, 你需要有机器的非限制性权限 (如 Administrator 权限).

另一个添加环境变量的方法是使用 set 命令:

set PYTHONPATH=%PYTHONPATH%;C:\My_python_lib

要使这个设定永久生效, 你可以把相应的命令行添加到你的 autoexec.bat 文件. msconfig 是这个文件的一个图形接口.

Viewing environment variables can also be done more straight-forward: The command prompt will expand strings wrapped into percent signs automatically 查看环境变量可以更为直接地做到: 命令提示会自动的扩展以百分号包围的字符串:

echo %PATH%

查询 set /? 获得它行为的更多细节.

See also

http://support.microsoft.com/kb/100843: Environment variables in Windows NT
http://support.microsoft.com/kb/310519: How To Manage Environment Variables in Windows XP
http://www.chem.gla.ac.uk/~louis/software/faq/q1.html: Setting Environment variables, Louis J. Farrugia

找到可执行的 Python¶

除了使用开始菜单里自动创建的图标进入 Python 解释器, 你可能还要从 DOS 命令下打开 Python. 要使得这个工作, 你需要设置你的 %PATH% 环境变量包含 Python 的目录, 通过分号分隔其它条目. 一个参考的变量如下所示 (假设最初两个记录是 Windows 默认的):

C:\WINDOWS\system32;C:\WINDOWS;C:\Python25

现在在命令行下键入 python 将会打开 Python 解释器. 因此, 你也可以使用命令行选项来执行你的脚本, 参阅 Command line.

找到模块¶

Python 通常把它的库放在安装目录下 (因此你的 site-packages 文件夹也在那里). 因此, 如果你把 Python 安装在 C:\Python\ 下, 默认库就放在 C:\Python\Lib\ 下, 而第三方模块应该放在 C:\Python\Lib\site-packages\.

这是 Windows 上 sys.path 的构成:

一个空的记录添加在开始的位置, 它对应于当前目录.
如果存在环境变量 PYTHONPATH, 如 Environment variables 所描述的, 它的条目会在接下来的地方添加. 注意在 Windows 上, 这个变量上的路径必须有分号来分隔, 以区别在盘符 (C:\ 等) 中使用的冒号.
附加的 “应用路径” 可以作为 HKEY_CURRENT_USER 或 HKEY_LOCAL_MACHINE 下的 \SOFTWARE\Python\PythonCore\version\PythonPath 的子键添加到注册表里. 用分开分隔的字符串作为默认值的子键会把每一个路径添加到 sys.path. (注意, 所有已知的安装包只使用 HKLM, 因此一般 HKCU 是空的.)
如果设置了环境变量 PYTHONHOME, 那么它就被认为是 “Python Home”. 否则, 主 Python 可执行文件的路径被用于定位一个 “地标文件” (Lib\os.py) 来推测 “Python Home”. 如果找到了一个 Python home, 被添加到 sys.path 的字目录 (Lib, plat-win 等) 就基于那个文件夹. 否则, 核心 Python 路径就通过注册表里的 PythonPath 来构建.
如果无法定位 Python Home, 环境里没有指定 PYTHONPATH, 没有注册表记录被找到, 那么就使用一个默认的相对路径 (例如, .\Lib;.\plat-win 等).

这一切的结果如下:

当在 Python 主目录 (即可以是安装的版本, 也可以直接是 PCBuild 目录) 下运行 python.exe, 或任意其它 .exe 文件, 核心路径就被推断出来了, 而注册表里的核心路径就被忽略. 注册表里的其它 “应用路径” 一直被读取.
当 Python 寄宿在另一个 .exe 里 (不同的目录, 通过 COM 嵌入等), “Python Home” 不会被推断出, 因此使用注册表里的核心路径. 注册表里的其它 “应用路径” 一直被读取.
如果 Python 无法找到它的 home, 也没有注册表 (如, 冻结的 .exe, 一些非常大的安装包) 你会得到一个默认但是相对的路径.

可执行脚本¶

Python 脚本 (后缀为 .py 的文件) 默认下被 python.exe 执行. 这个可执行文件会打开一个终端, 即使程序使用 GUI 时终端也会被打开. 如果你不想让它发生, 那么使用后缀 .pyw, 这样脚本就默认由 pythonw.exe 执行 (这两个可执行文件都位于安装 Python 的顶级目录下). 这样会在启动时阻止终端窗口的弹出.

你可能想让 .py 脚本也由 pythonw.exe 执行, 可以在 usual facilities 里设置它, 例如 (可能需要管理员权限):

打开一个命令行.
关联 .py 脚本到正确的文件组:
```
assoc .py=Python.File
```
重定向所有 Python 文件到新的可执行文件上:
```
ftype Python.File=C:\Path\to\pythonw.exe "%1" %*
```

附加模块¶

尽管 Python 旨在所有平台上的可移植性, 但在 Windows 上还有一些独有的特性. 存在几个模块, 有标准库里的也有外部的, 和代码片段使用这些特性.

Windows 独有的标准模块的文档在 MS Windows Specific Services.

PyWin32¶

Mark Hammond 的 PyWin32 模块是高级 Windows 独有支持的模块的一个集. 包含的工具有:

组件对象模型 (COM)
Win32 API 调用
注册表
事件日志
Microsoft Foundation Classes (MFC) 用户接口

PythonWin 是一个使用 PyWin32 装载的简单 MFC 应用. 它是一个可嵌入的 IDE, 内建有一个调试器.

See also

Win32 How Do I...?: by Tim Golden
Python and COM: by David and Paul Boddie

Py2exe¶

Py2exe 是一个 distutils 扩展 (参阅 Extending Distutils), 它包 Python 脚本包装成一个可执行的 Windows 程序 (*.exe 文件). 但你做这点的时候, 你可以发布你的应用而无需你的用户安装 Python.

WConio¶

因为 Python 高级终端处理层 (advanced terminal handling layer), curses, 仅限于 Unix-like 系统, 因此在 Windows 上同样有一个独有的库: Python 在 Windows 上的控制台 I/O (Windows Console I/O for Python).

WConio 是 Turbo-C 里 CONIO.H 文件的一个包装器, 用于创建文本用户接口.

在 Windows 上编译 Python¶

如果你想自己编译 CPython, 你要做的第一件事情是获得源代码. 你可以选择下载最后发布版本的源代码, 也可以抓取一个新鲜的 checkout.

对与 Microsoft Visual C++, 官方 Python 发行版使用的编译器, 源代码树包含 solutions/project 文件. 参看它们各自目录下的 readme.txt 文件:

目录	MSVC 版本	Visual Studio 版本
`PC/VC6/`	6.0	97
`PC/VS7.1/`	7.1	2003
`PC/VS8.0/`	8.0	2005
`PCbuild/`	9.0	2008

注意, 不是所有这些目的都被完整地支持. 阅读发行说明来, 对应你的版本, 看看相应官方发行版编译时使用的编译器版本.

检查 PC/readme.txt 获得建造流程的常规信息.

对于扩展模块, 参阅 Building C and C++ Extensions on Windows.

See also

Python + Windows + distutils + SWIG + gcc MinGW: or “Creating Python extensions in C/C++ with SWIG and compiling them with MinGW gcc under Windows” or “Installing Python extension with distutils and without Microsoft Visual C++” by Sébastien Sauvage, 2003
MingW – Python extensions: by Trent Apted et al, 2007

其它资源¶

See also

Python Programming On Win32: “Help for Windows Programmers” by Mark Hammond and Andy Robinson, O’Reilly Media, 2000, ISBN 1-56592-621-8
A Python for Windows Tutorial: by Amanda Birmingham, 2004

Using Python on a Macintosh¶

Author:	Bob Savage <bobsavage@mac.com>

Python on a Macintosh running Mac OS X is in principle very similar to Python on any other Unix platform, but there are a number of additional features such as the IDE and the Package Manager that are worth pointing out.

Getting and Installing MacPython¶

Mac OS X 10.5 comes with Python 2.5.1 pre-installed by Apple. If you wish, you are invited to install the most recent version of Python from the Python website (http://www.python.org). A current “universal binary” build of Python, which runs natively on the Mac’s new Intel and legacy PPC CPU’s, is available there.

What you get after installing is a number of things:

A MacPython 2.5 folder in your Applications folder. In here you find IDLE, the development environment that is a standard part of official Python distributions; PythonLauncher, which handles double-clicking Python scripts from the Finder; and the “Build Applet” tool, which allows you to package Python scripts as standalone applications on your system.
A framework /Library/Frameworks/Python.framework, which includes the Python executable and libraries. The installer adds this location to your shell path. To uninstall MacPython, you can simply remove these three things. A symlink to the Python executable is placed in /usr/local/bin/.

The Apple-provided build of Python is installed in /System/Library/Frameworks/Python.framework and /usr/bin/python, respectively. You should never modify or delete these, as they are Apple-controlled and are used by Apple- or third-party software. Remember that if you choose to install a newer Python version from python.org, you will have two different but functional Python installations on your computer, so it will be important that your paths and usages are consistent with what you want to do.

IDLE includes a help menu that allows you to access Python documentation. If you are completely new to Python you should start reading the tutorial introduction in that document.

If you are familiar with Python on other Unix platforms you should read the section on running Python scripts from the Unix shell.

How to run a Python script¶

Your best way to get started with Python on Mac OS X is through the IDLE integrated development environment, see section The IDE and use the Help menu when the IDE is running.

If you want to run Python scripts from the Terminal window command line or from the Finder you first need an editor to create your script. Mac OS X comes with a number of standard Unix command line editors, vim and emacs among them. If you want a more Mac-like editor, BBEdit or TextWrangler from Bare Bones Software (see http://www.barebones.com/products/bbedit/index.shtml) are good choices, as is TextMate (see http://macromates.com/). Other editors include Gvim (http://macvim.org) and Aquamacs (http://aquamacs.org/).

To run your script from the Terminal window you must make sure that /usr/local/bin is in your shell search path.

To run your script from the Finder you have two options:

Drag it to PythonLauncher
Select PythonLauncher as the default application to open your script (or any .py script) through the finder Info window and double-click it. PythonLauncher has various preferences to control how your script is launched. Option-dragging allows you to change these for one invocation, or use its Preferences menu to change things globally.

Running scripts with a GUI¶

With older versions of Python, there is one Mac OS X quirk that you need to be aware of: programs that talk to the Aqua window manager (in other words, anything that has a GUI) need to be run in a special way. Use pythonw instead of python to start such scripts.

With Python 2.5, you can use either python or pythonw.

Configuration¶

Python on OS X honors all standard Unix environment variables such as PYTHONPATH, but setting these variables for programs started from the Finder is non-standard as the Finder does not read your .profile or .cshrc at startup. You need to create a file ~ /.MacOSX/environment.plist. See Apple’s Technical Document QA1067 for details.

For more information on installation Python packages in MacPython, see section Installing Additional Python Packages.

The IDE¶

MacPython ships with the standard IDLE development environment. A good introduction to using IDLE can be found at http://hkn.eecs.berkeley.edu/~dyoo/python/idle_intro/index.html.

Installing Additional Python Packages¶

There are several methods to install additional Python packages:

http://pythonmac.org/packages/ contains selected compiled packages for Python 2.5, 2.4, and 2.3.
Packages can be installed via the standard Python distutils mode (python setup.py install).
Many packages can also be installed via the setuptools extension.

GUI Programming on the Mac¶

There are several options for building GUI applications on the Mac with Python.

PyObjC is a Python binding to Apple’s Objective-C/Cocoa framework, which is the foundation of most modern Mac development. Information on PyObjC is available from http://pyobjc.sourceforge.net.

The standard Python GUI toolkit is tkinter, based on the cross-platform Tk toolkit (http://www.tcl.tk). An Aqua-native version of Tk is bundled with OS X by Apple, and the latest version can be downloaded and installed from http://www.activestate.com; it can also be built from source.

wxPython is another popular cross-platform GUI toolkit that runs natively on Mac OS X. Packages and documentation are available from http://www.wxpython.org.

PyQt is another popular cross-platform GUI toolkit that runs natively on Mac OS X. More information can be found at http://www.riverbankcomputing.co.uk/software/pyqt/intro.

Distributing Python Applications on the Mac¶

The “Build Applet” tool that is placed in the MacPython 2.5 folder is fine for packaging small Python scripts on your own machine to run as a standard Mac application. This tool, however, is not robust enough to distribute Python applications to other users.

The standard tool for deploying standalone Python applications on the Mac is py2app. More information on installing and using py2app can be found at http://undefined.org/python/#py2app.

Application Scripting¶

Python can also be used to script other Mac applications via Apple’s Open Scripting Architecture (OSA); see http://appscript.sourceforge.net. Appscript is a high-level, user-friendly Apple event bridge that allows you to control scriptable Mac OS X applications using ordinary Python scripts. Appscript makes Python a serious alternative to Apple’s own AppleScript language for automating your Mac. A related package, PyOSA, is an OSA language component for the Python scripting language, allowing Python code to be executed by any OSA-enabled application (Script Editor, Mail, iTunes, etc.). PyOSA makes Python a full peer to AppleScript.

Other Resources¶

The MacPython mailing list is an excellent support resource for Python users and developers on the Mac:

http://www.python.org/community/sigs/current/pythonmac-sig/

Another useful resource is the MacPython wiki:

http://wiki.python.org/moin/MacPython

Python 语言参考¶

版本:	3.2
日期:	August 02, 2015

该参考手册描述了 Python 的语法和 “核心语义”. 它很简洁, 但尽可能的准确和完整. 不必要的内建对象类型和内建函数的语义以及模块在 Python 标准库中描述. 如果想要一份 Python 的非正式介绍, 参照 Python 入门教程. 对于 C 或 C++ 程序员, 还有两份额外的手册: 扩展和嵌入 Python 解释器描述了如何写 Python 扩展模块的高级情形, 而 Python/C API 参考手册则详细描述了对 C/C++ 程序员可用的接口.

简介¶

该参考手册描述 Python 程序语言. 它并不打算作为一个教程.

虽然我试图尽可能的准确, 但我还是在除了语法和词法分析的其它所有事情中使用英语而不是正式的规范. 这会使得这份文档对一般读者来说更容易理解, 但可能存在歧义. 因此, 如果你来自火星并试图通过这份文档单独的重新实现 Python, 你可能不得不猜测一些东西, 其实事实上你将可能最终实现一种完全不同的语言. 另一方面, 如果你正在使用 Python, 不知道该语言在某一特定环境下精确的规则, 你一定能在这里找到它们. 如果你想要看该语言更正式的定义, 或许你可以贡献你的时间 — 或者发明一台克隆机器 :-).

加入太多实现的细节到语言参考文档里是危险的 — 实现可能变化, 其它实现可能以不同的方法工作. 另一方面, CPython 一个广泛使用的 Python 实现 (虽然其它实现继续在得到支持), 有时它的特别的模式也是值得被提及的, 尤其是其在强加额外限制的地方. 因此, 你会在整个文档中不时发现短的 “实现说明”.

每一个 Python 实现都伴随着若干内建和标准模块. 它们记录在 Python 标准库. 少量内建模块在他们以一个有效的方法与语言定义交互的时候被提及.

其它实现¶

尽管已有一个目前最为流行的 Python 实现, 但还是有一些其它的实现, 它们对不同的用户有着特别的吸引力.

已知的实现包括:

CPython: 这是 Python 原始以及做被维护的实现, 使用 C 编写. 新的语言特性一般会最先在这里出现.
Jython: 用 Java 实现的 Python. 这份实现可以作为脚本语言在 Java 应用中使用, 或者可以用 Java 类库来创建应用. 它也经常被用来为 Java 库创建测试. 更多的信息可以在 Jython 网站找到.
Python for .NET: 这份实现实际上使用了 CPython 实现, 但是一个托管 .NET 应用, 并使得 .NET 类库可以使用. 它有 Brian Lloyd 创建, 取得更多信息, 参照 Python for .NET 主页.
IronPython: 另一份 .NET 上的 Python. 与 Python.NET 不一样, 这是一份完整的能产生 IL (译者注: 中间码) 的 Python 实现. 它由 Jim Hugunin 创造, Jim Hugunin 也是 Jython 的原始作者. 取得更多信息, 参照 IronPython 网站.
PyPy: 一份完全用 Python 写的 Python 实现. 它支持一些在其它实现中没有的高级特性, 像 stackless 支持和一个 JIT 编译器. 该项目的目标之一是鼓励通过更简单的更改解释器来试验语言本身 (因为它是用 Python 写的). 额外的信息在 PyPy 项目的主页.

这些实现的任意一个都在某些方面与在这份手册里记录的语言有所不同, 或者引入了在标准 Python 文档以外的特殊的信息. 请参阅特定实现的文档, 来确定你还需要了解些什么东西, 关于你在使用的特定实现的东西.

表示法¶

词法分析和语法使用了一种改良了的 BNF 语法表示法. 它使用了下面的定义风格:

name      ::=  lc_letter (lc_letter | "_")*
lc_letter ::=  "a"..."z"

第一行表示一个 name 是一个 lc_letter 后面跟着一个空序列或者更多的 lc_letter 和下划线. 而一个 lc_letter 是从 'a' 到 'z' 的任意一个字符. (事实上这也是该文档中这些名字定义的规则)

每一条规则以一个名字 (这条规则定义的名字) 和 ::= 开始. 竖线 (|) 用来分隔两者挑一的内容; 它是该表示法中最低优先级的符号. 星号 (*) 表示零个或更多之前项目的重复; 同样的, 加号 (+) 表示一个或更多重复, 而方括号 ([ ]) 里的内容表示它发生了零次或一次 (换句话说, 该内容是可选的). * 和 + 符号有着最高的优先级; 圆括号用来分组. 字符串被引号包围. 空白只能够用来分隔标识符. 规则通常使用一行; 有很多两者挑一的内容的规则可能会使用每一个可替代内容占一行的格式, 除第一行以外, 每一行以一个竖线开始.

在词法定义中 (如上面的例子), 还使用了两个额外的约定: 被三个点号分隔的两个字符表示在这两个字符范围内的某个 ASCII 字符. 在尖括号 (<...>) 中的短语给出了符号的非正式描述; 例如, 在需要时这可以用来描述 ‘控制符’ 的概念.

词法和语法定义虽然使用的表示法几乎完全一样, 但在意义上有一个巨大的不同: 词法分析运作在输入源的个体的字符上面, 而语法定义运作在由词法分析生成的标识符流上面. 在下一章 (“词法分析”) 里所有 BNF 的使用都是词法定义; 再随后的一章是语法定义.

词法分析¶

一个 Python 程序由 **解析器**(parser) 读入, 输入解析器的是由 **词法分析器**(lexical analyzer) 生成的 **语言符号**(token) 流. 本章讨论词法分析器是如何把文件割成若干语言符号的.

Python 使用 Unicode code points作为程序文本, 源程序文件的编码可以通过声明显式地修改, 默认为UTF-8, 详见 PEP 3120. 如果无法解码源代码, 就会抛出 SyntaxError 异常.

行结构¶

一个Python程序被分割成若干 **逻辑行**(logical lines).

逻辑行¶

ining* rules.

逻辑行的结束以 NEWLINE(新行) 语言符号表示. 语句不能跨多个逻辑行边界, 除非语法上就允许 NEWLINE (例如,复合语句中的语句之间) 出现在语句里. 一个逻辑行由一个物理行, 或者使用显式/隐式 **行连接**(line joining) 规则连接的多个物理行构成.

物理行¶

一个物理行是以一个 “断行符号序列” 结束的一个字符序列. 在源代码中, 任何标准平台的 “断行符号序列” 都可以使用: Unix风格为ASCII LF(换行)字符; Windows风格为ASCII字符序列CR LF(回车加换行); Macintosh风格为ASCII CR(回车)字符. 无论在什么平台上, 以上这三种形式都可以使用.

在嵌入 Python 时,传递给 Python API 的源代码字符串应该使用标准C的断行习惯 (\n 字符, 代表 ASCII LF, 是行终止符).

注释¶

注释以 # 字符 (它不能是字符串字面值的一部分) 开始, 结束于该物理行的结尾. 如果没有隐式的行连接, 注释就意味着逻辑行的终止. 注释会被语法分析忽略, 甚至不作为语言符号.

编码声明¶

Python 脚本第一行或者第二行中的注释如果与正则表达式 coding[=:]\s*([-\w.]+) 匹配, 那么这个注释就被认为是编码声明. 此正则表达式的第一组指定了源代码文件的编码名. 正则表达式的推荐形式为:

# -*- coding: <encoding-name> -*-

GNU Emacs也可以识别以上正则表达式,而:

# vim:fileencoding=<encoding-name>

Bram Moolenaar 的 VIM可以接受以上这种风格.

如果没有找到任何编码声明, 就使用默认编码UTF-8. 另外, 如果文件的前几个字节为 UTF-8 字节序标记 (byte-order mark): b'\xef\xbb\xbf' ,也会认为文件是以UTF-8编码的 (其他程序也支持这种方式,比如微软的 notepad).

如果声明了一种编码,那么这个编码必须是 Python 可以接受的. 此编码设置会用于整个词法分析过程, 包括字符串字面值, 注释和标识符. 编码声明必须在它所在位置的一行之内.

显式行连接¶

两个或更多物理行可以使用反斜线字符 (\) 合并成一个逻辑行, 具体地说: 当一个物理行结束于一个反斜线时 (这个反斜线不能是字符串字面值或注释的一部分), 它就同其后的物理行合并成一个逻辑行, 同时将它之后的反斜线和行结束符删除, 例如:

if 1900 < year < 2100 and 1 <= month <= 12 \
   and 1 <= day <= 31 and 0 <= hour < 24 \
   and 0 <= minute < 60 and 0 <= second < 60:   # Looks like a valid date
        return 1

以反斜线结尾的行之后不能有注释. 反斜线不能接续注释行. 除了字符串字面值, 反斜线也不能接续任何语言符号 (即, 不是字符串字面值的语言符号不能通过反斜线跨越物理行). 在字符串字面值之外的行内其它地方出现反斜线都是非法的.

隐式行连接¶

在小括号, 中括号,大括号中的表达式,不须借助反斜线就可以跨越多个物理行,例如:

month_names = ['Januari', 'Februari', 'Maart',      # These are the
               'April',   'Mei',      'Juni',       # Dutch names
               'Juli',    'Augustus', 'September',  # for the months
               'Oktober', 'November', 'December']   # of the year

隐式连接的行可以尾随注释, 接续行如何缩进也并不重要. 空接续行是允许的. 在隐式接续行之间是没有NEWLINE语言符号的. 隐式行连接在三重引用串 (后述) 中也是合法的, 但那种情况下不能加注释.

空行¶

只含有空格, 制表符, 进纸符和一个可选注释的逻辑行, 在解析过程中是被忽略的 (即不会产生对应的NEWLINE语言符号). 在语句进行交互式输入时, 对空行的处理可能不同, 这依赖于 “输入-计算-输出”(read-eval-print) 循环的实现方式. 在标准交互解释器中, 一个纯粹的空行 (即不包括任何东西, 甚至注释和空白) 才会结束多行语句.

缩进¶

逻辑行的前导空白 (空格和制表符) 用于计算行的缩进层次, 缩进层次然后用于语句的分组.

首先, 制表符被转换成 (从左到右) 一至八个空格, 这样直到包括替换部分的字符总数达到八的倍数 (这是为了与UNIX的规则保持一致. 然后, 根据首个非空白字符前的空格总数计算行的缩进层次. “缩进” 是不能用反斜线跨物理行接续的. 只有反斜线之前的空白字符才用于确定缩进层次.

如果源文件混合使用了制表符和空格, 并且缩进的意义依赖于制表符的空格长度的话, 那么这种缩进会以不一致为原因被拒绝, 并会抛出 TabError 异常.

**跨平台兼容性注意: ** 由于在非UNIX平台上的文本编辑器特性, 在单个源文件里使用混合空格和制表符的缩进是不明智的. 另一个值得注意的地方是不同平台可能明确地限制了最大缩进层次.

换页符 (formfeed) 可以出现在行首, 但以上介绍的缩进计算过程会忽略它. 在行前置空白的其它位置上出现的换页符会导致未定义的行为 (例如, 它可能使空格数重置为零).

每种连续行缩进的层次都会产生语言符号INDENT和DEDENT, 这里使用了堆栈数据结构, 如下所述.

在未读入文件第一行之前, 压入(push) 内一个零, 它此后再也不会被弹出(pop). 所有压入堆栈中的数字都从底部向顶部增长. 在每个逻辑行的开头处, 它的缩进层次与栈顶比较, 如果两者相等则什么也不会发生; 如果它大于栈顶, 将其压入栈中, 并产生一个INDENT语言符号; 如果小于栈顶, 那么它的值应该已经出现于堆栈中, 堆栈中所有大于它的数都将被弹出, 并且每个都产生一个DEDENT语言符号. 到达文件尾时, 堆栈中大于零的数字都被弹出, 每弹出一个数字都会产生一个DEDENT语言符号.

这是一个有着正确缩进格式的Python代码的例子 (虽然有点乱)

def perm(l):
        # Compute the list of all permutations of l
    if len(l) <= 1:
                  return [l]
    r = []
    for i in range(len(l)):
             s = l[:i] + l[i+1:]
             p = perm(s)
             for x in p:
              r.append(l[i:i+1] + x)
    return r

下面的例子展示了各种缩进错误:

   def perm(l):                       # error: first line indented  (首行缩进)
  for i in range(len(l)):             # error: not indented  (未缩进)
      s = l[:i] + l[i+1:]
          p = perm(l[:i] + l[i+1:])   # error: unexpected indent  (意外缩进)
          for x in p:
                  r.append(l[i:i+1] + x)
              return r                # error: inconsistent dedent  (不一致的缩进)
a level popped off the stack.)

(事实上, 前三个错误是由解析器发现的. 仅仅最后一个错误是由词法分析器找到的 — return r 的缩进层次与弹出堆栈的数不匹配.)

语言符号间的空白¶

除了位于在逻辑行开始处或者字符串当中, 空格, 制表符和进纸符这些空白字符可以等效地用于分隔语言符号 (token). 只在两个符号在连接后会有其它含义时才需要使用空白分割它们, 例如, ab是一个符号, 但a b是两个符号.

其它语言符号¶

除了NEWLINE, INDENT 和 DEDENT外, 还有以下几类语言符号: 标识符, 关键字, 字面值, 运算符 和 分隔符. 空白不是语言符号 (除了断行符, 如前所述), 但可以用于分隔语言符号. 如果在构造某语言符号可能存在歧义时, 就试图用尽量长的字符串 (从左至右读出的) 构造一个合法语言符号.

标识符和关键字¶

标识符 (也称为名字) 由以下词法定义描述.

下面介绍的 Python 标识符定义是在 Unicode standard annex UAX-31 的基础上加以修改而成的, 更多细节可以参考 PEP 3131.

在 ASCII 范围 (U+0001..U+007F) 内, 标识符的有效字符与 Python 2.x 相同: 大小写字母 (A-Z), 下划线, 以及不能作为标识符开始的数字 (0-9).

Python 3.0 引入了在ASCII范围之外额外字符 (参见 PEP 3131). 对于这些字符, 分类(classification) 可以使用 unicodedata 模块中的 Unicode Character Database.

标识符不限长度, 区分大小写.

identifier  ::=  id_start id_continue*
id_start    ::=  <all characters in general categories Lu, Ll, Lt, Lm, Lo, Nl, the underscore, and characters with the Other_ID_Start property>
id_continue ::=  <all characters in id_start, plus characters in the categories Mn, Mc, Nd, Pc and others with the Other_ID_Continue property>

以上 Unicode category code 的缩写是:

Lu - uppercase letters
Ll - lowercase letters
Lt - titlecase letters
Lm - modifier letters
Lo - other letters
Nl - letter numbers
Mn - nonspacing marks
Mc - spacing combining marks
Nd - decimal numbers
Pc - connector punctuations

在解析时, 所有标识符都被转换为 NFC 形式, 标识符的比较是基于NFC的.

可以在这里找到一篇非标准的HTML文件列出了所有Unicode 4.1中有效的标识符字符: http://www.dcl.hpi.uni-potsdam.de/home/loewis/table-3131.html.

关键字¶

以下标识符用作保留字, 或者叫做语言的 关键字, 它们不能作为普通标识符使用, 而且它们必须按如下拼写严格书写:

False      class      finally    is         return
None       continue   for        lambda     try
True       def        from       nonlocal   while
and        del        global     not        with
as         elif       if         or         yield
assert     else       import     pass
break      except     in         raise

保留的标识符类型¶

除了关键字, 某些类型的标识符也具有特殊含义, 这种标识符一般都以下划线开始或结束:

_*: from moduls import * 不会导入这些符号. 在交互式解释器中, 特殊标识符 _ 保存上次计算 (evaluation) 的结果, 这个符号在 builtins 模块之中. 在非交互方式时, _ 没有特殊含义, 而且是没有定义的. 参见 The import statement 节.

Note

名字 _ 通常用于国际化开发, 关于这个惯用法, 可以参考模块 gettext.
__*__: 系统预定义的名字. 这种名字由解释器及其实现定义 (包括标准库). 目前定义的系统名字在　特殊方法名和其他地方有所介绍. Python的未来版本可能会引入更多的这种名字. 对于*不*符合文档说明的 __*__ 名字的用法, 可能会在以后版本中在没有任何警告的前提下失败.
__*: 类私有名字. 此类名字出现在类定义的上下文中. 为了避免基类与继承类的 “私有” 属性的名字冲突, 它们会被自动更名为其他名字 (mangled form). 参考 Identifiers (Names).

字面值¶

字面值是某些内置类型的常量值的表示法.

字符串与字节的字面值¶

字符串字面值由以下词法定义描述:

stringliteral   ::=  [stringprefix](shortstring | longstring)
stringprefix    ::=  "r" | "R"
shortstring     ::=  "'" shortstringitem* "'" | '"' shortstringitem* '"'
longstring      ::=  "'''" longstringitem* "'''" | '"""' longstringitem* '"""'
shortstringitem ::=  shortstringchar | stringescapeseq
longstringitem  ::=  longstringchar | stringescapeseq
shortstringchar ::=  <any source character except "\" or newline or the quote>
longstringchar  ::=  <any source character except "\">
stringescapeseq ::=  "\" <any source character>

bytesliteral   ::=  bytesprefix(shortbytes | longbytes)
bytesprefix    ::=  "b" | "B" | "br" | "Br" | "bR" | "BR"
shortbytes     ::=  "'" shortbytesitem* "'" | '"' shortbytesitem* '"'
longbytes      ::=  "'''" longbytesitem* "'''" | '"""' longbytesitem* '"""'
shortbytesitem ::=  shortbyteschar | bytesescapeseq
longbytesitem  ::=  longbyteschar | bytesescapeseq
shortbyteschar ::=  <any ASCII character except "\" or newline or the quote>
longbyteschar  ::=  <any ASCII character except "\">
bytesescapeseq ::=  "\" <any ASCII character>

上面产生式中一个没有表示出来的语法限制是, 在 stringprefix 或 bytesprefix 与其余字面值之间不允许出现空白字符. 源代码的字符集由编码声明定义, 如果源文件内没有指定编码声明, 则默认为 UTF-8, 参见编码声明.

通俗地讲, 这两种字面值可以用单引号(')或双引号(")括住. 它们也可以用成对的三个单引号和双引号(这叫做 三重引用串), 反斜线(\)可以用于引用其它有特殊含义的字符, 例如新行符, 反斜线本身或者引用字符.

字节字面值一定要以 'b' 或 'B' 开始, 这会产生一个:bytes 类的实例, 而不是 str 的. 它只能包括ASCII字符, 数值等于或者超过128的字节必须用转义字符表达.

字符串和字节字面值都可以用 'u' 和 'U' 开头, 这样的字符串字面值叫作 原始串(raw strings), 其中不对反斜线作转义处理, 因此, 原始串中的 '\U' 和 '\u' 不会得到特殊处理.

在三重引用串中, 未转义新行和引用字符是允许的 (并且会被保留), 除非三个连续的引用字符结束了该串. (引用字符指用于开始字符串的字符, 如 ' 和 " )

如果没有使用 'r' 或 'R' 前缀, 转义序列就按就按类似标准C那样解释, 可接受的转义序列见下表:

只有字符串字面值才支持的转义字符有:

转义序列	含义	备注
`\N{name}`	Character named name in the Unicode database
`\uxxxx`	Character with 16-bit hex value xxxx	(4)
`\Uxxxxxxxx`	Character with 32-bit hex value xxxxxxxx	(5)

Notes:

与C标准相同, 最多只接受三个八进制数字.
不像C标准, 这里要求给全２个十六进制数字.
在字节字面值中, 十六进制和八进制转义字符都是指定一个字节的值. 在字符串字面值中, 这些转义字符指定的是一个Unicode字符的值.
任何构成部分 surrogate pair 的单独 code unit 都可以使用转义字符序列编码. 不像C标准, 这里要求给全４个十六进制数字.
任何Unicode字符都可以用这种方式编码, 但如果 Python 是按 16位 code unit 编译的话(默认), 这里要求写全８个十六进制数字.

与标准C不同, 所有不能解释的转义序列都会留在串不变, 即 反斜线也会留在串中 (这个行为在调试中特别有用: 如果有转义字符输错了, 可以很容易地判断出来). 但也要留意, 字节字面值并不接受那些只有字符串字面值内有效的转义序列.

即使在原始串中, 字符引用也可以使用反斜线转义, 但反斜线会保留在字符串中, 例如, r"\"" 是一个有效的字符串, 它由两个字符组成,一个反斜线一个双引号; 而 r"\" 则不是 (甚至原始串也不能包括奇数个反斜线. 另外, 原始串也不能以反斜线结束 (因为反斜线会把后面的引用字符转义). 同时, 也要注意在新行符后出现的反斜线, 会解释为串部分中的两个字符, 而不是续行处理.

字符串字面值的连接¶

多个空白分隔的相邻字符串或者字节字面值, 可能使用了不同的引用习惯, 这是允许的, 并且它们在连接时含义是一样的. 因此, "hello" 'world' 等价于 ``“helloworld” ``. 这个功能可以用来减少需要的反斜线, 把跨越多行的长字符串, 甚至可以在串的某个部分加注释, 例如:

re.compile("[A-Za-z_]"       # letter or underscore
           "[A-Za-z0-9_]*"   # letter, digit or underscore
          )

注意这个功能是在语法层次上定义的, 但却是在编译时实现的. 在运行时连接字符串表达式必须使用” +” 运算符. 再次提醒, 在字面值连接时, 不同的引用字符可以混用, 甚至原始串与三重引用串也可以混合使用.

数值型的字面值¶

有三种数值型字面值: 整数, 浮点数和虚数. 没有复数类型的字面值, 复数可以用一个实数加上一个虚数的方法构造.

注意数值型字面值并不包括正负号, 像 -1, 实际上是组合了一元运算符 ‘-‘ 和字面值 1 的一个表达式.

整数字面值¶

整数字面值由以下词法定义描述:

integer        ::=  decimalinteger | octinteger | hexinteger | bininteger
decimalinteger ::=  nonzerodigit digit* | "0"+
nonzerodigit   ::=  "1"..."9"
digit          ::=  "0"..."9"
octinteger     ::=  "0" ("o" | "O") octdigit+
hexinteger     ::=  "0" ("x" | "X") hexdigit+
bininteger     ::=  "0" ("b" | "B") bindigit+
octdigit       ::=  "0"..."7"
hexdigit       ::=  digit | "a"..."f" | "A"..."F"
bindigit       ::=  "0" | "1"

除了可用内存的容量限制, 整数长度没有其他限制.

注意, 非零十进制数字中不允许用0作为前缀, 这种写法会与 C 语言风格的八进制字面值产生歧义 (用于3.0之前版本的Python).

整数字面值的一些例子:

7     2147483647                        0o177    0b100110111
3     79228162514264337593543950336     0o377    0x100000000
      79228162514264337593543950336              0xdeadbeef

浮点型字面值¶

浮点型的字面值可以用以下词法定义描述:

floatnumber   ::=  pointfloat | exponentfloat
pointfloat    ::=  [intpart] fraction | intpart "."
exponentfloat ::=  (intpart | pointfloat) exponent
intpart       ::=  digit+
fraction      ::=  "." digit+
exponent      ::=  ("e" | "E") ["+" | "-"] digit+

注意整数部分和指数部分都看作是十进制的. 例如, 077e010 是合法的, 它等价于 77e10. 浮点型字面值的取值范围依赖于实现, 以下是一些浮点数的例子:

3.14    10.    .001    1e100    3.14e-10    0e0

注意, 数值型字面值并不包括正负号, 像 -1, 实际上是一个组合了一元运算符 ‘-‘ 和字面值 1 的表达式.

虚数字面值¶

虚数字面值可以用下面词法定义描述:

imagnumber ::=  (floatnumber | intpart) ("j" | "J")

虚数是实部为零的复数. 复数由一对有着相同取值范围的浮点数表示. 为了创建一个非零实部的复数, 可以对它增加一个浮点数, 例如, (3+4j). 下面是一些例子:

3.14j   10.j    10j     .001j   1e100j  3.14e-10j

运算符¶

运算符包括以下语言符号:

+       -       *       **      /       //      %
<<      >>      &       |       ^       ~
<       >       <=      >=      ==      !=

分隔符¶

以下符号用作语法上的分隔符:

(       )       [       ]       {       }
,       :       .       ;       @       =
+=      -=      *=      /=      //=     %=
&=      |=      ^=      >>=     <<=     **=

句号可以出现在浮点数和虚数字面值中, 三个连续句号的序列是片断的省略写法. 在这个列表的后半部分, 即参数化赋值运算符, 它们在词法上是分隔符, 同时也执行运算.

以下 ASCII 可打印字符, 要么在作为其它语言符号的一部分时有特殊含义, 要么对于词法分析器具有特殊作用:

'       "       #       \

Python 不使用以下ASCII可打印字符, 当它们出现在注释和字符串字面值之外时就是非法的:

$       ?       `

数据模型¶

对象, 值和类型¶

Python 中的 对象`(:dfn:`Objects) 是对数据的抽象. 所有 Python 程序中的数据都用对象或者对象关系表示 (Python中连代码也是对象, 这与诺依曼的 “存储程序计算机” 模型在某种意义上是一致的).

每个对象都有一个标识, 一种类型和一个值. 一旦建立, 对象的标识就不能改变了; 你可以认为它就是对象的内存地址. ‘is‘ 操作符可以比较两个对象的标识; id() 函式会返回对象标识 (目前是用地址实现的) 的一个整数表示. 对象的 类型`(:dfn:`type) 也是不可变的. [1] 对象的类型确定了对象能够支持的操作 (例如, “它有长度吗?”), 同时它也定义了该种对象的取值范围. type() 函式返回对象的类型 (类型本身也是一个对象),

某些对象的值可以改变, 值可以改变的对象称为是 可变的(mutable), 一旦创建完成值就不能改变的对象称为是 不可变的 (immutable)

(The value of an immutable container object that contains a reference to a mutable object can change when the latter’s value is changed; however the container is still considered immutable, because the collection of objects it contains cannot be changed. So, immutability is not strictly the same as having an unchangeable value, it is more subtle.)

(不可变容器对象如果引用了可变对象, 当可变对象改变了时, 它其实也是被修改了. 但它仍被看作是不可变对象, 这是因为它所包含的对象集合是不能变的, 所以不可变对象与值不可变并不完全一样, 这里实在有些微妙)

一个对象的可变性由它的类型决定, 例如数值, 字符串和元组是不可变的, 而字典和列表是可变的.

Objects are never explicitly destroyed; however, when they become unreachable they may be garbage-collected. An implementation is allowed to postpone garbage collection or omit it altogether — it is a matter of implementation quality how garbage collection is implemented, as long as no objects are collected that are still reachable.

对象从来不会被显式的的释放 (destroyed), 但处于不可达状态的对象会被回收掉. 实现可以选择推迟垃圾回收甚至忽略掉这个过程 — 这是实现垃圾回收机制的质量问题, 与语言本身无关. 只要还处于可达状态的对象不被回收就满足 Python 语言的基本要求.

CPython implementation detail: 当前 CPython 实现使用引用计数机制和一个可选的循环垃圾延时检测机制, 只要对象进入不可达状态, 它就会尽量回收对象, 但不能保证回收含有循环引用的垃圾对象. 关于如何控制循环垃圾对象回收的详细情况, 可以参考 gc 模块). 其他实现的行为与之不同, 而且 CPython 以后可能也会改变这个行为.

注意, 使用实现提供的跟踪和调试工具时可能会导致本该回收的对象不被回收. 此外, 语句 ‘try...except‘ 也可能导致此情况.

有些对象包括对 “外部” 资源的引用, 例如文件或窗口. 垃圾回收会释放这些资源是顺其自然的做法, 但因为并不保证垃圾回收一定会发生, 所以这样的对象一般都提供了显式的方法释放这些资源, 通常是用 close() 方法. 高度推荐使用这种方法释放引用了外部资源的对象. ‘try...finally‘ 和 ‘with‘ 语句为执行这种方法提供了方便.

引用了其它对象的对象叫做容器, 容器的例子有元组, 列表和字典. 引用是容器值的一部分. 大多数情况下, 当我们谈及一个容器的值时, 指的只是值, 而不是被包含对象的标识符. 但是, 当我们谈及容器对象可变性的时候, 指的就是被直接包含的对象的标识了. 因此, 如果一个不可变对象 (如元组) 包含了可变对象, 只要这个可变对象的值变了则容器的值就也改变了.

类型影响了对象的绝大多数行为, 甚至在某种程度上对对象标识也有重要影响：对于不可变对象, 计算新值的操作符实际返回的可能是, 一个指向已存在的具有相同类型和值的对象的引用. 对于可变对象来说, 这是不允许的. 例如: 在 a = 1; b = 1 之后, a 和 b 可能指向同一个具有 1 值的对象, 具体如何取决于实现. 但 c = []; d =[] 之后, c 和 d 可以保证是两个不同的, 独立的, 新建的空列表 (注意 c = d = [] 是把相同的对象赋给了 c 和 d).

标准类型层次¶

以下是Python内建类型的列表, 扩展模块 (根据不同实现的情况, 可能是 C, Java 或者其他语言写的) 可以定义其它内建类型. 未来版本的 Python 可能会在此类型层次中增加新的类型 (例如: 有理数, 高效存储的整数数组等), 不过这些类型通常是在标准库中定义的.

以下个别类型描述中可能有介绍 “特殊属性” 的段落, 它们是供实现访问的, 不作为一般用途. 这些定义在未来有可能发生改变:

None

这个类型只具有一个值, 并且这种类型也只有一个对象, 这个对象可以通过内建名字 None 访问, 在许多场合里它表示无值, 例如, 没有显式返回值的函式会返回 None. 这个对象的真值为假.

NotImplemented

这个类型只具有一个值, 并且这种类型也只有一个对象. 这个对象可以通过内建名字 NotImplemented 访问. 如果操作数没有对应实现, 数值方法和厚比较 (rich comparison) 方法就会可能返回这个值 (依赖于操作符, 解释器然后会尝试反射操作 (见后), 或者其它后备操作). 它的真值为真.

Ellipsis

这个类型只具有一个值, 并且这种类型也只有一个对象. 这个对象可以通过字面值 ... 或者内建名字 Ellipsis 访问. 它的真值为真.

numbers.Number

它们由数值型字面值产生, 或者是算术运算符和内建算术函式的返回值. 数值型对象是不可变的, 即一旦创建, 其值就不可改变. Python 数值型和数学上的数字关系当然是非常密切的, 但也受到计算机数值表达能力的限制.

Python 区分整数, 浮点数和复数:

numbers.Integral

描述了数学上的整数集 (正负数).

有两类整数:

Integers (int)

整数类型. 表示不限范围的数字. 移位和掩码操作符可以认为整数是这样组织的: 负数用二进制补码的一种变体表示, 符号位会扩展至左边无限多位.

Booleans (bool)

如此设计整数表示方法的一个目的是, 使得负数在移位和掩码操作中能够更有意义.

numbers.Real (float)

浮点数. 本类型表示了机器级的双精度浮点数. 硬件的底层体系结构 (和 C, Java 实现) 对你隐藏了浮点数取值范围和溢出处理的复杂细节. Python不支持单精度浮点数. 使用单精度浮点数的原因一般是为了降低CPU负荷和节省内存, 但是这个努力会被 Python 的对象处理代价所抵消, 因此没有必要同时支持两种浮点数, 使 Python 复杂化.

numbers.Complex (complex)

复数. 本类型用一对机器级的双精度浮点数表示复数. 关于浮点数的介绍也适用于复数类型. 复数 z 的实部和虚部可以通过属性 z.real 和 z.imag 获得.

Sequences

本类型描述的是, 以非负数作为元素索引, 由有限元素构成的有序集合. 内建函式 len() 返回序列数据中的元素数. 当序列长度为 n 时, 索引号为 0, 1,..., n -1. 序列 a 中的项 i, 用 a[i] 表示.

序列也支持切片: a[i:j] 表示满足 i <= k < j 的所有项 a[k]. 在作为表达式使用时, 这个切片与原始的序列类型相同, 这隐含着会重新编号索引, 即从零开始.

个别序列还支持有第三个 “步长” 参数的 扩展切片 : a[i:j:k] 选择了所有索引 x: x = i + n*k, n >= 0 并且 i <= x < j.

序列按照可变性可以分为:

Immutable sequences

一旦建立不可变对象的值就不可修改. (如果这个对象引用了其它对象, 这个被引用的对象可以是可变对象, 并且这个对象的值可以变化. 但是, 不可变对象所包括的可变对象集合是不能变的.)

The following types are immutable sequences:

以下是不可变序列类型:

Strings
Tuples
Bytes

Mutable sequences

可变序列可以在创建后改变, 其下标表示和切片表示可以作为赋值语句和 del 语句的目标.

目前, 有两种内建的可变序列对象:

Lists
Byte Arrays

扩展模块 array 提供另一种可变序列类型, 模块 collections 也是如此.

Set types

这个类型描述的是由有限数量的不可变对象构成的无序集合, 对象不能在集合中重复. 它们不能用任何索引作为下标, 但它们可以被迭代, 内建函式 len() 可以计算集合里的元素数. 集合的常用场合是快速测试某元素是否在集合中, 或者是从一个序列中删除重复元素, 或者是做一些数学运算, 比如求集合的交集, 并集, 差和对称差.

集合的元素与字典键一样, 都遵循不可变性对象的规则. 注意, 数值类型遵守数值比较的正常规则. 即比较相等的两个数值型对象, 只有一个能存在于集合中, 例如, 1 和 1.0.

当前有两种内建的集合类型:

Sets: 集合. 这表示可变集合, 可以用内建函式 set() 构造, 之后也可以使用用一系列方法修改这个集合, 比如 add().
Frozen sets: 这表示一个不可变集合. 由内建函式 frozenset() 构造. 这种类型的对象是不可变的, 并且是可散列的 (hashable), 因此它可以作为另一个集合的元素, 或者作为字典健使用.

Mappings

These represent finite sets of objects indexed by arbitrary index sets. The subscript notation a[k] selects the item indexed by k from the mapping a; this can be used in expressions and as the target of assignments or del statements. The built-in function len() returns the number of items in a mapping.

表示由任意类型作索引的有限对象集合. 下标记法 a[k] 表示在映射类型对象 a 中选择以 k 为索引的项, 这该项可以用于表达式, 作为赋值语句和 del 语句的目标. 内建函式 len() 返回映射对象的元素数量.

目前只有一种内建映射类型:

Dictionaries

表示一个有限对象集合, 几乎可以用任意值索引其中的对象. 包括列表和字典的值可以是值, 但不能是键, 或者其它通过值比较而不是以对象标识比较的可变对象也不能作为键, 其原因是字典的实现效率要求键的散列值保持不变. 数值比较结果相等的两个数值型对象, 例如, 1 和 1.0, 在作为字典值的索引 (键) 时是等效的.

字典是可变的, 可以用 {...} 语法创建它们 (参见 Dictionary displays).

扩展模块 dbm.ndbm, dbm.gnu 和 collections 提供了其他映射类型的例子.

Callable types

这是表示功能调用操作的类型 (参见 Calls):

User-defined functions

用户定义函式对象由函式定义 (见函数定义) 创建. 调用函式时的参数数量, 应该与定义时的形式参数量相同.

特殊属性:

属性	含义
`__doc__`	函式的文档. 字符串, 如果没有的话就为 `None`	可写
`__name__`	函式名	可写
`__module__`	定义函式的模块名, 或者如果没有对应模块名, 就为 `None`	可写
`__defaults__`	如果任何参数有默认值, 这个分组保存默认值, 否则为 `None`	可写
`__code__`	表示编译后的函式体的代码对象	可写
`__globals__`	函式的全局变量字典引用, 即函式定义处的全局名字空间.	只读
`__dict__`	支持任意函式属性的名字空间.	可写
`__closure__`	元组, 含有函式自由变量绑定, 如果没有自由变量, 就为 `None`	只读
`__annotations__`	一个含有参数注解 (annotations) 的字典, 键为参数名. 如果有返回值, 返回值的键为 `return`	可写
`__kwdefaults__`	只包括关键字参数默认值的字典	可写

以上大多数标记为 “可写” 的属性都会对赋的值做类型检查.

函式对象也支持用获得(getting) 和设置(setting)任意合法属性(attribute), 比如可以用这种方法将函式与元信息关联起来. 常规的 “点＋属性” 就可以获取和设置这些属性. 注意, 当前实现只在用户自定义函式上支持函式属性, 未来版本可能会支持内建函式的函式属性.

函式定义的其它信息可通过它的代码对象获得, 参考下面关于内部类型的介绍.

Instance methods

实例方法对象把类, 类实例和任意可调用对象 (通常是用户定义函式) 组合到了一起.

Special read-only attributes: __self__ is the class instance object, __func__ is the function object; __doc__ is the method’s documentation (same as __func__.__doc__); __name__ is the method name (same as __func__.__name__); __module__ is the name of the module the method was defined in, or None if unavailable.

特殊只读属性: __self__ 是类实例对象, __func__ 是函式对象, __doc__ 是方法的文档 (与 __func__.__doc__ 相同); __name__ 是方法的名字 (与 __func__.__name__ 相同); __module__ 函式定义所在的模块名字, 如果没有对应模块, 就为 None.

方法也支持对底层函式对象任意属性的访问, 但不支持设置.

用户定义方法对象可以通过获取类属性 (也可能是通过该类的一个实例) 创建, 但前提是这个属性是用户定义函式对象, 或者类方法对象.

通过获取一个类实例的用户定义函式, 创建新实例方法对象的时候, 新对象的属性 __self__ 指向该类实例, 这个方法称为是 “被绑定的”. 这个方法的属性 __func__ 指向底层的函式对象.

Modules

模块可以用 import 语句 (见 The import statement) 语句导入. 每个模块都有一个用字典对象实现的名字空间 (在模块中定义的函式的 __global__ 属性引用的就是这个字典). 模块属性的访问被转换成查找这个字典, 例如, m.x 等价于 m.__dict__[" x" ]. 模块对象不包含初始化该模块的代码对象 (因为初始化完成后就不再需要它了).

对模块属性的赋值会更新模块的名字空间, 例如 m.x = 1 等价于 m.__dict__["x"] = 1.

只读特殊属性 __dict__ 就是模块名字空间的字典对象.

预定义的可写属性: __name__ 是模块名; __doc__ 是模块的文档字符串或 None. 如果模块是由文件加载的, __file__ 是对应文件的路径名, 用C语言编写的静态链接进解释器的模块没有这个属性, 而对于从共享库加载的模块, 这个属性的值就是共享库的路径.

Custom classes

定制类类型. 定制类, 一般是由类定义创建的 (见类定义). 类用字典对象实现其名字空间, 对类属性的访问会转换成对该字典的查找, 例如 C.x 被解释成 C.__dict__[" x" ] (但也有许多钩子机制允许我们用其它方式访问属性). 当此查找没有找到属性时, 搜索会在基类中继续进行. 基类中的搜索方法使用C3方法解析顺序, 这种方法即便是多重继承里出现了公共祖先类的 “菱形” 结构也能保持正确行为. 关于Python使用的 C3 MRO 额外细节可以在 2.3 版本的附带文档中找到:

http://www.python.org/download/releases/2.3/mro/.

当一个类 (假如是类 C) 的属性引用会产生类方法对象时, 它就会被转换成实例方法对象, 并将这个对象的 __self__ 属性指向 C. 当要产生静态方法对象时, 它会被转换成用静态方法对象包装的对象. 另一种获取与 __dict__ 实际内容不同的属性的方法可以参考实现描述符.

类属性的赋值会更新类的字典, 而不是基类的字典.

一个类对象可以被调用 (如上所述), 以产生一个类实例 (下述).

特殊属性: __name__ 是类名, __module__ 是类定义所在的模块名; __dict__ 是类的名字空间字典. __bases__ 是基类元组 (可能为空或独元), 基类的顺序以定义时基类列表中的排列次序为准. __doc__ 是类的文档字符串或者 None.

Class instances

类实例是用类对象调用创建的. 类实例有一个用字典实现的名字空间, 它是进行属性搜索的第一个地方. 如果属性没在那找到, 但实例的类中有那个名字的属性, 就继续在类属性中查找. 如果找到的是一个用户定义函式对象, 它被转换成实例方法对象, 这个对象的 __self__ 属性指向实例本身. 静态方法和类方法对象也会按上面 “Classes” 中的介绍那样进行转换. 另一种获取与 __dict__ 实际内容不同的属性的方法可以参考实现描述符. 如果没有找到匹配的类属性, 但对象的类提供了 __getattr__() 方法, 那么最后就会调用它完成属性搜索.

属性的赋值和删除会更新实例字典, 而不是类的字典. 如果类具有方法 __setattr__() 或者 __delattr__() 就会调用它们, 而不是直接更新字典.

如果提供了相应特别方法的定义, 类实例可以伪装成数值, 序列或者映射类型, 参见特殊方法名.

特殊属性: __dict__ 是属性字典; __class__ 是实例的类.

I/O objects (also known as file objects)

文件对象表示已经打开的文件. 创建文件对象有许多不同方法: 内建函式 open(), os.popen(), os.fdopen() 和 socket 对象的 makefile() 方法创建 (其它扩展模块的方法或函式也可以).

对象 sys.stdin, sys.stdout 和 sys.stderr 被初始化为解释器相应的标准输入流, 标准输出流和标准错误输出流. 它们都以文本模式打开, 因此都遵循抽象类 io.TextIOBase 定义的接口.

Internal types

有少量解释器内部使用的类型是用户可见的, 它们的定义可能会在未来版本中改变, 出于完整性的考虑这里也会提一下它们.

Code objects

代码对象表示 字节编译 过的可执行Python代码, 或者称为 bytecode. 代码对象与函式对象的不同在于函式对象包含了函式全局变量的引用 (所在模块定义的), 而代码对象不包括上下文. 默认参数值也保存在函式对象里, 而不在代码对象中 (因为它们表示的是运行时计算出来的值). 不像函式对象, 代码对象是不可变的, 并且不包括对可变对象的 (直接或间接的) 引用.

只读特殊属性: co_name 给出了函式名; co_argcount 是位置参数的数目 (包括有默认值的参数) ; co_nlocals 是函式使用的局部变量的数目 (包括参数). co_varnames 是一个包括局部变量名的元组 (从参数的名字开始) ; co_cellvars 是一个元组, 包括由嵌套函式引用的局部变量名; co_freevals 元组包括了自由变量的名字; co_code 是字节编译后的指令序列的字符串表示; co_consts 元组包括字节码中使用的字面值; co_names 元组包括字节码中使用的名字; co_ﬁlename 记录了字节码来自于什么文件; co_ﬁrstlineno 是函式首行号; co_lnotab 是一个字符串, 它表示从字节码偏移到行号的映射 (细节可以在解释器代码中找到) ; co_stacksize 是需要的堆栈尺寸 (包括局部变量) ; co_ﬂags 是一个表示解释器各种标志的整数.

co_ﬂags 定义了如下标志位: 如果函式使用了 *arguments 语法接收任意数目的位置参数就会把 0x04 置位; 如果函式使用了 **keywords 语法接收任意数量的关键字参数, 就会把 0x08 置位. 如果函式是一个产生器 (generator), 就会置位 0x20.

“Future功能声明” (from __future__ import division) 也使用了 co_flags 的标志位指出代码对象在编译时是否打开某些特定功能: 如果函式是打开了future division编译的, 就会把 0x2000 置位; 之前版本的Python使用过位 0x10 和 0x1000.

co_flags 中其它位由解释器内部保留.

如果代码对象表示的是函式, 那么 co_consts 的第一个项是函式的文档字符串, 或者为 None.

Frame objects

栈桢对象表示执行时的栈桢, 它们会在回溯对象中出现 (下述).

只读特殊属性: 属性 f_back 指向前一个栈桢 (朝着调用者的方向), 如果位于堆栈底部它就是 None; 属性 f_code 指向在这个栈桢结构上执行的代码对象. 属性 f_locals 是用于查找局部变量的字典; 属性 f_globals 字典用于查找全局变量; 属性 f_builtins 字典用于查找内建名字; 属性 lasti 以代码对象里指令字符串的索引的形式给出了精确的指令.

可写特殊属性: 属性 f_trace 如果不是 None, 就是这个栈桢所在函式的名称 (用于调试器). 属性 f_lineno 是此栈帧当前行的行号, 在跟踪函式里如果写入这个属性, 可以使程序跳转到新行上 (只能用于最底部的栈桢), 调试器可以这样实现跳转命令 (即 “指定下一步” 语句).

Traceback objects

回溯对象表示一个 “异常” 的栈回溯. 回溯对象会在发生异常时创建. 当我们在栈桢内搜索异常处理器时, 每当要搜索一个栈桢就会把一个回溯对象会插入到当前回溯对象的前面. 在进行异常处理器时, 回溯对象对程序也就可用了 (参见 try 语句). 这些回溯对象可以通过 sys.exc_info() 返回元组的第三项访问. 当程序中没有适当的异常处理器, 回溯对象就被打印到标准错误输出上. 如果工作在交互模式上, 也可以通过 sys.last_traceback 访问.

只读特殊属性: tb_text 是堆栈回溯的下一级 (向着发生异常的那个栈桢), 或者如果没有下一级就为 None. 属性 tb_frame 指向当前的栈桢对象; 属性 tb_lineno 给出发生异常的行号; 属性 tb_lasti 精确地指出对应的指令. 如果异常发生在没有匹配 except 或 finally 子句的 try 语句中, 回溯对象中的行号和指令可能与栈桢对象中的行号和指令不同.

Slice objects

切片对象用于在 __getitem__() 方法中表示切片信息, 也可以用内建函式 slice() 创建.

只读特殊属性: start 是下界; stop 是上界; step 是步长, 如果忽略任何一个, 就取 None 值. 这些属性可以是任意类型.

切片对象支持一个方法:

slice.indices(self, length)¶

这个方法根据整数参数 length 判断切片对象是否能够描述 length 长的元素序列. 它返回一个包含三个整数的元组, 分别是索引 start, stop 和步长 step. 对于索引不足或者说越界的情况, 返回值提供的是切片对象中能够提供的最大 (最小) 边界索引.

Static method objects

这种对象提供一种可以绕过上面函式对象到方法对象转换的方法. 静态方法对象一般是其他对象的包装, 通常是用户定义方法. 当从一个类或者类实例获取静态方法对象时, 返回的对象通常是包装过的, 没有经过前面介绍的其他转换. 虽然它所包装的对象经常是可调用的, 但静态方法对象本身是不可调用的. 静态方法对象可以用内建函式 staticmethod() 创建.

Class method objects

类方法对象. 类似于静态方法对象, 也用来包装其他对象的. 是从类或者类实例获取对象的另一种候选方案. 获取对象的具体行为已经在 “用户定义方法” 中介绍过了. 类方法对象可以使用内建函式 classmethod() 创建.

特殊方法名¶

通过定义特殊方法, 类能够实现特殊语法所调用的操作 (例如算术运算, 下标及切片操作). 这是Python方式的运算符重载(operator overloading), 允许类能够针对语言运算符定义自己的行为. 例如, 某个类定义了方法 __getitem__(), 并且 x 是这个类的实例, 那么 x[i] 就粗略等价于 type(x).__getitem__(x, i). 除非特别标示, 在没有适当定义方法的类上执行操作会导致抛出异常, 一般是 AttributeError 或者 TypeError.

在实现要模拟任意内建类型的类时, 需要特别指出的是 “模拟” 只是达到了满足使用的程度, 这点需要特别指出. 例如, 获取某些序列的单个元素是正常的, 但使用切片却是没有意义的 (一个例子是在W3C文档对象模型中的 NodeList 接口.)

基本定制¶

object.__new__(cls[, ...])¶

用于创建类 cls 的新实例. __new__() 是静态方法 (但你并不需要显式地这样声明), 它的第一个参数是新实例的类, 其余的参数就是传递给类构造器 (即类调用) 的那些参数. __new__() 的返回值应该是新对象实例 (一般来说是类 cls 的实例).

这个方法的典型实现是用适当的参数通过 super(currentclass, cls).__new__(cls[,...]) 调用父类的 __new__() 方法创建新实例, 在其基础上做可能的修改, 再返回之.

如果 __new__() 返回了 cls 的一个实例, 之后会以 __init__(self[,...]) 的方式调用新实例的 __init__() 方法, 其中 self 是新实例, 其余参数与传递给 __new__() 的相同.

如果 __new__() 没有返回 cls 的实例, 就不会调用新实例的 __init__().

引入 __new__() 主要是为了允许对不可变类型 (如整数, 字符串和元组) 的子类定制实例. 另外, 它通常也在元类 (metaclass) 定制化时被重载, 目的是定制类的创建.

object.__init__(self[, ...])¶: 在创建新实例时调用. 参数与传递给类构造表达式的参数相同. 如果基类中定义了 __init__() 方法, 那么必须显式地调用它以确保完成对实例基础部分的初始化. 例如, BaseClass.__init__(self, [args...]). 作为一个构造时的特殊限制, 这个方法不会返回任何值, 否则会导致运行时抛出异常 TypeError.

object.__del__(self)¶: 在实例要被释放 (destroy) 时被调用, 也称为析构器. 如果基类中也有 __del__() 方法, 那么子类应该显式地调用它以确保正确删除实例的基础部分. 注意, 在 __del__() 里可以创建本对象的新引用来达到推迟删除的目的, 但这并不是推荐做法. __del__() 方法在删除最后一个引用后不久调用. 但不能保证, 在解释器退出时所有存活对象的 __del__() 方法都能被调用.

Note

del x 并不直接调用 x.__del__() ——— 前者将引用计数减一, 而后者只有在引用计数减到零时才被调用. 引用计数无法达到零的一些常见情况有: 对象之间的循环引用 (例如, 一个双链表或一个具有父子指针的树状数据结构); 对出现异常的函式的栈桢上对象的引用 (sys.ext_info()[2] 中的回溯对象保证了栈桢不会被删除); 或者交互模式下出现未拦截异常的栈桢上的对象的引用 (sys.last_traceback 中的回溯对象保证了栈桢不会被删除). 第一种情况只有能通过地打破循环才能解决. 后两种情况, 可以通过将 sys.last_traceback 赋予 None 解决. 只有在打开循环检查器选项时 (这是默认的), 循环引用才能被垃圾回收机制发现, 但前提是Python脚本中的 __del__() 方法不要参与进来. 关于 __del__() 与循环检查器是如何相互影响的详细信息, 可以参见 gc 模块的介绍, 尤其是其中的 garbage 值的描述.

Warning

因为调用 __del__() 方法时环境的不确定性, 它执行时产生的异常会被忽略掉, 只是在 sys.stderr 打印警告信息. 另外, 当因为删除模块而调用 __del__() 方法时 (例如, 程序退出时), 有些 __del__() 所引用的全局名字可能已经删除了, 或者正在删除 (例如, 正在清理 import 关系). 由于这些原因, __del__() 方法对外部不变式的要求应该保持最小. 从Python1.5开始, Python 可以保证以单下划线开始的全局名字一定在其它全局名字之前从该模块中删除, 如果没有其它对这种全局名字的引用, 这个功能有助于保证导入的模块在调用 __del__() 时还是有效的.

object.__repr__(self)¶

使用内建函式 repr() 计算对象的 “正式” 字符串表示时会调用这个方法. 尽可能地, 结果应该是一个能够重建具有相同值的对象的有效 Python 表达式 (在适当环境下). 如果这不可能, 也应该是返回一个形如 <... 一些有用的描述...> 的字符串. 返回值必须是一个字符串对象. 如果类定义了 __repr__() 方法, 但没有定义 __str__(), 那么 __repr__() 也可以用于产生类实例的 “说明性 “字符串描述.

一般来说, 这通常用于调试, 所以描述字符串的信息丰富性和无歧义性是很重要的.

object.__str__(self)¶: 由内建函式 str() 和 print() 调用, 用于计算一个对象的 “说明性” 字符串描述. 与 __repr__() 不同, 这里并不要求一定是有效的 Python 表达式, 可以采用比较通俗简洁的表述方式. 返回值必须是一个字符串对象.

object.__format__(self, format_spec)¶

由内建函式 format() (和 str 类的方法 format()) 调用, 用来构造对象的 “格式化” 字符串描述. format_spec 参数是描述格式选项的字符串. format_spec 的解释依赖于实现 __format__() 的类型, 但一般来说, 大多数类要么把格式化任务委托 (转交) 给某个内建类型, 或者使用与内建类型类似的格式化选项.

关于标准格式语法的描述, 可以参考 Format Specification Mini-Language.

返回值必须是字符串对象.

object.__lt__(self, other)¶

object.__le__(self, other)¶

object.__eq__(self, other)¶

object.__ne__(self, other)¶

object.__gt__(self, other)¶

object.__ge__(self, other)¶

它们称为” 厚比较” 方法. 运算符与方法名的对应关系如下: x<y 调用 x.__lt__(y), x<=y 调用 x.__le__(y), x==y 调用 x.__eq__(y), x!=y 调用 x.__ne__(y), x>y 调用 x.__gt__(y), x>=y 调用 x.__ge__(y).

不是所有厚比较方法都要同时实现的, 如果个别厚比较方法没有实现, 可以直接返回 NotImplemented. 从习惯上讲, 一次成功的比较应该返回 False 或 True. 但是这些方法也可以返回任何值, 所以如果比较运算发生在布尔上下文中 (例如 if 语句中的条件测试), Python会在返回值上调用函式 bool() 确定返回值的真值.

在比较运算符之间并没有潜在的相互关系. x==y 为真并不意味着 x!=y 为假. 因此, 如果定义了方法 __eq__(), 那么也应该定义 __ne__(), 这样才可以得到期望的效果. 关于如何创建可以作为字典键使用的 hashable 对象, 还需要参考 __hash__() 的介绍.

没有参数交换版本的方法定义 (这可以用于当左边参数不支持操作, 但右边参数支持的情况). __lt__() 和 __gt__() 相互反射 (即互为参数交换版本) ; __le__() 和 __ge__() 相互反射; __eq__() 和 __ne__() 相互反射.

传递给厚比较方法的参数不能是被自动强制类型转换的 (coerced).

关于如何从一个根操作自动生成顺序判定操作, 可以参考 functools.total_ordering().

object.__hash__(self)¶

由内建函式 hash(), 或者是在可散列集合 (hashed collections, 包括 set, frozenset 和 dict) 成员上的操作调用. 这个方法应该返回一个整数. 只有一个要求, 具有相同值的对象应该有相同的散列值. 应该考虑以某种方式 (例如排斥或) 把在对象比较中起作用的部分与散列值关联起来.

如果类没有定义 __eq__() 方法, 那么它也不应该定义 __hash__() 方法; 如果一个类只定义了 __eq__() 方法, 那么它是不适合作散列键的. 如果可变对象实现了 __eq__() 方法, 它也不应该实现 __hash__() 方法, 因为可散列集合要求键值是不可变的 (如果对象的散列值发生了改变, 它会被放在错误的桶 (bucket) 中).

所有用户定义类默认都定义了方法 __eq__() 和 __hash__(), 这样, 所有对象都可以进行相等比较 (除了与自身比较). x.__hash__() 返回 id(x).

如果子类从父类继承了方法 __hash__(), 但修改了 __eq__(), 这时子类继承的散列值就不再正确了 (例如, 可能从默认的标识相等的比较切换成了值相等的比较), 这时在在类定义时显式地将 __hash__() 设置成 None 就行了. 这样, 在使用这个子类对象作为散列键时就会抛出 TypeError 异常, 或者也可以使用常规的可散列检查 isinstance(obj, collections.Hashable) 确定它的不可散列性 (使用这种检查方式时, 这种达到不可散列的方法与在 __hash__() 中显式抛出异常的方法是的计算结果就不一样了).

如果子类修改了 __eq__() 方法, 但需要保留父类的 __hash__(), 它必须显式地告诉解释器 __hash__ = <ParentClass>.__hash__, 否则 __hash__() 的继承会被阻止, 就像设置 __hash__ 为 None.

object.__bool__(self)¶: 在实现真值测试和内建操作 bool() 中调用, 应该返回 False 和 True. 如果没有这个定义方法, 转而使用 __len__(). 非零返回值, 当作 “真”. 如果这两个方法都没有定义, 就认为该实例为 “真”.

自定义属性权限¶

以下方法可以用于定制访问类实例属性的含义 (例如, 赋值, 或删除 x.name)

object.__getattr__(self, name)¶

在正常方式访问属性无法成功时 (就是说, self属性既不是实例的, 在类树结构中找不到) 使用. name 是属性名. 应该返回一个计算好的属性值, 或抛出一个 AttributeError 异常.

注意, 如果属性可以通过正常方法访问, __getattr__() 是不会被调用的 (是有意将 __getattr__() 和 __setattr__() 设计成不对称的). 这样做的原因是基于效率的考虑, 并且这样也不会让 __getattr__() 干涉正常属性. 注意, 至少对于类实例而言, 不必非要更新实例字典伪装属性 (但可以将它们插入到其它对象中). 需要全面控制属性访问, 可以参考以下 __getattribute__() 的介绍.

object.__getattribute__(self, name)¶: 在访问类实例的属性时无条件调用这个方法. 如果类也定义了方法 __getattr__(), 那么除非 __getattribute__() 显式地调用了它, 或者抛出了 AttributeError 异常, 否则它就不会被调用. 这个方法应该返回一个计算好的属性值, 或者抛出异常 AttributeError. 为了避免无穷递归, 对于任何它需要访问的属性, 这个方法应该调用基类的同名方法, 例如, object.__getattribute__(self, name).

Note

但是, 通过特定语法或者内建函式, 做隐式调用搜索特殊方法时, 这个方法可能会被跳过, 参见搜索特殊方法.

object.__setattr__(self, name, value)¶

在属性要被赋值时调用. 这会替代正常机制 (即把值保存在实例字典中). name 是属性名, vaule 是要赋的值.

如果在 __setattr__() 里要对一个实例属性赋值, 它应该调用父类的同名方法, 例如, object.__setattr__(self, name, value).

object.__delattr__(self, name)¶: 与 __setattr__() 类似, 但它的功能是删除属性. 当 del obj.name 对对象有意义时, 才需要实现它.

object.__dir__(self)¶: 在对象上调用 dir() 时调用, 它需要返回一个列表.

实现描述符¶

以下方法只能使用在 “描述符类” 中, “描述子类” 的实例出现在其他类 (称为 “所有者类”) 的类字典中, 而这个类包括以下方法. 在下面的例子里, “属性” 专指在所有者类字典中的属性.

object.__get__(self, instance, owner)¶: 在获取所有者类属性或实例属性时调用这个方法. owner 是所有者类, instance 用于访问的所有者类的实例, 如果是通过 owner 访问的话, 这个参数为 None. 这个方法应该返回一个计算好的属性值, 或者抛出异常 AttributeError.

object.__set__(self, instance, value)¶: 在给所有者类的一个实例 instance 设置属性时调用这个方法, value 代表新值.

object.__delete__(self, instance)¶: 删除所有者类实例 instance 的属性时调用这个方法.

调用描述符¶

一般来说, 描述符就是一个有 “绑定行为” 的对象属性, 这种属性的访问操作会以描述符协议的方式替代, 即方法 __get__(), __set__() 和 __delete__(). 如果一个对象定义了任何以上方法之一, 就称它为 “描述符”.

属性访问的默认行为是从对象字典中获取, 设置, 删除. 例如, a.x 会在导致以下的搜索链: 先 a.__dict__['x'] 后 type(a).__dict__['x'], 之后再从 type(a) 的父类中搜索, 但不搜索元类 (metaclass).

但是, 如果搜索的是一个定义了描述符方法的对象, Python 会放弃默认方案转而调用描述符方法. 调用在以上搜索链上的位置取决于定义了什么描述符方法及调用方式的.

描述符调用始于 “绑定”, a.x, 方法参数的组织取决于 a:

Direct Call: 直接调用. 这是最简单但也是最不常用的方法, 用户直接调用一个描述符方法, 例如 x.__get__(a).
Instance Binding: 实例绑定. 如果与实例绑定, a.x 会转换为以下调用: type(a).__dict__['x'].__get__(a, type(a)).
Class Binding: 类绑定. 如果与类绑定, A.x 会转换为以下调用: A.__dict__['x'].__get__(None, A).
Super Binding: 超级绑定. 如果 a 是类 super 的一个实例, 那么绑定 super(B, obj).m() 会在 obj.__class__.__mro__ 里直接搜索基类 A, 而不是先搜索 B, 并调用 A.__dict__['m'].__get__(obj, A).

对于实例绑定, 描述符调用的优先顺序依赖于定义了什么描述符方法. 描述符可以定义 __get__(), __set__() 和 __delete__() 的一个任意组合. 如果它没有定义 __get__(), 并且在对象实例的字典中没有这个值, 访问该属性就直接会返回描述符本身. 如果描述符定义了 __set__() 和 (或) __delete__(), 它就是一个数据描述符. 如果两者都没有定义, 它就是非数据描述符. 正常情况下, 数据描述符会定义两个方法 __get__() 和 __set__(), 而非数据描述符只会定义 __get__(). ; 定义了 __set__() and __get__() 数据描述符会覆盖实例的字典, 相比之下, 实例字典会反之覆盖掉非数据描述符.

Python 方法 (包括 staticmethod() 和 classmethod()) 是以非数据描述符实现的. 因此, 实例可以重新定义或者说覆盖方法. 这允许同一个类的不同实例可以有不同的行为.

函式 property() 是用数据描述符实现的, 因此, 实例不能覆写特性 (property) 的行为.

slots¶

默认情况下, 类实例使用字典管理属性. 在对象只有少量实例变量时这就会占用不少空间, 当有大量实例时, 空间消耗会变得更为严重.

这个默认行为可以通过在类定义中定义 __slots__ 修改. __slots__ 声明只为该类的所有实例预留刚刚够用的空间. 因为不会为每个实例创建 __dict__, 因此空间节省下来了.

object.__slots__¶: 这个类变量可以赋值为一个字符串, 一个可迭代对象, 或者一个字符串序列 (每个字符串表示实例所用的变量名). 如果定义了 __slots__, Python 就会为实例预留出存储声明变量的空间, 并且不会为每个实例自动创建 __dict__ 和 __weakref__.

使用 __slots__ 的注意事项¶

如果从一个没有定义 __slots__ 的基类继承, 子类一定存在 __dict__ 属性. 所以, 在这种子类中定义 __slots__ 是没有意义的.
没有定义 __dict__ 的实例不支持对不在 __slot__ 中的属性赋值. 如果需要支持这个功能, 可以把 '__dict__' 放到 __slots__ 声明中.
没有定义 __weakref__ 的, 使用 __slot__ 的实例不支持对它的 “弱引用”. 如果需要支持弱引用, 可以把 '__weakref__' 放到 __slots__ 声明中.
__slots__ 是在类这一级实现的, 通过为每个实例创建描述符 (实现描述符). 因此, 不能使用类属性为实例的 __slots__ 中定义的属性设置默认值. 否则, 类属性会覆盖描述符的赋值操作.
__slots__ 声明的行为只限于其定义所在的类. 因此, 如果子类没有定义自己的 __slots__ (它必须只包括那些额外的 slots), 子类仍然会使用 __dict__.
如果类定义的slot与父类中的相同, 那么父类slot中的变量将成为不可访问的 (除非直接从基类中获取描述符). 这使得程序行为变得有一点模糊, 以后可能会增加一个防止出现这种情况的检查.

从”变长”内建类型, 例如 int, str 和 tuple 继承的子类的非空 __slots__ 不会起作用.
任何非字符串可迭代对象都可以赋给 __slots__. 映射类型也是允许的, 但是, 以后版本的Python可能给 “键” 赋予特殊意义.
只有在两个类的 __slots__ 相同时, __class__ 赋值才会正常工作.

类创建的定制¶

默认情况下, 类是由 type() 构造的. 类的定义会读入一个独立的名字空间, 并且类名称的值会与 type(name, bases, dict) 的返回结果绑定.

在读取类定义时, 如果在基类名之后给出了可调用类型的关键字参数 metaclass (元类), 这时就不再调用 type() 转而使用这个可调用对象. 如果有其它关键字参数, 它们也会传递给 metaclass. 这就允许我们使用类或者函式来监视或修改类创建过程:

在类创建之前修改类的字典.
返回其它类的实例 – 基本上这里执行的就是工厂方法.

以上步骤必须在元类的 __new__() 方法内完成 – 然后从这个方法中调用 type.__new__() 创建具有不同特性 (properties) 的新类. 下面的例子会在创建类之前在类字典中增加一个新元素

class metacls(type):
    def __new__(mcs, name, bases, dict):
        dict['foo'] = 'metacls was here'
        return type.__new__(mcs, name, bases, dict)

你当然也可以覆盖其它类方法 (或是增加新方法), 例如, 在元类中定制 __call__() 方法就可以控制类在调用时的行为, 例如, 不会每次调用都返回一个实例.

如果元类具有 __prepare__() 属性 (一般是用类或者静态方法实现的), 它会在对类定义体 (结合类名, 父类和参数) 估值之前调用. 这个方法应该返回一个支持映射接口的对象, 这个对象用于存储类的名字空间. 默认是一个普通字典. 例如, 可以通过返回一个有序字典跟踪类属性的声明顺序.

使用的元类是按以下优先顺序确定的:

如果在基类位置上使用了 metaclass 关键字参数, 就使用它.
否则, 如果至少有一个基类, 就用基类的元类.
否则, 使用默认元类 type.

元类的用途非常广泛, 目前已知的用法有记录日志, 接口检查, 自动委托, 特性 (property) 自动创建, 代理, 框架, 自动资源锁定及同步.

这里是一个使用 collections.OrderedDict 的元类例子, 它可以记住类成员的定义顺序

class OrderedClass(type):

     @classmethod
     def __prepare__(metacls, name, bases, **kwds):
        return collections.OrderedDict()

     def __new__(cls, name, bases, classdict):
        result = type.__new__(cls, name, bases, dict(classdict))
        result.members = tuple(classdict)
        return result

class A(metaclass=OrderedClass):
    def one(self): pass
    def two(self): pass
    def three(self): pass
    def four(self): pass

>>> A.members
('__module__', 'one', 'two', 'three', 'four')

在类定义 A 执行时, 进程开始于调用元类的 __prepare__() 方法并返回一个空 collections.OrderedDict, 这个映射会记录在 class 语句中定义的 A 的方法和属性. 一旦执行完这个定义, 有序字典就完全设置好了, 并调用元类的 __new__() 方法, 这个方法会创建新的类型, 并把这个有序字典的键保存在属性 members 中.

对实例和子类检查的定制¶

以下函式可以用于定制内建函式 isinstance() 和 issubclass() 的默认行为.

具体地, 元类 abc.ABCMeta 实现了这些方法, 以允许把抽象基类 (ABC) 作为任何类或者类型 (包括内建类型), 也包括其他ABC的 “虚拟基类”.

class.__instancecheck__(self, instance)¶

　如果 instance 应该被看作 class 的一个直接或者间接的实例, 就返回真.: 如果定义了这个方法, 它就会被用于实现 isinstance(instance, class).

class.__subclasscheck__(self, subclass)¶: 如果 subclass 应该被看作是 class 的一个直接或者间接的基类, 就返回真. 如果定义了这个方法, 它就会被用于实现 issubclass(subclass, class).

注意, 这个方法会在一个类的类型 (元类) 中查找. 它们不能作为一个类方法在类中定义. 这个机制, 与在实例上查找要调用的特殊方法的机制是一致的, 因为在这个情况下实例本身就是一个类.

See also

PEP 3119 - Introducing Abstract Base Classes　 (抽象基类介绍): 包括定制通过 __instancecheck__() 和 __subclasscheck__()　定制 isinstance() 和 issubclass() 行为的规范, 以及在语言上增加抽象基类 (见 abc 模块) 这个背景设置这个功能的动机.

模拟可调用对象¶

object.__call__(self[, args...])¶: 当实例作为函式使用时调用本方法. 如果定义了这个方法, x(arg1, arg2,...) 就相当于 x.__call__(arg1, arg2,...).

模拟容器对象¶

通过定义以下方法可以实现容器对象. 容器通常指序列 (如列表或元组) 或映射类型 (如字典), 不过也可以表示其它容器. 第一个方法集用于模拟序列或映射: 序列的区别在于, 键只能是整数 k (0 <= k < N, N 是序列的长度) 或者是定义了处在一个范围内的项的切片对象. 另外, 也推荐模拟映射类型时实现方法 keys(), values(), items(), get(), clear(), setdefault(), pop(), popitem(), copy() 和 update(), 这使得模拟出来的行为与 Python 标准字典对象类似. 模块 collections 提供了 MutableMapping 抽象基类可用于帮助从基本方法 __getitem__(), __setitem__(), __delitem__() 和 keys() 中创建这些方法. 可变序列应该提供方法 append(), count(), index(), extend(), insert(), pop(), remove(), reverse() 和 sort(), 就像Python标准列表对象那样. 最后, 序列应该用下述的方法 __add__(), __radd__(), __iadd__(), __mul__(), __rmul__() 和 __imul__() 实现 “加” 操作 (即连接) 和 “乘” 操作 (即重复). 它们也可以实现其它算术运算符. 推荐映射和序列实现 __contains__() 方法以实现 in 操作符的有效使用, 对于映射类型, in 应该搜索映射的键, 对于序列, 它应该搜索值. 进一步的建议是映射和序列实现 __iter__() 方法, 以支持在容器中有效地进行迭代. 对于映射, __iter__() 应该与 keys() 相同, 对于序列, 它应该在值之间迭代.

object.__len__(self)¶: 实现内建函式 len() 相仿的功能. 应该返回对象的长度, 一个大于等于0的整数. 另外, 如果对象没有定义 __bool__() 方法, 那么在布尔上下文中, __len__() 返回的 0 将被看作是 “假”.

Note

切片独立于下面介绍的三个方法, 类似于下面的调用:

a[1:2] = b

会转换为

a[slice(1, 2, None)] = b

以此类推. 缺少的切片项会被替代为 None.

object.__getitem__(self, key)¶: 用于实现 self[key]. 对于序列, 可接受的 key 应该有整数和切片对象. 注意对负数索引 (如果类希望模拟序列) 的特殊解释也依赖于 __getitem__() 方法. 如果 key 的类型不合适, 可以抛出异常 TypeError. 如果 key 的值在序列的索引范围之外 (在任何负值索引的特殊解释也行不通的情况下), 可以抛出 IndexError 异常. 对于映射类型, 如果没有给出 key (或者不在容器之内), 应该抛出异常 KeyError.

Note

for 循环根据由于无效索引导致的 IndexError 异常检测序列的结尾.

object.__setitem__(self, key, value)¶: 在对 self[key] 赋值时调用. 与 __getitem__() 有着相同的注意事项. 如果映射类型对象要支持改变键的值或者增加新键, 或者序列要支持可替换元素时, 应该实现这个方法. 在使用无效的 key 值时, 会抛出与 __getitem__() 相同的异常.

object.__delitem__(self, key)¶: 在删除 self[key] 时调用, 与 __getitem__() 有相同的注意事项. 如果映射对象要支持删除键, 或者序列对象要支持元素的删除, 就应该实现这个方法. 在使用无效的 key 值时, 会抛出与 __getitem__() 相同的异常.

object.__iter__(self)¶

在使用容器的迭代器时会调用这个方法. 本方法应该返回一个可以遍历容器内所有对象的迭代器对象. 对于映射类型, 应该在容器的键上迭代, 并且也应该定义 keys() 方法.

迭代器对象也需要实现这个方法, 它们应该返回它自身. 关于迭代器对象的更多信息, 可以参考 Iterator Types.

object.__reversed__(self)¶

在使用内建函式 reversed() 实现反向迭代时调用这个方法. 它应该返回一个以相反顺序在容器内迭代所有对象的新迭代器对象.

如果没有定义方法 __reversed__(), 内建函式 reversed() 会切换到备用方案: 使用序列协议 (__len__() 和 __getitem__()). 支持序列协议的对象只在有更高效的 reversed() 实现方法时, 才有必要实现 __reversed__().

成员测试运算符 (in 和 not in) 一般是在对序列进行迭代实现的. 但是容器也可以实现方法提供更高效率的实现, 并不要求对象一定是序列.

object.__contains__(self, item)¶

在成员测试时调用这个方法. 如果 item 在 self 之内应该返回真, 否则返回假. 对于映射类型的对象, 测试应该是针对键的, 而不是值, 或者键值对.

没有定义 __contains__() 方法的对象, 在进行成员测试时首先尝试使用 __iter__() 方法进行迭代, 然后尝试使用旧式的有序对象迭代协议 __getitem__(), 参见这里的讨论.

模拟数值类型¶

以下方法用于模拟数值类型. 不同数值类型所支持的操作符并不完全相同, 如果一个类型不支持某些操作符 (例如非整数值上的位运算), 对应的方法就不应该被实现.

object.__add__(self, other)¶

object.__sub__(self, other)¶

object.__mul__(self, other)¶

object.__truediv__(self, other)¶

object.__floordiv__(self, other)¶

object.__mod__(self, other)¶

object.__divmod__(self, other)¶

object.__pow__(self, other[, modulo])¶

object.__lshift__(self, other)¶

object.__rshift__(self, other)¶

object.__and__(self, other)¶

object.__xor__(self, other)¶

object.__or__(self, other)¶

这些方法用于二元算术操作 (+, -, *, /, //, %, divmod(), pow(), **, <<, >>, &, ^, |). 例如, 在计算表达式 x + y 时, x 是一个定义了 __add__() 的类的实例, 那么就会调用 x.__add__(y). 方法 __divmod__() 应该与使用 __floordiv__() 和 __mod__() 的结果相同, 但应该与 __truediv__() 无关. 注意如果要支持三参数版本的内建函式 pow() 的话, 方法 __pow__() 应该被定义成可以接受第三个可选参数的.

如果以上任一方法无法处理根据参数完成计算的话, 就应该返回 NotImplemented.

object.__radd__(self, other)¶

object.__rsub__(self, other)¶

object.__rmul__(self, other)¶

object.__rtruediv__(self, other)¶

object.__rfloordiv__(self, other)¶

object.__rmod__(self, other)¶

object.__rdivmod__(self, other)¶

object.__rpow__(self, other)¶

object.__rlshift__(self, other)¶

object.__rrshift__(self, other)¶

object.__rand__(self, other)¶

object.__rxor__(self, other)¶

object.__ror__(self, other)¶

这些方法用于实现二元算术操作 (+, -, *, /, //, %, divmod(), pow(), **, <<, >>, &, ^, |), 但用于操作数反射 (即参数顺序是相反的). 这些函式只有在左操作数不支持相应操作, 并且是参数是不同类型时才会被使用. [2] 例如, 计算表达式 x - y, y 是一个定义了方法 __rsub__() 的类实例, 那么在 x.__sub__(y) 返回 NotImplemented 时才会调用 y.__rsub__(x).

注意三参数版本的 pow() 不会试图调用 __rpow__(). (这会导致类型自动转换规则过于复杂)

Note

注意: 如果右操作数的类型是左操作数的一个子类, 并且这个子类提供了操作数反射版本的方法. 那么, 子类的操作数反射方法将在左操作数的非反射方法之前调用. 这个行为允许了子类可以覆盖祖先类的操作.

object.__iadd__(self, other)¶
object.__isub__(self, other)¶
object.__imul__(self, other)¶
object.__itruediv__(self, other)¶
object.__ifloordiv__(self, other)¶
object.__imod__(self, other)¶
object.__ipow__(self, other[, modulo])¶
object.__ilshift__(self, other)¶
object.__irshift__(self, other)¶
object.__iand__(self, other)¶
object.__ixor__(self, other)¶
object.__ior__(self, other)¶: 这些方法用于实现参数化算术赋值操作 (+=, -=, *=, /=, //=, %=, **=, <<=, >>=, &=, ^=, |=)). 这些方法应该是就地操作的 (即直接修改 self) 并返回结果 (一般来讲, 这里应该是对 self 直接操作, 但并不是一定要求如此). 如果没有实现某个对应方法的话, 参数化赋值会蜕化为正常方法. 例如, 执行语句 x += y 时, x 是一个实现了 __iadd__() 方法的实例, x.__iadd__(y) 就会被调用. 如果 x 没有定义 __iadd__(), 就会选择 x.__add__(y) 或者 y.__radd__(x), 与 x + y 类似.

object.__neg__(self)¶
object.__pos__(self)¶
object.__abs__(self)¶
object.__invert__(self)¶: 用于实现一元算术操作 (-, +, abs() 和 ~).

object.__complex__(self)¶
object.__int__(self)¶
object.__float__(self)¶
object.__round__(self[, n])¶: 用于实现内建函式 complex(), int(), float() 和 round(). 应该返回对应的类型值.

object.__index__(self)¶: 用于实现函式 operator.index(), 或者在Python需要一个整数对象时调用 (例如在分片时 (slicing), 或者在内建函式函式 bin(), hex() 和 oct() 中). 这个方法必须返回一个整数.

With 语句的上下文管理器¶

上下文管理器 (context manager) 是一个对象, 这个对象定义了执行 with 语句时要建立的运行时上下文. 上下文管理器负责处理执行某代码块时对应的运行时上下文进入和退出. 运行时上下文的使用一般通过 with 语句 (参见 with 语句), 但也可以直接调用它的方法.

上下文管理器的典型用途包括保存和恢复各种全局状态, 锁定和解锁资源, 关闭打开的文件等等.

关于上下文管理的更多信息, 可以参考 Context Manager Types.

object.__enter__(self)¶: 进入与这个对象关联的运行时上下文. with 语句会把这个方法的返回值与 as 子句指定的目标绑定在一起 (如果指定了的话).

object.__exit__(self, exc_type, exc_value, traceback)¶

退出与这个对象相关的运行时上下文. 参数描述了导致上下文退出的异常, 如果是无异常退出, 则这三个参数都为 None.

如果给出了一个异常, 而这个方法决定要压制它 (即防止它把传播出去), 那么它应该返回真. 否则, 在退出这个方法时, 这个异常会按正常方式处理.

注意方法 __exit__() 不应该把传入的异常重新抛出, 这是调用者的责任.

See also

PEP 0343 - The “with” statement (“with” 语句): Python with 语句的规范, 背景和例子.

搜索特殊方法¶

对于定制类, 只有在对象类型的字典里定义好, 才能保证成功调用特殊方法. 这是以下代码发生异常的原因:

>>> class C(object):
...     pass
...
>>> c = C()
>>> c.__len__ = lambda: 5
>>> len(c)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: object of type 'C' has no len()

这个行为的原因在于, 有不少特殊方法在所有对象中都得到了实现, 例如 __hash__() 和 __repr__(). 如果按照常规的搜索过程搜索这些方法, 在涉及到类型对象时就会出错:

>>> 1.__hash__() == hash(1)
True
>>> int.__hash__() == hash(int)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: descriptor '__hash__' of 'int' object needs an argument

试图以这种错误方式调用一个类的未绑定方法有时叫作 “元类含混”, 可以通过在搜索特殊方法时跳过实例避免:

>>> type(1).__hash__(1) == hash(1)
True
>>> type(int).__hash__(int) == hash(int)
True

除了因为正确性的原因而跳过实例属外之外, 特殊方法搜索也会跳过 __getattribute__() 方法, 甚至是元类中的:

>>> class Meta(type):
...    def __getattribute__(*args):
...       print("Metaclass getattribute invoked")
...       return type.__getattribute__(*args)
...
>>> class C(object, metaclass=Meta):
...     def __len__(self):
...         return 10
...     def __getattribute__(*args):
...         print("Class getattribute invoked")
...         return object.__getattribute__(*args)
...
>>> c = C()
>>> c.__len__()                 # Explicit lookup via instance
Class getattribute invoked
10
>>> type(c).__len__(c)          # Explicit lookup via type
Metaclass getattribute invoked
10
>>> len(c)                      # Implicit lookup
10

这种跳过 __getattribute__() 的机制为解释器的时间性能优化提供了充分的余地, 代价是牺牲了处理特殊方法时的部分灵活性 (为了保持与解释器的一致, 特殊方法必须在类对象中定义).

Footnotes

[1]	是在某些受控制的条件下, 修改对象类型是有可能的. 但一般这不是个好做法, 因为一旦处理不周, 就有可能导致一些非常奇怪的行为.

[2]	对于相同类型的操作数, 假定如果非反射方法失败就意味着并不支持这个运算符. 这就是没有调用反射方法的原因.

Execution model¶

Naming and binding¶

Names refer to objects. Names are introduced by name binding operations. Each occurrence of a name in the program text refers to the binding of that name established in the innermost function block containing the use.

A block is a piece of Python program text that is executed as a unit. The following are blocks: a module, a function body, and a class definition. Each command typed interactively is a block. A script file (a file given as standard input to the interpreter or specified on the interpreter command line the first argument) is a code block. A script command (a command specified on the interpreter command line with the ‘-c‘ option) is a code block. The string argument passed to the built-in functions eval() and exec() is a code block.

A code block is executed in an execution frame. A frame contains some administrative information (used for debugging) and determines where and how execution continues after the code block’s execution has completed.

A scope defines the visibility of a name within a block. If a local variable is defined in a block, its scope includes that block. If the definition occurs in a function block, the scope extends to any blocks contained within the defining one, unless a contained block introduces a different binding for the name. The scope of names defined in a class block is limited to the class block; it does not extend to the code blocks of methods – this includes comprehensions and generator expressions since they are implemented using a function scope. This means that the following will fail:

class A:
    a = 42
    b = list(a + i for i in range(10))

When a name is used in a code block, it is resolved using the nearest enclosing scope. The set of all such scopes visible to a code block is called the block’s environment.

If a name is bound in a block, it is a local variable of that block, unless declared as nonlocal. If a name is bound at the module level, it is a global variable. (The variables of the module code block are local and global.) If a variable is used in a code block but not defined there, it is a free variable.

When a name is not found at all, a NameError exception is raised. If the name refers to a local variable that has not been bound, a UnboundLocalError exception is raised. UnboundLocalError is a subclass of NameError.

The following constructs bind names: formal parameters to functions, import statements, class and function definitions (these bind the class or function name in the defining block), and targets that are identifiers if occurring in an assignment, for loop header, or after as in a with statement or except clause. The import statement of the form from ... import * binds all names defined in the imported module, except those beginning with an underscore. This form may only be used at the module level.

A target occurring in a del statement is also considered bound for this purpose (though the actual semantics are to unbind the name).

Each assignment or import statement occurs within a block defined by a class or function definition or at the module level (the top-level code block).

If a name binding operation occurs anywhere within a code block, all uses of the name within the block are treated as references to the current block. This can lead to errors when a name is used within a block before it is bound. This rule is subtle. Python lacks declarations and allows name binding operations to occur anywhere within a code block. The local variables of a code block can be determined by scanning the entire text of the block for name binding operations.

If the global statement occurs within a block, all uses of the name specified in the statement refer to the binding of that name in the top-level namespace. Names are resolved in the top-level namespace by searching the global namespace, i.e. the namespace of the module containing the code block, and the builtins namespace, the namespace of the module builtins. The global namespace is searched first. If the name is not found there, the builtins namespace is searched. The global statement must precede all uses of the name.

The builtins namespace associated with the execution of a code block is actually found by looking up the name __builtins__ in its global namespace; this should be a dictionary or a module (in the latter case the module’s dictionary is used). By default, when in the __main__ module, __builtins__ is the built-in module builtins; when in any other module, __builtins__ is an alias for the dictionary of the builtins module itself. __builtins__ can be set to a user-created dictionary to create a weak form of restricted execution.

CPython implementation detail: Users should not touch __builtins__; it is strictly an implementation detail. Users wanting to override values in the builtins namespace should import the builtins module and modify its attributes appropriately.

The namespace for a module is automatically created the first time a module is imported. The main module for a script is always called __main__.

The global statement has the same scope as a name binding operation in the same block. If the nearest enclosing scope for a free variable contains a global statement, the free variable is treated as a global.

A class definition is an executable statement that may use and define names. These references follow the normal rules for name resolution. The namespace of the class definition becomes the attribute dictionary of the class. Names defined at the class scope are not visible in methods.

Interaction with dynamic features¶

There are several cases where Python statements are illegal when used in conjunction with nested scopes that contain free variables.

If a variable is referenced in an enclosing scope, it is illegal to delete the name. An error will be reported at compile time.

If the wild card form of import — import * — is used in a function and the function contains or is a nested block with free variables, the compiler will raise a SyntaxError.

The eval() and exec() functions do not have access to the full environment for resolving names. Names may be resolved in the local and global namespaces of the caller. Free variables are not resolved in the nearest enclosing namespace, but in the global namespace. [1] The exec() and eval() functions have optional arguments to override the global and local namespace. If only one namespace is specified, it is used for both.

Exceptions¶

Exceptions are a means of breaking out of the normal flow of control of a code block in order to handle errors or other exceptional conditions. An exception is raised at the point where the error is detected; it may be handled by the surrounding code block or by any code block that directly or indirectly invoked the code block where the error occurred.

The Python interpreter raises an exception when it detects a run-time error (such as division by zero). A Python program can also explicitly raise an exception with the raise statement. Exception handlers are specified with the try ... except statement. The finally clause of such a statement can be used to specify cleanup code which does not handle the exception, but is executed whether an exception occurred or not in the preceding code.

Python uses the “termination” model of error handling: an exception handler can find out what happened and continue execution at an outer level, but it cannot repair the cause of the error and retry the failing operation (except by re-entering the offending piece of code from the top).

When an exception is not handled at all, the interpreter terminates execution of the program, or returns to its interactive main loop. In either case, it prints a stack backtrace, except when the exception is SystemExit.

Exceptions are identified by class instances. The except clause is selected depending on the class of the instance: it must reference the class of the instance or a base class thereof. The instance can be received by the handler and can carry additional information about the exceptional condition.

Note

Exception messages are not part of the Python API. Their contents may change from one version of Python to the next without warning and should not be relied on by code which will run under multiple versions of the interpreter.

See also the description of the try statement in section try 语句 and raise statement in section The raise statement.

Footnotes

[1]	This limitation occurs because the code that is executed by these operations is not available at the time the module is compiled.

Expressions¶

This chapter explains the meaning of the elements of expressions in Python.

Syntax Notes: In this and the following chapters, extended BNF notation will be used to describe syntax, not lexical analysis. When (one alternative of) a syntax rule has the form

name ::=  othername

and no semantics are given, the semantics of this form of name are the same as for othername.

Arithmetic conversions¶

When a description of an arithmetic operator below uses the phrase “the numeric arguments are converted to a common type,” this means that the operator implementation for built-in types works that way:

If either argument is a complex number, the other is converted to complex;
otherwise, if either argument is a floating point number, the other is converted to floating point;
otherwise, both must be integers and no conversion is necessary.

Some additional rules apply for certain operators (e.g., a string left argument to the ‘%’ operator). Extensions must define their own conversion behavior.

Atoms¶

Atoms are the most basic elements of expressions. The simplest atoms are identifiers or literals. Forms enclosed in parentheses, brackets or braces are also categorized syntactically as atoms. The syntax for atoms is:

atom      ::=  identifier | literal | enclosure
enclosure ::=  parenth_form | list_display | dict_display | set_display
               | generator_expression | yield_atom

Identifiers (Names)¶

An identifier occurring as an atom is a name. See section 标识符和关键字 for lexical definition and section Naming and binding for documentation of naming and binding.

When the name is bound to an object, evaluation of the atom yields that object. When a name is not bound, an attempt to evaluate it raises a NameError exception.

Private name mangling: When an identifier that textually occurs in a class definition begins with two or more underscore characters and does not end in two or more underscores, it is considered a private name of that class. Private names are transformed to a longer form before code is generated for them. The transformation inserts the class name in front of the name, with leading underscores removed, and a single underscore inserted in front of the class name. For example, the identifier __spam occurring in a class named Ham will be transformed to _Ham__spam. This transformation is independent of the syntactical context in which the identifier is used. If the transformed name is extremely long (longer than 255 characters), implementation defined truncation may happen. If the class name consists only of underscores, no transformation is done.

Literals¶

Python supports string and bytes literals and various numeric literals:

literal ::=  stringliteral | bytesliteral
             | integer | floatnumber | imagnumber

Evaluation of a literal yields an object of the given type (string, bytes, integer, floating point number, complex number) with the given value. The value may be approximated in the case of floating point and imaginary (complex) literals. See section 字面值 for details.

With the exception of bytes literals, these all correspond to immutable data types, and hence the object’s identity is less important than its value. Multiple evaluations of literals with the same value (either the same occurrence in the program text or a different occurrence) may obtain the same object or a different object with the same value.

Parenthesized forms¶

A parenthesized form is an optional expression list enclosed in parentheses:

parenth_form ::=  "(" [expression_list] ")"

A parenthesized expression list yields whatever that expression list yields: if the list contains at least one comma, it yields a tuple; otherwise, it yields the single expression that makes up the expression list.

An empty pair of parentheses yields an empty tuple object. Since tuples are immutable, the rules for literals apply (i.e., two occurrences of the empty tuple may or may not yield the same object).

Note that tuples are not formed by the parentheses, but rather by use of the comma operator. The exception is the empty tuple, for which parentheses are required — allowing unparenthesized “nothing” in expressions would cause ambiguities and allow common typos to pass uncaught.

Displays for lists, sets and dictionaries¶

For constructing a list, a set or a dictionary Python provides special syntax called “displays”, each of them in two flavors:

either the container contents are listed explicitly, or
they are computed via a set of looping and filtering instructions, called a comprehension.

Common syntax elements for comprehensions are:

comprehension ::=  expression comp_for
comp_for      ::=  "for" target_list "in" or_test [comp_iter]
comp_iter     ::=  comp_for | comp_if
comp_if       ::=  "if" expression_nocond [comp_iter]

The comprehension consists of a single expression followed by at least one for clause and zero or more for or if clauses. In this case, the elements of the new container are those that would be produced by considering each of the for or if clauses a block, nesting from left to right, and evaluating the expression to produce an element each time the innermost block is reached.

Note that the comprehension is executed in a separate scope, so names assigned to in the target list don’t “leak” in the enclosing scope.

List displays¶

A list display is a possibly empty series of expressions enclosed in square brackets:

list_display ::=  "[" [expression_list | comprehension] "]"

A list display yields a new list object, the contents being specified by either a list of expressions or a comprehension. When a comma-separated list of expressions is supplied, its elements are evaluated from left to right and placed into the list object in that order. When a comprehension is supplied, the list is constructed from the elements resulting from the comprehension.

Set displays¶

A set display is denoted by curly braces and distinguishable from dictionary displays by the lack of colons separating keys and values:

set_display ::=  "{" (expression_list | comprehension) "}"

A set display yields a new mutable set object, the contents being specified by either a sequence of expressions or a comprehension. When a comma-separated list of expressions is supplied, its elements are evaluated from left to right and added to the set object. When a comprehension is supplied, the set is constructed from the elements resulting from the comprehension.

An empty set cannot be constructed with {}; this literal constructs an empty dictionary.

Dictionary displays¶

A dictionary display is a possibly empty series of key/datum pairs enclosed in curly braces:

dict_display       ::=  "{" [key_datum_list | dict_comprehension] "}"
key_datum_list     ::=  key_datum ("," key_datum)* [","]
key_datum          ::=  expression ":" expression
dict_comprehension ::=  expression ":" expression comp_for

A dictionary display yields a new dictionary object.

If a comma-separated sequence of key/datum pairs is given, they are evaluated from left to right to define the entries of the dictionary: each key object is used as a key into the dictionary to store the corresponding datum. This means that you can specify the same key multiple times in the key/datum list, and the final dictionary’s value for that key will be the last one given.

A dict comprehension, in contrast to list and set comprehensions, needs two expressions separated with a colon followed by the usual “for” and “if” clauses. When the comprehension is run, the resulting key and value elements are inserted in the new dictionary in the order they are produced.

Restrictions on the types of the key values are listed earlier in section 标准类型层次. (To summarize, the key type should be hashable, which excludes all mutable objects.) Clashes between duplicate keys are not detected; the last datum (textually rightmost in the display) stored for a given key value prevails.

Generator expressions¶

A generator expression is a compact generator notation in parentheses:

generator_expression ::=  "(" expression comp_for ")"

A generator expression yields a new generator object. Its syntax is the same as for comprehensions, except that it is enclosed in parentheses instead of brackets or curly braces.

Variables used in the generator expression are evaluated lazily when the __next__() method is called for generator object (in the same fashion as normal generators). However, the leftmost for clause is immediately evaluated, so that an error produced by it can be seen before any other possible error in the code that handles the generator expression. Subsequent for clauses cannot be evaluated immediately since they may depend on the previous for loop. For example: (x*y for x in range(10) for y in bar(x)).

The parentheses can be omitted on calls with only one argument. See section Calls for the detail.

Yield expressions¶

yield_atom       ::=  "(" yield_expression ")"
yield_expression ::=  "yield" [expression_list]

The yield expression is only used when defining a generator function, and can only be used in the body of a function definition. Using a yield expression in a function definition is sufficient to cause that definition to create a generator function instead of a normal function.

When a generator function is called, it returns an iterator known as a generator. That generator then controls the execution of a generator function. The execution starts when one of the generator’s methods is called. At that time, the execution proceeds to the first yield expression, where it is suspended again, returning the value of expression_list to generator’s caller. By suspended we mean that all local state is retained, including the current bindings of local variables, the instruction pointer, and the internal evaluation stack. When the execution is resumed by calling one of the generator’s methods, the function can proceed exactly as if the yield expression was just another external call. The value of the yield expression after resuming depends on the method which resumed the execution.

All of this makes generator functions quite similar to coroutines; they yield multiple times, they have more than one entry point and their execution can be suspended. The only difference is that a generator function cannot control where should the execution continue after it yields; the control is always transferred to the generator’s caller.

The yield statement is allowed in the try clause of a try ... finally construct. If the generator is not resumed before it is finalized (by reaching a zero reference count or by being garbage collected), the generator-iterator’s close() method will be called, allowing any pending finally clauses to execute.

The following generator’s methods can be used to control the execution of a generator function:

generator.__next__()¶

Starts the execution of a generator function or resumes it at the last executed yield expression. When a generator function is resumed with a __next__() method, the current yield expression always evaluates to None. The execution then continues to the next yield expression, where the generator is suspended again, and the value of the expression_list is returned to next()‘s caller. If the generator exits without yielding another value, a StopIteration exception is raised.

This method is normally called implicitly, e.g. by a for loop, or by the built-in next() function.

generator.send(value)¶: Resumes the execution and “sends” a value into the generator function. The value argument becomes the result of the current yield expression. The send() method returns the next value yielded by the generator, or raises StopIteration if the generator exits without yielding another value. When send() is called to start the generator, it must be called with None as the argument, because there is no yield expression that could receive the value.

generator.throw(type[, value[, traceback]])¶: Raises an exception of type type at the point where generator was paused, and returns the next value yielded by the generator function. If the generator exits without yielding another value, a StopIteration exception is raised. If the generator function does not catch the passed-in exception, or raises a different exception, then that exception propagates to the caller.

generator.close()¶: Raises a GeneratorExit at the point where the generator function was paused. If the generator function then raises StopIteration (by exiting normally, or due to already being closed) or GeneratorExit (by not catching the exception), close returns to its caller. If the generator yields a value, a RuntimeError is raised. If the generator raises any other exception, it is propagated to the caller. close() does nothing if the generator has already exited due to an exception or normal exit.

Here is a simple example that demonstrates the behavior of generators and generator functions:

>>> def echo(value=None):
...     print("Execution starts when 'next()' is called for the first time.")
...     try:
...         while True:
...             try:
...                 value = (yield value)
...             except Exception as e:
...                 value = e
...     finally:
...         print("Don't forget to clean up when 'close()' is called.")
...
>>> generator = echo(1)
>>> print(next(generator))
Execution starts when 'next()' is called for the first time.
1
>>> print(next(generator))
None
>>> print(generator.send(2))
2
>>> generator.throw(TypeError, "spam")
TypeError('spam',)
>>> generator.close()
Don't forget to clean up when 'close()' is called.

See also

PEP 0255 - Simple Generators: The proposal for adding generators and the yield statement to Python.
PEP 0342 - Coroutines via Enhanced Generators: The proposal to enhance the API and syntax of generators, making them usable as simple coroutines.

Primaries¶

Primaries represent the most tightly bound operations of the language. Their syntax is:

primary ::=  atom | attributeref | subscription | slicing | call

Attribute references¶

An attribute reference is a primary followed by a period and a name:

attributeref ::=  primary "." identifier

The primary must evaluate to an object of a type that supports attribute references, which most objects do. This object is then asked to produce the attribute whose name is the identifier (which can be customized by overriding the __getattr__() method). If this attribute is not available, the exception AttributeError is raised. Otherwise, the type and value of the object produced is determined by the object. Multiple evaluations of the same attribute reference may yield different objects.

Subscriptions¶

A subscription selects an item of a sequence (string, tuple or list) or mapping (dictionary) object:

subscription ::=  primary "[" expression_list "]"

The primary must evaluate to an object that supports subscription, e.g. a list or dictionary. User-defined objects can support subscription by defining a __getitem__() method.

For built-in objects, there are two types of objects that support subscription:

If the primary is a mapping, the expression list must evaluate to an object whose value is one of the keys of the mapping, and the subscription selects the value in the mapping that corresponds to that key. (The expression list is a tuple except if it has exactly one item.)

If the primary is a sequence, the expression (list) must evaluate to an integer or a slice (as discussed in the following section).

The formal syntax makes no special provision for negative indices in sequences; however, built-in sequences all provide a __getitem__() method that interprets negative indices by adding the length of the sequence to the index (so that x[-1] selects the last item of x). The resulting value must be a nonnegative integer less than the number of items in the sequence, and the subscription selects the item whose index is that value (counting from zero). Since the support for negative indices and slicing occurs in the object’s __getitem__() method, subclasses overriding this method will need to explicitly add that support.

A string’s items are characters. A character is not a separate data type but a string of exactly one character.

Slicings¶

A slicing selects a range of items in a sequence object (e.g., a string, tuple or list). Slicings may be used as expressions or as targets in assignment or del statements. The syntax for a slicing:

slicing      ::=  primary "[" slice_list "]"
slice_list   ::=  slice_item ("," slice_item)* [","]
slice_item   ::=  expression | proper_slice
proper_slice ::=  [lower_bound] ":" [upper_bound] [ ":" [stride] ]
lower_bound  ::=  expression
upper_bound  ::=  expression
stride       ::=  expression

There is ambiguity in the formal syntax here: anything that looks like an expression list also looks like a slice list, so any subscription can be interpreted as a slicing. Rather than further complicating the syntax, this is disambiguated by defining that in this case the interpretation as a subscription takes priority over the interpretation as a slicing (this is the case if the slice list contains no proper slice).

The semantics for a slicing are as follows. The primary must evaluate to a mapping object, and it is indexed (using the same __getitem__() method as normal subscription) with a key that is constructed from the slice list, as follows. If the slice list contains at least one comma, the key is a tuple containing the conversion of the slice items; otherwise, the conversion of the lone slice item is the key. The conversion of a slice item that is an expression is that expression. The conversion of a proper slice is a slice object (see section 标准类型层次) whose start, stop and step attributes are the values of the expressions given as lower bound, upper bound and stride, respectively, substituting None for missing expressions.

Calls¶

A call calls a callable object (e.g., a function) with a possibly empty series of arguments:

call                 ::=  primary "(" [argument_list [","] | comprehension] ")"
argument_list        ::=  positional_arguments ["," keyword_arguments]
                            ["," "*" expression] ["," keyword_arguments]
                            ["," "**" expression]
                          | keyword_arguments ["," "*" expression]
                            ["," keyword_arguments] ["," "**" expression]
                          | "*" expression ["," keyword_arguments] ["," "**" expression]
                          | "**" expression
positional_arguments ::=  expression ("," expression)*
keyword_arguments    ::=  keyword_item ("," keyword_item)*
keyword_item         ::=  identifier "=" expression

A trailing comma may be present after the positional and keyword arguments but does not affect the semantics.

The primary must evaluate to a callable object (user-defined functions, built-in functions, methods of built-in objects, class objects, methods of class instances, and all objects having a __call__() method are callable). All argument expressions are evaluated before the call is attempted. Please refer to section 函数定义 for the syntax of formal parameter lists.

If keyword arguments are present, they are first converted to positional arguments, as follows. First, a list of unfilled slots is created for the formal parameters. If there are N positional arguments, they are placed in the first N slots. Next, for each keyword argument, the identifier is used to determine the corresponding slot (if the identifier is the same as the first formal parameter name, the first slot is used, and so on). If the slot is already filled, a TypeError exception is raised. Otherwise, the value of the argument is placed in the slot, filling it (even if the expression is None, it fills the slot). When all arguments have been processed, the slots that are still unfilled are filled with the corresponding default value from the function definition. (Default values are calculated, once, when the function is defined; thus, a mutable object such as a list or dictionary used as default value will be shared by all calls that don’t specify an argument value for the corresponding slot; this should usually be avoided.) If there are any unfilled slots for which no default value is specified, a TypeError exception is raised. Otherwise, the list of filled slots is used as the argument list for the call.

CPython implementation detail: An implementation may provide built-in functions whose positional parameters do not have names, even if they are ‘named’ for the purpose of documentation, and which therefore cannot be supplied by keyword. In CPython, this is the case for functions implemented in C that use PyArg_ParseTuple() to parse their arguments.

If there are more positional arguments than there are formal parameter slots, a TypeError exception is raised, unless a formal parameter using the syntax *identifier is present; in this case, that formal parameter receives a tuple containing the excess positional arguments (or an empty tuple if there were no excess positional arguments).

If any keyword argument does not correspond to a formal parameter name, a TypeError exception is raised, unless a formal parameter using the syntax **identifier is present; in this case, that formal parameter receives a dictionary containing the excess keyword arguments (using the keywords as keys and the argument values as corresponding values), or a (new) empty dictionary if there were no excess keyword arguments.

If the syntax *expression appears in the function call, expression must evaluate to an iterable. Elements from this iterable are treated as if they were additional positional arguments; if there are positional arguments x1, ..., xN, and expression evaluates to a sequence y1, ..., yM, this is equivalent to a call with M+N positional arguments x1, ..., xN, y1, ..., yM.

A consequence of this is that although the *expression syntax may appear after some keyword arguments, it is processed before the keyword arguments (and the **expression argument, if any – see below). So:

>>> def f(a, b):
...  print(a, b)
...
>>> f(b=1, *(2,))
2 1
>>> f(a=1, *(2,))
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: f() got multiple values for keyword argument 'a'
>>> f(1, *(2,))
1 2

It is unusual for both keyword arguments and the *expression syntax to be used in the same call, so in practice this confusion does not arise.

If the syntax **expression appears in the function call, expression must evaluate to a mapping, the contents of which are treated as additional keyword arguments. In the case of a keyword appearing in both expression and as an explicit keyword argument, a TypeError exception is raised.

Formal parameters using the syntax *identifier or **identifier cannot be used as positional argument slots or as keyword argument names.

A call always returns some value, possibly None, unless it raises an exception. How this value is computed depends on the type of the callable object.

If it is—

a user-defined function:: The code block for the function is executed, passing it the argument list. The first thing the code block will do is bind the formal parameters to the arguments; this is described in section 函数定义. When the code block executes a return statement, this specifies the return value of the function call.
a built-in function or method:: The result is up to the interpreter; see 内置函数 for the descriptions of built-in functions and methods.
a class object:: A new instance of that class is returned.
a class instance method:: The corresponding user-defined function is called, with an argument list that is one longer than the argument list of the call: the instance becomes the first argument.
a class instance:: The class must define a __call__() method; the effect is then the same as if that method was called.

The power operator¶

The power operator binds more tightly than unary operators on its left; it binds less tightly than unary operators on its right. The syntax is:

power ::=  primary ["**" u_expr]

Thus, in an unparenthesized sequence of power and unary operators, the operators are evaluated from right to left (this does not constrain the evaluation order for the operands): -1**2 results in -1.

The power operator has the same semantics as the built-in pow() function, when called with two arguments: it yields its left argument raised to the power of its right argument. The numeric arguments are first converted to a common type, and the result is of that type.

For int operands, the result has the same type as the operands unless the second argument is negative; in that case, all arguments are converted to float and a float result is delivered. For example, 10**2 returns 100, but 10**-2 returns 0.01.

Raising 0.0 to a negative power results in a ZeroDivisionError. Raising a negative number to a fractional power results in a complex number. (In earlier versions it raised a ValueError.)

Unary arithmetic and bitwise operations¶

All unary arithmetic and bitwise operations have the same priority:

u_expr ::=  power | "-" u_expr | "+" u_expr | "~" u_expr

The unary - (minus) operator yields the negation of its numeric argument.

The unary + (plus) operator yields its numeric argument unchanged.

The unary ~ (invert) operator yields the bitwise inversion of its integer argument. The bitwise inversion of x is defined as -(x+1). It only applies to integral numbers.

In all three cases, if the argument does not have the proper type, a TypeError exception is raised.

Binary arithmetic operations¶

The binary arithmetic operations have the conventional priority levels. Note that some of these operations also apply to certain non-numeric types. Apart from the power operator, there are only two levels, one for multiplicative operators and one for additive operators:

m_expr ::=  u_expr | m_expr "*" u_expr | m_expr "//" u_expr | m_expr "/" u_expr
            | m_expr "%" u_expr
a_expr ::=  m_expr | a_expr "+" m_expr | a_expr "-" m_expr

The * (multiplication) operator yields the product of its arguments. The arguments must either both be numbers, or one argument must be an integer and the other must be a sequence. In the former case, the numbers are converted to a common type and then multiplied together. In the latter case, sequence repetition is performed; a negative repetition factor yields an empty sequence.

The / (division) and // (floor division) operators yield the quotient of their arguments. The numeric arguments are first converted to a common type. Integer division yields a float, while floor division of integers results in an integer; the result is that of mathematical division with the ‘floor’ function applied to the result. Division by zero raises the ZeroDivisionError exception.

The % (modulo) operator yields the remainder from the division of the first argument by the second. The numeric arguments are first converted to a common type. A zero right argument raises the ZeroDivisionError exception. The arguments may be floating point numbers, e.g., 3.14%0.7 equals 0.34 (since 3.14 equals 4*0.7 + 0.34.) The modulo operator always yields a result with the same sign as its second operand (or zero); the absolute value of the result is strictly smaller than the absolute value of the second operand [1].

The floor division and modulo operators are connected by the following identity: x == (x//y)*y + (x%y). Floor division and modulo are also connected with the built-in function divmod(): divmod(x, y) == (x//y, x%y). [2].

In addition to performing the modulo operation on numbers, the % operator is also overloaded by string objects to perform old-style string formatting (also known as interpolation). The syntax for string formatting is described in the Python Library Reference, section Old String Formatting Operations.

The floor division operator, the modulo operator, and the divmod() function are not defined for complex numbers. Instead, convert to a floating point number using the abs() function if appropriate.

The + (addition) operator yields the sum of its arguments. The arguments must either both be numbers or both sequences of the same type. In the former case, the numbers are converted to a common type and then added together. In the latter case, the sequences are concatenated.

The - (subtraction) operator yields the difference of its arguments. The numeric arguments are first converted to a common type.

Shifting operations¶

The shifting operations have lower priority than the arithmetic operations:

shift_expr ::=  a_expr | shift_expr ( "<<" | ">>" ) a_expr

These operators accept integers as arguments. They shift the first argument to the left or right by the number of bits given by the second argument.

A right shift by n bits is defined as division by pow(2,n). A left shift by n bits is defined as multiplication with pow(2,n).

Note

In the current implementation, the right-hand operand is required to be at most sys.maxsize. If the right-hand operand is larger than sys.maxsize an OverflowError exception is raised.

Binary bitwise operations¶

Each of the three bitwise operations has a different priority level:

and_expr ::=  shift_expr | and_expr "&" shift_expr
xor_expr ::=  and_expr | xor_expr "^" and_expr
or_expr  ::=  xor_expr | or_expr "|" xor_expr

The & operator yields the bitwise AND of its arguments, which must be integers.

The ^ operator yields the bitwise XOR (exclusive OR) of its arguments, which must be integers.

The | operator yields the bitwise (inclusive) OR of its arguments, which must be integers.

Comparisons¶

Unlike C, all comparison operations in Python have the same priority, which is lower than that of any arithmetic, shifting or bitwise operation. Also unlike C, expressions like a < b < c have the interpretation that is conventional in mathematics:

comparison    ::=  or_expr ( comp_operator or_expr )*
comp_operator ::=  "<" | ">" | "==" | ">=" | "<=" | "!="
                   | "is" ["not"] | ["not"] "in"

Comparisons yield boolean values: True or False.

Comparisons can be chained arbitrarily, e.g., x < y <= z is equivalent to x < y and y <= z, except that y is evaluated only once (but in both cases z is not evaluated at all when x < y is found to be false).

Formally, if a, b, c, ..., y, z are expressions and op1, op2, ..., opN are comparison operators, then a op1 b op2 c ... y opN z is equivalent to a op1 b and b op2 c and ... y opN z, except that each expression is evaluated at most once.

Note that a op1 b op2 c doesn’t imply any kind of comparison between a and c, so that, e.g., x < y > z is perfectly legal (though perhaps not pretty).

The operators <, >, ==, >=, <=, and != compare the values of two objects. The objects need not have the same type. If both are numbers, they are converted to a common type. Otherwise, the == and != operators always consider objects of different types to be unequal, while the <, >, >= and <= operators raise a TypeError when comparing objects of different types that do not implement these operators for the given pair of types. You can control comparison behavior of objects of non-built-in types by defining rich comparison methods like __gt__(), described in section 基本定制.

Comparison of objects of the same type depends on the type:

Numbers are compared arithmetically.
The values float('NaN') and Decimal('NaN') are special. The are identical to themselves, x is x but are not equal to themselves, x != x. Additionally, comparing any value to a not-a-number value will return False. For example, both 3 < float('NaN') and float('NaN') < 3 will return False.
Bytes objects are compared lexicographically using the numeric values of their elements.
Strings are compared lexicographically using the numeric equivalents (the result of the built-in function ord()) of their characters. [3] String and bytes object can’t be compared!
Tuples and lists are compared lexicographically using comparison of corresponding elements. This means that to compare equal, each element must compare equal and the two sequences must be of the same type and have the same length.

If not equal, the sequences are ordered the same as their first differing elements. For example, [1,2,x] <= [1,2,y] has the same value as x <= y. If the corresponding element does not exist, the shorter sequence is ordered first (for example, [1,2] < [1,2,3]).
Mappings (dictionaries) compare equal if and only if they have the same (key, value) pairs. Order comparisons ('<', '<=', '>=', '>') raise TypeError.
Sets and frozensets define comparison operators to mean subset and superset tests. Those relations do not define total orderings (the two sets {1,2} and {2,3} are not equal, nor subsets of one another, nor supersets of one another). Accordingly, sets are not appropriate arguments for functions which depend on total ordering. For example, min(), max(), and sorted() produce undefined results given a list of sets as inputs.
Most other objects of built-in types compare unequal unless they are the same object; the choice whether one object is considered smaller or larger than another one is made arbitrarily but consistently within one execution of a program.

Comparison of objects of the differing types depends on whether either of the types provide explicit support for the comparison. Most numeric types can be compared with one another, but comparisons of float and Decimal are not supported to avoid the inevitable confusion arising from representation issues such as float('1.1') being inexactly represented and therefore not exactly equal to Decimal('1.1') which is. When cross-type comparison is not supported, the comparison method returns NotImplemented. This can create the illusion of non-transitivity between supported cross-type comparisons and unsupported comparisons. For example, Decimal(2) == 2 and 2 == float(2) but Decimal(2) != float(2).

The operators in and not in test for membership. x in s evaluates to true if x is a member of s, and false otherwise. x not in s returns the negation of x in s. All built-in sequences and set types support this as well as dictionary, for which in tests whether a the dictionary has a given key. For container types such as list, tuple, set, frozenset, dict, or collections.deque, the expression x in y is equivalent to any(x is e or x == e for e in y).

For the string and bytes types, x in y is true if and only if x is a substring of y. An equivalent test is y.find(x) != -1. Empty strings are always considered to be a substring of any other string, so "" in "abc" will return True.

For user-defined classes which define the __contains__() method, x in y is true if and only if y.__contains__(x) is true.

For user-defined classes which do not define __contains__() but do define __iter__(), x in y is true if some value z with x == z is produced while iterating over y. If an exception is raised during the iteration, it is as if in raised that exception.

Lastly, the old-style iteration protocol is tried: if a class defines __getitem__(), x in y is true if and only if there is a non-negative integer index i such that x == y[i], and all lower integer indices do not raise IndexError exception. (If any other exception is raised, it is as if in raised that exception).

The operator not in is defined to have the inverse true value of in.

The operators is and is not test for object identity: x is y is true if and only if x and y are the same object. x is not y yields the inverse truth value. [4]

Boolean operations¶

or_test  ::=  and_test | or_test "or" and_test
and_test ::=  not_test | and_test "and" not_test
not_test ::=  comparison | "not" not_test

In the context of Boolean operations, and also when expressions are used by control flow statements, the following values are interpreted as false: False, None, numeric zero of all types, and empty strings and containers (including strings, tuples, lists, dictionaries, sets and frozensets). All other values are interpreted as true. User-defined objects can customize their truth value by providing a __bool__() method.

The operator not yields True if its argument is false, False otherwise.

The expression x and y first evaluates x; if x is false, its value is returned; otherwise, y is evaluated and the resulting value is returned.

The expression x or y first evaluates x; if x is true, its value is returned; otherwise, y is evaluated and the resulting value is returned.

(Note that neither and nor or restrict the value and type they return to False and True, but rather return the last evaluated argument. This is sometimes useful, e.g., if s is a string that should be replaced by a default value if it is empty, the expression s or 'foo' yields the desired value. Because not has to invent a value anyway, it does not bother to return a value of the same type as its argument, so e.g., not 'foo' yields False, not ''.)

Conditional expressions¶

conditional_expression ::=  or_test ["if" or_test "else" expression]
expression             ::=  conditional_expression | lambda_form
expression_nocond      ::=  or_test | lambda_form_nocond

Conditional expressions (sometimes called a “ternary operator”) have the lowest priority of all Python operations.

The expression x if C else y first evaluates the condition, C (not x); if C is true, x is evaluated and its value is returned; otherwise, y is evaluated and its value is returned.

See PEP 308 for more details about conditional expressions.

Lambdas¶

lambda_form        ::=  "lambda" [parameter_list]: expression
lambda_form_nocond ::=  "lambda" [parameter_list]: expression_nocond

Lambda forms (lambda expressions) have the same syntactic position as expressions. They are a shorthand to create anonymous functions; the expression lambda arguments: expression yields a function object. The unnamed object behaves like a function object defined with

def <lambda>(arguments):
    return expression

See section 函数定义 for the syntax of parameter lists. Note that functions created with lambda forms cannot contain statements or annotations.

Expression lists¶

expression_list ::=  expression ( "," expression )* [","]

An expression list containing at least one comma yields a tuple. The length of the tuple is the number of expressions in the list. The expressions are evaluated from left to right.

The trailing comma is required only to create a single tuple (a.k.a. a singleton); it is optional in all other cases. A single expression without a trailing comma doesn’t create a tuple, but rather yields the value of that expression. (To create an empty tuple, use an empty pair of parentheses: ().)

Evaluation order¶

Python evaluates expressions from left to right. Notice that while evaluating an assignment, the right-hand side is evaluated before the left-hand side.

In the following lines, expressions will be evaluated in the arithmetic order of their suffixes:

expr1, expr2, expr3, expr4
(expr1, expr2, expr3, expr4)
{expr1: expr2, expr3: expr4}
expr1 + expr2 * (expr3 - expr4)
expr1(expr2, expr3, *expr4, **expr5)
expr3, expr4 = expr1, expr2

Summary¶

The following table summarizes the operator precedences in Python, from lowest precedence (least binding) to highest precedence (most binding). Operators in the same box have the same precedence. Unless the syntax is explicitly given, operators are binary. Operators in the same box group left to right (except for comparisons, including tests, which all have the same precedence and chain from left to right — see section Comparisons — and exponentiation, which groups from right to left).

Operator	Description
`lambda`	Lambda expression
`if` – `else`	Conditional expression
`or`	Boolean OR
`and`	Boolean AND
`not` x	Boolean NOT
`in`, `not` `in`, `is`, `is not`, `<`, `<=`, `>`, `>=`, `!=`, `==`	Comparisons, including membership tests and identity tests,
`\|`	Bitwise OR
`^`	Bitwise XOR
`&`	Bitwise AND
`<<`, `>>`	Shifts
`+`, `-`	Addition and subtraction
`*`, `/`, `//`, `%`	Multiplication, division, remainder [5]
`+x`, `-x`, `~x`	Positive, negative, bitwise NOT
`**`	Exponentiation [6]
`x[index]`, `x[index:index]`, `x(arguments...)`, `x.attribute`	Subscription, slicing, call, attribute reference
`(expressions...)`, `[expressions...]`, `{key:datum...}`, `{expressions...}`	Binding or tuple display, list display, dictionary display, set display

Footnotes

[1] While abs(x%y) < abs(y) is true mathematically, for floats it may not be true numerically due to roundoff. For example, and assuming a platform on which a Python float is an IEEE 754 double-precision number, in order that -1e-100 % 1e100 have the same sign as 1e100, the computed result is -1e-100 + 1e100, which is numerically exactly equal to 1e100. The function math.fmod() returns a result whose sign matches the sign of the first argument instead, and so returns -1e-100 in this case. Which approach is more appropriate depends on the application.

[2]	If x is very close to an exact integer multiple of y, it’s possible for `x//y` to be one larger than `(x-x%y)//y` due to rounding. In such cases, Python returns the latter result, in order to preserve that `divmod(x,y)[0] * y + x % y` be very close to `x`.

[3] While comparisons between strings make sense at the byte level, they may be counter-intuitive to users. For example, the strings "\u00C7" and "\u0327\u0043" compare differently, even though they both represent the same unicode character (LATIN CAPITAL LETTER C WITH CEDILLA). To compare strings in a human recognizable way, compare using unicodedata.normalize().

[4]	Due to automatic garbage-collection, free lists, and the dynamic nature of descriptors, you may notice seemingly unusual behaviour in certain uses of the `is` operator, like those involving comparisons between instance methods, or constants. Check their documentation for more info.

[5]	The `%` operator is also used for string formatting; the same precedence applies.

[6]	The power operator `` binds less tightly than an arithmetic or bitwise unary operator on its right, that is, `2-1` is `0.5`.

Simple statements¶

Simple statements are comprised within a single logical line. Several simple statements may occur on a single line separated by semicolons. The syntax for simple statements is:

simple_stmt ::=  expression_stmt
                 | assert_stmt
                 | assignment_stmt
                 | augmented_assignment_stmt
                 | pass_stmt
                 | del_stmt
                 | return_stmt
                 | yield_stmt
                 | raise_stmt
                 | break_stmt
                 | continue_stmt
                 | import_stmt
                 | global_stmt
                 | nonlocal_stmt

Expression statements¶

Expression statements are used (mostly interactively) to compute and write a value, or (usually) to call a procedure (a function that returns no meaningful result; in Python, procedures return the value None). Other uses of expression statements are allowed and occasionally useful. The syntax for an expression statement is:

expression_stmt ::=  expression_list

An expression statement evaluates the expression list (which may be a single expression).

In interactive mode, if the value is not None, it is converted to a string using the built-in repr() function and the resulting string is written to standard output on a line by itself (except if the result is None, so that procedure calls do not cause any output.)

Assignment statements¶

Assignment statements are used to (re)bind names to values and to modify attributes or items of mutable objects:

assignment_stmt ::=  (target_list "=")+ (expression_list | yield_expression)
target_list     ::=  target ("," target)* [","]
target          ::=  identifier
                     | "(" target_list ")"
                     | "[" target_list "]"
                     | attributeref
                     | subscription
                     | slicing
                     | "*" target

(See section Primaries for the syntax definitions for the last three symbols.)

An assignment statement evaluates the expression list (remember that this can be a single expression or a comma-separated list, the latter yielding a tuple) and assigns the single resulting object to each of the target lists, from left to right.

Assignment is defined recursively depending on the form of the target (list). When a target is part of a mutable object (an attribute reference, subscription or slicing), the mutable object must ultimately perform the assignment and decide about its validity, and may raise an exception if the assignment is unacceptable. The rules observed by various types and the exceptions raised are given with the definition of the object types (see section 标准类型层次).

Assignment of an object to a target list, optionally enclosed in parentheses or square brackets, is recursively defined as follows.

If the target list is a single target: The object is assigned to that target.
If the target list is a comma-separated list of targets: The object must be an iterable with the same number of items as there are targets in the target list, and the items are assigned, from left to right, to the corresponding targets.
- If the target list contains one target prefixed with an asterisk, called a “starred” target: The object must be a sequence with at least as many items as there are targets in the target list, minus one. The first items of the sequence are assigned, from left to right, to the targets before the starred target. The final items of the sequence are assigned to the targets after the starred target. A list of the remaining items in the sequence is then assigned to the starred target (the list can be empty).
- Else: The object must be a sequence with the same number of items as there are targets in the target list, and the items are assigned, from left to right, to the corresponding targets.

Assignment of an object to a single target is recursively defined as follows.

If the target is an identifier (name):
- If the name does not occur in a global or nonlocal statement in the current code block: the name is bound to the object in the current local namespace.
- Otherwise: the name is bound to the object in the global namespace or the outer namespace determined by nonlocal, respectively.
The name is rebound if it was already bound. This may cause the reference count for the object previously bound to the name to reach zero, causing the object to be deallocated and its destructor (if it has one) to be called.
If the target is a target list enclosed in parentheses or in square brackets: The object must be an iterable with the same number of items as there are targets in the target list, and its items are assigned, from left to right, to the corresponding targets.
If the target is an attribute reference: The primary expression in the reference is evaluated. It should yield an object with assignable attributes; if this is not the case, TypeError is raised. That object is then asked to assign the assigned object to the given attribute; if it cannot perform the assignment, it raises an exception (usually but not necessarily AttributeError).

Note: If the object is a class instance and the attribute reference occurs on both sides of the assignment operator, the RHS expression, a.x can access either an instance attribute or (if no instance attribute exists) a class attribute. The LHS target a.x is always set as an instance attribute, creating it if necessary. Thus, the two occurrences of a.x do not necessarily refer to the same attribute: if the RHS expression refers to a class attribute, the LHS creates a new instance attribute as the target of the assignment:
```
class Cls:
    x = 3             # class variable
inst = Cls()
inst.x = inst.x + 1   # writes inst.x as 4 leaving Cls.x as 3
```
This description does not necessarily apply to descriptor attributes, such as properties created with property().
If the target is a subscription: The primary expression in the reference is evaluated. It should yield either a mutable sequence object (such as a list) or a mapping object (such as a dictionary). Next, the subscript expression is evaluated.

If the primary is a mutable sequence object (such as a list), the subscript must yield an integer. If it is negative, the sequence’s length is added to it. The resulting value must be a nonnegative integer less than the sequence’s length, and the sequence is asked to assign the assigned object to its item with that index. If the index is out of range, IndexError is raised (assignment to a subscripted sequence cannot add new items to a list).

If the primary is a mapping object (such as a dictionary), the subscript must have a type compatible with the mapping’s key type, and the mapping is then asked to create a key/datum pair which maps the subscript to the assigned object. This can either replace an existing key/value pair with the same key value, or insert a new key/value pair (if no key with the same value existed).

For user-defined objects, the __setitem__() method is called with appropriate arguments.
If the target is a slicing: The primary expression in the reference is evaluated. It should yield a mutable sequence object (such as a list). The assigned object should be a sequence object of the same type. Next, the lower and upper bound expressions are evaluated, insofar they are present; defaults are zero and the sequence’s length. The bounds should evaluate to integers. If either bound is negative, the sequence’s length is added to it. The resulting bounds are clipped to lie between zero and the sequence’s length, inclusive. Finally, the sequence object is asked to replace the slice with the items of the assigned sequence. The length of the slice may be different from the length of the assigned sequence, thus changing the length of the target sequence, if the object allows it.

CPython implementation detail: In the current implementation, the syntax for targets is taken to be the same as for expressions, and invalid syntax is rejected during the code generation phase, causing less detailed error messages.

WARNING: Although the definition of assignment implies that overlaps between the left-hand side and the right-hand side are ‘safe’ (for example a, b = b, a swaps two variables), overlaps within the collection of assigned-to variables are not safe! For instance, the following program prints [0, 2]:

x = [0, 1]
i = 0
i, x[i] = 1, 2
print(x)

See also

PEP 3132 - Extended Iterable Unpacking: The specification for the *target feature.

Augmented assignment statements¶

Augmented assignment is the combination, in a single statement, of a binary operation and an assignment statement:

augmented_assignment_stmt ::=  augtarget augop (expression_list | yield_expression)
augtarget                 ::=  identifier | attributeref | subscription | slicing
augop                     ::=  "+=" | "-=" | "*=" | "/=" | "//=" | "%=" | "**="
                               | ">>=" | "<<=" | "&=" | "^=" | "|="

(See section Primaries for the syntax definitions for the last three symbols.)

An augmented assignment evaluates the target (which, unlike normal assignment statements, cannot be an unpacking) and the expression list, performs the binary operation specific to the type of assignment on the two operands, and assigns the result to the original target. The target is only evaluated once.

An augmented assignment expression like x += 1 can be rewritten as x = x + 1 to achieve a similar, but not exactly equal effect. In the augmented version, x is only evaluated once. Also, when possible, the actual operation is performed in-place, meaning that rather than creating a new object and assigning that to the target, the old object is modified instead.

With the exception of assigning to tuples and multiple targets in a single statement, the assignment done by augmented assignment statements is handled the same way as normal assignments. Similarly, with the exception of the possible in-place behavior, the binary operation performed by augmented assignment is the same as the normal binary operations.

For targets which are attribute references, the same caveat about class and instance attributes applies as for regular assignments.

The `assert` statement¶

Assert statements are a convenient way to insert debugging assertions into a program:

assert_stmt ::=  "assert" expression ["," expression]

The simple form, assert expression, is equivalent to

if __debug__:
   if not expression: raise AssertionError

The extended form, assert expression1, expression2, is equivalent to

if __debug__:
   if not expression1: raise AssertionError(expression2)

These equivalences assume that __debug__ and AssertionError refer to the built-in variables with those names. In the current implementation, the built-in variable __debug__ is True under normal circumstances, False when optimization is requested (command line option -O). The current code generator emits no code for an assert statement when optimization is requested at compile time. Note that it is unnecessary to include the source code for the expression that failed in the error message; it will be displayed as part of the stack trace.

Assignments to __debug__ are illegal. The value for the built-in variable is determined when the interpreter starts.

The `pass` statement¶

pass_stmt ::=  "pass"

pass is a null operation — when it is executed, nothing happens. It is useful as a placeholder when a statement is required syntactically, but no code needs to be executed, for example:

def f(arg): pass    # a function that does nothing (yet)

class C: pass       # a class with no methods (yet)

The `del` statement¶

del_stmt ::=  "del" target_list

Deletion is recursively defined very similar to the way assignment is defined. Rather that spelling it out in full details, here are some hints.

Deletion of a target list recursively deletes each target, from left to right.

Deletion of a name removes the binding of that name from the local or global namespace, depending on whether the name occurs in a global statement in the same code block. If the name is unbound, a NameError exception will be raised.

Deletion of attribute references, subscriptions and slicings is passed to the primary object involved; deletion of a slicing is in general equivalent to assignment of an empty slice of the right type (but even this is determined by the sliced object).

Changed in version 3.2:

Changed in version 3.2: Previously it was illegal to delete a name from the local namespace if it occurs as a free variable in a nested block.

The `return` statement¶

return_stmt ::=  "return" [expression_list]

return may only occur syntactically nested in a function definition, not within a nested class definition.

If an expression list is present, it is evaluated, else None is substituted.

return leaves the current function call with the expression list (or None) as return value.

When return passes control out of a try statement with a finally clause, that finally clause is executed before really leaving the function.

In a generator function, the return statement is not allowed to include an expression_list. In that context, a bare return indicates that the generator is done and will cause StopIteration to be raised.

The `yield` statement¶

yield_stmt ::=  yield_expression

The yield statement is only used when defining a generator function, and is only used in the body of the generator function. Using a yield statement in a function definition is sufficient to cause that definition to create a generator function instead of a normal function. When a generator function is called, it returns an iterator known as a generator iterator, or more commonly, a generator. The body of the generator function is executed by calling the next() function on the generator repeatedly until it raises an exception.

When a yield statement is executed, the state of the generator is frozen and the value of expression_list is returned to next()‘s caller. By “frozen” we mean that all local state is retained, including the current bindings of local variables, the instruction pointer, and the internal evaluation stack: enough information is saved so that the next time next() is invoked, the function can proceed exactly as if the yield statement were just another external call.

See also

PEP 0255 - Simple Generators: The proposal for adding generators and the yield statement to Python.
PEP 0342 - Coroutines via Enhanced Generators: The proposal that, among other generator enhancements, proposed allowing yield to appear inside a try ... finally block.

The `raise` statement¶

raise_stmt ::=  "raise" [expression ["from" expression]]

If no expressions are present, raise re-raises the last exception that was active in the current scope. If no exception is active in the current scope, a TypeError exception is raised indicating that this is an error (if running under IDLE, a queue.Empty exception is raised instead).

Otherwise, raise evaluates the first expression as the exception object. It must be either a subclass or an instance of BaseException. If it is a class, the exception instance will be obtained when needed by instantiating the class with no arguments.

The type of the exception is the exception instance’s class, the value is the instance itself.

A traceback object is normally created automatically when an exception is raised and attached to it as the __traceback__ attribute, which is writable. You can create an exception and set your own traceback in one step using the with_traceback() exception method (which returns the same exception instance, with its traceback set to its argument), like so:

raise Exception("foo occurred").with_traceback(tracebackobj)

The from clause is used for exception chaining: if given, the second expression must be another exception class or instance, which will then be attached to the raised exception as the __cause__ attribute (which is writable). If the raised exception is not handled, both exceptions will be printed:

>>> try:
...     print(1 / 0)
... except Exception as exc:
...     raise RuntimeError("Something bad happened") from exc
...
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
ZeroDivisionError: int division or modulo by zero

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
RuntimeError: Something bad happened

A similar mechanism works implicitly if an exception is raised inside an exception handler: the previous exception is then attached as the new exception’s __context__ attribute:

>>> try:
...     print(1 / 0)
... except:
...     raise RuntimeError("Something bad happened")
...
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
ZeroDivisionError: int division or modulo by zero

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
RuntimeError: Something bad happened

Additional information on exceptions can be found in section Exceptions, and information about handling exceptions is in section try 语句.

The `break` statement¶

break_stmt ::=  "break"

break may only occur syntactically nested in a for or while loop, but not nested in a function or class definition within that loop.

It terminates the nearest enclosing loop, skipping the optional else clause if the loop has one.

If a for loop is terminated by break, the loop control target keeps its current value.

When break passes control out of a try statement with a finally clause, that finally clause is executed before really leaving the loop.

The `continue` statement¶

continue_stmt ::=  "continue"

continue may only occur syntactically nested in a for or while loop, but not nested in a function or class definition or finally clause within that loop. It continues with the next cycle of the nearest enclosing loop.

When continue passes control out of a try statement with a finally clause, that finally clause is executed before really starting the next loop cycle.

The `import` statement¶

import_stmt     ::=  "import" module ["as" name] ( "," module ["as" name] )*
                     | "from" relative_module "import" identifier ["as" name]
                     ( "," identifier ["as" name] )*
                     | "from" relative_module "import" "(" identifier ["as" name]
                     ( "," identifier ["as" name] )* [","] ")"
                     | "from" module "import" "*"
module          ::=  (identifier ".")* identifier
relative_module ::=  "."* module | "."+
name            ::=  identifier

Import statements are executed in two steps: (1) find a module, and initialize it if necessary; (2) define a name or names in the local namespace (of the scope where the import statement occurs). The statement comes in two forms differing on whether it uses the from keyword. The first form (without from) repeats these steps for each identifier in the list. The form with from performs step (1) once, and then performs step (2) repeatedly. For a reference implementation of step (1), see the importlib module.

To understand how step (1) occurs, one must first understand how Python handles hierarchical naming of modules. To help organize modules and provide a hierarchy in naming, Python has a concept of packages. A package can contain other packages and modules while modules cannot contain other modules or packages. From a file system perspective, packages are directories and modules are files. The original specification for packages is still available to read, although minor details have changed since the writing of that document.

Once the name of the module is known (unless otherwise specified, the term “module” will refer to both packages and modules), searching for the module or package can begin. The first place checked is sys.modules, the cache of all modules that have been imported previously. If the module is found there then it is used in step (2) of import unless None is found in sys.modules, in which case ImportError is raised.

If the module is not found in the cache, then sys.meta_path is searched (the specification for sys.meta_path can be found in PEP 302). The object is a list of finder objects which are queried in order as to whether they know how to load the module by calling their find_module() method with the name of the module. If the module happens to be contained within a package (as denoted by the existence of a dot in the name), then a second argument to find_module() is given as the value of the __path__ attribute from the parent package (everything up to the last dot in the name of the module being imported). If a finder can find the module it returns a loader (discussed later) or returns None.

If none of the finders on sys.meta_path are able to find the module then some implicitly defined finders are queried. Implementations of Python vary in what implicit meta path finders are defined. The one they all do define, though, is one that handles sys.path_hooks, sys.path_importer_cache, and sys.path.

The implicit finder searches for the requested module in the “paths” specified in one of two places (“paths” do not have to be file system paths). If the module being imported is supposed to be contained within a package then the second argument passed to find_module(), __path__ on the parent package, is used as the source of paths. If the module is not contained in a package then sys.path is used as the source of paths.

Once the source of paths is chosen it is iterated over to find a finder that can handle that path. The dict at sys.path_importer_cache caches finders for paths and is checked for a finder. If the path does not have a finder cached then sys.path_hooks is searched by calling each object in the list with a single argument of the path, returning a finder or raises ImportError. If a finder is returned then it is cached in sys.path_importer_cache and then used for that path entry. If no finder can be found but the path exists then a value of None is stored in sys.path_importer_cache to signify that an implicit, file-based finder that handles modules stored as individual files should be used for that path. If the path does not exist then a finder which always returns None is placed in the cache for the path.

If no finder can find the module then ImportError is raised. Otherwise some finder returned a loader whose load_module() method is called with the name of the module to load (see PEP 302 for the original definition of loaders). A loader has several responsibilities to perform on a module it loads. First, if the module already exists in sys.modules (a possibility if the loader is called outside of the import machinery) then it is to use that module for initialization and not a new module. But if the module does not exist in sys.modules then it is to be added to that dict before initialization begins. If an error occurs during loading of the module and it was added to sys.modules it is to be removed from the dict. If an error occurs but the module was already in sys.modules it is left in the dict.

The loader must set several attributes on the module. __name__ is to be set to the name of the module. __file__ is to be the “path” to the file unless the module is built-in (and thus listed in sys.builtin_module_names) in which case the attribute is not set. If what is being imported is a package then __path__ is to be set to a list of paths to be searched when looking for modules and packages contained within the package being imported. __package__ is optional but should be set to the name of package that contains the module or package (the empty string is used for module not contained in a package). __loader__ is also optional but should be set to the loader object that is loading the module.

If an error occurs during loading then the loader raises ImportError if some other exception is not already being propagated. Otherwise the loader returns the module that was loaded and initialized.

When step (1) finishes without raising an exception, step (2) can begin.

The first form of import statement binds the module name in the local namespace to the module object, and then goes on to import the next identifier, if any. If the module name is followed by as, the name following as is used as the local name for the module.

The from form does not bind the module name: it goes through the list of identifiers, looks each one of them up in the module found in step (1), and binds the name in the local namespace to the object thus found. As with the first form of import, an alternate local name can be supplied by specifying “as localname”. If a name is not found, ImportError is raised. If the list of identifiers is replaced by a star ('*'), all public names defined in the module are bound in the local namespace of the import statement.

The public names defined by a module are determined by checking the module’s namespace for a variable named __all__; if defined, it must be a sequence of strings which are names defined or imported by that module. The names given in __all__ are all considered public and are required to exist. If __all__ is not defined, the set of public names includes all names found in the module’s namespace which do not begin with an underscore character ('_'). __all__ should contain the entire public API. It is intended to avoid accidentally exporting items that are not part of the API (such as library modules which were imported and used within the module).

The from form with * may only occur in a module scope. The wild card form of import — import * — is only allowed at the module level. Attempting to use it in class or function definitions will raise a SyntaxError.

When specifying what module to import you do not have to specify the absolute name of the module. When a module or package is contained within another package it is possible to make a relative import within the same top package without having to mention the package name. By using leading dots in the specified module or package after from you can specify how high to traverse up the current package hierarchy without specifying exact names. One leading dot means the current package where the module making the import exists. Two dots means up one package level. Three dots is up two levels, etc. So if you execute from . import mod from a module in the pkg package then you will end up importing pkg.mod. If you execute from ..subpkg2 import mod from within pkg.subpkg1 you will import pkg.subpkg2.mod. The specification for relative imports is contained within PEP 328.

importlib.import_module() is provided to support applications that determine which modules need to be loaded dynamically.

Future statements¶

A future statement is a directive to the compiler that a particular module should be compiled using syntax or semantics that will be available in a specified future release of Python. The future statement is intended to ease migration to future versions of Python that introduce incompatible changes to the language. It allows use of the new features on a per-module basis before the release in which the feature becomes standard.

future_statement ::=  "from" "__future__" "import" feature ["as" name]
                      ("," feature ["as" name])*
                      | "from" "__future__" "import" "(" feature ["as" name]
                      ("," feature ["as" name])* [","] ")"
feature          ::=  identifier
name             ::=  identifier

A future statement must appear near the top of the module. The only lines that can appear before a future statement are:

the module docstring (if any),
comments,
blank lines, and
other future statements.

The features recognized by Python 3.0 are absolute_import, division, generators, unicode_literals, print_function, nested_scopes and with_statement. They are all redundant because they are always enabled, and only kept for backwards compatibility.

A future statement is recognized and treated specially at compile time: Changes to the semantics of core constructs are often implemented by generating different code. It may even be the case that a new feature introduces new incompatible syntax (such as a new reserved word), in which case the compiler may need to parse the module differently. Such decisions cannot be pushed off until runtime.

For any given release, the compiler knows which feature names have been defined, and raises a compile-time error if a future statement contains a feature not known to it.

The direct runtime semantics are the same as for any import statement: there is a standard module __future__, described later, and it will be imported in the usual way at the time the future statement is executed.

The interesting runtime semantics depend on the specific feature enabled by the future statement.

Note that there is nothing special about the statement:

import __future__ [as name]

That is not a future statement; it’s an ordinary import statement with no special semantics or syntax restrictions.

Code compiled by calls to the built-in functions exec() and compile() that occur in a module M containing a future statement will, by default, use the new syntax or semantics associated with the future statement. This can be controlled by optional arguments to compile() — see the documentation of that function for details.

A future statement typed at an interactive interpreter prompt will take effect for the rest of the interpreter session. If an interpreter is started with the -i option, is passed a script name to execute, and the script includes a future statement, it will be in effect in the interactive session started after the script is executed.

See also

PEP 236 - Back to the __future__: The original proposal for the __future__ mechanism.

The `global` statement¶

global_stmt ::=  "global" identifier ("," identifier)*

The global statement is a declaration which holds for the entire current code block. It means that the listed identifiers are to be interpreted as globals. It would be impossible to assign to a global variable without global, although free variables may refer to globals without being declared global.

Names listed in a global statement must not be used in the same code block textually preceding that global statement.

Names listed in a global statement must not be defined as formal parameters or in a for loop control target, class definition, function definition, or import statement.

CPython implementation detail: The current implementation does not enforce the latter two restrictions, but programs should not abuse this freedom, as future implementations may enforce them or silently change the meaning of the program.

Programmer’s note: the global is a directive to the parser. It applies only to code parsed at the same time as the global statement. In particular, a global statement contained in a string or code object supplied to the built-in exec() function does not affect the code block containing the function call, and code contained in such a string is unaffected by global statements in the code containing the function call. The same applies to the eval() and compile() functions.

The `nonlocal` statement¶

nonlocal_stmt ::=  "nonlocal" identifier ("," identifier)*

The nonlocal statement causes the listed identifiers to refer to previously bound variables in the nearest enclosing scope. This is important because the default behavior for binding is to search the local namespace first. The statement allows encapsulated code to rebind variables outside of the local scope besides the global (module) scope.

Names listed in a nonlocal statement, unlike to those listed in a global statement, must refer to pre-existing bindings in an enclosing scope (the scope in which a new binding should be created cannot be determined unambiguously).

Names listed in a nonlocal statement must not collide with pre-existing bindings in the local scope.

See also

PEP 3104 - Access to Names in Outer Scopes: The specification for the nonlocal statement.

复合语句¶

复合语句包含有其它语句 (组) . 它们以某种方式影响或控制其它语句的执行. 一般地, 复合语句会跨越多行, 但一个完整的复合语句也可以简化在一行中.

if, while 和 for 语句实现了传统的控制结构. try 语句为一组语句指定了异常处理器和/或清理 (cleanup) 代码. with 语句允许为其内的代码提供初始化的清理 (finalization) 代码. 函数定义和类定义在语法上也被看作复合语句.

复合语句由一个或多个” 子句” (clause) 组成. 一个子句由一个头和一个 “语句序列”(suite) 组成. 一个具体复合语句内的所有子句头都具有相同的缩进层次. 每个子句 “头” 以一个唯一性标识关键字开始并以一个冒号结束. “语句序列”, 是该子句所控制的一组语句, 一个语句序列可以是与子句头处于同一行的, 在子句头冒号之后以分号分隔的多条简单语句, 或者也可以是在后面连续行中缩进的语句. 只有第二种情况下,子句序列才允许包括嵌套复合语句. 下面这样是非法的, 这样处理大部分原因是不易判断 else 子句与前面的 if 子句的配对关系:

if test1: if test2: print(x)

也要注意在这样的上下文中分号的优先级比冒号高, 所以在下面的例子中, 要么执行全部的 print() 调用, 要么一个也不执行:

if x < y < z: print(x); print(y); print(z)

总结:

compound_stmt ::=  if_stmt
                   | while_stmt
                   | for_stmt
                   | try_stmt
                   | with_stmt
                   | funcdef
                   | classdef
suite         ::=  stmt_list NEWLINE | NEWLINE INDENT statement+ DEDENT
statement     ::=  stmt_list NEWLINE | compound_stmt
stmt_list     ::=  simple_stmt (";" simple_stmt)* [";"]

注意语句结尾的 NEWLINE 之后可能还有一个 DEDENT, 注意可选的续行子句都是以不能开始另一个语句的关键字开头的, 因此这里不存在歧义 (“悬挂 else 问题”已经因为 Python 要求缩进嵌套语句而解决掉了).

为了叙述清楚, 以下章节中每个子句的语法规则格式都会分行列出.

`if` 语句¶

if 语句用于条件执行:

if_stmt ::=  "if" expression ":" suite
             ( "elif" expression ":" suite )*
             ["else" ":" suite]

它对表达式逐个求值, 直到其中一个为真时, 准确地选择相应的一个语句序列 (对于真和假的定义参见 Boolean operations 节), 然后该执行语句序列 (if 语句的其它部分不会被执行和计算). 如果所有表达式都为假, 并且给出了 else 子句, 那么将执行它包括的语句序列.

`while` 语句¶

while 用于重复执行, 前提是条件表达式为真:

while_stmt ::=  "while" expression ":" suite
                ["else" ":" suite]

while 会重复地计算表达式的值, 并且如果为真, 就执行第一个语句序列; 如果为假 (可能在第一次比较时), 就执行else子句 (如果给出) 并退出循环.

在第一个语句序列中执行 break 语句就可以做到不执行 else 子句而退出循环. 在第一个语句序列执行 continue 语句可以跳过该子句的其余部分直接进入下次的表达式测试.

`for` 语句¶

for 语句用于迭代有序类型 (像串, 元组或列表) 或其它可迭代对象的元素:

for_stmt ::=  "for" target_list "in" expression_list ":" suite
              ["else" ":" suite]

只计算一次 expression_list , 它应该生成一个迭代器对象. 然后在迭代器每次提供一个元素时就会执行语句序列 (suite) 一次, 元素按索引升序循环给出. 每个元素使用标准的赋值规则 (见 Assignment statements) 依次赋给循环的 target_list, 然后执行语句序列. 当迭代完毕后(当有序类型对象为空, 或者迭代器抛出异常 StopIteration 时立即结束循环), 就执行 else 子句 (如果给出) 中的语句序列, 最后结束循环.

在第一个语句序列中执行 break 语句可以不执行 else 子句就退出循环. 在第一个语句序列中执行 continue 语句可以跳过该子句的其余部分, 直接处理下个元素, 或者如果没有下个元素了, 就进入 else 子句.

语句序列可以对 target_list 中的变量赋值, 这不影响 for 语句赋下一项元素给它.

在循环结束后, 这个 target_list 并不会删除, 但如果有序类型对象为空, 它根本就不会在循环中赋值. 小技巧:内置函数 range() 返回一个整数列表, 可以用于模拟Pascal语言中的 for i := a to b 的行为, 例如 list(range(3)) 返回列表 [0, 1, 2].

Note

如果在循环中要修改有序类型对象 (仅对可变类型而言, 即列表), 这里有一些要注意的地方. 有一个内部计数器用于跟踪下一轮循环使用哪一个元素, 并且每次迭代就增加一次. 当这个计数器到达有序类型对象的长度时该循环就结束了. 这意味着如果语句序列删除了当前元素 (或一个之前的元素) 时, 下一个元素就会被跳过去 (因为当前索引值的元素已经处理过了). 类似地, 如果在当前元素前插入了一个元素, 则当前元素会在下一轮循环再次得到处理. 这可能会导致难以觉察的错误, 但可以通过使用含有整个有序类型对象的片断而生成的临时拷贝避免这个问题, 例如:

for x in a[:]:
    if x < 0: a.remove(x)

`try` 语句¶

try 语句为一组语句指定异常处理器和/或清理代码:

try_stmt  ::=  try1_stmt | try2_stmt
try1_stmt ::=  "try" ":" suite
               ("except" [expression ["as" target]] ":" suite)+
               ["else" ":" suite]
               ["finally" ":" suite]
try2_stmt ::=  "try" ":" suite
               "finally" ":" suite

except 子句指定了一个或多个异常处理器. 当在 try 子句中没有异常发生时, 异常处理器将不被执行. 当在 try 子句中有异常发生时, 就会开始搜索异常处理器. 它会按书写顺序搜索每个子句, 直到有一个匹配的处理器找到为止. 如果存在一个没有指定异常的 except 子句, 它必须放在最后, 它会匹配任何异常. 当一个 except 子句携带了一个表达式时, 这个表达式会被求值, 如果结果与该异常” 兼容”, 那么该子句就匹配上了这个异常. 对象与异常兼容是指, 对象与这个异常的类或者基类相同, 或者对象是一个元组, 它的某个项包括与该异常兼容的对象.

如果没有 except 子句匹配异常, 异常处理器的搜索工作将继续在外层代码和调用栈上进行. [1]

如果在 except 子句头部计算表达式时引发了异常, 那么就会中断原异常处理器的搜索工作, 而在外层代码和调用栈上搜索新的异常处理器 (就好像是整个 try 语句发生了异常一样).

当找到了一个匹配的 except 子句时, 异常对象就被赋给 except 子句中关键字 as 指定的目标对象 (如果给出), 并且执行其后的语句序列. 每个 except 子句必须一个可执行代码块. 当执行到该代码块末尾时, 会跳转到整个 try 语句之后继续正常执行 (这意味着, 如果有两个嵌套的异常处理器要处理同一个异常的话, 那么如果异常已经在内层处理了, 外层处理器就不会响应这个异常了).

在使用 as target 形式将异常赋值时, 它会在 except 子句结束时自动清除掉:

except E as N:
    foo

被转换为:

except E as N:
    try:
        foo
    finally:
        del N

这意味着如果你想在 except 子句之后访问这个异常, 就必须在处理它时把它赋给另一个变量. 这么设计的原因在于回溯跟踪对象与这个异常关联, 而它们与栈桢会构成了一个引用循环, 从而使栈桢上所有局部变量直到下次垃圾回收时才被回收.

在某个 except 子句的语句序列被执行前, 异常的详细情况记录在 sys 模块中, 可以通过函数 sys.exc_info() 访问. sys.exc_info() 返回一个元组, 包括 exc_type, 异常类; exc_value, 异常实例; exc_traceback, 记录异常发生点的回溯对象 (见标准类型层次 ). sys.exc_info() 值会在处理异常的函数返回时会恢复它们之前的值 (调用之前).

当控制从 try 子句的尾部结束时 (即没有异常发生时), 就执行可选的 else 子句. [2] 在 else 子句中引发的异常不会在前面的 except 子句里得到处理.

如果给出了 finally, 它就指定一个”清理”处理器 (cleanup handler). 这种语法下, try 子句会得到执行, 也包括任何 except 和 else 子句. 如果在任何子句中发生了异常, 并且这个异常没有得到处理, 该异常就会被临时保存起来. 之后, finally 子句就会得以执行. 然后暂存的异常在 finally 子句末尾被重新引发. 如果执行 finally 子句时引发了另一个异常或执行了:keyword:return 或 break 语句, 就会抛弃保存的异常. 在执行 finally 子句时异常信息是无效的.

当在 try ...finally 语句中的 try 语句序列中执行 return , break 或 continue 时, finally 子句也会 “在退出的路上” 被执行. 在 finally 子句中的 continue 语句是非法的 (这缘于因为当前实现中的一个问题——以后可能会去掉这个限制).

关于异常的更多信息可以在 Exceptions 中找到, 关于如何使用 raise 语句产生异常的信息, 可以在 The raise statement 中找到.

`with` 语句¶

with 语句用于封装上下文管理器 (见 With 语句的上下文管理器) 定义的方法的代码块的执行. 这允许我们方便地复用常见的 try...except...finally 使用模式.

with_stmt ::=  "with" with_item ("," with_item)* ":" suite
with_item ::=  expression ["as" target]

with 语句对一个 “item” 的执行按如下方式进行:

对上下文表达式求值得到一个上下文管理器.
调用上下文管理器的 __enter__() 方法.
如果 with 语句包括有 target, 就将 __enter__() 的返回值赋给它.

Note

with 语句保证了如果 __enter__() 是无错返回的, 就一定会调用 __exit__() 方法. 如果在给 target list 赋值时发生错误, 就按在语句序列里发生错误同样对待, 参见下面的步骤６.
执行语句序列.
调用上下文管理器的 __exit__() 方法. 如果语句序列导致了一个异常, 那么异常的异常的类型,值和回溯对象都作为参数传递给 __exit__() 方法. 否则, 使用 None 作为参数.

如果语句序列因为异常退出, 且 __exit__() 方法返回假, 那么异常就会重新抛出. 如果返回值为真, 异常就会被 “吃掉”, 并且执行会在 with 语句之后继续.

如果语句序列不是因为异常的原因退出的, 那么 __exit__() 的返回值会被忽略掉, 并且在退出点后继续运行程序.

使用多个项时, 上下文管理器就按多个嵌套 with 语句处理:

with A() as a, B() as b:
    suite

等价于:

with A() as a:
    with B() as b:
        suite

Changed in version 3.1:

Changed in version 3.1: Support for multiple context expressions.

See also

PEP 0343 - The “with” statement

Python with 语句的规范, 背景和例子.

函数定义¶

“函数定义”定义了一个用户定义函数对象 (见标准类型层次):

funcdef        ::=  [decorators] "def" funcname "(" [parameter_list] ")" ["->" expression] ":" suite
decorators     ::=  decorator+
decorator      ::=  "@" dotted_name ["(" [argument_list [","]] ")"] NEWLINE
dotted_name    ::=  identifier ("." identifier)*
parameter_list ::=  (defparameter ",")*
                    (  "*" [parameter] ("," defparameter)*
                    [, "**" parameter]
                    | "**" parameter
                    | defparameter [","] )
parameter      ::=  identifier [":" expression]
defparameter   ::=  parameter ["=" expression]
funcname       ::=  identifier

函数定义是一个可执行语句. 执行它会在当前局部名字空间中将函数名字与函数对象 (一个函数可执行代码的包装对象) 绑定在一起. 这个函数对象包括一个全局名字空间的引用, 以便在调用时使用.

函数定义不执行函数体, 它们只在调用时执行. [3]

函数定义前可能有若干个 decorator 表达式. Decorator 表达式于函数定义时, 且在函数定义所在的作用域里求值. 结果必须是可调用的, 它以函数对象为唯一参数, 然后它的返回值将与函数名绑定, 而不是函数对象本身. 多个Decorator表达式可以嵌套使用, 例如, 以下代码:

@f1(arg)
@f2
def func(): pass

等价于:

def func(): pass
func = f1(arg)(f2(func))

当一个或多个参数以 parameter = expression 形式出现时, 我们就说这个函数具有” 默认参数值”. 对于有默认参数值的参数, 可以在调用时省略它们, 此时他们被赋予默认值. 如果某参数具有默认值, 则它之后直到 “*” 的所有参数都必须有默认值 — 这是以上语法说明中没有表达出来的一个限制.

默认参数值是在执行函数定义时计算的. 这意味着这个表达式仅仅求值一次, 时间是函数定义时, 并且所有调用都使用这个 “预计算” 的值. 在理解默认参数值是一个像列表, 字典这样的可变对象时, 这需要特别注意: 如果修改了这个对象 (例如给列表追加了一项), 默认值也随之修改. 这通常是应该避免的. 避免这个麻烦的一个方法就是使用 None 作默认值, 然后在函数体中作显式的测试, 例如:

def whats_on_the_telly(penguin=None):
    if penguin is None:
        penguin = []
    penguin.append("property of the zoo")
    return penguin

函数调用语义的详细说明, 参见 Calls 一节. 函数调用通常会给每个参数表中的参数赋一个值, 值的来源要么是位置参数, 要么是关键字参数或者是默认值. 如果给出了 “*identifier” 语法, 这个标识符就被初始化成一个接受所有额外位置参数的元组, 默认为空元组. 如果使用了 “**identiﬁer” 语法, 它就被初始化成一个接受所有额外关键字参数的字典, 默认为一个新的空字典. 在 “*” or “*identifier” 之后的参数必须是纯关键字参数, 并且只能使用指定关键字的方式传递.

可以使用参数名之后 “: expression” 语法为参数添加一个注解. 任何参数都可以有注解,甚至包括 *identifier 或 **identifier. 函数也可以有一个 “返回” 注解, 语法是在参数列表之后使用 “-> expression”. 这些注解可以是任何合法的Python表达式, 它是在函数定义时求值的, 但它们的求值顺序可能与在源代码中的书写顺序不同. 使用注解不会改变函数的语义, 注解的值可以通过函数对象的属性 __annotations__ 访问, 它是一个字典, 键是参数名字.

也可以创建匿名函数 (没有名字与之绑定的函数), 它可以直接在表达式中使用. 这是通过lambda表达式实现的, 详见 Lambdas. 注意lambda形式只是一个简单函数的简写形式, 以 def 定义的函数也可以被传递, 或者赋予另一个名字, 与以lambda定义的函数一样. 以 def 定义的函数功能要更强大些, 因为它允许执行多条语句和注解.

程序员注意: 在函数定义中执行的 def 可以创建一个局部函数, 可用于返回和传递. 在嵌套函数里, 可以通过自由变量访问包括这个函数定义的函数的局部变量. 详见 Naming and binding.

类定义¶

“类定义”定义一个类对象 (参见标准类型层次):

classdef    ::=  [decorators] "class" classname [inheritance] ":" suite
inheritance ::=  "(" [argument_list [","] ] ")"
classname   ::=  identifier

类定义是一条可执行语句. 继承列表通常给出一个基类列表 (更高级的用法参见类创建的定制 ). 所以, 列表中的每个项都应该是允许子类化 (subclassing) 的类对象. 没有继承列表的类默认继承自基类 object; 因此:

class Foo:
    pass

等价于

class Foo(object):
    pass

类的语句序列在新的栈桢结构 (见 Naming and binding) 内执行, 它会使用一个新建的局部名字空间和现有全局名字空间 (这个语句序列里通常只有函数定义). 当这个语句序执行结束时, 就会丢弃掉这个栈桢结构, 但其局部名字空间被保存了下来. [#]_ 之后, 使用继承关系列表作为基类, 使用保存的名字空间作为属性字典, 创建新的类对象. 最后, 这个新类对象的名字, 会在最初的局部名字空间中与该类对象绑定.

类的创建可以使用 metaclasses 中介绍的技术进行大量定制.

类也可以使用decorate表达式; 类似于函数,:

@f1(arg)
@f2
class Foo: pass

等价于:

class Foo: pass
Foo = f1(arg)(f2(Foo))

程序员注意: 在类定义中定义的变量是类属性, 它们由所有类实例共享. 实例属性可以使用 self.name = value 设置值. 实例属性和类属性都可以使用这种方式访问, 但实例属性会掩盖掉类属性. 类属性可以用于实例属性的默认值, 但使用可变对象作为默认值可能导致并非预期的效果, 还可以使用　Descriptors 创建具有不同实现的实例属性.

See also

PEP 3116 - Metaclasses in Python 3 (Python 3中的元类) PEP 3129 - Class Decorators (类装饰器)

Footnotes

[1]	只有在 `finally` 子句没有取消 (negate) 异常时, 异常才会传播出调用栈.

[2]	目前, 尾部结束 (“flows off the end”) 不包括出现异常的情况, 以及执行 `return`, `continue` 和 `break` 语句的情况.

[3]	在函数体作为第一条语句出现的字符串字面值, 会被转换为函数的 `__doc__` 属性, 即作为函数的 docstring.

Top-level components¶

The Python interpreter can get its input from a number of sources: from a script passed to it as standard input or as program argument, typed in interactively, from a module source file, etc. This chapter gives the syntax used in these cases.

Complete Python programs¶

While a language specification need not prescribe how the language interpreter is invoked, it is useful to have a notion of a complete Python program. A complete Python program is executed in a minimally initialized environment: all built-in and standard modules are available, but none have been initialized, except for sys (various system services), builtins (built-in functions, exceptions and None) and __main__. The latter is used to provide the local and global namespace for execution of the complete program.

The syntax for a complete Python program is that for file input, described in the next section.

The interpreter may also be invoked in interactive mode; in this case, it does not read and execute a complete program but reads and executes one statement (possibly compound) at a time. The initial environment is identical to that of a complete program; each statement is executed in the namespace of __main__.

Under Unix, a complete program can be passed to the interpreter in three forms: with the -c string command line option, as a file passed as the first command line argument, or as standard input. If the file or standard input is a tty device, the interpreter enters interactive mode; otherwise, it executes the file as a complete program.

File input¶

All input read from non-interactive files has the same form:

file_input ::=  (NEWLINE | statement)*

This syntax is used in the following situations:

when parsing a complete Python program (from a file or from a string);
when parsing a module;
when parsing a string passed to the exec() function;

Interactive input¶

Input in interactive mode is parsed using the following grammar:

interactive_input ::=  [stmt_list] NEWLINE | compound_stmt NEWLINE

Note that a (top-level) compound statement must be followed by a blank line in interactive mode; this is needed to help the parser detect the end of the input.

Expression input¶

There are two forms of expression input. Both ignore leading whitespace. The string argument to eval() must have the following form:

eval_input ::=  expression_list NEWLINE*

Note: to read ‘raw’ input line without interpretation, you can use the the readline() method of file objects, including sys.stdin.

Full Grammar specification¶

This is the full Python grammar, as it is read by the parser generator and used to parse Python source files:

# Grammar for Python

# Note:  Changing the grammar specified in this file will most likely
#        require corresponding changes in the parser module
#        (../Modules/parsermodule.c).  If you can't make the changes to
#        that module yourself, please co-ordinate the required changes
#        with someone who can; ask around on python-dev for help.  Fred
#        Drake <fdrake@acm.org> will probably be listening there.

# NOTE WELL: You should also follow all the steps listed in PEP 306,
# "How to Change Python's Grammar"

# Start symbols for the grammar:
#       single_input is a single interactive statement;
#       file_input is a module or sequence of commands read from an input file;
#       eval_input is the input for the eval() and input() functions.
# NB: compound_stmt in single_input is followed by extra NEWLINE!
single_input: NEWLINE | simple_stmt | compound_stmt NEWLINE
file_input: (NEWLINE | stmt)* ENDMARKER
eval_input: testlist NEWLINE* ENDMARKER

decorator: '@' dotted_name [ '(' [arglist] ')' ] NEWLINE
decorators: decorator+
decorated: decorators (classdef | funcdef)
funcdef: 'def' NAME parameters ['->' test] ':' suite
parameters: '(' [typedargslist] ')'
typedargslist: (tfpdef ['=' test] (',' tfpdef ['=' test])* [','
       ['*' [tfpdef] (',' tfpdef ['=' test])* [',' '**' tfpdef] | '**' tfpdef]]
     |  '*' [tfpdef] (',' tfpdef ['=' test])* [',' '**' tfpdef] | '**' tfpdef)
tfpdef: NAME [':' test]
varargslist: (vfpdef ['=' test] (',' vfpdef ['=' test])* [','
       ['*' [vfpdef] (',' vfpdef ['=' test])* [',' '**' vfpdef] | '**' vfpdef]]
     |  '*' [vfpdef] (',' vfpdef ['=' test])* [',' '**' vfpdef] | '**' vfpdef)
vfpdef: NAME

stmt: simple_stmt | compound_stmt
simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE
small_stmt: (expr_stmt | del_stmt | pass_stmt | flow_stmt |
             import_stmt | global_stmt | nonlocal_stmt | assert_stmt)
expr_stmt: testlist_star_expr (augassign (yield_expr|testlist) |
                     ('=' (yield_expr|testlist_star_expr))*)
testlist_star_expr: (test|star_expr) (',' (test|star_expr))* [',']
augassign: ('+=' | '-=' | '*=' | '/=' | '%=' | '&=' | '|=' | '^=' |
            '<<=' | '>>=' | '**=' | '//=')
# For normal assignments, additional restrictions enforced by the interpreter
del_stmt: 'del' exprlist
pass_stmt: 'pass'
flow_stmt: break_stmt | continue_stmt | return_stmt | raise_stmt | yield_stmt
break_stmt: 'break'
continue_stmt: 'continue'
return_stmt: 'return' [testlist]
yield_stmt: yield_expr
raise_stmt: 'raise' [test ['from' test]]
import_stmt: import_name | import_from
import_name: 'import' dotted_as_names
# note below: the ('.' | '...') is necessary because '...' is tokenized as ELLIPSIS
import_from: ('from' (('.' | '...')* dotted_name | ('.' | '...')+)
              'import' ('*' | '(' import_as_names ')' | import_as_names))
import_as_name: NAME ['as' NAME]
dotted_as_name: dotted_name ['as' NAME]
import_as_names: import_as_name (',' import_as_name)* [',']
dotted_as_names: dotted_as_name (',' dotted_as_name)*
dotted_name: NAME ('.' NAME)*
global_stmt: 'global' NAME (',' NAME)*
nonlocal_stmt: 'nonlocal' NAME (',' NAME)*
assert_stmt: 'assert' test [',' test]

compound_stmt: if_stmt | while_stmt | for_stmt | try_stmt | with_stmt | funcdef | classdef | decorated
if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite]
while_stmt: 'while' test ':' suite ['else' ':' suite]
for_stmt: 'for' exprlist 'in' testlist ':' suite ['else' ':' suite]
try_stmt: ('try' ':' suite
           ((except_clause ':' suite)+
            ['else' ':' suite]
            ['finally' ':' suite] |
           'finally' ':' suite))
with_stmt: 'with' with_item (',' with_item)*  ':' suite
with_item: test ['as' expr]
# NB compile.c makes sure that the default except clause is last
except_clause: 'except' [test ['as' NAME]]
suite: simple_stmt | NEWLINE INDENT stmt+ DEDENT

test: or_test ['if' or_test 'else' test] | lambdef
test_nocond: or_test | lambdef_nocond
lambdef: 'lambda' [varargslist] ':' test
lambdef_nocond: 'lambda' [varargslist] ':' test_nocond
or_test: and_test ('or' and_test)*
and_test: not_test ('and' not_test)*
not_test: 'not' not_test | comparison
comparison: expr (comp_op expr)*
comp_op: '<'|'>'|'=='|'>='|'<='|'<>'|'!='|'in'|'not' 'in'|'is'|'is' 'not'
star_expr: '*' expr
expr: xor_expr ('|' xor_expr)*
xor_expr: and_expr ('^' and_expr)*
and_expr: shift_expr ('&' shift_expr)*
shift_expr: arith_expr (('<<'|'>>') arith_expr)*
arith_expr: term (('+'|'-') term)*
term: factor (('*'|'/'|'%'|'//') factor)*
factor: ('+'|'-'|'~') factor | power
power: atom trailer* ['**' factor]
atom: ('(' [yield_expr|testlist_comp] ')' |
       '[' [testlist_comp] ']' |
       '{' [dictorsetmaker] '}' |
       NAME | NUMBER | STRING+ | '...' | 'None' | 'True' | 'False')
testlist_comp: (test|star_expr) ( comp_for | (',' (test|star_expr))* [','] )
trailer: '(' [arglist] ')' | '[' subscriptlist ']' | '.' NAME
subscriptlist: subscript (',' subscript)* [',']
subscript: test | [test] ':' [test] [sliceop]
sliceop: ':' [test]
exprlist: (expr|star_expr) (',' (expr|star_expr))* [',']
testlist: test (',' test)* [',']
dictorsetmaker: ( (test ':' test (comp_for | (',' test ':' test)* [','])) |
                  (test (comp_for | (',' test)* [','])) )

classdef: 'class' NAME ['(' [arglist] ')'] ':' suite

arglist: (argument ',')* (argument [',']
                         |'*' test (',' argument)* [',' '**' test] 
                         |'**' test)
# The reason that keywords are test nodes instead of NAME is that using NAME
# results in an ambiguity. ast.c makes sure it's a NAME.
argument: test [comp_for] | test '=' test  # Really [keyword '='] test
comp_iter: comp_for | comp_if
comp_for: 'for' exprlist 'in' or_test [comp_iter]
comp_if: 'if' test_nocond [comp_iter]

# not used in grammar, but may appear in "node" passed from Parser to Compiler
encoding_decl: NAME

yield_expr: 'yield' [testlist]

Python 标准库¶

Release:	3.2
Date:	August 02, 2015

Python 语言参考描述了 Python 语言的确切语法和语义, 而这个库参考手册介绍了大量的 Python 的标准库. 它还介绍了一些可选组件, 它们通常包含在 Python 发行版本中.

Python 的标准库非常的强大, 它提供了以下列举的众多功能. 该库包含一些内建模块 (由 C 编写), 它们提供了对系统功能的访问, 比如说文件 I/O, 同时, 该库还包含由 Python 编写的模块, 它们提供了日常编程中遇到的许多问题的解决方案. 其中一些模块被明确地设计来鼓励和加强 Python 程序的可移植性, 通过把特定平台的接口抽象为平台无关的 API.

Windows 平台下的 Python 安装包通常包含整个标准库, 并且经常还包含许多附加的组件. 对于类 Unix 操作系统, Python 一般作为一个*包集* (collection of packages) 提供, 因此有必要使用操作系统提供的包工具来获得一些或全部可选组件. In addition to the standard library, there is a growing collection of several thousand components (from individual programs and modules to packages and entire application development frameworks), available from the Python Package Index.

作为标准库的补充, 有一个包含几千个组件, 并在持续增长的集 (既有个体的程序和模块, 也有包和整个应用开发框架), 可以在 Python 包索引里找到它.

介绍¶

Python 库是由几种不同类型的组成部分构成的。

包括组成一个编程语言最核心部分的数据类型，比如数字、列表。对于这些数据类型，Python 语言核心定义了它们的表达形式和一些语义规范，但并没有定义全部的语义规范。（比如，语言核心定义了语义规范，像拼写计算符的优先级。）

Python 库还包括了内置函数和表达式，你可以在所有的 Python 代码中使用这些内置函数和表达式而不需要使用 import 语句。一些内置函数和表达式是被 Python 本身核心代码所使用的，但是很多并不是核心语义所必须的，它们的存在只是为了方便使用。

Python 库的主要由很多模块构成的。我们可以从很多方面去剖析这些模块。一些模块使用 C 语言写成并且内置到 Python 解释器中；另外的是由 Python 编写并且以源代码的形式直接被导入到 Python 中。一些模块提供非常具体的针对 Python 的接口，比如 Python 堆栈跟踪的模块；一些是针对操作系统的，比如操作具体的硬件；还有一些模块提供某一应用领域的接口，比如面向互联网。一些模块可以在任意版本和平台的 Python 中使用；另外一些只能在特定的系统支持的情况下才能使用；还有一些模块只有在编译 Python 时使用了某些特定参数的情况下才能使用。

本手册是按照“由内而外”的顺序组织的：首先介绍了内置数据结构，然后是内置函数和异常，最后分别在各个章节介绍相关的各种模块。这些章节的是按照其相关内容的重要程度由重要到不重要的顺序排序的。

这就意味着，不管你是从头阅读，还是因为无聊随便翻到某一章节，你都可以获得一个对于当前章节所讲解的模块以及其应用的合理的、全面的了解。当然，你完全不必像阅读一部小说一样阅读这部手册，你可以查阅目录，或者直接在索引中搜索具体的函数、模块、条目。最后，如果你希望学习随机的章节，你可以选择一个随机页码数（模块 random ）然后随便阅读一两节。不论你是按照什么顺序阅读手册的各个部分，最好先阅读内置函数（内置函数）这章，因为我们假设当你在阅读手册其他部分的时候你对于内置函数已经很熟悉了。

那么，现在就开始向大家介绍吧。

内置函数¶

Python中内置了很多函数和类型，你可以在任何时候使用它们。以下按字母表顺序列出它们。

		内置函数
`abs()`	`dict()`	`help()`	`min()`	`setattr()`
`all()`	`dir()`	`hex()`	`next()`	`slice()`
`any()`	`divmod()`	`id()`	`object()`	`sorted()`
`ascii()`	`enumerate()`	`input()`	`oct()`	`staticmethod()`
`bin()`	`eval()`	`int()`	`open()`	`str()`
`bool()`	`exec()`	`isinstance()`	`ord()`	`sum()`
`bytearray()`	`filter()`	`issubclass()`	`pow()`	`super()`
`bytes()`	`float()`	`iter()`	`print()`	`tuple()`
`callable()`	`format()`	`len()`	`property()`	`type()`
`chr()`	`frozenset()`	`list()`	`range()`	`vars()`
`classmethod()`	`getattr()`	`locals()`	`repr()`	`zip()`
`compile()`	`globals()`	`map()`	`reversed()`	`__import__()`
`complex()`	`hasattr()`	`max()`	`round()`
`delattr()`	`hash()`	`memoryview()`	`set()`

abs(x)¶: 返回一个数的绝对值。参数可以是一个整数或者一个浮点数。如果参数是一个复数，那么将返回它的模。

all(iterable)¶

当 iterable 中所有元素都为 True 时（或者 iterable 为空），返回 True 。相当于：

def all(iterable):
    for element in iterable:
        if not element:
            return False
    return True

any(iterable)¶

当 iterable 中有元素为 True 时，则返回 True 。如果当 iterable 为空时，返回 False 。相当于：

def any(iterable):
    for element in iterable:
        if element:
            return True
    return False

ascii(object)¶: 就像函数 repr() ，返回一个输入对象的可打印的字符串，但是在返回字符串中去掉非 ASCII 编码的字符，而这些字符在 repr() 生成的字符串中利用编码加 \x ， \u 或 \U 前缀来表示。这个函数所生成的字符串与 Python2 中的 repr() 函数类似。

bin(x)¶: 将一个整数转化为一个二进制字符串。结果是一个可用的 Python 表达式。如果 x 不是 Python 中的 int 类型，那么需要定义 __index__() 方法使之返回一个整数。

bool([x])¶: 将一个值转换为布尔类型，使用标准的真值判断过程。如果 x 为假或者不存在，将返回 False ；否则则返回 True 。 bool 同样是一个类，是 int 的子类， bool 类不能作为其他子类。它仅有的实例是 False 和 True 。

bytearray([source[, encoding[, errors]]])¶

Return a new array of bytes. The bytearray type is a mutable sequence of integers in the range 0 <= x < 256. It has most of the usual methods of mutable sequences, described in Mutable Sequence Types, as well as most methods that the bytes type has, see Bytes and Byte Array Methods.

The optional source parameter can be used to initialize the array in a few different ways:

If it is a string, you must also give the encoding (and optionally, errors) parameters; bytearray() then converts the string to bytes using str.encode().
If it is an integer, the array will have that size and will be initialized with null bytes.
If it is an object conforming to the buffer interface, a read-only buffer of the object will be used to initialize the bytes array.
If it is an iterable, it must be an iterable of integers in the range 0 <= x < 256, which are used as the initial contents of the array.

Without an argument, an array of size 0 is created.

bytes([source[, encoding[, errors]]])¶

Return a new “bytes” object, which is an immutable sequence of integers in the range 0 <= x < 256. bytes is an immutable version of bytearray – it has the same non-mutating methods and the same indexing and slicing behavior.

Accordingly, constructor arguments are interpreted as for bytearray().

Bytes objects can also be created with literals, see 字符串与字节的字面值.

callable(object)¶

Return True if the object argument appears callable, False if not. If this returns true, it is still possible that a call fails, but if it is false, calling object will never succeed. Note that classes are callable (calling a class returns a new instance); instances are callable if their class has a __call__() method.

New in version 3.2:

New in version 3.2: This function was first removed in Python 3.0 and then brought back in Python 3.2.

chr(i)¶

Return the string representing a character whose Unicode codepoint is the integer i. For example, chr(97) returns the string 'a'. This is the inverse of ord(). The valid range for the argument is from 0 through 1,114,111 (0x10FFFF in base 16). ValueError will be raised if i is outside that range.

Note that on narrow Unicode builds, the result is a string of length two for i greater than 65,535 (0xFFFF in hexadecimal).

classmethod(function)¶

Return a class method for function.

A class method receives the class as implicit first argument, just like an instance method receives the instance. To declare a class method, use this idiom:

class C:
    @classmethod
    def f(cls, arg1, arg2, ...): ...

The @classmethod form is a function decorator – see the description of function definitions in 函数定义 for details.

It can be called either on the class (such as C.f()) or on an instance (such as C().f()). The instance is ignored except for its class. If a class method is called for a derived class, the derived class object is passed as the implied first argument.

Class methods are different than C++ or Java static methods. If you want those, see staticmethod() in this section.

For more information on class methods, consult the documentation on the standard type hierarchy in 标准类型层次.

compile(source, filename, mode, flags=0, dont_inherit=False, optimize=-1)¶

Compile the source into a code or AST object. Code objects can be executed by exec() or eval(). source can either be a string or an AST object. Refer to the ast module documentation for information on how to work with AST objects.

The filename argument should give the file from which the code was read; pass some recognizable value if it wasn’t read from a file ('<string>' is commonly used).

The mode argument specifies what kind of code must be compiled; it can be 'exec' if source consists of a sequence of statements, 'eval' if it consists of a single expression, or 'single' if it consists of a single interactive statement (in the latter case, expression statements that evaluate to something other than None will be printed).

The optional arguments flags and dont_inherit control which future statements (see PEP 236) affect the compilation of source. If neither is present (or both are zero) the code is compiled with those future statements that are in effect in the code that is calling compile. If the flags argument is given and dont_inherit is not (or is zero) then the future statements specified by the flags argument are used in addition to those that would be used anyway. If dont_inherit is a non-zero integer then the flags argument is it – the future statements in effect around the call to compile are ignored.

Future statements are specified by bits which can be bitwise ORed together to specify multiple statements. The bitfield required to specify a given feature can be found as the compiler_flag attribute on the _Feature instance in the __future__ module.

The argument optimize specifies the optimization level of the compiler; the default value of -1 selects the optimization level of the interpreter as given by -O options. Explicit levels are 0 (no optimization; __debug__ is true), 1 (asserts are removed, __debug__ is false) or 2 (docstrings are removed too).

This function raises SyntaxError if the compiled source is invalid, and TypeError if the source contains null bytes.

Note

When compiling a string with multi-line code in 'single' or 'eval' mode, input must be terminated by at least one newline character. This is to facilitate detection of incomplete and complete statements in the code module.

Changed in version 3.2:

Changed in version 3.2: Allowed use of Windows and Mac newlines. Also input in 'exec' mode does not have to end in a newline anymore. Added the optimize parameter.

complex([real[, imag]])¶

Create a complex number with the value real + imag*j or convert a string or number to a complex number. If the first parameter is a string, it will be interpreted as a complex number and the function must be called without a second parameter. The second parameter can never be a string. Each argument may be any numeric type (including complex). If imag is omitted, it defaults to zero and the function serves as a numeric conversion function like int() and float(). If both arguments are omitted, returns 0j.

The complex type is described in Numeric Types — int, float, complex.

delattr(object, name)¶: This is a relative of setattr(). The arguments are an object and a string. The string must be the name of one of the object’s attributes. The function deletes the named attribute, provided the object allows it. For example, delattr(x, 'foobar') is equivalent to del x.foobar.

dict([arg])

Create a new data dictionary, optionally with items taken from arg. The dictionary type is described in Mapping Types — dict.

For other containers see the built in list, set, and tuple classes, and the collections module.

dir([object])¶

Without arguments, return the list of names in the current local scope. With an argument, attempt to return a list of valid attributes for that object.

If the object has a method named __dir__(), this method will be called and must return the list of attributes. This allows objects that implement a custom __getattr__() or __getattribute__() function to customize the way dir() reports their attributes.

If the object does not provide __dir__(), the function tries its best to gather information from the object’s __dict__ attribute, if defined, and from its type object. The resulting list is not necessarily complete, and may be inaccurate when the object has a custom __getattr__().

The default dir() mechanism behaves differently with different types of objects, as it attempts to produce the most relevant, rather than complete, information:

If the object is a module object, the list contains the names of the module’s attributes.
If the object is a type or class object, the list contains the names of its attributes, and recursively of the attributes of its bases.
Otherwise, the list contains the object’s attributes’ names, the names of its class’s attributes, and recursively of the attributes of its class’s base classes.

The resulting list is sorted alphabetically. For example:

>>> import struct
>>> dir()   # show the names in the module namespace
['__builtins__', '__doc__', '__name__', 'struct']
>>> dir(struct)   # show the names in the struct module
['Struct', '__builtins__', '__doc__', '__file__', '__name__',
 '__package__', '_clearcache', 'calcsize', 'error', 'pack', 'pack_into',
 'unpack', 'unpack_from']
>>> class Shape(object):
        def __dir__(self):
            return ['area', 'perimeter', 'location']
>>> s = Shape()
>>> dir(s)
['area', 'perimeter', 'location']

Note

Because dir() is supplied primarily as a convenience for use at an interactive prompt, it tries to supply an interesting set of names more than it tries to supply a rigorously or consistently defined set of names, and its detailed behavior may change across releases. For example, metaclass attributes are not in the result list when the argument is a class.

divmod(a, b)¶: Take two (non complex) numbers as arguments and return a pair of numbers consisting of their quotient and remainder when using integer division. With mixed operand types, the rules for binary arithmetic operators apply. For integers, the result is the same as (a // b, a % b). For floating point numbers the result is (q, a % b), where q is usually math.floor(a / b) but may be 1 less than that. In any case q * b + a % b is very close to a, if a % b is non-zero it has the same sign as b, and 0 <= abs(a % b) < abs(b).

enumerate(iterable, start=0)¶

Return an enumerate object. iterable must be a sequence, an iterator, or some other object which supports iteration. The __next__() method of the iterator returned by enumerate() returns a tuple containing a count (from start which defaults to 0) and the values obtained from iterating over iterable.

>>> seasons = ['Spring', 'Summer', 'Fall', 'Winter']
>>> list(enumerate(seasons))
[(0, 'Spring'), (1, 'Summer'), (2, 'Fall'), (3, 'Winter')]
>>> list(enumerate(seasons, start=1))
[(1, 'Spring'), (2, 'Summer'), (3, 'Fall'), (4, 'Winter')]

Equivalent to:

def enumerate(sequence, start=0):
    n = start
    for elem in sequence:
        yield n, elem
        n += 1

eval(expression, globals=None, locals=None)¶

The arguments are a string and optional globals and locals. If provided, globals must be a dictionary. If provided, locals can be any mapping object.

The expression argument is parsed and evaluated as a Python expression (technically speaking, a condition list) using the globals and locals dictionaries as global and local namespace. If the globals dictionary is present and lacks ‘__builtins__’, the current globals are copied into globals before expression is parsed. This means that expression normally has full access to the standard builtins module and restricted environments are propagated. If the locals dictionary is omitted it defaults to the globals dictionary. If both dictionaries are omitted, the expression is executed in the environment where eval() is called. The return value is the result of the evaluated expression. Syntax errors are reported as exceptions. Example:

>>> x = 1
>>> eval('x+1')
2

This function can also be used to execute arbitrary code objects (such as those created by compile()). In this case pass a code object instead of a string. If the code object has been compiled with 'exec' as the mode argument, eval()‘s return value will be None.

Hints: dynamic execution of statements is supported by the exec() function. The globals() and locals() functions returns the current global and local dictionary, respectively, which may be useful to pass around for use by eval() or exec().

See ast.literal_eval() for a function that can safely evaluate strings with expressions containing only literals.

exec(object[, globals[, locals]])¶

This function supports dynamic execution of Python code. object must be either a string or a code object. If it is a string, the string is parsed as a suite of Python statements which is then executed (unless a syntax error occurs). [1] If it is a code object, it is simply executed. In all cases, the code that’s executed is expected to be valid as file input (see the section “File input” in the Reference Manual). Be aware that the return and yield statements may not be used outside of function definitions even within the context of code passed to the exec() function. The return value is None.

In all cases, if the optional parts are omitted, the code is executed in the current scope. If only globals is provided, it must be a dictionary, which will be used for both the global and the local variables. If globals and locals are given, they are used for the global and local variables, respectively. If provided, locals can be any mapping object.

If the globals dictionary does not contain a value for the key __builtins__, a reference to the dictionary of the built-in module builtins is inserted under that key. That way you can control what builtins are available to the executed code by inserting your own __builtins__ dictionary into globals before passing it to exec().

Note

The built-in functions globals() and locals() return the current global and local dictionary, respectively, which may be useful to pass around for use as the second and third argument to exec().

Note

The default locals act as described for function locals() below: modifications to the default locals dictionary should not be attempted. Pass an explicit locals dictionary if you need to see effects of the code on locals after function exec() returns.

filter(function, iterable)¶

Construct an iterator from those elements of iterable for which function returns true. iterable may be either a sequence, a container which supports iteration, or an iterator. If function is None, the identity function is assumed, that is, all elements of iterable that are false are removed.

Note that filter(function, iterable) is equivalent to the generator expression (item for item in iterable if function(item)) if function is not None and (item for item in iterable if item) if function is None.

See itertools.filterfalse() for the complementary function that returns elements of iterable for which function returns false.

float([x])¶

Convert a string or a number to floating point.

If the argument is a string, it should contain a decimal number, optionally preceded by a sign, and optionally embedded in whitespace. The optional sign may be '+' or '-'; a '+' sign has no effect on the value produced. The argument may also be a string representing a NaN (not-a-number), or a positive or negative infinity. More precisely, the input must conform to the following grammar after leading and trailing whitespace characters are removed:

sign           ::=  "+" | "-"
infinity       ::=  "Infinity" | "inf"
nan            ::=  "nan"
numeric_value  ::=  floatnumber | infinity | nan
numeric_string ::=  [sign] numeric_value

Here floatnumber is the form of a Python floating-point literal, described in 浮点型字面值. Case is not significant, so, for example, “inf”, “Inf”, “INFINITY” and “iNfINity” are all acceptable spellings for positive infinity.

Otherwise, if the argument is an integer or a floating point number, a floating point number with the same value (within Python’s floating point precision) is returned. If the argument is outside the range of a Python float, an OverflowError will be raised.

For a general Python object x, float(x) delegates to x.__float__().

If no argument is given, 0.0 is returned.

Examples:

>>> float('+1.23')
1.23
>>> float('   -12345\n')
-12345.0
>>> float('1e-003')
0.001
>>> float('+1E6')
1000000.0
>>> float('-Infinity')
-inf

The float type is described in Numeric Types — int, float, complex.

format(value[, format_spec])¶

Convert a value to a “formatted” representation, as controlled by format_spec. The interpretation of format_spec will depend on the type of the value argument, however there is a standard formatting syntax that is used by most built-in types: Format Specification Mini-Language.

The default format_spec is an empty string which usually gives the same effect as calling str(value).

A call to format(value, format_spec) is translated to type(value).__format__(format_spec) which bypasses the instance dictionary when searching for the value’s __format__() method. A TypeError exception is raised if the method is not found or if either the format_spec or the return value are not strings.

frozenset([iterable])

Return a frozenset object, optionally with elements taken from iterable. The frozenset type is described in Set Types — set, frozenset.

For other containers see the built in dict, list, and tuple classes, and the collections module.

getattr(object, name[, default])¶: Return the value of the named attribute of object. name must be a string. If the string is the name of one of the object’s attributes, the result is the value of that attribute. For example, getattr(x, 'foobar') is equivalent to x.foobar. If the named attribute does not exist, default is returned if provided, otherwise AttributeError is raised.

globals()¶: Return a dictionary representing the current global symbol table. This is always the dictionary of the current module (inside a function or method, this is the module where it is defined, not the module from which it is called).

hasattr(object, name)¶: The arguments are an object and a string. The result is True if the string is the name of one of the object’s attributes, False if not. (This is implemented by calling getattr(object, name) and seeing whether it raises an AttributeError or not.)

hash(object)¶: Return the hash value of the object (if it has one). Hash values are integers. They are used to quickly compare dictionary keys during a dictionary lookup. Numeric values that compare equal have the same hash value (even if they are of different types, as is the case for 1 and 1.0).

help([object])¶

Invoke the built-in help system. (This function is intended for interactive use.) If no argument is given, the interactive help system starts on the interpreter console. If the argument is a string, then the string is looked up as the name of a module, function, class, method, keyword, or documentation topic, and a help page is printed on the console. If the argument is any other kind of object, a help page on the object is generated.

This function is added to the built-in namespace by the site module.

hex(x)¶: Convert an integer number to a hexadecimal string. The result is a valid Python expression. If x is not a Python int object, it has to define an __index__() method that returns an integer.

Note

To obtain a hexadecimal string representation for a float, use the float.hex() method.

id(object)¶: Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.

CPython implementation detail: This is the address of the object in memory.

input([prompt])¶

If the prompt argument is present, it is written to standard output without a trailing newline. The function then reads a line from input, converts it to a string (stripping a trailing newline), and returns that. When EOF is read, EOFError is raised. Example:

>>> s = input('--> ')
--> Monty Python's Flying Circus
>>> s
"Monty Python's Flying Circus"

If the readline module was loaded, then input() will use it to provide elaborate line editing and history features.

int([number | string[, base]])¶

Convert a number or string to an integer. If no arguments are given, return 0. If a number is given, return number.__int__(). Conversion of floating point numbers to integers truncates towards zero. A string must be a base-radix integer literal optionally preceded by ‘+’ or ‘-‘ (with no space in between) and optionally surrounded by whitespace. A base-n literal consists of the digits 0 to n-1, with ‘a’ to ‘z’ (or ‘A’ to ‘Z’) having values 10 to 35. The default base is 10. The allowed values are 0 and 2-36. Base-2, -8, and -16 literals can be optionally prefixed with 0b/0B, 0o/0O, or 0x/0X, as with integer literals in code. Base 0 means to interpret exactly as a code literal, so that the actual base is 2, 8, 10, or 16, and so that int('010', 0) is not legal, while int('010') is, as well as int('010', 8).

The integer type is described in Numeric Types — int, float, complex.

isinstance(object, classinfo)¶: Return true if the object argument is an instance of the classinfo argument, or of a (direct or indirect) subclass thereof. If object is not an object of the given type, the function always returns false. If classinfo is not a class (type object), it may be a tuple of type objects, or may recursively contain other such tuples (other sequence types are not accepted). If classinfo is not a type or tuple of types and such tuples, a TypeError exception is raised.

issubclass(class, classinfo)¶: Return true if class is a subclass (direct or indirect) of classinfo. A class is considered a subclass of itself. classinfo may be a tuple of class objects, in which case every entry in classinfo will be checked. In any other case, a TypeError exception is raised.

iter(object[, sentinel])¶

Return an iterator object. The first argument is interpreted very differently depending on the presence of the second argument. Without a second argument, object must be a collection object which supports the iteration protocol (the __iter__() method), or it must support the sequence protocol (the __getitem__() method with integer arguments starting at 0). If it does not support either of those protocols, TypeError is raised. If the second argument, sentinel, is given, then object must be a callable object. The iterator created in this case will call object with no arguments for each call to its __next__() method; if the value returned is equal to sentinel, StopIteration will be raised, otherwise the value will be returned.

One useful application of the second form of iter() is to read lines of a file until a certain line is reached. The following example reads a file until the readline() method returns an empty string:

with open('mydata.txt') as fp:
    for line in iter(fp.readline, ''):
        process_line(line)

len(s)¶: Return the length (the number of items) of an object. The argument may be a sequence (string, tuple or list) or a mapping (dictionary).

list([iterable])¶

Return a list whose items are the same and in the same order as iterable‘s items. iterable may be either a sequence, a container that supports iteration, or an iterator object. If iterable is already a list, a copy is made and returned, similar to iterable[:]. For instance, list('abc') returns ['a', 'b', 'c'] and list( (1, 2, 3) ) returns [1, 2, 3]. If no argument is given, returns a new empty list, [].

list is a mutable sequence type, as documented in Sequence Types — str, bytes, bytearray, list, tuple, range.

locals()¶: Update and return a dictionary representing the current local symbol table. Free variables are returned by locals() when it is called in function blocks, but not in class blocks.

Note

The contents of this dictionary should not be modified; changes may not affect the values of local and free variables used by the interpreter.

map(function, iterable, ...)¶: Return an iterator that applies function to every item of iterable, yielding the results. If additional iterable arguments are passed, function must take that many arguments and is applied to the items from all iterables in parallel. With multiple iterables, the iterator stops when the shortest iterable is exhausted. For cases where the function inputs are already arranged into argument tuples, see itertools.starmap().

max(iterable, [args..., ]*[, key])¶

With a single argument iterable, return the largest item of a non-empty iterable (such as a string, tuple or list). With more than one argument, return the largest of the arguments.

The optional keyword-only key argument specifies a one-argument ordering function like that used for list.sort().

If multiple items are maximal, the function returns the first one encountered. This is consistent with other sort-stability preserving tools such as sorted(iterable, key=keyfunc, reverse=True)[0] and heapq.nlargest(1, iterable, key=keyfunc).

memoryview(obj): Return a “memory view” object created from the given argument. See memoryview type for more information.

min(iterable, [args..., ]*[, key])¶

With a single argument iterable, return the smallest item of a non-empty iterable (such as a string, tuple or list). With more than one argument, return the smallest of the arguments.

The optional keyword-only key argument specifies a one-argument ordering function like that used for list.sort().

If multiple items are minimal, the function returns the first one encountered. This is consistent with other sort-stability preserving tools such as sorted(iterable, key=keyfunc)[0] and heapq.nsmallest(1, iterable, key=keyfunc).

next(iterator[, default])¶: Retrieve the next item from the iterator by calling its __next__() method. If default is given, it is returned if the iterator is exhausted, otherwise StopIteration is raised.

object()¶: Return a new featureless object. object is a base for all classes. It has the methods that are common to all instances of Python classes. This function does not accept any arguments.

Note

object does not have a __dict__, so you can’t assign arbitrary attributes to an instance of the object class.

oct(x)¶: Convert an integer number to an octal string. The result is a valid Python expression. If x is not a Python int object, it has to define an __index__() method that returns an integer.

open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True)¶

Open file and return a corresponding stream. If the file cannot be opened, an IOError is raised.

file is either a string or bytes object giving the pathname (absolute or relative to the current working directory) of the file to be opened or an integer file descriptor of the file to be wrapped. (If a file descriptor is given, it is closed when the returned I/O object is closed, unless closefd is set to False.)

mode is an optional string that specifies the mode in which the file is opened. It defaults to 'r' which means open for reading in text mode. Other common values are 'w' for writing (truncating the file if it already exists), and 'a' for appending (which on some Unix systems, means that all writes append to the end of the file regardless of the current seek position). In text mode, if encoding is not specified the encoding used is platform dependent. (For reading and writing raw bytes use binary mode and leave encoding unspecified.) The available modes are:

参数	作用
`'r'`	只读（默认）
`'w'`	只写，并且先将文件清空
`'a'`	只写，将写入的内容添加到已有内容的后面
`'b'`	二进制模式
`'t'`	文本模式（默认）
`'+'`	open a disk file for updating (reading and writing)
`'U'`	通用换行模式（方便向后兼容；不推荐在新代码中使用）

The default mode is 'r' (open for reading text, synonym of 'rt'). For binary read-write access, the mode 'w+b' opens and truncates the file to 0 bytes. 'r+b' opens the file without truncation.

As mentioned in the Overview, Python distinguishes between binary and text I/O. Files opened in binary mode (including 'b' in the mode argument) return contents as bytes objects without any decoding. In text mode (the default, or when 't' is included in the mode argument), the contents of the file are returned as str, the bytes having been first decoded using a platform-dependent encoding or using the specified encoding if given.

Note

Python doesn’t depend on the underlying operating system’s notion of text files; all the the processing is done by Python itself, and is therefore platform-independent.

buffering is an optional integer used to set the buffering policy. Pass 0 to switch buffering off (only allowed in binary mode), 1 to select line buffering (only usable in text mode), and an integer > 1 to indicate the size of a fixed-size chunk buffer. When no buffering argument is given, the default buffering policy works as follows:

Binary files are buffered in fixed-size chunks; the size of the buffer is chosen using a heuristic trying to determine the underlying device’s “block size” and falling back on io.DEFAULT_BUFFER_SIZE. On many systems, the buffer will typically be 4096 or 8192 bytes long.
“Interactive” text files (files for which isatty() returns True) use line buffering. Other text files use the policy described above for binary files.

encoding is the name of the encoding used to decode or encode the file. This should only be used in text mode. The default encoding is platform dependent (whatever locale.getpreferredencoding() returns), but any encoding supported by Python can be used. See the codecs module for the list of supported encodings.

errors is an optional string that specifies how encoding and decoding errors are to be handled–this cannot be used in binary mode. Pass 'strict' to raise a ValueError exception if there is an encoding error (the default of None has the same effect), or pass 'ignore' to ignore errors. (Note that ignoring encoding errors can lead to data loss.) 'replace' causes a replacement marker (such as '?') to be inserted where there is malformed data. When writing, 'xmlcharrefreplace' (replace with the appropriate XML character reference) or 'backslashreplace' (replace with backslashed escape sequences) can be used. Any other error handling name that has been registered with codecs.register_error() is also valid.

newline controls how universal newlines works (it only applies to text mode). It can be None, '', '\n', '\r', and '\r\n'. It works as follows:

On input, if newline is None, universal newlines mode is enabled. Lines in the input can end in '\n', '\r', or '\r\n', and these are translated into '\n' before being returned to the caller. If it is '', universal newline mode is enabled, but line endings are returned to the caller untranslated. If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated.
On output, if newline is None, any '\n' characters written are translated to the system default line separator, os.linesep. If newline is '', no translation takes place. If newline is any of the other legal values, any '\n' characters written are translated to the given string.

If closefd is False and a file descriptor rather than a filename was given, the underlying file descriptor will be kept open when the file is closed. If a filename is given closefd has no effect and must be True (the default).

The type of file object returned by the open() function depends on the mode. When open() is used to open a file in a text mode ('w', 'r', 'wt', 'rt', etc.), it returns a subclass of io.TextIOBase (specifically io.TextIOWrapper). When used to open a file in a binary mode with buffering, the returned class is a subclass of io.BufferedIOBase. The exact class varies: in read binary mode, it returns a io.BufferedReader; in write binary and append binary modes, it returns a io.BufferedWriter, and in read/write mode, it returns a io.BufferedRandom. When buffering is disabled, the raw stream, a subclass of io.RawIOBase, io.FileIO, is returned.

See also the file handling modules, such as, fileinput, io (where open() is declared), os, os.path, tempfile, and shutil.

ord(c)¶

Given a string representing one Uncicode character, return an integer representing the Unicode code point of that character. For example, ord('a') returns the integer 97 and ord('\u2020') returns 8224. This is the inverse of chr().

On wide Unicode builds, if the argument length is not one, a TypeError will be raised. On narrow Unicode builds, strings of length two are accepted when they form a UTF-16 surrogate pair.

pow(x, y[, z])¶

Return x to the power y; if z is present, return x to the power y, modulo z (computed more efficiently than pow(x, y) % z). The two-argument form pow(x, y) is equivalent to using the power operator: x**y.

The arguments must have numeric types. With mixed operand types, the coercion rules for binary arithmetic operators apply. For int operands, the result has the same type as the operands (after coercion) unless the second argument is negative; in that case, all arguments are converted to float and a float result is delivered. For example, 10**2 returns 100, but 10**-2 returns 0.01. If the second argument is negative, the third argument must be omitted. If z is present, x and y must be of integer types, and y must be non-negative.

print([object, ..., ]*, sep=' ', end='\n', file=sys.stdout)¶

Print object(s) to the stream file, separated by sep and followed by end. sep, end and file, if present, must be given as keyword arguments.

All non-keyword arguments are converted to strings like str() does and written to the stream, separated by sep and followed by end. Both sep and end must be strings; they can also be None, which means to use the default values. If no object is given, print() will just write end.

The file argument must be an object with a write(string) method; if it is not present or None, sys.stdout will be used.

property(fget=None, fset=None, fdel=None, doc=None)¶

Return a property attribute.

fget is a function for getting an attribute value, likewise fset is a function for setting, and fdel a function for del’ing, an attribute. Typical use is to define a managed attribute x:

class C:
    def __init__(self):
        self._x = None

    def getx(self):
        return self._x
    def setx(self, value):
        self._x = value
    def delx(self):
        del self._x
    x = property(getx, setx, delx, "I'm the 'x' property.")

If then c is an instance of C, c.x will invoke the getter, c.x = value will invoke the setter and del c.x the deleter.

If given, doc will be the docstring of the property attribute. Otherwise, the property will copy fget‘s docstring (if it exists). This makes it possible to create read-only properties easily using property() as a decorator:

class Parrot:
    def __init__(self):
        self._voltage = 100000

    @property
    def voltage(self):
        """Get the current voltage."""
        return self._voltage

turns the voltage() method into a “getter” for a read-only attribute with the same name.

A property object has getter, setter, and deleter methods usable as decorators that create a copy of the property with the corresponding accessor function set to the decorated function. This is best explained with an example:

class C:
    def __init__(self):
        self._x = None

    @property
    def x(self):
        """I'm the 'x' property."""
        return self._x

    @x.setter
    def x(self, value):
        self._x = value

    @x.deleter
    def x(self):
        del self._x

This code is exactly equivalent to the first example. Be sure to give the additional functions the same name as the original property (x in this case.)

The returned property also has the attributes fget, fset, and fdel corresponding to the constructor arguments.

range([start, ]stop[, step])¶

This is a versatile function to create iterables yielding arithmetic progressions. It is most often used in for loops. The arguments must be integers. If the step argument is omitted, it defaults to 1. If the start argument is omitted, it defaults to 0. The full form returns an iterable of integers [start, start + step, start + 2 * step, ...]. If step is positive, the last element is the largest start + i * step less than stop; if step is negative, the last element is the smallest start + i * step greater than stop. step must not be zero (or else ValueError is raised). Example:

>>> list(range(10))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> list(range(1, 11))
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> list(range(0, 30, 5))
[0, 5, 10, 15, 20, 25]
>>> list(range(0, 10, 3))
[0, 3, 6, 9]
>>> list(range(0, -10, -1))
[0, -1, -2, -3, -4, -5, -6, -7, -8, -9]
>>> list(range(0))
[]
>>> list(range(1, 0))
[]

Range objects implement the collections.Sequence ABC, and provide features such as containment tests, element index lookup, slicing and support for negative indices:

>>> r = range(0, 20, 2)
>>> r
range(0, 20, 2)
>>> 11 in r
False
>>> 10 in r
True
>>> r.index(10)
5
>>> r[5]
10
>>> r[:5]
range(0, 10, 2)
>>> r[-1]
18

Ranges containing absolute values larger than sys.maxsize are permitted but some features (such as len()) will raise OverflowError.

Changed in version 3.2:

Changed in version 3.2: Implement the Sequence ABC. Support slicing and negative indices. Test integers for membership in constant time instead of iterating through all items.

repr(object)¶: Return a string containing a printable representation of an object. For many types, this function makes an attempt to return a string that would yield an object with the same value when passed to eval(), otherwise the representation is a string enclosed in angle brackets that contains the name of the type of the object together with additional information often including the name and address of the object. A class can control what this function returns for its instances by defining a __repr__() method.

reversed(seq)¶: Return a reverse iterator. seq must be an object which has a __reversed__() method or supports the sequence protocol (the __len__() method and the __getitem__() method with integer arguments starting at 0).

round(x[, n])¶

Return the floating point value x rounded to n digits after the decimal point. If n is omitted, it defaults to zero. Delegates to x.__round__(n).

For the built-in types supporting round(), values are rounded to the closest multiple of 10 to the power minus n; if two multiples are equally close, rounding is done toward the even choice (so, for example, both round(0.5) and round(-0.5) are 0, and round(1.5) is 2). The return value is an integer if called with one argument, otherwise of the same type as x.

Note

The behavior of round() for floats can be surprising: for example, round(2.675, 2) gives 2.67 instead of the expected 2.68. This is not a bug: it’s a result of the fact that most decimal fractions can’t be represented exactly as a float. See 浮点算术: 问题和限制 for more information.

set([iterable]): Return a new set, optionally with elements taken from iterable. The set type is described in Set Types — set, frozenset.

setattr(object, name, value)¶: This is the counterpart of getattr(). The arguments are an object, a string and an arbitrary value. The string may name an existing attribute or a new attribute. The function assigns the value to the attribute, provided the object allows it. For example, setattr(x, 'foobar', 123) is equivalent to x.foobar = 123.

slice([start, ]stop[, step])¶: Return a slice object representing the set of indices specified by range(start, stop, step). The start and step arguments default to None. Slice objects have read-only data attributes start, stop and step which merely return the argument values (or their default). They have no other explicit functionality; however they are used by Numerical Python and other third party extensions. Slice objects are also generated when extended indexing syntax is used. For example: a[start:stop:step] or a[start:stop, i]. See itertools.islice() for an alternate version that returns an iterator.

sorted(iterable[, key][, reverse])¶

Return a new sorted list from the items in iterable.

Has two optional arguments which must be specified as keyword arguments.

key specifies a function of one argument that is used to extract a comparison key from each list element: key=str.lower. The default value is None (compare the elements directly).

reverse is a boolean value. If set to True, then the list elements are sorted as if each comparison were reversed.

Use functools.cmp_to_key() to convert an old-style cmp function to a key function.

For sorting examples and a brief sorting tutorial, see Sorting HowTo.

staticmethod(function)¶

Return a static method for function.

A static method does not receive an implicit first argument. To declare a static method, use this idiom:

class C:
    @staticmethod
    def f(arg1, arg2, ...): ...

The @staticmethod form is a function decorator – see the description of function definitions in 函数定义 for details.

It can be called either on the class (such as C.f()) or on an instance (such as C().f()). The instance is ignored except for its class.

Static methods in Python are similar to those found in Java or C++. Also see classmethod() for a variant that is useful for creating alternate class constructors.

For more information on static methods, consult the documentation on the standard type hierarchy in 标准类型层次.

str([object[, encoding[, errors]]])¶

Return a string version of an object, using one of the following modes:

If encoding and/or errors are given, str() will decode the object which can either be a byte string or a character buffer using the codec for encoding. The encoding parameter is a string giving the name of an encoding; if the encoding is not known, LookupError is raised. Error handling is done according to errors; this specifies the treatment of characters which are invalid in the input encoding. If errors is 'strict' (the default), a ValueError is raised on errors, while a value of 'ignore' causes errors to be silently ignored, and a value of 'replace' causes the official Unicode replacement character, U+FFFD, to be used to replace input characters which cannot be decoded. See also the codecs module.

When only object is given, this returns its nicely printable representation. For strings, this is the string itself. The difference with repr(object) is that str(object) does not always attempt to return a string that is acceptable to eval(); its goal is to return a printable string. With no arguments, this returns the empty string.

Objects can specify what str(object) returns by defining a __str__() special method.

For more information on strings see Sequence Types — str, bytes, bytearray, list, tuple, range which describes sequence functionality (strings are sequences), and also the string-specific methods described in the String Methods section. To output formatted strings, see the String Formatting section. In addition see the String Services section.

sum(iterable[, start])¶

Sums start and the items of an iterable from left to right and returns the total. start defaults to 0. The iterable‘s items are normally numbers, and the start value is not allowed to be a string.

For some use cases, there are good alternatives to sum(). The preferred, fast way to concatenate a sequence of strings is by calling ''.join(sequence). To add floating point values with extended precision, see math.fsum(). To concatenate a series of iterables, consider using itertools.chain().

super([type[, object-or-type]])¶

Return a proxy object that delegates method calls to a parent or sibling class of type. This is useful for accessing inherited methods that have been overridden in a class. The search order is same as that used by getattr() except that the type itself is skipped.

The __mro__ attribute of the type lists the method resolution search order used by both getattr() and super(). The attribute is dynamic and can change whenever the inheritance hierarchy is updated.

If the second argument is omitted, the super object returned is unbound. If the second argument is an object, isinstance(obj, type) must be true. If the second argument is a type, issubclass(type2, type) must be true (this is useful for classmethods).

There are two typical use cases for super. In a class hierarchy with single inheritance, super can be used to refer to parent classes without naming them explicitly, thus making the code more maintainable. This use closely parallels the use of super in other programming languages.

The second use case is to support cooperative multiple inheritance in a dynamic execution environment. This use case is unique to Python and is not found in statically compiled languages or languages that only support single inheritance. This makes it possible to implement “diamond diagrams” where multiple base classes implement the same method. Good design dictates that this method have the same calling signature in every case (because the order of calls is determined at runtime, because that order adapts to changes in the class hierarchy, and because that order can include sibling classes that are unknown prior to runtime).

For both use cases, a typical superclass call looks like this:

class C(B):
    def method(self, arg):
        super().method(arg)    # This does the same thing as:
                               # super(C, self).method(arg)

Note that super() is implemented as part of the binding process for explicit dotted attribute lookups such as super().__getitem__(name). It does so by implementing its own __getattribute__() method for searching classes in a predictable order that supports cooperative multiple inheritance. Accordingly, super() is undefined for implicit lookups using statements or operators such as super()[name].

Also note that super() is not limited to use inside methods. The two argument form specifies the arguments exactly and makes the appropriate references. The zero argument form automatically searches the stack frame for the class (__class__) and the first argument.

For practical suggestions on how to design cooperative classes using super(), see guide to using super().

tuple([iterable])¶

Return a tuple whose items are the same and in the same order as iterable‘s items. iterable may be a sequence, a container that supports iteration, or an iterator object. If iterable is already a tuple, it is returned unchanged. For instance, tuple('abc') returns ('a', 'b', 'c') and tuple([1, 2, 3]) returns (1, 2, 3). If no argument is given, returns a new empty tuple, ().

tuple is an immutable sequence type, as documented in Sequence Types — str, bytes, bytearray, list, tuple, range.

type(object)¶

Return the type of an object. The return value is a type object and generally the same object as returned by object.__class__.

The isinstance() built-in function is recommended for testing the type of an object, because it takes subclasses into account.

With three arguments, type() functions as a constructor as detailed below.

type(name, bases, dict)

Return a new type object. This is essentially a dynamic form of the class statement. The name string is the class name and becomes the __name__ attribute; the bases tuple itemizes the base classes and becomes the __bases__ attribute; and the dict dictionary is the namespace containing definitions for class body and becomes the __dict__ attribute. For example, the following two statements create identical type objects:

>>> class X:
...     a = 1
...
>>> X = type('X', (object,), dict(a=1))

vars([object])¶

Without an argument, act like locals().

With a module, class or class instance object as argument (or anything else that has a __dict__ attribute), return that attribute.

Note

The returned dictionary should not be modified: the effects on the corresponding symbol table are undefined. [2]

zip(*iterables)¶

Make an iterator that aggregates elements from each of the iterables.

Returns an iterator of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. The iterator stops when the shortest input iterable is exhausted. With a single iterable argument, it returns an iterator of 1-tuples. With no arguments, it returns an empty iterator. Equivalent to:

def zip(*iterables):
    # zip('ABCD', 'xy') --> Ax By
    sentinel = object()
    iterables = [iter(it) for it in iterables]
    while iterables:
        result = []
        for it in iterables:
            elem = next(it, sentinel)
            if elem is sentinel:
                return
            result.append(elem)
        yield tuple(result)

The left-to-right evaluation order of the iterables is guaranteed. This makes possible an idiom for clustering a data series into n-length groups using zip(*[iter(s)]*n).

zip() should only be used with unequal length inputs when you don’t care about trailing, unmatched values from the longer iterables. If those values are important, use itertools.zip_longest() instead.

zip() in conjunction with the * operator can be used to unzip a list:

>>> x = [1, 2, 3]
>>> y = [4, 5, 6]
>>> zipped = zip(x, y)
>>> list(zipped)
[(1, 4), (2, 5), (3, 6)]
>>> x2, y2 = zip(*zip(x, y))
>>> x == list(x2) and y == list(y2)
True

__import__(name, globals={}, locals={}, fromlist=[], level=0)¶

Note

This is an advanced function that is not needed in everyday Python programming, unlike importlib.import_module().

This function is invoked by the import statement. It can be replaced (by importing the builtins module and assigning to builtins.__import__) in order to change semantics of the import statement, but nowadays it is usually simpler to use import hooks (see PEP 302). Direct use of __import__() is rare, except in cases where you want to import a module whose name is only known at runtime.

The function imports the module name, potentially using the given globals and locals to determine how to interpret the name in a package context. The fromlist gives the names of objects or submodules that should be imported from the module given by name. The standard implementation does not use its locals argument at all, and uses its globals only to determine the package context of the import statement.

level specifies whether to use absolute or relative imports. 0 (the default) means only perform absolute imports. Positive values for level indicate the number of parent directories to search relative to the directory of the module calling __import__().

When the name variable is of the form package.module, normally, the top-level package (the name up till the first dot) is returned, not the module named by name. However, when a non-empty fromlist argument is given, the module named by name is returned.

For example, the statement import spam results in bytecode resembling the following code:

spam = __import__('spam', globals(), locals(), [], 0)

The statement import spam.ham results in this call:

spam = __import__('spam.ham', globals(), locals(), [], 0)

Note how __import__() returns the toplevel module here because this is the object that is bound to a name by the import statement.

On the other hand, the statement from spam.ham import eggs, sausage as saus results in

_temp = __import__('spam.ham', globals(), locals(), ['eggs', 'sausage'], 0)
eggs = _temp.eggs
saus = _temp.sausage

Here, the spam.ham module is returned from __import__(). From this object, the names to import are retrieved and assigned to their respective names.

If you simply want to import a module (potentially within a package) by name, use importlib.import_module().

Footnotes

[1]	Note that the parser only accepts the Unix-style end of line convention. If you are reading the code from a file, make sure to use newline conversion mode to convert Windows or Mac-style newlines.

[2]	In the current implementation, local variable bindings cannot normally be affected this way, but variables retrieved from other scopes (such as modules) can be. This may change.

Built-in Constants¶

A small number of constants live in the built-in namespace. They are:

False¶: The false value of the bool type. Assignments to False are illegal and raise a SyntaxError.

True¶: The true value of the bool type. Assignments to True are illegal and raise a SyntaxError.

None¶: The sole value of types.NoneType. None is frequently used to represent the absence of a value, as when default arguments are not passed to a function. Assignments to None are illegal and raise a SyntaxError.

NotImplemented¶: Special value which can be returned by the “rich comparison” special methods (__eq__(), __lt__(), and friends), to indicate that the comparison is not implemented with respect to the other type.

Ellipsis¶: The same as .... Special value used mostly in conjunction with extended slicing syntax for user-defined container data types.

__debug__¶: This constant is true if Python was not started with an -O option. See also the assert statement.

Note

The names None, False, True and __debug__ cannot be reassigned (assignments to them, even as an attribute name, raise SyntaxError), so they can be considered “true” constants.

Constants added by the `site` module¶

The site module (which is imported automatically during startup, except if the -S command-line option is given) adds several constants to the built-in namespace. They are useful for the interactive interpreter shell and should not be used in programs.

quit(code=None)¶
exit(code=None)¶: Objects that when printed, print a message like “Use quit() or Ctrl-D (i.e. EOF) to exit”, and when called, raise SystemExit with the specified exit code.

copyright¶
license¶
credits¶: Objects that when printed, print a message like “Type license() to see the full license text”, and when called, display the corresponding text in a pager-like fashion (one screen at a time).

Built-in Types¶

The following sections describe the standard types that are built into the interpreter.

The principal built-in types are numerics, sequences, mappings, classes, instances and exceptions.

Some operations are supported by several object types; in particular, practically all objects can be compared, tested for truth value, and converted to a string (with the repr() function or the slightly different str() function). The latter function is implicitly used when an object is written by the print() function.

Truth Value Testing¶

Any object can be tested for truth value, for use in an if or while condition or as operand of the Boolean operations below. The following values are considered false:

None
False
zero of any numeric type, for example, 0, 0.0, 0j.
any empty sequence, for example, '', (), [].
any empty mapping, for example, {}.
instances of user-defined classes, if the class defines a __bool__() or __len__() method, when that method returns the integer zero or bool value False. [1]

All other values are considered true — so objects of many types are always true.

Operations and built-in functions that have a Boolean result always return 0 or False for false and 1 or True for true, unless otherwise stated. (Important exception: the Boolean operations or and and always return one of their operands.)

Boolean Operations — `and`, `or`, `not`¶

These are the Boolean operations, ordered by ascending priority:

Operation	Result	Notes
`x or y`	if x is false, then y, else x	(1)
`x and y`	if x is false, then x, else y	(2)
`not x`	if x is false, then `True`, else `False`	(3)

Notes:

This is a short-circuit operator, so it only evaluates the second argument if the first one is False.
This is a short-circuit operator, so it only evaluates the second argument if the first one is True.
not has a lower priority than non-Boolean operators, so not a == b is interpreted as not (a == b), and a == not b is a syntax error.

Comparisons¶

There are eight comparison operations in Python. They all have the same priority (which is higher than that of the Boolean operations). Comparisons can be chained arbitrarily; for example, x < y <= z is equivalent to x < y and y <= z, except that y is evaluated only once (but in both cases z is not evaluated at all when x < y is found to be false).

This table summarizes the comparison operations:

Operation	Meaning
`<`	strictly less than
`<=`	less than or equal
`>`	strictly greater than
`>=`	greater than or equal
`==`	equal
`!=`	not equal
`is`	object identity
`is not`	negated object identity

Objects of different types, except different numeric types, never compare equal. Furthermore, some types (for example, function objects) support only a degenerate notion of comparison where any two objects of that type are unequal. The <, <=, > and >= operators will raise a TypeError exception when comparing a complex number with another built-in numeric type, when the objects are of different types that cannot be compared, or in other cases where there is no defined ordering.

Non-identical instances of a class normally compare as non-equal unless the class defines the __eq__() method.

Instances of a class cannot be ordered with respect to other instances of the same class, or other types of object, unless the class defines enough of the methods __lt__(), __le__(), __gt__(), and __ge__() (in general, __lt__() and __eq__() are sufficient, if you want the conventional meanings of the comparison operators).

The behavior of the is and is not operators cannot be customized; also they can be applied to any two objects and never raise an exception.

Two more operations with the same syntactic priority, in and not in, are supported only by sequence types (below).

Numeric Types — `int`, `float`, `complex`¶

There are three distinct numeric types: integers, floating point numbers, and complex numbers. In addition, Booleans are a subtype of integers. Integers have unlimited precision. Floating point numbers are usually implemented using double in C; information about the precision and internal representation of floating point numbers for the machine on which your program is running is available in sys.float_info. Complex numbers have a real and imaginary part, which are each a floating point number. To extract these parts from a complex number z, use z.real and z.imag. (The standard library includes additional numeric types, fractions that hold rationals, and decimal that hold floating-point numbers with user-definable precision.)

Numbers are created by numeric literals or as the result of built-in functions and operators. Unadorned integer literals (including hex, octal and binary numbers) yield integers. Numeric literals containing a decimal point or an exponent sign yield floating point numbers. Appending 'j' or 'J' to a numeric literal yields an imaginary number (a complex number with a zero real part) which you can add to an integer or float to get a complex number with real and imaginary parts.

Python fully supports mixed arithmetic: when a binary arithmetic operator has operands of different numeric types, the operand with the “narrower” type is widened to that of the other, where integer is narrower than floating point, which is narrower than complex. Comparisons between numbers of mixed type use the same rule. [2] The constructors int(), float(), and complex() can be used to produce numbers of a specific type.

All numeric types (except complex) support the following operations, sorted by ascending priority (operations in the same box have the same priority; all numeric operations have a higher priority than comparison operations):

Operation	Result	Notes	Full documentation
`x + y`	sum of x and y
`x - y`	difference of x and y
`x * y`	product of x and y
`x / y`	quotient of x and y
`x // y`	floored quotient of x and y	(1)
`x % y`	remainder of `x / y`	(2)
`-x`	x negated
`+x`	x unchanged
`abs(x)`	absolute value or magnitude of x		`abs()`
`int(x)`	x converted to integer	(3)(6)	`int()`
`float(x)`	x converted to floating point	(4)(6)	`float()`
`complex(re, im)`	a complex number with real part re, imaginary part im. im defaults to zero.	(6)	`complex()`
`c.conjugate()`	conjugate of the complex number c
`divmod(x, y)`	the pair `(x // y, x % y)`	(2)	`divmod()`
`pow(x, y)`	x to the power y	(5)	`pow()`
`x ** y`	x to the power y	(5)

Notes:

Also referred to as integer division. The resultant value is a whole integer, though the result’s type is not necessarily int. The result is always rounded towards minus infinity: 1//2 is 0, (-1)//2 is -1, 1//(-2) is -1, and (-1)//(-2) is 0.
Not for complex numbers. Instead convert to floats using abs() if appropriate.
Conversion from floating point to integer may round or truncate as in C; see functions floor() and ceil() in the math module for well-defined conversions.
float also accepts the strings “nan” and “inf” with an optional prefix “+” or “-” for Not a Number (NaN) and positive or negative infinity.
Python defines pow(0, 0) and 0 ** 0 to be 1, as is common for programming languages.
The numeric literals accepted include the digits 0 to 9 or any Unicode equivalent (code points with the Nd property).

See http://www.unicode.org/Public/6.0.0/ucd/extracted/DerivedNumericType.txt for a complete list of code points with the Nd property.

All numbers.Real types (int and float) also include the following operations:

Operation	Result	Notes
`math.trunc(x)`	x truncated to Integral
`round(x[, n])`	x rounded to n digits, rounding half to even. If n is omitted, it defaults to 0.
`math.floor(x)`	the greatest integral float <= x
`math.ceil(x)`	the least integral float >= x

For additional numeric operations see the math and cmath modules.

Bit-string Operations on Integer Types¶

Integers support additional operations that make sense only for bit-strings. Negative numbers are treated as their 2’s complement value (this assumes a sufficiently large number of bits that no overflow occurs during the operation).

The priorities of the binary bitwise operations are all lower than the numeric operations and higher than the comparisons; the unary operation ~ has the same priority as the other unary numeric operations (+ and -).

This table lists the bit-string operations sorted in ascending priority (operations in the same box have the same priority):

Operation	Result	Notes
`x \| y`	bitwise or of x and y
`x ^ y`	bitwise exclusive or of x and y
`x & y`	bitwise and of x and y
`x << n`	x shifted left by n bits	(1)(2)
`x >> n`	x shifted right by n bits	(1)(3)
`~x`	the bits of x inverted

Notes:

Negative shift counts are illegal and cause a ValueError to be raised.
A left shift by n bits is equivalent to multiplication by pow(2, n) without overflow check.
A right shift by n bits is equivalent to division by pow(2, n) without overflow check.

Additional Methods on Integer Types¶

The int type implements the numbers.Integral abstract base class. In addition, it provides one more method:

int.bit_length()¶

Return the number of bits necessary to represent an integer in binary, excluding the sign and leading zeros:

>>> n = -37
>>> bin(n)
'-0b100101'
>>> n.bit_length()
6

More precisely, if x is nonzero, then x.bit_length() is the unique positive integer k such that 2**(k-1) <= abs(x) < 2**k. Equivalently, when abs(x) is small enough to have a correctly rounded logarithm, then k = 1 + int(log(abs(x), 2)). If x is zero, then x.bit_length() returns 0.

Equivalent to:

def bit_length(self):
    s = bin(self)       # binary representation:  bin(-37) --> '-0b100101'
    s = s.lstrip('-0b') # remove leading zeros and minus sign
    return len(s)       # len('100101') --> 6

New in version 3.1:

New in version 3.1.

int.to_bytes(length, byteorder, *, signed=False)¶

Return an array of bytes representing an integer.

>>> (1024).to_bytes(2, byteorder='big')
b'\x04\x00'
>>> (1024).to_bytes(10, byteorder='big')
b'\x00\x00\x00\x00\x00\x00\x00\x00\x04\x00'
>>> (-1024).to_bytes(10, byteorder='big', signed=True)
b'\xff\xff\xff\xff\xff\xff\xff\xff\xfc\x00'
>>> x = 1000
>>> x.to_bytes((x.bit_length() // 8) + 1, byteorder='little')
b'\xe8\x03'

The integer is represented using length bytes. An OverflowError is raised if the integer is not representable with the given number of bytes.

The byteorder argument determines the byte order used to represent the integer. If byteorder is "big", the most significant byte is at the beginning of the byte array. If byteorder is "little", the most significant byte is at the end of the byte array. To request the native byte order of the host system, use sys.byteorder as the byte order value.

The signed argument determines whether two’s complement is used to represent the integer. If signed is False and a negative integer is given, an OverflowError is raised. The default value for signed is False.

New in version 3.2:

New in version 3.2.

classmethod int.from_bytes(bytes, byteorder, *, signed=False)¶

Return the integer represented by the given array of bytes.

>>> int.from_bytes(b'\x00\x10', byteorder='big')
16
>>> int.from_bytes(b'\x00\x10', byteorder='little')
4096
>>> int.from_bytes(b'\xfc\x00', byteorder='big', signed=True)
-1024
>>> int.from_bytes(b'\xfc\x00', byteorder='big', signed=False)
64512
>>> int.from_bytes([255, 0, 0], byteorder='big')
16711680

The argument bytes must either support the buffer protocol or be an iterable producing bytes. bytes and bytearray are examples of built-in objects that support the buffer protocol.

The signed argument indicates whether two’s complement is used to represent the integer.

New in version 3.2:

New in version 3.2.

ABCs - abstract base classes¶

The collections module offers the following ABCs:

ABC	Inherits from	Abstract Methods	Mixin Methods
`Container`		`__contains__`
`Hashable`		`__hash__`
`Iterable`		`__iter__`
`Iterator`	`Iterable`	`__next__`	`__iter__`
`Sized`		`__len__`
`Callable`		`__call__`
`Sequence`	`Sized`, `Iterable`, `Container`	`__getitem__`	`__contains__`, `__iter__`, `__reversed__`, `index`, and `count`
`MutableSequence`	`Sequence`	`__setitem__`, `__delitem__`, `insert`	Inherited `Sequence` methods and `append`, `reverse`, `extend`, `pop`, `remove`, and `__iadd__`
`Set`	`Sized`, `Iterable`, `Container`		`__le__`, `__lt__`, `__eq__`, `__ne__`, `__gt__`, `__ge__`, `__and__`, `__or__`, `__sub__`, `__xor__`, and `isdisjoint`
`MutableSet`	`Set`	`add`, `discard`	Inherited `Set` methods and `clear`, `pop`, `remove`, `__ior__`, `__iand__`, `__ixor__`, and `__isub__`
`Mapping`	`Sized`, `Iterable`, `Container`	`__getitem__`	`__contains__`, `keys`, `items`, `values`, `get`, `__eq__`, and `__ne__`
`MutableMapping`	`Mapping`	`__setitem__`, `__delitem__`	Inherited `Mapping` methods and `pop`, `popitem`, `clear`, `update`, and `setdefault`
`MappingView`	`Sized`		`__len__`
`ItemsView`	`MappingView`, `Set`		`__contains__`, `__iter__`
`KeysView`	`MappingView`, `Set`		`__contains__`, `__iter__`
`ValuesView`	`MappingView`		`__contains__`, `__iter__`

class collections.Container¶
class collections.Hashable¶
class collections.Sized¶
class collections.Callable¶: ABCs for classes that provide respectively the methods __contains__(), __hash__(), __len__(), and __call__().

class collections.Iterable¶: ABC for classes that provide the __iter__() method. See also the definition of iterable.

class collections.Iterator¶: ABC for classes that provide the __iter__() and next() methods. See also the definition of iterator.

class collections.Sequence¶
class collections.MutableSequence¶: ABCs for read-only and mutable sequences.

class collections.Set¶
class collections.MutableSet¶: ABCs for read-only and mutable sets.

class collections.Mapping¶
class collections.MutableMapping¶: ABCs for read-only and mutable mappings.

class collections.MappingView¶
class collections.ItemsView¶
class collections.KeysView¶
class collections.ValuesView¶: ABCs for mapping, items, keys, and values views.

These ABCs allow us to ask classes or instances if they provide particular functionality, for example:

size = None
if isinstance(myvar, collections.Sized):
    size = len(myvar)

Several of the ABCs are also useful as mixins that make it easier to develop classes supporting container APIs. For example, to write a class supporting the full Set API, it only necessary to supply the three underlying abstract methods: __contains__(), __iter__(), and __len__(). The ABC supplies the remaining methods such as __and__() and isdisjoint()

class ListBasedSet(collections.Set):
     ''' Alternate set implementation favoring space over speed
         and not requiring the set elements to be hashable. '''
     def __init__(self, iterable):
         self.elements = lst = []
         for value in iterable:
             if value not in lst:
                 lst.append(value)
     def __iter__(self):
         return iter(self.elements)
     def __contains__(self, value):
         return value in self.elements
     def __len__(self):
         return len(self.elements)

s1 = ListBasedSet('abcdef')
s2 = ListBasedSet('defghi')
overlap = s1 & s2            # The __and__() method is supported automatically

Notes on using Set and MutableSet as a mixin:

Since some set operations create new sets, the default mixin methods need a way to create new instances from an iterable. The class constructor is assumed to have a signature in the form ClassName(iterable). That assumption is factored-out to an internal classmethod called _from_iterable() which calls cls(iterable) to produce a new set. If the Set mixin is being used in a class with a different constructor signature, you will need to override _from_iterable() with a classmethod that can construct new instances from an iterable argument.
To override the comparisons (presumably for speed, as the semantics are fixed), redefine __le__() and then the other operations will automatically follow suit.
The Set mixin provides a _hash() method to compute a hash value for the set; however, __hash__() is not defined because not all sets are hashable or immutable. To add set hashabilty using mixins, inherit from both Set() and Hashable(), then define __hash__ = Set._hash.

See also

OrderedSet recipe for an example built on MutableSet.
For more about ABCs, see the abc module and PEP 3119.

Abstract Grammar¶

The module defines a string constant __version__ which is the decimal Subversion revision number of the file shown below.

The abstract grammar is currently defined as follows:

-- ASDL's four builtin types are identifier, int, string, object

module Python version "$Revision$"
{
	mod = Module(stmt* body)
	    | Interactive(stmt* body)
	    | Expression(expr body)

	    -- not really an actual node but useful in Jython's typesystem.
	    | Suite(stmt* body)

	stmt = FunctionDef(identifier name, arguments args, 
                           stmt* body, expr* decorator_list, expr? returns)
	      | ClassDef(identifier name, 
			 expr* bases,
			 keyword* keywords,
			 expr? starargs,
			 expr? kwargs,
			 stmt* body,
			 expr* decorator_list)
	      | Return(expr? value)

	      | Delete(expr* targets)
	      | Assign(expr* targets, expr value)
	      | AugAssign(expr target, operator op, expr value)

	      -- use 'orelse' because else is a keyword in target languages
	      | For(expr target, expr iter, stmt* body, stmt* orelse)
	      | While(expr test, stmt* body, stmt* orelse)
	      | If(expr test, stmt* body, stmt* orelse)
	      | With(expr context_expr, expr? optional_vars, stmt* body)

	      | Raise(expr? exc, expr? cause)
	      | TryExcept(stmt* body, excepthandler* handlers, stmt* orelse)
	      | TryFinally(stmt* body, stmt* finalbody)
	      | Assert(expr test, expr? msg)

	      | Import(alias* names)
	      | ImportFrom(identifier? module, alias* names, int? level)

	      | Global(identifier* names)
	      | Nonlocal(identifier* names)
	      | Expr(expr value)
	      | Pass | Break | Continue

	      -- XXX Jython will be different
	      -- col_offset is the byte offset in the utf8 string the parser uses
	      attributes (int lineno, int col_offset)

	      -- BoolOp() can use left & right?
	expr = BoolOp(boolop op, expr* values)
	     | BinOp(expr left, operator op, expr right)
	     | UnaryOp(unaryop op, expr operand)
	     | Lambda(arguments args, expr body)
	     | IfExp(expr test, expr body, expr orelse)
	     | Dict(expr* keys, expr* values)
	     | Set(expr* elts)
	     | ListComp(expr elt, comprehension* generators)
	     | SetComp(expr elt, comprehension* generators)
	     | DictComp(expr key, expr value, comprehension* generators)
	     | GeneratorExp(expr elt, comprehension* generators)
	     -- the grammar constrains where yield expressions can occur
	     | Yield(expr? value)
	     -- need sequences for compare to distinguish between
	     -- x < 4 < 3 and (x < 4) < 3
	     | Compare(expr left, cmpop* ops, expr* comparators)
	     | Call(expr func, expr* args, keyword* keywords,
			 expr? starargs, expr? kwargs)
	     | Num(object n) -- a number as a PyObject.
	     | Str(string s) -- need to specify raw, unicode, etc?
	     | Bytes(string s)
	     | Ellipsis
	     -- other literals? bools?

	     -- the following expression can appear in assignment context
	     | Attribute(expr value, identifier attr, expr_context ctx)
	     | Subscript(expr value, slice slice, expr_context ctx)
	     | Starred(expr value, expr_context ctx)
	     | Name(identifier id, expr_context ctx)
	     | List(expr* elts, expr_context ctx) 
	     | Tuple(expr* elts, expr_context ctx)

	      -- col_offset is the byte offset in the utf8 string the parser uses
	      attributes (int lineno, int col_offset)

	expr_context = Load | Store | Del | AugLoad | AugStore | Param

	slice = Slice(expr? lower, expr? upper, expr? step) 
	      | ExtSlice(slice* dims) 
	      | Index(expr value) 

	boolop = And | Or 

	operator = Add | Sub | Mult | Div | Mod | Pow | LShift 
                 | RShift | BitOr | BitXor | BitAnd | FloorDiv

	unaryop = Invert | Not | UAdd | USub

	cmpop = Eq | NotEq | Lt | LtE | Gt | GtE | Is | IsNot | In | NotIn

	comprehension = (expr target, expr iter, expr* ifs)

	-- not sure what to call the first argument for raise and except
	excepthandler = ExceptHandler(expr? type, identifier? name, stmt* body)
	                attributes (int lineno, int col_offset)

	arguments = (arg* args, identifier? vararg, expr? varargannotation,
                     arg* kwonlyargs, identifier? kwarg,
                     expr? kwargannotation, expr* defaults,
                     expr* kw_defaults)
	arg = (identifier arg, expr? annotation)

        -- keyword arguments supplied to call
        keyword = (identifier arg, expr value)

        -- import name with optional 'as' alias.
        alias = (identifier name, identifier? asname)
}

Abstract Protocol Support¶

Python supports a variety of abstract ‘protocols;’ the specific interfaces provided to use these interfaces are documented in Abstract Objects Layer.

A number of these abstract interfaces were defined early in the development of the Python implementation. In particular, the number, mapping, and sequence protocols have been part of Python since the beginning. Other protocols have been added over time. For protocols which depend on several handler routines from the type implementation, the older protocols have been defined as optional blocks of handlers referenced by the type object. For newer protocols there are additional slots in the main type object, with a flag bit being set to indicate that the slots are present and should be checked by the interpreter. (The flag bit does not indicate that the slot values are non-NULL. The flag may be set to indicate the presence of a slot, but a slot may still be unfilled.)

PyNumberMethods   tp_as_number;
PySequenceMethods tp_as_sequence;
PyMappingMethods  tp_as_mapping;

If you wish your object to be able to act like a number, a sequence, or a mapping object, then you place the address of a structure that implements the C type PyNumberMethods, PySequenceMethods, or PyMappingMethods, respectively. It is up to you to fill in this structure with appropriate values. You can find examples of the use of each of these in the Objects directory of the Python source distribution.

hashfunc tp_hash;

This function, if you choose to provide it, should return a hash number for an instance of your data type. Here is a moderately pointless example:

static long
newdatatype_hash(newdatatypeobject *obj)
{
    long result;
    result = obj->obj_UnderlyingDatatypePtr->size;
    result = result * 3;
    return result;
}

ternaryfunc tp_call;

This function is called when an instance of your data type is “called”, for example, if obj1 is an instance of your data type and the Python script contains obj1('hello'), the tp_call handler is invoked.

This function takes three arguments:

arg1 is the instance of the data type which is the subject of the call. If the call is obj1('hello'), then arg1 is obj1.
arg2 is a tuple containing the arguments to the call. You can use PyArg_ParseTuple() to extract the arguments.
arg3 is a dictionary of keyword arguments that were passed. If this is non-NULL and you support keyword arguments, use PyArg_ParseTupleAndKeywords() to extract the arguments. If you do not want to support keyword arguments and this is non-NULL, raise a TypeError with a message saying that keyword arguments are not supported.

Here is a desultory example of the implementation of the call function.

/* Implement the call function.
 *    obj1 is the instance receiving the call.
 *    obj2 is a tuple containing the arguments to the call, in this
 *         case 3 strings.
 */
static PyObject *
newdatatype_call(newdatatypeobject *obj, PyObject *args, PyObject *other)
{
    PyObject *result;
    char *arg1;
    char *arg2;
    char *arg3;

    if (!PyArg_ParseTuple(args, "sss:call", &arg1, &arg2, &arg3)) {
        return NULL;
    }
    result = PyString_FromFormat(
        "Returning -- value: [\%d] arg1: [\%s] arg2: [\%s] arg3: [\%s]\n",
        obj->obj_UnderlyingDatatypePtr->size,
        arg1, arg2, arg3);
    printf("\%s", PyString_AS_STRING(result));
    return result;
}

XXX some fields need to be added here...

/* Iterators */
getiterfunc tp_iter;
iternextfunc tp_iternext;

These functions provide support for the iterator protocol. Any object which wishes to support iteration over its contents (which may be generated during iteration) must implement the tp_iter handler. Objects which are returned by a tp_iter handler must implement both the tp_iter and tp_iternext handlers. Both handlers take exactly one parameter, the instance for which they are being called, and return a new reference. In the case of an error, they should set an exception and return NULL.

For an object which represents an iterable collection, the tp_iter handler must return an iterator object. The iterator object is responsible for maintaining the state of the iteration. For collections which can support multiple iterators which do not interfere with each other (as lists and tuples do), a new iterator should be created and returned. Objects which can only be iterated over once (usually due to side effects of iteration) should implement this handler by returning a new reference to themselves, and should also implement the tp_iternext handler. File objects are an example of such an iterator.

Iterator objects should implement both handlers. The tp_iter handler should return a new reference to the iterator (this is the same as the tp_iter handler for objects which can only be iterated over destructively). The tp_iternext handler should return a new reference to the next object in the iteration if there is one. If the iteration has reached the end, it may return NULL without setting an exception or it may set StopIteration; avoiding the exception can yield slightly better performance. If an actual error occurs, it should set an exception and return NULL.

Footnotes¶

This document was reviewed and revised by John Lee.

[1]	For an introduction to the CGI protocol see Writing Web Applications in Python.

[2]	Like Google for example. The proper way to use google from a program is to use PyGoogle of course. See Voidspace Google for some examples of using the Google API.

[3]	Browser sniffing is a very bad practise for website design - building sites using web standards is much more sensible. Unfortunately a lot of sites still send different versions to different browsers.

[4]	The user agent for MSIE 6 is ‘Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)’

[5]	For details of more HTTP request headers, see Quick Reference to HTTP Headers.

[6]	In my case I have to use a proxy to access the internet at work. If you attempt to fetch localhost URLs through this proxy it blocks them. IE is set to use the proxy, which urllib picks up on. In order to test scripts with a localhost server, I have to prevent urllib from using the proxy.

[7]	urllib opener for SSL proxy (CONNECT method): ASPN Cookbook Recipe.

Operation	Result	Notes
`x in s`	`True` if an item of s is equal to x, else `False`	(1)
`x not in s`	`False` if an item of s is equal to x, else `True`	(1)
`s + t`	the concatenation of s and t	(6)
`s * n, n * s`	n shallow copies of s concatenated	(2)
`s[i]`	i‘th item of s, origin 0	(3)
`s[i:j]`	slice of s from i to j	(3)(4)
`s[i:j:k]`	slice of s from i to j with step k	(3)(5)
`len(s)`	length of s
`min(s)`	smallest item of s
`max(s)`	largest item of s
`s.index(i)`	index of the first occurence of i in s
`s.count(i)`	total number of occurences of i in s

Conversion	Meaning	Notes
`'d'`	Signed integer decimal.
`'i'`	Signed integer decimal.
`'o'`	Signed octal value.	(1)
`'u'`	Obsolete type – it is identical to `'d'`.	(7)
`'x'`	Signed hexadecimal (lowercase).	(2)
`'X'`	Signed hexadecimal (uppercase).	(2)
`'e'`	Floating point exponential format (lowercase).	(3)
`'E'`	Floating point exponential format (uppercase).	(3)
`'f'`	Floating point decimal format.	(3)
`'F'`	Floating point decimal format.	(3)
`'g'`	Floating point format. Uses lowercase exponential format if exponent is less than -4 or not less than precision, decimal format otherwise.	(4)
`'G'`	Floating point format. Uses uppercase exponential format if exponent is less than -4 or not less than precision, decimal format otherwise.	(4)
`'c'`	Single character (accepts integer or single character string).
`'r'`	String (converts any Python object using `repr()`).	(5)
`'s'`	String (converts any Python object using `str()`).	(5)
`'a'`	String (converts any Python object using `ascii()`).	(5)
`'%'`	No argument is converted, results in a `'%'` character in the result.

Operation	Result	Notes
`s[i] = x`	item i of s is replaced by x
`s[i:j] = t`	slice of s from i to j is replaced by the contents of the iterable t
`del s[i:j]`	same as `s[i:j] = []`
`s[i:j:k] = t`	the elements of `s[i:j:k]` are replaced by those of t	(1)
`del s[i:j:k]`	removes the elements of `s[i:j:k]` from the list
`s.append(x)`	same as `s[len(s):len(s)] = [x]`
`s.extend(x)`	same as `s[len(s):len(s)] = x`	(2)
`s.count(x)`	return number of i‘s for which `s[i] == x`
`s.index(x[, i[, j]])`	return smallest k such that `s[k] == x` and `i <= k < j`	(3)
`s.insert(i, x)`	same as `s[i:i] = [x]`	(4)
`s.pop([i])`	same as `x = s[i]; del s[i]; return x`	(5)
`s.remove(x)`	same as `del s[s.index(x)]`	(3)
`s.reverse()`	reverses the items of s in place	(6)
`s.sort([key[, reverse]])`	sort the items of s in place	(6), (7), (8)

Option	Meaning
`'<'`	Forces the field to be left-aligned within the available space (this is the default for most objects).
`'>'`	Forces the field to be right-aligned within the available space (this is the default for numbers).
`'='`	Forces the padding to be placed after the sign (if any) but before the digits. This is used for printing fields in the form ‘+000000120’. This alignment option is only valid for numeric types.
`'^'`	Forces the field to be centered within the available space.

Type	Meaning
`'b'`	Binary format. Outputs the number in base 2.
`'c'`	Character. Converts the integer to the corresponding unicode character before printing.
`'d'`	Decimal Integer. Outputs the number in base 10.
`'o'`	Octal format. Outputs the number in base 8.
`'x'`	Hex format. Outputs the number in base 16, using lower- case letters for the digits above 9.
`'X'`	Hex format. Outputs the number in base 16, using upper- case letters for the digits above 9.
`'n'`	Number. This is the same as `'d'`, except that it uses the current locale setting to insert the appropriate number separator characters.
None	The same as `'d'`.

Flag	Meaning
`'#'`	The value conversion will use the “alternate form” (where defined below).
`'0'`	The conversion will be zero padded for numeric values.
`'-'`	The converted value is left adjusted (overrides the `'0'` conversion if both are given).
`' '`	(a space) A blank should be left before a positive number (or empty string) produced by a signed conversion.
`'+'`	A sign character (`'+'` or `'-'`) will precede the conversion (overrides a “space” flag).

Option	Meaning
`'+'`	indicates that a sign should be used for both positive as well as negative numbers.
`'-'`	indicates that a sign should be used only for negative numbers (this is the default behavior).
space	indicates that a leading space should be used on positive numbers, and a minus sign on negative numbers.

Type	Meaning
`'s'`	String format. This is the default type for strings and may be omitted.
None	The same as `'s'`.

`scanf()` Token	Regular Expression
`%c`	`.`
`%5c`	`.{5}`
`%d`	`[-+]?\d+`
`%e`, `%E`, `%f`, `%g`	`[-+]?(\d+(\.\d*)?\|\.\d+)([eE][-+]?\d+)?`
`%i`	`[-+]?(0[xX][\dA-Fa-f]+\|0[0-7]*\|\d+)`
`%o`	`0[0-7]*`
`%s`	`\S+`
`%u`	`\d+`
`%x`, `%X`	`0[xX][\dA-Fa-f]+`

Character	Byte order	Size	Alignment
`@`	native	native	native
`=`	native	standard	none
`<`	little-endian	standard	none
`>`	big-endian	standard	none
`!`	network (= big-endian)	standard	none

Format	C Type	Python type	Standard size	Notes
`x`	pad byte	no value
`c`	`char`	bytes of length 1	1
`b`	`signed char`	integer	1	(1),(3)
`B`	`unsigned char`	integer	1	(3)
`?`	`_Bool`	bool	1	(1)
`h`	`short`	integer	2	(3)
`H`	`unsigned short`	integer	2	(3)
`i`	`int`	integer	4	(3)
`I`	`unsigned int`	integer	4	(3)
`l`	`long`	integer	4	(3)
`L`	`unsigned long`	integer	4	(3)
`q`	`long long`	integer	8	(2), (3)
`Q`	`unsigned long long`	integer	8	(2), (3)
`f`	`float`	float	4	(4)
`d`	`double`	float	8	(4)
`s`	`char[]`	bytes
`p`	`char[]`	bytes
`P`	`void *`	integer		(5)

Code	Meaning
`'- '`	line unique to sequence 1
`'+ '`	line unique to sequence 2
`' '`	line common to both sequences
`'? '`	line not present in either input sequence

Value	Meaning
`'replace'`	`a[i1:i2]` should be replaced by `b[j1:j2]`.
`'delete'`	`a[i1:i2]` should be deleted. Note that `j1 == j2` in this case.
`'insert'`	`b[j1:j2]` should be inserted at `a[i1:i1]`. Note that `i1 == i2` in this case.
`'equal'`	`a[i1:i2] == b[j1:j2]` (the sub-sequences are equal).

Value	Meaning
`'strict'`	Raise `UnicodeError` (or a subclass); this is the default.
`'ignore'`	Ignore the character and continue with the next.
`'replace'`	Replace with a suitable replacement character; Python will use the official U+FFFD REPLACEMENT CHARACTER for the built-in Unicode codecs on decoding and ‘?’ on encoding.
`'xmlcharrefreplace'`	Replace with the appropriate XML character reference (only for encoding).
`'backslashreplace'`	Replace with backslashed escape sequences (only for encoding).
`'surrogateescape'`	Replace byte with surrogate U+DCxx, as defined in PEP 383.

Range	Encoding
`U-00000000` ... `U-0000007F`	0xxxxxxx
`U-00000080` ... `U-000007FF`	110xxxxx 10xxxxxx
`U-00000800` ... `U-0000FFFF`	1110xxxx 10xxxxxx 10xxxxxx
`U-00010000` ... `U-001FFFFF`	11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
`U-00200000` ... `U-03FFFFFF`	111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
`U-04000000` ... `U-7FFFFFFF`	1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx

Codec	Aliases	Languages
ascii	646, us-ascii	English
big5	big5-tw, csbig5	Traditional Chinese
big5hkscs	big5-hkscs, hkscs	Traditional Chinese
cp037	IBM037, IBM039	English
cp424	EBCDIC-CP-HE, IBM424	Hebrew
cp437	437, IBM437	English
cp500	EBCDIC-CP-BE, EBCDIC-CP-CH, IBM500	Western Europe
cp720		Arabic
cp737		Greek
cp775	IBM775	Baltic languages
cp850	850, IBM850	Western Europe
cp852	852, IBM852	Central and Eastern Europe
cp855	855, IBM855	Bulgarian, Byelorussian, Macedonian, Russian, Serbian
cp856		Hebrew
cp857	857, IBM857	Turkish
cp858	858, IBM858	Western Europe
cp860	860, IBM860	Portuguese
cp861	861, CP-IS, IBM861	Icelandic
cp862	862, IBM862	Hebrew
cp863	863, IBM863	Canadian
cp864	IBM864	Arabic
cp865	865, IBM865	Danish, Norwegian
cp866	866, IBM866	Russian
cp869	869, CP-GR, IBM869	Greek
cp874		Thai
cp875		Greek
cp932	932, ms932, mskanji, ms-kanji	Japanese
cp949	949, ms949, uhc	Korean
cp950	950, ms950	Traditional Chinese
cp1006		Urdu
cp1026	ibm1026	Turkish
cp1140	ibm1140	Western Europe
cp1250	windows-1250	Central and Eastern Europe
cp1251	windows-1251	Bulgarian, Byelorussian, Macedonian, Russian, Serbian
cp1252	windows-1252	Western Europe
cp1253	windows-1253	Greek
cp1254	windows-1254	Turkish
cp1255	windows-1255	Hebrew
cp1256	windows-1256	Arabic
cp1257	windows-1257	Baltic languages
cp1258	windows-1258	Vietnamese
euc_jp	eucjp, ujis, u-jis	Japanese
euc_jis_2004	jisx0213, eucjis2004	Japanese
euc_jisx0213	eucjisx0213	Japanese
euc_kr	euckr, korean, ksc5601, ks_c-5601, ks_c-5601-1987, ksx1001, ks_x-1001	Korean
gb2312	chinese, csiso58gb231280, euc- cn, euccn, eucgb2312-cn, gb2312-1980, gb2312-80, iso- ir-58	Simplified Chinese
gbk	936, cp936, ms936	Unified Chinese
gb18030	gb18030-2000	Unified Chinese
hz	hzgb, hz-gb, hz-gb-2312	Simplified Chinese
iso2022_jp	csiso2022jp, iso2022jp, iso-2022-jp	Japanese
iso2022_jp_1	iso2022jp-1, iso-2022-jp-1	Japanese
iso2022_jp_2	iso2022jp-2, iso-2022-jp-2	Japanese, Korean, Simplified Chinese, Western Europe, Greek
iso2022_jp_2004	iso2022jp-2004, iso-2022-jp-2004	Japanese
iso2022_jp_3	iso2022jp-3, iso-2022-jp-3	Japanese
iso2022_jp_ext	iso2022jp-ext, iso-2022-jp-ext	Japanese
iso2022_kr	csiso2022kr, iso2022kr, iso-2022-kr	Korean
latin_1	iso-8859-1, iso8859-1, 8859, cp819, latin, latin1, L1	West Europe
iso8859_2	iso-8859-2, latin2, L2	Central and Eastern Europe
iso8859_3	iso-8859-3, latin3, L3	Esperanto, Maltese
iso8859_4	iso-8859-4, latin4, L4	Baltic languages
iso8859_5	iso-8859-5, cyrillic	Bulgarian, Byelorussian, Macedonian, Russian, Serbian
iso8859_6	iso-8859-6, arabic	Arabic
iso8859_7	iso-8859-7, greek, greek8	Greek
iso8859_8	iso-8859-8, hebrew	Hebrew
iso8859_9	iso-8859-9, latin5, L5	Turkish
iso8859_10	iso-8859-10, latin6, L6	Nordic languages
iso8859_13	iso-8859-13, latin7, L7	Baltic languages
iso8859_14	iso-8859-14, latin8, L8	Celtic languages
iso8859_15	iso-8859-15, latin9, L9	Western Europe
iso8859_16	iso-8859-16, latin10, L10	South-Eastern Europe
johab	cp1361, ms1361	Korean
koi8_r		Russian
koi8_u		Ukrainian
mac_cyrillic	maccyrillic	Bulgarian, Byelorussian, Macedonian, Russian, Serbian
mac_greek	macgreek	Greek
mac_iceland	maciceland	Icelandic
mac_latin2	maclatin2, maccentraleurope	Central and Eastern Europe
mac_roman	macroman, macintosh	Western Europe
mac_turkish	macturkish	Turkish
ptcp154	csptcp154, pt154, cp154, cyrillic-asian	Kazakh
shift_jis	csshiftjis, shiftjis, sjis, s_jis	Japanese
shift_jis_2004	shiftjis2004, sjis_2004, sjis2004	Japanese
shift_jisx0213	shiftjisx0213, sjisx0213, s_jisx0213	Japanese
utf_32	U32, utf32	all languages
utf_32_be	UTF-32BE	all languages
utf_32_le	UTF-32LE	all languages
utf_16	U16, utf16	all languages
utf_16_be	UTF-16BE	all languages
utf_16_le	UTF-16LE	all languages
utf_7	U7, unicode-1-1-utf-7	all languages
utf_8	U8, UTF, utf8	all languages
utf_8_sig		all languages

Codec	Aliases	Purpose
idna		Implements RFC 3490, see also `encodings.idna`
mbcs	dbcs	Windows only: Encode operand according to the ANSI codepage (CP_ACP)
palmos		Encoding of PalmOS 3.5
punycode		Implements RFC 3492
raw_unicode_escape		Produce a string that is suitable as raw Unicode literal in Python source code
undefined		Raise an exception for all conversions. Can be used as the system encoding if no automatic coercion between byte and Unicode strings is desired.
unicode_escape		Produce a string that is suitable as Unicode literal in Python source code
unicode_internal		Return the internal representation of the operand

Codec	Aliases	Purpose
base64_codec	base64, base-64	Convert operand to MIME base64
bz2_codec	bz2	Compress the operand using bz2
hex_codec	hex	Convert operand to hexadecimal representation, with two digits per byte
quopri_codec	quopri, quoted-printable, quotedprintable	Convert operand to MIME quoted printable
uu_codec	uu	Convert the operand using uuencode
zlib_codec	zip, zlib	Compress the operand using gzip

Attribute	Value
`days`	Between -999999999 and 999999999 inclusive
`seconds`	Between 0 and 86399 inclusive
`microseconds`	Between 0 and 999999 inclusive

Operation	Result
`t1 = t2 + t3`	Sum of t2 and t3. Afterwards t1-t2 == t3 and t1-t3 == t2 are true. (1)
`t1 = t2 - t3`	Difference of t2 and t3. Afterwards t1 == t2 - t3 and t2 == t1 + t3 are true. (1)
`t1 = t2 * i or t1 = i * t2`	Delta multiplied by an integer. Afterwards t1 // i == t2 is true, provided `i != 0`.
	In general, t1 * i == t1 * (i-1) + t1 is true. (1)
`t1 = t2 * f or t1 = f * t2`	Delta multiplied by a float. The result is rounded to the nearest multiple of timedelta.resolution using round-half-to-even.
`f = t2 / t3`	Division (3) of t2 by t3. Returns a `float` object.
`t1 = t2 / f or t1 = t2 / i`	Delta divided by a float or an int. The result is rounded to the nearest multiple of timedelta.resolution using round-half-to-even.
`t1 = t2 // i` or `t1 = t2 // t3`	The floor is computed and the remainder (if any) is thrown away. In the second case, an integer is returned. (3)
`t1 = t2 % t3`	The remainder is computed as a `timedelta` object. (3)
`q, r = divmod(t1, t2)`	Computes the quotient and the remainder: `q = t1 // t2` (3) and `r = t1 % t2`. q is an integer and r is a `timedelta` object.
`+t1`	Returns a `timedelta` object with the same value. (2)
`-t1`	equivalent to `timedelta`(-t1.days, -t1.seconds, -t1.microseconds), and to t1* -1. (1)(4)
`abs(t)`	equivalent to +t when `t.days >= 0`, and to -t when `t.days < 0`. (2)
`str(t)`	Returns a string in the form `[D day[s], ][H]H:MM:SS[.UUUUUU]`, where D is negative for negative `t`. (5)
`repr(t)`	Returns a string in the form `datetime.timedelta(D[, S[, U]])`, where D is negative for negative `t`. (5)

Operation	Result
`date2 = date1 + timedelta`	date2 is `timedelta.days` days removed from date1. (1)
`date2 = date1 - timedelta`	Computes date2 such that `date2 + timedelta == date1`. (2)
`timedelta = date1 - date2`	(3)
`date1 < date2`	date1 is considered less than date2 when date1 precedes date2 in time. (4)

Operation	Result
`datetime2 = datetime1 + timedelta`	(1)
`datetime2 = datetime1 - timedelta`	(2)
`timedelta = datetime1 - datetime2`	(3)
`datetime1 < datetime2`	Compares `datetime` to `datetime`. (4)

Directive	Meaning	Notes
`%a`	Locale’s abbreviated weekday name.
`%A`	Locale’s full weekday name.
`%b`	Locale’s abbreviated month name.
`%B`	Locale’s full month name.
`%c`	Locale’s appropriate date and time representation.
`%d`	Day of the month as a decimal number [01,31].
`%f`	Microsecond as a decimal number [0,999999], zero-padded on the left	(1)
`%H`	Hour (24-hour clock) as a decimal number [00,23].
`%I`	Hour (12-hour clock) as a decimal number [01,12].
`%j`	Day of the year as a decimal number [001,366].
`%m`	Month as a decimal number [01,12].
`%M`	Minute as a decimal number [00,59].
`%p`	Locale’s equivalent of either AM or PM.	(2)
`%S`	Second as a decimal number [00,59].	(3)
`%U`	Week number of the year (Sunday as the first day of the week) as a decimal number [00,53]. All days in a new year preceding the first Sunday are considered to be in week 0.	(4)
`%w`	Weekday as a decimal number [0(Sunday),6].
`%W`	Week number of the year (Monday as the first day of the week) as a decimal number [00,53]. All days in a new year preceding the first Monday are considered to be in week 0.	(4)
`%x`	Locale’s appropriate date representation.
`%X`	Locale’s appropriate time representation.
`%y`	Year without century as a decimal number [00,99].
`%Y`	Year with century as a decimal number [0001,9999] (strptime), [1000,9999] (strftime).	(5)
`%z`	UTC offset in the form +HHMM or -HHMM (empty string if the the object is naive).	(6)
`%Z`	Time zone name (empty string if the object is naive).
`%%`	A literal `'%'` character.

`namedtuple()`	factory function for creating tuple subclasses with named fields
`deque`	list-like container with fast appends and pops on either end
`Counter`	dict subclass for counting hashable objects
`OrderedDict`	dict subclass that remembers the order entries were added
`defaultdict`	dict subclass that calls a factory function to supply missing values
`UserDict`	wrapper around dictionary objects for easier dict subclassing
`UserList`	wrapper around list objects for easier list subclassing
`UserString`	wrapper around string objects for easier string subclassing

Type code	C Type	Python Type	Minimum size in bytes
`'b'`	signed char	int	1
`'B'`	unsigned char	int	1
`'u'`	Py_UNICODE	Unicode character	2 (see note)
`'h'`	signed short	int	2
`'H'`	unsigned short	int	2
`'i'`	signed int	int	2
`'I'`	unsigned int	int	2
`'l'`	signed long	int	4
`'L'`	unsigned long	int	4
`'f'`	float	float	4
`'d'`	double	float	8

Python 3 文档(简体中文) 3.2.2 documentation

Python Documentation contents¶

What’s New in Python¶

What’s New In Python 3.2¶

PEP 384: Defining a Stable ABI¶

PEP 389: Argparse Command Line Parsing Module¶

PEP 391: Dictionary Based Configuration for Logging¶

PEP 3148: The concurrent.futures module¶

PEP 3147: PYC Repository Directories¶

PEP 3149: ABI Version Tagged .so Files¶

PEP 3333: Python Web Server Gateway Interface v1.0.1¶

Other Language Changes¶

New, Improved, and Deprecated Modules¶

email¶

elementtree¶

functools¶

itertools¶

collections¶

threading¶

datetime and time¶

math¶

abc¶

io¶

reprlib¶

logging¶

csv¶

contextlib¶

decimal and fractions¶

ftp¶

popen¶

select¶

gzip and zipfile¶

tarfile¶

hashlib¶

ast¶

os¶

shutil¶

sqlite3¶

html¶

socket¶

ssl¶

nntp¶

certificates¶

imaplib¶

http.client¶

unittest¶

random¶

poplib¶

asyncore¶

tempfile¶

inspect¶

pydoc¶

dis¶

dbm¶

ctypes¶

site¶

sysconfig¶

pdb¶

configparser¶

urllib.parse¶

mailbox¶

turtledemo¶

Multi-threading¶

Optimizations¶

Unicode¶

Codecs¶

Documentation¶

IDLE¶

Code Repository¶

Build and C API Changes¶

Porting to Python 3.2¶

What’s New In Python 3.1¶

PEP 372: Ordered Dictionaries¶

PEP 378: Format Specifier for Thousands Separator¶

Other Language Changes¶

New, Improved, and Deprecated Modules¶

Optimizations¶

IDLE¶

Build and C API Changes¶

Porting to Python 3.1¶

PEP 3148: The `concurrent.futures` module¶

PEP 370: Per-user `site-packages` Directory¶

PEP 371: The `multiprocessing` Package¶

PEP 3105: `print` As a Function¶

The `fractions` Module¶

Iterator	Arguments	Results	Example
`count()`	start, [step]	start, start+step, start+2*step, ...	`count(10) --> 10 11 12 13 14 ...`
`cycle()`	p	p0, p1, ... plast, p0, p1, ...	`cycle('ABCD') --> A B C D A B C D ...`
`repeat()`	elem [,n]	elem, elem, elem, ... endlessly or up to n times	`repeat(10, 3) --> 10 10 10`

Iterator	Arguments	Results	Example
`accumulate()`	p	p0, p0+p1, p0+p1+p2, ...	`accumulate([1,2,3,4,5]) --> 1 3 6 10 15`
`chain()`	p, q, ...	p0, p1, ... plast, q0, q1, ...	`chain('ABC', 'DEF') --> A B C D E F`
`compress()`	data, selectors	(d[0] if s[0]), (d[1] if s[1]), ...	`compress('ABCDEF', [1,0,1,0,1,1]) --> A C E F`
`dropwhile()`	pred, seq	seq[n], seq[n+1], starting when pred fails	`dropwhile(lambda x: x<5, [1,4,6,4,1]) --> 6 4 1`
`filterfalse()`	pred, seq	elements of seq where pred(elem) is False	`filterfalse(lambda x: x%2, range(10)) --> 0 2 4 6 8`
`groupby()`	iterable[, keyfunc]	sub-iterators grouped by value of keyfunc(v)
`islice()`	seq, [start,] stop [, step]	elements from seq[start:stop:step]	`islice('ABCDEFG', 2, None) --> C D E F G`
`starmap()`	func, seq	func(seq[0]), func(seq[1]), ...	`starmap(pow, [(2,5), (3,2), (10,3)]) --> 32 9 1000`
`takewhile()`	pred, seq	seq[0], seq[1], until pred fails	`takewhile(lambda x: x<5, [1,4,6,4,1]) --> 1 4`
`tee()`	it, n	it1, it2 , ... itn splits one iterator into n
`zip_longest()`	p, q, ...	(p[0], q[0]), (p[1], q[1]), ...	`zip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-`

Iterator	Arguments	Results
`product()`	p, q, ... [repeat=1]	cartesian product, equivalent to a nested for-loop
`permutations()`	p[, r]	r-length tuples, all possible orderings, no repeated elements
`combinations()`	p, r	r-length tuples, in sorted order, no repeated elements
`combinations_with_replacement()`	p, r	r-length tuples, in sorted order, with repeated elements
`product('ABCD', repeat=2)`		`AA AB AC AD BA BB BC BD CA CB CC CD DA DB DC DD`
`permutations('ABCD', 2)`		`AB AC AD BA BC BD CA CB CD DA DB DC`
`combinations('ABCD', 2)`		`AB AC AD BC BD CD`
`combinations_with_replacement('ABCD', 2)`		`AA AB AC AD BB BC BD CC CD DD`

Operation	Syntax	Function
Addition	`a + b`	`add(a, b)`
Concatenation	`seq1 + seq2`	`concat(seq1, seq2)`
Containment Test	`obj in seq`	`contains(seq, obj)`
Division	`a / b`	`div(a, b)`
Division	`a // b`	`floordiv(a, b)`
Bitwise And	`a & b`	`and_(a, b)`
Bitwise Exclusive Or	`a ^ b`	`xor(a, b)`
Bitwise Inversion	`~ a`	`invert(a)`
Bitwise Or	`a \| b`	`or_(a, b)`
Exponentiation	`a ** b`	`pow(a, b)`
Identity	`a is b`	`is_(a, b)`
Identity	`a is not b`	`is_not(a, b)`
Indexed Assignment	`obj[k] = v`	`setitem(obj, k, v)`
Indexed Deletion	`del obj[k]`	`delitem(obj, k)`
Indexing	`obj[k]`	`getitem(obj, k)`
Left Shift	`a << b`	`lshift(a, b)`
Modulo	`a % b`	`mod(a, b)`
Multiplication	`a * b`	`mul(a, b)`
Negation (Arithmetic)	`- a`	`neg(a)`
Negation (Logical)	`not a`	`not_(a)`
Positive	`+ a`	`pos(a)`
Right Shift	`a >> b`	`rshift(a, b)`
Sequence Repetition	`seq * i`	`repeat(seq, i)`
Slice Assignment	`seq[i:j] = values`	`setitem(seq, slice(i, j), values)`
Slice Deletion	`del seq[i:j]`	`delitem(seq, slice(i, j))`
Slicing	`seq[i:j]`	`getitem(seq, slice(i, j))`
String Formatting	`s % obj`	`mod(s, obj)`
Subtraction	`a - b`	`sub(a, b)`
Truth Test	`obj`	`truth(obj)`
Ordering	`a < b`	`lt(a, b)`
Ordering	`a <= b`	`le(a, b)`
Equality	`a == b`	`eq(a, b)`
Difference	`a != b`	`ne(a, b)`
Ordering	`a >= b`	`ge(a, b)`
Ordering	`a > b`	`gt(a, b)`

Pattern	Meaning
`*`	matches everything
`?`	matches any single character
`[seq]`	matches any character in seq
`[!seq]`	matches any character not in seq

Value	Meaning
`'r'`	Open existing database for reading only (default)
`'w'`	Open existing database for reading and writing
`'c'`	Open database for reading and writing, creating it if it doesn’t exist
`'n'`	Always create a new, empty database, open for reading and writing

Value	Meaning
`'f'`	Open the database in fast mode. Writes to the database will not be synchronized.
`'s'`	Synchronized mode. This will cause changes to the database to be immediately written to the file.
`'u'`	Do not lock database.

Python type	SQLite type
`None`	`NULL`
`int`	`INTEGER`
`float`	`REAL`
`str`	`TEXT`
`bytes`	`BLOB`

Index	Value
`0`	Year
`1`	Month (one-based)
`2`	Day of month (one-based)
`3`	Hours (zero-based)
`4`	Minutes (zero-based)
`5`	Seconds (zero-based)

mode	action
`'r' or 'r:*'`	Open for reading with transparent compression (recommended).
`'r:'`	Open for reading exclusively without compression.
`'r:gz'`	Open for reading with gzip compression.
`'r:bz2'`	Open for reading with bzip2 compression.
`'a' or 'a:'`	Open for appending with no compression. The file is created if it does not exist.
`'w' or 'w:'`	Open for uncompressed writing.
`'w:gz'`	Open for gzip compressed writing.
`'w:bz2'`	Open for bzip2 compressed writing.